Thursday, December 29, 2016

Progress

I'm going to lay out my current goals more specifically in this post because I think that helps them be more achievable. While these goals will not pass the Lovelace 2.0 test by themselves, I think they are a good first step.

So a first question: assuming that we can interpret human language into a reasonable form, how do we design an algorithm that can create something that meets arbitrary constraints?

The truth is that we can't without background knowledge. If I say to someone that hasn't had any baking experience "please bake me a pie", without the internet or asking others they would not know what to do. They might have a perfect example of a pie given to them, but they can't really recreate that pie without knowing some of the process that went towards its creation.

One way to look at this issue is through the lens of "one-shot learning".

In one-shot learning, the goal is to make a classifier that, for example, can distinguish between images of a cat and images of something else. The tricky part is that that learner only gets one picture of a cat and must then make all future decisions using only that picture. Once it is done "training" it is given more images to classify as either having a cat or not and gets no feedback about how it did. It is also not allowed to use those new images to change the way it thinks about the world.

There are some simple heuristics one could use. For example, try some fancy image similarity measures and then choose a threshold of "how similar" the picture must be to the cat picture in order to be classified as having a cat. While this might do decent, this is a difficult problem mostly because the typical approaches scientists use to classify things don't apply here.

Specifically, how one usually classifies these images is by giving an algorithm thousands of examples of images with and without cats and telling it whether or not those images have cats. Then the algorithm has a decent understanding about what it "means to look like a cat" and can do pretty well. Yet with one or two examples these models don't work well at all.

The typical method used to get around this is known as "transfer learning". The idea is that you train an algorithm beforehand to, say, distinguish between images containing an animal or not because that is data that is easily obtainable. They pick multiple categories like this (does the animal have fur, is the animal larger than a microwave oven, etc.) and train an algorithm to do good at all of these things. Then when given an image of a cat, the algorithm can say "okay this has paws, fur, a tail, eyes, ears" etc and then it can distinguish future pictures based on whether or not they contain those characteristics as well. It isn't perfect, but it would do much better than most heuristics and allows one to use standard machine learning techniques.

A natural extension to classification is generation: instead of saying "tell me whether or not this picture has a cat in it", instead I say "give me a new picture that has a cat in it". These are known as generative models and do stuff lol ima finish this later


Intro and goals

There is this notion of computers "creating something". It is in some sense at the heart of artificial intelligence but is a little tangent to what people typically consider as artificial intelligence.

The goal of this blog is to document my understanding of how to design an algorithm to pass the Lovelace 2.0 test. Sorta like the Turing Test, this is a test that once an algorithm passes it can be considered creative. It goes as follows:

To pass the test, an artificial agent $a$ is challenged as follows:

- $a$ must create an artifact $o$ of type $t$
- $o$ must conform to a set of constraints $C$ where $c_i \in C$ is any criterion expressible in natural language
- A human evaluator $h$, haven chosen $t$ and $C$, is satisfied that $o$ is a valid instance of $t$ and meets $C$.
- A human referee $r$ determines the combination of $t$ and $C$ to not be unrealistic for an average human.

There are certainly some flaws in this test. The main one that I think is worth addressing is that is not "open ended" - this is not judging creativity, this is judging someone's ability to be a commissioned artist. A human that is considered creative usually makes something that they aren't asked to do.

However, I think that if one can pass this test reasonably well then we are close to being open ended. Instead of being given a task, an algorithm can, say, generate a random task and then complete that task. It could do this many times and have some reasonable method of determining if humans will think it is good, then share those good things. This has a very similar effect to what one observing an artist from the outside might see, so while not exactly the same I'd say that it is very close.

Now that that goal is set out it's time to make something that passes it :)