Thursday, December 29, 2016

Progress

I'm going to lay out my current goals more specifically in this post because I think that helps them be more achievable. While these goals will not pass the Lovelace 2.0 test by themselves, I think they are a good first step.

So a first question: assuming that we can interpret human language into a reasonable form, how do we design an algorithm that can create something that meets arbitrary constraints?

The truth is that we can't without background knowledge. If I say to someone that hasn't had any baking experience "please bake me a pie", without the internet or asking others they would not know what to do. They might have a perfect example of a pie given to them, but they can't really recreate that pie without knowing some of the process that went towards its creation.

One way to look at this issue is through the lens of "one-shot learning".

In one-shot learning, the goal is to make a classifier that, for example, can distinguish between images of a cat and images of something else. The tricky part is that that learner only gets one picture of a cat and must then make all future decisions using only that picture. Once it is done "training" it is given more images to classify as either having a cat or not and gets no feedback about how it did. It is also not allowed to use those new images to change the way it thinks about the world.

There are some simple heuristics one could use. For example, try some fancy image similarity measures and then choose a threshold of "how similar" the picture must be to the cat picture in order to be classified as having a cat. While this might do decent, this is a difficult problem mostly because the typical approaches scientists use to classify things don't apply here.

Specifically, how one usually classifies these images is by giving an algorithm thousands of examples of images with and without cats and telling it whether or not those images have cats. Then the algorithm has a decent understanding about what it "means to look like a cat" and can do pretty well. Yet with one or two examples these models don't work well at all.

The typical method used to get around this is known as "transfer learning". The idea is that you train an algorithm beforehand to, say, distinguish between images containing an animal or not because that is data that is easily obtainable. They pick multiple categories like this (does the animal have fur, is the animal larger than a microwave oven, etc.) and train an algorithm to do good at all of these things. Then when given an image of a cat, the algorithm can say "okay this has paws, fur, a tail, eyes, ears" etc and then it can distinguish future pictures based on whether or not they contain those characteristics as well. It isn't perfect, but it would do much better than most heuristics and allows one to use standard machine learning techniques.

A natural extension to classification is generation: instead of saying "tell me whether or not this picture has a cat in it", instead I say "give me a new picture that has a cat in it". These are known as generative models and do stuff lol ima finish this later


No comments:

Post a Comment