August 07, 2018
There is an important technique in deep learning called Transfer Learning. It allows one to fine-tune a pretrained network on new data and repurpose it. This greatly reduces computation cost and data requirements. Consider a business problem where you needed a machine to recognize SpongeBob SquarePants characters. How could you quickly tackle this important pressing issue?
The VGG19 is a very deep convolutional network for image recognition. It is a 19 layer network that was trained for the ImageNet Challenge in 2014 by the University of Oxford. This network can classify 1000 different objects so it’s a perfect baseline for our task. The following is a diagram of VGG19’s architecture:
Lets see how well it does in recognizing an image of SpongeBob SquarePants.
The network guessed the image to be a
17.6656% confidence. As expected it fails because it wasn’t trained on SpongeBob SquarePants.
To apply transfer learning, we need to perform the following steps:
You can find my code and data for this on my github
I built a very small dataset for this task. It consisted of 3 characters (classes):
For each class, I had 31 images. 27 for training, 3 for validation, and 1 for final prediction.
I used the VGG19 network for this task. The objects that this image was trained to recognize are real objects, not cartoons. I wanted to see if it could generalize to screenshots of cartoon characters.
Freezing layers means that the network will not train a given set of layers. We may freeze many of the layers if we don’t have sufficent data, or maybe those layers are already well-trained on a set of features. It took a few tries to find the right number of layers to freeze. My combinations were:
My best result was freezing the first five layers.
The last few layers of VGG19 network are used to classify images into classes. We need to rip these out, and add our own. Mine were as follows:
It was very easy to train the model. Because most of the work was already done, I was able to train all the freezing combinations above in under 30 mins. I used a batch size of 16, and trained until accuracy stopped increasing.
The following are my results.
Written by Peter Chau, a Canadian Software Engineer building AIs, APIs, UIs, and robots.