July 13, 2018
Following my initial exploration of Image Style Transfer with Machine Learning in part 1, a number of steps we taken to improve my best models. The major focus here was tuning the hyper parameters. Next, the model was productionized as a web application. This user interface allowed for easier usage of the model. And finally, I used an implementation of VGG style transfer to start to get true style transfer. Thankfully, there are many implementations online. Let’s examine these improvements.
It became clear that my models required better implementation to break past training barriers. I had 3 issues to tackle:
Tensorflow’s documentation made it clear that using
feed_dict to input data for training was sub-optimal. Instead, a pipeline should be built to handle the extract, transform, load (ETL) process. This allows for just-in-time delivery of data during training. Futhermore, using Tensorflow rather than Numpy allowed for ETL allows for leveraging of graph optimizations. Nice.
Upon building an input pipeline, I found that could easily handle 5000 images. Previously, I had to wait for the ETL process to finish before I could run additional code. I suspect this was why I couldn’t load more than 2000 image. It is likely that my computer was trying to load them all into memory prior to training. Along with parallelized data transformation, the overall training times were greatly reduced. Additionally, the reconstructions were greatly improved!
My training machine has a very low-end CPU from 2010, and a CUDA-incompatible GPU. Training was painfully slow, even with the prior improvements. It was time to get serious so I scored me a GeForce GTX 1050 2GB GPU. Not at high-end accelerator but a big step up. CUDA setup was a challenge but after getting through it and updating my models to use the GPU, training became at least 20 times faster! I was able to have the input pipeline use the CPU, while the new GPU was used for training. I continued to use the old GPU to run my monitors.
This hardware upgrade allowed me to quickly train far more epochs with larger batch sizes. I was even able to increase the size of my input images from 32 pixels by 32 pixels to 128 px by 128 px, and use color images. The resulting reconstructions were night and day.
With the prior issues out of my way, I could focus on experimenting with more better model architectures. The prior improvements allowed me to iterate faster, and train more complicated models. I was able to get much better reconstructions with a deeper Autoencoder, and by using more filters at each layer. I also used batch normalization after each Convolutional layer to reduce the effects of vanishing and exploding gradients. This allows me to reduce the number of training epochs and decrease batch size.
It is important to note that I had to balance model complexity and batch size, otherwise the GPU would run out of memory. I’ll need to investigate how this can be handled with software.
The first step was to train a model and export it from Keras. This model can’t be used directly so I transformed it into model layers which can be ingested by Tensorflow.js. This model was put into a Content Distribution Network for consumption. Finally, I build a React.js web application from create-react-app. This web aplication loads up the Keras model, and allows users to make inferences on their device. Users can even upload their own photos. SICK.
Having pushed the boundaries of my knowledge, it was time to defer to the experts. I could no load get better reconstructions. This was because I did not consult the literature on style transfer. It was a good learning experience though.
As it turns out, I was close in my methodology. Using a Convolutional model was the right choice. However, I didn’t need an Autoencoder. The literature says that using a Convolutional classification model works fine. Another mistake of mine was to train with anime images then feed a photo through the network for reconstructions. Instead, I needed to feed 1 style image, and 1 content image into a pre-trained network then use 2 losses to generate a style transferred image.
I found a simple implementation in Keras to try. This model used the pretrained VGG19 network, and 3 loss functions (I only used 2 of them to start). The resulting style transfers were much better… this is definitely the way to go!
In third and final part of this series, I will:
Written by Peter Chau, a Canadian Software Engineer building AIs, APIs, UIs, and robots.