July 21, 2018
In this final installment on Style Transfer using Neural Networks, I will be tuning the implementation from the previous installment. There are some differences in use case that make it slightly difficult to get good results. The implementation was effective when the content image is of an object and the style image is of an environment. This resulted in a generated image where the object was evenly stylized with the background image. We want to make an object look like it was drawn as anime so this doesn’t work. This will be addressed with several methods.
The style image is used by the network to stylize the content image. For our use case, we should select an image of similar pose and scale to the content image. This constraint makes it difficult to productionize this implementation.
I chose the following image to use for styling:
This was the content image:
Here are some samples of the generated Images:
Selecting initial loss weights for style loss and content loss is important for getting good results. They help in generating an image that is balanced in content-to-style. If we use an initial content loss that is too large, then the generated image will look too much like the content image and won’t be styled enough. It took many manual iterations to find a set of initial loss weights that I liked.
The generated images tended to looked rough. The content and style images features were visually discrete in the generated images. To address this, I used Total Variance Loss which smoothed out the generated image.
Model layer outputs of the VGG16 network are used to calculate losses. The VGG16 network has 13 convolutional layers to choose from. The referenced implementation selected 1 layer for content loss, and 5 layers for style loss. This was ok, however the generated image often had eyeballs and hair in random locations. I found that using 2 layers for content loss, and 7 layers for style loss gave me the best results.
Here are some results from different layer combinations:
I used an image size of 128px x 128px in the prior testing. This size was selected for fast iterations of hyperparameters. I found that generated images were blurry when they were smaller than 200px x 200px. Larger dimensions were computationally intensive but always generated clearer results.
Written by Peter Chau, a Canadian Software Engineer building AIs, APIs, UIs, and robots.