Pytorch's Tutorial on Neural Style Transfer has drastically different results than the original paper

I also posted this on github.

Following the official tutorial on van Gogh’s Starry Night produces the results below. I’ve included the results from the original paper for comparison. The notebook that generated these plots can be viewed here.

The original paper used Caffe’s pre-trained VGG19 network, which has a few differences:

  1. The pixels are in the range of [0, 255] (Pytorch’s was trained in the range of [0, 1])
  2. The color channels are in the order BGR (Pytorch’s is in the order of RGB)
  3. The method of normalization is different.

Despite these differences, I would expect to see results that are closer to the original paper. What else could be causing such a large difference in results?

I would assume it has to do with hyperparameters and most importantly the learning rate and following the notation from the original paper the alpha and beta parameter to emphasize how much of the total loss comes from the style loss vs the content loss. I also think you could perhaps try out initializing with the content image (instead of a random image) and see if that helps. Let us know how it goes

The images above come from Figure 1 of the original paper, which applied back-propagation to a noisy image to capture the style alone (the content loss weight being effectively 0). I have also tried learning rates in the range [1e-05, 1e03] and gotten similar results.

If I load the caffe model, switch the method of normalization, and follow this same process, I get results that are comparable to the original paper.