L-BFGS gradients moving loss in the wrong direction

Hi, I’m seeing some strange behavior with BFLGS. It’s quite possible I’m doing something wrong or don’t fully understand how BFLGS works. I’m trying to implement style transfer from the pytorch tutorial. I started noticing green pixels popping up in all my images. I attempted to debug this by eliminating the style transfer portion and just transfer content(e.g. recreate the input image).

I quickly noticed that for certain layers of VGG, the optimizer converged and the image was total nonsense. When I printed out the loss and gradients, it appears that the loss gradually declines up until a point and then sharply spikes up and gets “stuck” at this high loss value. Notice between iteration 1250 and 1300 the loss goes from 0.00009 to 5.0.

> Iteration: 1150 Loss: 0.0001372118276776746 Gradient Range: (-9.439394489163533e-06...1.4998562619439326e-05)
> Iteration: 1200 Loss: 0.00010962127998936921 Gradient Range: (-1.3320541256689467e-05...9.216311809723265e-06)
> Epoch #60 Content Loss: 9.982399933505803e-05
> Iteration: 1250 Loss: 8.824308315524831e-05 Gradient Range: (-1.279950447496958e-05...2.7290301659377292e-05)
> Iteration: 1300 Loss: 5.023702144622803 Gradient Range: (-0.0018774743657559156...0.0014377052430063486)
> Epoch #80 Content Loss: 5.023702144622803
> Iteration: 1350 Loss: 5.023702144622803 Gradient Range: (-0.0018774743657559156...0.0014377052430063486)

I stepped through some of the code and figured out that this optimizer measures loss, does some optimization and then measures loss a second time. If there is no difference it stops trying to optimize. I’m not sure why it gets worse or why it can’t improve but it seems to get “stuck” because it starts aborting iterations since it sees no difference in the loss.

When I switch to SGD, the image converges to reasonable values so my assumption is that this is an issue with the optimizer. I’ve created a notebook that illustrates my problem.