Why is network output changing?

ndwork · February 23, 2018, 7:26am

I am attempting to understand what the following code is doing and why the loss is changing. net is my network module.

      preParams = net.state_dict()
      while True:
        net.load_state_dict( preParams )
        optimizer.zero_grad()
        preOutputs = net( inputs )
        preLoss = criterion( preOutputs, labels )
        print( preLoss )
        preLoss.backward()
        preParams = net.state_dict()
        optimizer.step()

Each time through this loop, the value of preLoss changes.

If I comment out “optimizer.step()” then the loss doesn’t change.
What is optimizer.step doing that isn’t undone by the lines above it? What else do I need to call in order to undo what optimizer.step did?

I would have thought that load_state_dict would have set the values of the parameters to be what was stored prior to entering the loop. preLoss would have been calculated for those values of the parameters. preLoss.backward would have calculated the gradient at that point. And optimizer.step would have done a single optimization step. Then everything would be undone and started again in the next iteration.

Where did I go wrong?

tom · February 23, 2018, 8:55am

net.state_dict() will return a view on the network’s parameters as tensors. optimizer.step() updates the network and the view preParams will also get the update. If you use

preParams = {k:v.clone() for k,v in net.state_dict().items()}

instead, it will work as you expect.

Best regards

Thomas