Simple question about loss.backward()

I’m trying to create a version of A3C reinforcement learning in caffe and have a question on how ‘loss.backward()’ works.

In caffe (for which I am familiar with), the loss value is calculated, placed in a variable such as a ‘double’ or ‘float’ and then used to calculate the gradient diffs within each layer’s backward function all of which are then applied by the optimizer used which later applies the diffs to the various learnable blobs based on the learning rate, decay etc.

How does this work in pytorch? For example, I see in numerous A3C examples how the loss is calculated, but what occurs when the call to ‘loss.backward()’ is made?

Does the ‘loss.backward()’ function apply the loss ‘value’ to each layer within the model defined much in the same way that caffe does or is there an extra bit of magic that I’m missing?

Any comments are appreciated.


Basically, the backward call evaluates the graph built in the forward pass and computes the gradients using the chain rule.
You can find more information here.

Note that in PyTorch the gradients will be accumulated, so usually you want to zero them again before calling the next backward pass.