Tensor.backward() low-level computation

dav_n · December 7, 2020, 9:26am

Hi everyone,
I’m trying to understand what .backward() does when called, in terms of types of involved tensors (during computation). I am running the following code:

—Before training code:
net.to(‘cuda’)
net.half() # 16 bit net

—Training code (first part):
outputs = net(inputs) # 16 bit
loss = criterion(outputs, targets) #16 bit
scaledloss = scalefactor * loss.float() # 32 bit
net.zero_grad()
scaledloss.backward() #16 bit grads inside net (parameter.grad)

May I conclude that, if my network has fp16 parameters and its computed gradients are fp16, then backpropagation is computed in fp16?
Furthermore, inspecting C++ code, I found out there is a library of functions with their own backward methods (https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/FunctionsManual.cpp): are these the ones called during .backward() computation?

ptrblck · December 8, 2020, 9:54am

Yes, the dtype would be recorded by Autograd and the backward computation would be executed in the same precision as its forward part, if you don’t change it.
This would also mean that your FP16 gradient updates might easily underflow, which is why we recommend to use the mixed-precision training utility via torch.cuda.amp.