Hi everyone,
I’m trying to understand what .backward() does when called, in terms of types of involved tensors (during computation). I am running the following code:
—Before training code:
net.to(‘cuda’)
net.half() # 16 bit net
—Training code (first part):
outputs = net(inputs) # 16 bit
loss = criterion(outputs, targets) #16 bit
scaledloss = scalefactor * loss.float() # 32 bit
net.zero_grad()
scaledloss.backward() #16 bit grads inside net (parameter.grad)
May I conclude that, if my network has fp16 parameters and its computed gradients are fp16, then backpropagation is computed in fp16?
Furthermore, inspecting C++ code, I found out there is a library of functions with their own backward methods (https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/FunctionsManual.cpp): are these the ones called during .backward() computation?