PyTorch comes with a component called autograd which provides automatic differentiation for all operations on Tensors, and Tensors which can remember where they “came from”.
From the PyTorch docs:
torch.Tensor
is the central class of the package. If you set its attribute .requires_grad
as True
, it starts to track all operations on it. When you finish your computation you can call .backward()
and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad
attribute.
When a Neural Network is defined in PyTorch it uses the base class of Torch.nn.Module
. All your submodules and layers can be initialized in the module - which will lead to them being tracked by the Module
.
Let us say Net1
looked like this (subclassing nn.Module
)
import torch.nn as nn
import torch.nn.functional as F
class Net1(nn.Module):
def __init__(self):
super(Net1, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Then when you call the model you get an output:
output = Net1(x)
This output has been propagated through the forward()
pass of Net1
(among other methods). Then you calculate the loss:
loss1 = criterion(outputs1, labels1)
Now we call the .backward()
method on the optimizer, autograd will backpropogate through the tensor
s which have requires_grad
set to True
and calculate the gradient w.r.t the parameters all the way back to where they came from.
Then when you call optimizer1.step()
it will look into params.grad
and update the value of the params
by subtracting the learning_rate
times the grad
from it.
The key here is that Tensors know where they came from and each Net is backpropogated automatically - so both your Nets know exactly where they came from. This abstracts it away from the user which makes it quite friendly
Note: I am newer to PyTorch so if I explain something wrong someone please feel free to correct me.
David Alford