I am reading in the book Deep Learning with PyTorch that by calling the nn.Module.parameters() method that it will call submodules defined in the module’s init constructor. To understand and help visualize the processes I would like to use an ensemble as an example from ptrblck:
class MyEnsemble(nn.Module):
def __init__(self, modelA, modelB, nb_classes=10):
super(MyEnsemble, self).__init__()
self.modelA = modelA
self.modelB = modelB
# Remove last linear layer
self.modelA.fc = nn.Identity()
self.modelB.fc = nn.Identity()
# Create new classifier
self.classifier = nn.Linear(2048+512, nb_classes)
def forward(self, x):
x1 = self.modelA(x.clone()) # clone to make sure x is not changed by inplace methods
x1 = x1.view(x1.size(0), -1)
x2 = self.modelB(x)
x2 = x2.view(x2.size(0), -1)
x = torch.cat((x1, x2), dim=1)
x = self.classifier(F.relu(x))
return x
In this nn.Module both self.modelA = modelA and self.modelB = modelB are being called in the init constructor. Therefore, by calling MyEnsemble.parameters() we would be returned the params which autograd would calculate the gradients wrt the parameters of the models MyEnsemble, modelA, and modelB?
Unless, of course, requires_grad=False for self.modelA and self.modelB - in which case autograd would not calculate the gradients wrt the parameters of these models and only that of MyEnsemble?
Is this thinking correct? Please correct my language if it is off. I am trying to learn as best I can and all help is appreciated.
Yes, you are correct that the gradients won’t be calculated for parameters, which are using requires_grad=False. However, model.parameters() would still return all parameters, if you are not filtering them out.
So if I call Module.parameters().grad I will be able to see the gradients? And I should call optimizer.zero_grad() each epoch to clear the gradients so they don’t accumulate? (Doesnt matter where you call zero_grad()).
EDIT: Also, what do you think is the area of PyTorch most people have trouble with? My focus is Computer Vision. Is there a concept/area in PyTorch that I can focus my energy on where you think people commonly make mistakes?
In the general framework usage. I see you are very knowledgeable about PyTorch. Where would you recommend someone focus their energy when learning PyTorch? Any areas you see more mistakes than others that I could focus on?
When you are trying to learn PyTorch, I would suggest to pick an interesting (and personal) project you could spend some time on. E.g. if you are interested in photography and would like to experiment with some style transfer approaches, this would be a great way to learn more about GANs and other architectures. You would learn the framework just by working on the project.
On the other hand, if you would like to contribute to PyTorch, I would recommend to have a look at the usability / simple-fixes or misc category here.
Also, the good first issue label is useful to check for some starter PRs. The Contribution Guide is a good way to get started.
Generally, I would say that a lot of new users would have some trouble with language models, i.e. how the shapes are used in RNNs, where and when to detach() the activations etc.