Hi,
As always, as part of my first post I thank the developers for this amazing library that helps a lot of us in our deep learning escapades. I’ve finished running through the first tutorial involving the CIFAR10 dataset and have some questions.
Code
In this code block,
# forward + backward + optimize outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step()
Could some explain in detail on what is going on here. These might be more of python related questions than a pytorch question but I think its crucial to understand what is happening here. I understand whats happening in an abstract level but not on the code level. In particular,
-
net is an object, so what is net(inputs) calling? Because its not a constructor, so I’m not sure whats happening here. Also this returns the output but where is this function (if it is a function at all) defined to return the output and what does it do?
-
Where do we call the forward function for the that was defined as part of the model class? I’m guessing this has something to do with the previous question.
-
Similar to the first point criterion(outputs, labels), where is this function defined? I checked the docs for crossentropyloss() and its a class that only takes weights and size_average in the constructor.
In the prediction code block,
outputs = net(Variable(images)) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum()
This code (net(images)) is similar to the training stage, so I’m not sure how we are “testing” because we don’t have testing mode. For example, in Keras for training we use model.fit and testing we use model.evaluate, and I’m not seeing a similar distinction here.
EDIT-1: I got the answers to the above questions from the Learning PyTorch with Examples. It all happens through the _call_ function in python.
Others
-
Can I get a small dataset from the dataloader for overfitting before I get the whole thing? I’m guessing I could just run the for loop till train_loader[:small_number], any thoughts?
-
The dataloader only provides train and test, how would I get a validation set out of this?
-
We print out loss.data[0], does it contain the loss for the entire mini-batch? Could I get some pointers on how to keep track of the loss history for entire epochs (for plotting purposes)?
-
If I want to use GPU, do I have to call the .cuda() function in every place where I have Variables and instantiation of my models? Or is there some global param I can set that automatically makes all the Variables and instantiated net into cuda compatible objects?
-
Why is torch.save(model.state_dict) recommended over torch.save(model) since the latter can be used to save the entire model including architecture and params?
-
The normalize method in transform takes a list of 2 tuples representing the desired mean and stddev for each of the color channels. Is that calculated within that particular set? How would I normalize the test set with the training set mean and stddev?
-
Can I add to the post category list or is it strictly confined to the 4 that is defined?
I apologize for a whole lot of questions, most of them born out of ignorance and I’m sure I’ll have more as I start using pytorch for my problems. If I need to split them up into separate posts, please let me know and I’ll edit the post accordingly.
Thanks and I appreciate everyone’s help!