Hi, beginner with pytorch here, not understanding exactly how backpropagation with loss.backward() and optimizer.step() is working with my code. Can someone help me clarify this? My question is as follows in 2 parts:
- I have a neural network that I’m training that 3 output neurons. So, when I feed an input into my model, I get a predicted with size 3x1. I then compare that to my actual of size 3x1 to get a loss:
loss = torch.nn.functional.mse_loss(predicted, actual)
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
loss here is a single number (1x1). Is loss supposed to be a single number here, even with the output layer size being 3? My understanding is that since loss is an mse over the 3 actual - predicted differences (aka summing over them), it contains the information needed to calculate gradients & backpropagate through all 3 output neurons. Please correct me on anything I’m wrong about here.
- In reality I’m computing batches of 32 at a time, such that my predicted and actual are 32x3 in size. Using the above lines, loss is still a 1x1, with which I update my model at the end of the batch. Is this the correct way to backpropagate when doing mini-batch training like I’m doing? My loss here is a single number that averages over losses in the 3 layers and across the 32 samples in the batch. Or should I be getting 1 loss per sample instead?
Sorry if this question doesn’t make a whole lot of sense. Currently very confused