The average of the batch losses will give you an estimate of the “epoch loss” during training.
Since you are calculating the loss anyway, you could just sum it and calculate the mean after the epoch finishes.
This training loss is used to see, how well your model performs on the training dataset.
Alternatively you could also plot the batch loss values, but this is usually not necessary and will give you a lot of outputs.
I’ve just used Colab in the past a few times for debugging purposes. Not sure if the runtime changed, but I don’t thing Colab is a good fit for “real time” deployment, e.g. since the notebook runtime is limited.
What if we use .detach() when working with Pytorch lightning. It would give us the data without any computational graph. Will it be correct to use .detach() instead of .item() ?
.detach() will return a tensor, which is detached from the computation graph, while .item() will return the Python scalar. I don’t know how and where this is needed in PyTorch Lightning depending on the use case detach() might also work.
Thanks for such a quick reply. Yes, I understand this concept now. So because in Pytorch docs it is mentioned that tensors can be passed for logging, I am sure I can do the following:
However, I am little unsure if the following implementation is correct in which I replaced .item() with .detach() before the loss value is returned by the model. I am not getting any syntax error though but I am little worried if it would interfere with grad_calculation and affect the performance.
NOTE - The reason I am trying to replace .item() is because I am training the model on multiple gpus and it was taking very long to train just one epoch. So while going through the pytorch lightning docs I came across this
Calls into item() might slow down your code, as they are synchronizing.
While detach() could potentially avoid synchronizations, a push to the CPU would still wait for the GPU to finish the calculation and would thus synchronize, so I don’t think your self.metrics would behave differently in this case.
However, let me know if you see any changes.
Yes, you were correct. I could only manage to reduce time a little by using more num_workers in dataloader and using 2 gpus. But still it is taking around 3.5 hrs for each epoch for 50.000 images, which I feel is still a lot.
Form what I understand, I don’t need to specify the device for any tensor as pytorch lightning takes care of it. So would it be okay if I replace to_cpu(x).item() with just x.detach() ?
You can try it out, but I assume the implementation in Lightning might be there for a reason.
E.g. if you need to print these values after returning them, you would still need to synchronize the code, since these values have to be calculated first.