CUDA out of memory during training

A good suggestion that use with torch.no_grad() in test and validation phase(clear immediate tensors) and detach the loss(remove cached calculating phase) while calculating the total loss!!!