Hi, guys,
I am learning about DeepLabV3+ model these days.
And I meet a strange phenomenon that using the same batch size in evaluation trigger “RuntimeError: CUDA out of memory.”, which is normal in training.
But the inference speed seems quite faster than the training.
Are you seeing the OOM error after a few iterations or how were you able to see the validation step being faster?
Since Python uses function scoping, you might want to wrap some code in specific functions so that they’ll be cleared as explained here.
Could you post a code snippet to reproduce this issue?
Instead of your real data, you could initialize the input and target using random tensors, so that we could debug this issue.
You could use torch.cuda.memory_allocated(), torch.cuda.memory_cached() etc. in your script to check the memory. Also, nvidia-smi will give you the overall memory usage (including the CUDA context).