Hi,
I have the same code running on two platforms:
platform1: Quadro P4000 (8119 MB RAM)
platform2: Titan V (12033 MB RAM)
I’m using mini batches of size 400 on both platforms.
While in platform1
everything works fine (consumes 7295/8119 MB , 99% volatile memory usage
)
But, in platform2
it runs in CUDA: Out of Memory Error
while only consuming 1129/12033 MB, 0% volatile memory usage
and stops.
Traceback (most recent call last): │
File "run_meenet1.py", line 151, in <module> │
criterion=criterion) │
File "/home/ubuntu/projectSSML/meenet/modules/helpers.py", line 90, in train_│
batchwise │
loss.backward() │
File "/home/ubuntu/.local/lib/python3.7/site-packages/torch/tensor.py", line │
93, in backward │
torch.autograd.backward(self, gradient, retain_graph, create_graph) │
File "/home/ubuntu/.local/lib/python3.7/site-packages/torch/autograd/__init__│
.py", line 90, in backward │
allow_unreachable=True) # allow_unreachable flag │
RuntimeError: CUDA error: out of memory
The scripts are exactly the same.
What might be going wrong.
Thanks