About Runtime Error

Hi guys, can you plz tell me how to deal with the following problem? Cheers!

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/THCTensorRandom.cu:25

Your GPU does not have enough memory left to run the operation.

If this GPU is used by other processes, try to stop them.

Otherwise if that’s not the case, your code just needs too much memory.
You could try to e.g. lower the batch size. If you are already at batch_size=1, you could try to make your model smaller until it fits.

The reason for this error is hard to guess without code. You could also have a potential memory leak, e.g. by storing the computation graph with something like total_loss += loss in your training procedure.

1 Like

Well, I see, ptrblck. Cheers, man!:+1:

If you can’t figure out the reason, you could post your code so that we could have a look at it.

Let me try now according to your nice advice. Had any problem, I will do accordingly.

Hi ptrblck,

Now the code is running smoothly without any modification. What’s up?

Well, I can just speculate without code. Maybe a process was not killed properly and was still using the GPU?
You can check it’s memory with nvidia-smi in your terminal.

Fair enough! I will let you know later since the codes are being run. Thanks!

Hi ptrblck,

Sorry to give you so late reply -:slight_smile:

Yes, you are correct. At that time a process was actually not killed properly and was still using the GPU.

Thanks again!

1 Like