About Runtime Error

(Qiang Sun) #1

Hi guys, can you plz tell me how to deal with the following problem? Cheers!

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/THCTensorRandom.cu:25


Your GPU does not have enough memory left to run the operation.

If this GPU is used by other processes, try to stop them.

Otherwise if that’s not the case, your code just needs too much memory.
You could try to e.g. lower the batch size. If you are already at batch_size=1, you could try to make your model smaller until it fits.

The reason for this error is hard to guess without code. You could also have a potential memory leak, e.g. by storing the computation graph with something like total_loss += loss in your training procedure.

(Qiang Sun) #3

Well, I see, ptrblck. Cheers, man!:+1:


If you can’t figure out the reason, you could post your code so that we could have a look at it.

(Qiang Sun) #5

Let me try now according to your nice advice. Had any problem, I will do accordingly.

(Qiang Sun) #6

Hi ptrblck,

Now the code is running smoothly without any modification. What’s up?


Well, I can just speculate without code. Maybe a process was not killed properly and was still using the GPU?
You can check it’s memory with nvidia-smi in your terminal.

(Qiang Sun) #8

Fair enough! I will let you know later since the codes are being run. Thanks!