About Runtime Error

Qiang_Sun · July 12, 2018, 7:45pm

Hi guys, can you plz tell me how to deal with the following problem? Cheers!

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/THCTensorRandom.cu:25

ptrblck · July 12, 2018, 7:58pm

Your GPU does not have enough memory left to run the operation.

If this GPU is used by other processes, try to stop them.

Otherwise if that’s not the case, your code just needs too much memory.
You could try to e.g. lower the batch size. If you are already at batch_size=1, you could try to make your model smaller until it fits.

The reason for this error is hard to guess without code. You could also have a potential memory leak, e.g. by storing the computation graph with something like total_loss += loss in your training procedure.

Qiang_Sun · July 12, 2018, 8:03pm

Well, I see, ptrblck. Cheers, man!

ptrblck · July 12, 2018, 8:05pm

If you can’t figure out the reason, you could post your code so that we could have a look at it.

Qiang_Sun · July 12, 2018, 8:06pm

Let me try now according to your nice advice. Had any problem, I will do accordingly.

Qiang_Sun · July 12, 2018, 8:08pm

Hi ptrblck,

Now the code is running smoothly without any modification. What’s up?

ptrblck · July 12, 2018, 8:12pm

Well, I can just speculate without code. Maybe a process was not killed properly and was still using the GPU?
You can check it’s memory with nvidia-smi in your terminal.

Qiang_Sun · July 12, 2018, 8:14pm

Fair enough! I will let you know later since the codes are being run. Thanks!

Qiang_Sun · July 27, 2018, 5:16pm

Hi ptrblck,

Sorry to give you so late reply -

Yes, you are correct. At that time a process was actually not killed properly and was still using the GPU.

Thanks again!