For some reason when I set model.cuda() along with the other examples I get the following error:
*** Error in `python’: free(): invalid pointer: 0x00007f8af6c2bae0 ***
However when I no longer set model.cuda() I get no pointer free errors and the model trains fine. Do I have to set .cuda() on every single variable including the criterion?
I am using python2 and using the Udacity Tensorflow g2.2xlarge instance on Amazon AWS.
Here is a link to my code:
.cuda() operation is not inplace for tensors, your should do
input = input.cuda().
That being said it should just raise an error, not fail like that.
I changed everything to
.cuda() now but this is the error I get instead:
THCudaCheck FAIL file=/py/conda-bld/pytorch_1490983232023/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "model.py", line 244, in <module>
train(train_loader, model, criterion, optimizer, epoch)
File "model.py", line 118, in train
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/torch/autograd/variable.py", line 146, in backward
self._execution_engine.run_backward((self,), (gradient,), retain_variables)
RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1490983232023/work/torch/lib/THC/generic/THCStorage.cu:66
You don’t have enough memory on the GPU, you may want to reduce the batch size.
I get the same error on running the cartpole example with cuda. Hower as mentioned above without cuda it runs fine. The error persists even on reducing the batch_size to 2. Any solutions?
sudo apt-get install libtcmalloc-minimal4
Fixes the error.
It works, thanks. But do you know why?
Indeed it solves the “invalid pointer error”! Can anyone explain why?
Is there any solution to this for someone on an academic institution cluster without sudo privileges?