Pytorch 0.4.1 installation issues for EDNP implementation

I am trying to run following implementation: GitHub - Lotayou/everybody_dance_now_pytorch: A PyTorch Implementation of "Everybody Dance Now" from Berkeley AI lab.

As mentioned on repo, their requirements are

  • Ubuntu 18.04 (But 16.04 should be fine too)
  • Python 3.6
  • CUDA 9.0.176
  • PyTorch 0.4.1post2

So Installed above using:

!pip3 install http://download.pytorch.org/whl/cu90/torch-0.4.1-cp36-cp36m-linux_x86_64.whl # for pytorch
!pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ torchvision==0.2.1.post2 # for torchvision
!pip uninstall pillow
!pip install pillow==5.2.0 # this is requirement of torchvision's version 0.2.1.post2

and confirmed installation using

import torch
import torchvision 

print(torch.__version__)
print(torchvision.__version__)
print(torch.version.cuda)

output:

0.4.1
0.2.1
9.0.176

and everything seems okay, now I run the training file

!python /content/dance_now/train.py 

I get the following error:

RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:663 pytorch

Have tried many things like upgrading/degrading pytorch, installing cudNN etc but nothing worked.

Kindly help, thanks.

The error is raised in the old THCudaMalloc call here, so I guess that your current (CUDA) setup is not working correctly.
It’s a bit hard to tell what the reason might be, as 0.4.1 was released in July 2018.

The probably better approach would be to try to update the repository and make it compatible with the latest stable release. Alternatively you could also try to use an old docker container from 2018 with CUDA9.0 and try to get this old version running there.