CUDA Error While Porting TensorFlow Model to PyTorch: "iter.device(arg).is_cuda() INTERNAL ASSERT FAILED"

Hi everyone,

I’ve been working on porting a TensorFlow model for Real-Time Super-Resolution (GitHub - braindotai/Real-Time-Super-Resolution: 🔥 Real-time Super Resolution enhancement (4x) with content loss and relativistic adversarial optimization 🔥) to PyTorch over the past week. However, I’ve hit a roadblock and could use some help since there aren’t many resources available online that address my specific issue.

The model trains fine on the CPU, but when I try to run train.py on the GPU (‘cuda’), I encounter the following error:

RuntimeError: iter.device(arg).is_cuda() INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1695392035891/work/aten/src/ATen/native/cuda/Loops.cuh":89, please report a bug to PyTorch. argument 2: expected a CUDA device but found cpu

You can check out my repository here: GitHub - snehilchatterjee/Real-Time-Super-Resolution-PyTorch

To reproduce the error, you just need to run train.py (no additional arguments required).

I’ve been stuck on this for a while and would appreciate any insights or suggestions on how to resolve this issue. Thank you in advance for your help!

Your code fails with a data dependency:

FileNotFoundError: [Errno 2] No such file or directory: '../../DIV2K_Complete/DIV2K_train'

so try to narrow down the failing code section and post a minimal and executable code snippet reproducing the issue.

1 Like

Hello,

I apologize for the inconvenience caused by the data dependency issue. To help reproduce the error easily, I have uploaded a notebook on Colab which you can run without any data dependency. I have included the necessary code to download a subset of the data required to run the model once.

You can access the Colab notebook here: Colab Notebook

If you prefer to run the notebook on a CPU, simply change force_cpu=True when creating the SRGAN class object.

I apologize for not being able to isolate the failing code section. The error message does not provide specific information, and running the code line by line using the debugger in VS Code did not help in identifying the problem.

Thank you for your understanding.