isInt() INTERNAL ASSERT FAILED

Joachim · April 27, 2020, 9:24pm

Hi

I am receiving the following RuntimeError

RuntimeError: isInt() INTERNAL ASSERT FAILED at /tmp/pip-req-build-xd7oc4a9/aten/src/ATen/core/ivalue.h:221, please report a bug to PyTorch. 
The above operation failed in interpreter, with the following stack trace:
at /network_extended.py:49:30
        outputs = []
        for i in range(len(inputs)):
            state = self.cell(inputs[i], state)
                              ~~~~~~~~ <--- HERE
            outputs += [state]
        return torch.stack(outputs)

when attempting to train on a machine with this environment. Note that I am executing in a Singularity container built using Nvidia’s pytorch_20.01-py3 image. As such, whilst the Tesla V100 allocated to the system appears unavailable in the above dump, it is very much available and works with other PyTorch models trained the same way.

However, curiously, when executing the exact same code on an older Windows 8.1 machine with this environment, everything works as expected. I can thus only assume that it is due to the specific environment on my Linux machine as opposed to my code.

I have for this reason opted for writing here as opposed to submitting an issue. I am naturally happy to close this thread and submit an issue with additional information if that is more appropriate.

Any thoughts on what this could be are greatly appreciated!

Thanks!

ptrblck · April 28, 2020, 6:50am

Could you wipe the PyTorch installation inside the container via pip uninstall torch -y and install the latest binary?
Besides the OS change, PyTorch is also in another version (1.4.0 in Windows, while 20.01 uses a pre-1.4.0 build), so this might have been a known issue, was could have been fixed in 1.4.0.

Joachim · April 28, 2020, 9:36am

Thanks for your response. It was indeed an issue with the pre-1.4.0 build. Upgrading the entire Singularity container using the pytorch_20.03-py3 image resolved the issue. Thanks!