Cuda runtime error (8) : invalid device function at THCTensorMathPointwise.cu for latest build

So I’ve tried to compile Pytorch locally for development purposes under the (dev) anaconda environment and another one under (lisa) which I install from the instructions from pytorch.org, I’ve got a recent warning when running the VAE code on Pytorch tutorials when running it on the newest version.

(dev) [SLURM] suhubdyd@kepler2:~/research/models/pytorch-models/vae$ python main.py 
THCudaCheck FAIL file=/u/suhubdyd/research/dl-frameworks/pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu line=247 error=8 : invalid device function
Traceback (most recent call last):
  File "main.py", line 138, in <module>
    train(epoch)
  File "main.py", line 108, in train
    recon_batch, mu, logvar = model(data)
  File "/u/suhubdyd/.conda/envs/dev/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "main.py", line 71, in forward
    mu, logvar = self.encode(x.view(-1, 784))
  File "main.py", line 54, in encode
    h1 = self.relu(self.fc1(x))
  File "/u/suhubdyd/.conda/envs/dev/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/u/suhubdyd/.conda/envs/dev/lib/python2.7/site-packages/torch/nn/modules/linear.py", line 54, in forward
    return self._backend.Linear.apply(input, self.weight, self.bias)
  File "/u/suhubdyd/.conda/envs/dev/lib/python2.7/site-packages/torch/nn/_functions/linear.py", line 14, in forward
    output.add_(bias.expand_as(output))
RuntimeError: cuda runtime error (8) : invalid device function at /u/suhubdyd/research/dl-frameworks/pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:247

while it doesn’t matter for the stable distribution

(lisa) [SLURM] suhubdyd@kepler2:~/research/models/pytorch-models/vae$ python main.py 
Train Epoch: 1 [0/60000 (0%)]   Loss: 549.307800
Train Epoch: 1 [1280/60000 (2%)]        Loss: 311.565613
Train Epoch: 1 [2560/60000 (4%)]        Loss: 236.338989
Train Epoch: 1 [3840/60000 (6%)]        Loss: 225.371078
Train Epoch: 1 [5120/60000 (9%)]        Loss: 208.993668
Train Epoch: 1 [6400/60000 (11%)]       Loss: 206.427368
Train Epoch: 1 [7680/60000 (13%)]       Loss: 208.580597
Train Epoch: 1 [8960/60000 (15%)]       Loss: 201.324646
Train Epoch: 1 [10240/60000 (17%)]      Loss: 191.824326
Train Epoch: 1 [11520/60000 (19%)]      Loss: 195.214188
Train Epoch: 1 [12800/60000 (21%)]      Loss: 189.770447
Train Epoch: 1 [14080/60000 (23%)]      Loss: 173.119644
Train Epoch: 1 [15360/60000 (26%)]      Loss: 179.030197
Train Epoch: 1 [16640/60000 (28%)]      Loss: 170.247345
Train Epoch: 1 [17920/60000 (30%)]      Loss: 169.193451
Train Epoch: 1 [19200/60000 (32%)]      Loss: 162.828690
Train Epoch: 1 [20480/60000 (34%)]      Loss: 158.171326
Train Epoch: 1 [21760/60000 (36%)]      Loss: 158.530518
Train Epoch: 1 [23040/60000 (38%)]      Loss: 155.896255
Train Epoch: 1 [24320/60000 (41%)]      Loss: 158.835968
Train Epoch: 1 [25600/60000 (43%)]      Loss: 152.416977
Train Epoch: 1 [26880/60000 (45%)]      Loss: 153.593964
Train Epoch: 1 [28160/60000 (47%)]      Loss: 147.944260
Train Epoch: 1 [29440/60000 (49%)]      Loss: 148.223892
Train Epoch: 1 [30720/60000 (51%)]      Loss: 145.770905
Train Epoch: 1 [32000/60000 (53%)]      Loss: 144.410706
Train Epoch: 1 [33280/60000 (55%)]      Loss: 147.592163
Train Epoch: 1 [34560/60000 (58%)]      Loss: 149.320328

for anyone looking for a follow-up thread https://github.com/pytorch/pytorch/issues/1955

1 Like