Hello,I have this problem in running code！

Bcw_93 · July 10, 2020, 12:00pm

I have this error in running this code.Anyone can help me?
Traceback (most recent call last):
File “d:\VsCode Project\nlp-tutorial-master\5-2.BERT\BERT-Torch.py”, line 229, in
logits_lm, logits_clsf = model(input_ids.to(device), segment_ids.to(device), masked_pos.to(device))
File “C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “d:\VsCode Project\nlp-tutorial-master\5-2.BERT\BERT-Torch.py”, line 202, in forward
output, enc_self_attn = layer(output.to(device), enc_self_attn_mask.to(device))
File “C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “d:\VsCode Project\nlp-tutorial-master\5-2.BERT\BERT-Torch.py”, line 177, in forward
enc_outputs = self.pos_ffn(enc_outputs) # enc_outputs: [batch_size x len_q x d_model]
File “C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “d:\VsCode Project\nlp-tutorial-master\5-2.BERT\BERT-Torch.py”, line 166, in forward
return self.fc2(gelu(self.fc1(x)))
File “d:\VsCode Project\nlp-tutorial-master\5-2.BERT\BERT-Torch.py”, line 102, in gelu
return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0)))
RuntimeError: CUDA error: an illegal memory access was encountered

ptrblck · July 12, 2020, 2:44am

Is your code running fine on the CPU?
This would probably give you a better error message than the illegal memory access.

Also, if you are using PyTorch 1.5.0, I would strongly recommend to update to 1.5.1, as 1.5.0 has a bug, where internal CUDA assert statements are skipped and will not raise a proper error but will run into other potentially issues (such as an illegal memory access).

If the code is running fine on the CPU and you are already using 1.5.1, rerun the code with

CUDA_LAUNCH_BLOCKING=1 python script.py args

and post the stack trace here, please.

Sagnik_Mukherjee · July 25, 2020, 7:56am

I also ran into this issue. And found the solution here,

Apparently the latest cudnn v8 has it fixed.