I have this error in running this code.Anyone can help me?
Traceback (most recent call last):
File “d:\VsCode Project\nlp-tutorial-master\5-2.BERT\BERT-Torch.py”, line 229, in
logits_lm, logits_clsf = model(input_ids.to(device), segment_ids.to(device), masked_pos.to(device))
File “C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “d:\VsCode Project\nlp-tutorial-master\5-2.BERT\BERT-Torch.py”, line 202, in forward
output, enc_self_attn = layer(output.to(device), enc_self_attn_mask.to(device))
File “C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “d:\VsCode Project\nlp-tutorial-master\5-2.BERT\BERT-Torch.py”, line 177, in forward
enc_outputs = self.pos_ffn(enc_outputs) # enc_outputs: [batch_size x len_q x d_model]
File “C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “d:\VsCode Project\nlp-tutorial-master\5-2.BERT\BERT-Torch.py”, line 166, in forward
return self.fc2(gelu(self.fc1(x)))
File “d:\VsCode Project\nlp-tutorial-master\5-2.BERT\BERT-Torch.py”, line 102, in gelu
return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0)))
RuntimeError: CUDA error: an illegal memory access was encountered
Is your code running fine on the CPU?
This would probably give you a better error message than the illegal memory access.
Also, if you are using PyTorch 1.5.0
, I would strongly recommend to update to 1.5.1
, as 1.5.0
has a bug, where internal CUDA assert statements are skipped and will not raise a proper error but will run into other potentially issues (such as an illegal memory access).
If the code is running fine on the CPU and you are already using 1.5.1
, rerun the code with
CUDA_LAUNCH_BLOCKING=1 python script.py args
and post the stack trace here, please.
I also ran into this issue. And found the solution here,
Apparently the latest cudnn v8 has it fixed.