I’ve encountered this weird problem while running Huggingface transformers’ BART model. My code executed just fine until it came across a batch of data, after which an error occurred:
CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())`
After that, whatever I put into CUDA, its value will become insanely large.
For example, if I declare a `torch.arange(1, 16, 1) and put it onto cuda, it will become as follows:
What’s more, the insanely large value will vary each time I print it
It seems that some memory leak problem occurred on CUDA, but what exactly will it be?
Has anybody come across the same problem? I have struggled with this for decades. I’ll appreciate it if someone helps me.
p.s.
The full traceback is as follows:
File "my_conda_env_path/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
File "my_conda_env_path/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "my_conda_env_path/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py", line 192, in forward
query_states = self.q_proj(hidden_states) * self.scaling
File "my_conda_env_path/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "my_conda_env_path/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py", line 331, in forward
hidden_states, attn_weights, _ = self.self_attn(
File "my_conda_env_path/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "my_conda_env_path/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py", line 856, in forward
layer_outputs = encoder_layer(
File "my_conda_env_path/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "my_conda_env_path/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py", line 1237, in forward
encoder_outputs = self.encoder(
File "my_conda_env_path/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "my_conda_env_path/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py", line 1373, in forward
outputs = self.model(
File "demos.py", line 43, in main
model_res = model.forward(**batch.to(device), output_hidden_states=True)
File "my_conda_env_path/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "demos.py", line 81, in <module>
main()
the batch of data which caused the insane problem is as follows:
{'input_ids': tensor([[ 0, 41552, 45692, ..., 1, 1, 1],
[ 0, 41552, 45692, ..., 1, 1, 1],
[ 0, 41552, 45692, ..., 1, 1, 1],
...,
[ 0, 41552, 45692, ..., 15698, 50264, 2],
[ 0, 41552, 45692, ..., 1, 1, 1],
[ 0, 41552, 45692, ..., 1, 1, 1]]), 'attention_mask': tensor([[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
...,
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0]]), 'labels': tensor([[ 0, 41552, 45692, ..., -100, -100, -100],
[ 0, 41552, 45692, ..., -100, -100, -100],
[ 0, 41552, 45692, ..., -100, -100, -100],
...,
[ 0, 41552, 45692, ..., 442, 479, 2],
[ 0, 41552, 45692, ..., -100, -100, -100],
[ 0, 41552, 45692, ..., -100, -100, -100]])}
To my best knowledge, neither of these input_ids
or labels
have indices out of my model’s Embedding
range (except for -100
which is used for skip id in labels
).