How to fix “CUDA error: device-side assert triggered” error?

I use huggingface Transformer to fine-tune a binary classification model. When I do inference job on big data. In rare case, it will trigger “CUDA error: device-side assert triggered” error, but when I debug the single wrong batch, it is strange that it can pass (both on GPU and CPU) , I don’t know why.

This error firstly trigger on

probs = probs.cpu().numpy()

and after that it will trigger on

input_ids = torch.tensor(batch['input_ids'], dtype=torch.long).to(device)

2021-11-18 20:18:41,251 - non_news_model.py[line:342] - ERROR: Traceback (most recent call last):
  File "/data/project/NonNewsInference/InferenceServices/non_news_model.py", line 307, in DoInference
    probs = probs.cpu().numpy()
RuntimeError: CUDA error: device-side assert triggered
, DoInference is dead!
2021-11-18 20:18:41,263 - non_news_model.py[line:345] - ERROR: Bad batch recorded!
2021-11-18 20:18:41,292 - non_news_model.py[line:342] - ERROR: Traceback (most recent call last):
  File "/data/project/NonNewsInference/InferenceServices/non_news_model.py", line 298, in DoInference
    dtype=torch.long).to(device)
RuntimeError: CUDA error: device-side assert triggered
, DoInference is dead!

Could anyone tell me how to solve this problem? Thanks!!

CUDA operations are executed asynchronously, so the stack trace might point to the wrong line of code. Rerun your script via CUDA_LAUNCH_BLOCKING=1 python script.py args and check the failing operation in the reported stack trace. Often these asserts are triggered by an invalid indexing operation.

6 Likes

Thank you @ptrblck, this error is occasionally triggered during model inference, not often. If I encounter this exception during large-scale data inference task, how can I accurately find the wrong batch of data? As your said, CUDA operations are asynchronously, if I catch the exception and log the bad batch, can I locate this wrong batch?

like this:

try:
	batch = inference_queue.get(block=True)
	with torch.no_grad():
		input_ids = torch.tensor(batch['input_ids'],
								 dtype=torch.long).to(device)
		attention_mask = torch.tensor(batch['attention_mask'],
									  dtype=torch.long).to(device)
		inputs = {
			"input_ids": input_ids,
			"attention_mask": attention_mask
		}
		logits = model(**inputs)[0]
		probs = softmax(logits)
	probs = probs.cpu().numpy()
except Exception as e:
	logger.error(f'{traceback.format_exc()}, DoInference is dead!')
	with open('./BadBatch.pkl', 'wb') as f:
		pickle.dump(batch, f)
	logger.error(f'Bad batch recorded!')

You could run the script with the aforementioned env variable, which would point to the operation raising the error.
Your approach could work, but note that once you are running into an assert the CUDA context might be corrupted and I don’t know if you would be able to store any additional data.

1 Like

Thank you! I solve this problem. Due to the my tokenizer output did not match model vocabulary size. :grinning:

3 Likes

could you describe what was your solution to that error? I am also facing CUDA error: device-side assert triggered error when running inference with yolov8’s tracking. Thank you.

Have the same error while trying to resume YOLOV8:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Rerun your script via:

CUDA_LAUNCH_BLOCKING=1 python script.py args

and check the failing operation in the reported stack trace as already mentioned in this topic.

2 Likes

thanks your comment made me to re-check tokenizer and model are from same repo

Run your code with CPU device, you will find the actual error.

3 Likes

I encountered the same error:
Steps which helped me to resolve the error:

  1. write these two lines in the first file being executed. It helped to show the true/exact line of code in the current file which was creating problem.
    import os
    os.environ[“CUDA_LAUNCH_BLOCKING”] = “1”

  2. For my case error was in line:
    channel_select_filtered_positive = all_filtered.view(-1)[indices.long()].view(1, height, width)
    change to
    channel_select_filtered_positive = all_filtered.view(-1)[indices.int()].view(1, height, width)
    resolved the error.

I would recommend understanding the fix in detail as it seems the transformation from long to int and thus also the corresponding numerical range reduction “fixed” the issue while it seems the clipping is a side effect.

I’m facing the same issue in my Transformer block’s forward pass in Llama3.
Using the CUDA debugging env I get the error line detected and in fact it was:

def forward(self, tokens: torch.Tensor):
        """Perform a forward pass through the Transformer model.

        Args:
            tokens (torch.Tensor): Input token indices.

        Returns:
            torch.Tensor: Output logits after applying the Transformer model.

        """
        
       # ERROR RAISES HERE 
       # passthrough for nonexistent layers, allows easy configuration of pipeline parallel stages
        h = self.tok_embeddings(tokens) if self.tok_embeddings else tokens

        for layer in self.layers.values():
            h = layer(h, self.freqs_cis)

        h = self.norm(h) if self.norm else h
        return self.output(h).float() if self.output else h

How did you fixed it?

It seems that when using any tokenizer from Hugging Face, it must be initialized and run on a CPU device. Running it directly on a GPU device might cause device-side assertion errors.

To potentially resolve this issue, I would suggest modifying your code as follows:
Please change your code as below

input_ids = torch.tensor(batch['input_ids'], dtype=torch.long).to(device)

To

input_ids = torch.tensor(batch['input_ids'], dtype=torch.long)

This might solve the problem.