Device-side assert triggered,can not find the solution

huihuiM · August 9, 2019, 8:52am

When I try to calcuate the loss value of the expected value and the label value,I got the exception, even though there are some simliar topic, but I still cannot found the solution.
Can you help me?

File “/usr/local/lib/python3.6/site-packages/torch/tensor.py”, line 71, in repr
return torch._tensor_str._str(self)
File “/usr/local/lib/python3.6/site-packages/torch/_tensor_str.py”, line 286, in _str
tensor_str = _tensor_str(self, indent)
File “/usr/local/lib/python3.6/site-packages/torch/_tensor_str.py”, line 201, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File “/usr/local/lib/python3.6/site-packages/torch/_tensor_str.py”, line 87, in init
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/THCReduceAll.cuh:327

My code is

for step,(x,y1) in enumerate(train_loader):
    
    x_data, y_model = x.to(device), y1.to(device)

    optimizer.zero_grad()

    output_model = model(x_data, h0)

    print("output_model.shape:", output_model.size())
    print("y_model.shape:", y_model.size())

    print("output_model:", output_model)
    print("y_model:", y_model)
    
    loss_model = loss_func(output_model, y_model)
    print("loss_model:", loss_model)    # the error occured in this line.

the printed result is:
output_model.shape: torch.Size([64, 2])
y_model.shape: torch.Size([64])

output_model: tensor([[-0.9913, -0.4638],
[-0.9701, -0.4765],
[-1.0105, -0.4526],
[-0.9497, -0.4891],
[-1.0548, -0.4281],
[-1.1764, -0.3687],
[-0.9274, -0.5035],
[-0.9197, -0.5086],…
[-0.9032, -0.5197]], device=‘cuda:0’, grad_fn=)]

y_model: tensor([3, 3, 2, 1, 0, 3, 2, 2, 1, 3, 1, 3, 1, 1, 1, 1, 0, 3, 1, 1, 0, 1, 2, 2,
2, 1, 0, 1, 0, 2, 2, 2, 0, 3, 1, 1, 1, 1, 3, 2, 2, 2, 1, 1, 1, 3, 3, 0,
2, 2, 2, 3, 3, 0, 0, 2, 3, 0, 3, 2, 2, 3, 3, 2], device=‘cuda:0’)

Oli · August 9, 2019, 9:02am

I don’t know why the crash is happening but your model outputs and labels aren’t the same shape. Are you trying to do binary classification? Then I suspect that you want to change the model to only have one output instead of two.

huihuiM · August 10, 2019, 1:31am

yes, the problem is the binary classification. But I have not do the one hot encoding for the output model.
the last operation of the model is:

F.log_softmax(pred_model)

Is there some problem?

huihuiM · August 10, 2019, 2:42am

Thank you.
I check the model outputs and labels again.
And the problem is that there are not the same shape.
I have fixed the bug.
Thanks