CUDA exception: "Event device type CUDA does not match blocking stream's device type CPU"

Hi,

I ran into an exception while setting up CUDA.

GPU: Geforce RTX 3090 Ti
CUDA ver.: 11.6
cuDNN ver.: 8.5
torch ver.: 1.12.1+cu116

Exception: ================================================================

Traceback (most recent call last):
  File "C:\Program Files\Python39\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 9, in <module>
  File "D:\Projects\PycharmProjects\scat\scat\analysis\neural_net\torch_wrapper.py", line 97, in train
    loss.backward()
  File "D:\Projects\PycharmProjects\venv\scat-venv\lib\site-packages\torch\_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "D:\Projects\PycharmProjects\venv\scat-venv\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Event device type CUDA does not match blocking stream's device type CPU.

=========================================================================

A simple test code like

a = torch.FloatTensor([1, 2, 3, 4, 5]).to('cuda:0')
b = torch.FloatTensor([11, 22, 33, 44, 55]).to('cuda:0')
c = a + b
print(c)

is run without any exceptions.

But, only loss.backward() ran into exception in my MLP code.

How can I solve this problem?

Could you post a minimal, executable code snippet to reproduce the issue, please?

Here is my code.

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from sklearn.model_selection import train_test_split


class MLP(nn.Module):
    def __init__(self, samples: int) -> None:
        super(MLP, self).__init__()
        self.no_samples = samples
        self.fc1 = nn.Linear(self.no_samples, 300, bias=True)
        self.dropout1 = nn.Dropout(0.2)
        self.fc2 = nn.Linear(300, 300, bias=True)
        self.dropout2 = nn.Dropout(0.2)
        self.fc3 = nn.Linear(300, 256, bias=True)
        pass

    def forward(self, x):
        x = self.fc1(x)
        x = self.dropout1(x)
        x = torch.sigmoid(self.fc2(x))
        x = self.dropout2(x)
        x = torch.sigmoid(self.fc3(x))
        return x
    pass


class TensorData(Dataset):
    def __init__(self, x_data, y_data):
        self.x_data = torch.FloatTensor(x_data)
        self.y_data = torch.LongTensor(y_data)
        self.len = self.y_data.shape[0]

    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    def __len__(self):
        return self.len

    def __add__(self, other):
        return super().__add__(other)


#  skip
# x = np.load(...)
# y = np.load(...)

x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.7, random_state=123)
# x_train, x_val, y_train, y_val = train_test_split(x_data, y_data, test_size=0.7, random_state=123)
train_loader = DataLoader(TensorData(x_train, y_train), batch_size=100, shuffle=False)
val_loader = DataLoader(TensorData(x_val, y_val), batch_size=100, shuffle=False)

# device = 'cpu'
device = 'cuda:0'

model = MLP(1500).to(device)
optimizer = torch.optim.Adam(model.parameters(), 0.001)

for epoch in range(100):
    model.train()
    batch_loss = []
    for i, (train_batch_x, train_batch_y) in enumerate(train_loader):
        optimizer.zero_grad()
        prediction = model(train_batch_x.to(device))
        loss = torch.nn.CrossEntropyLoss()(prediction, train_batch_y.to(device))
        loss.backward()   # Error
        optimizer.step()
        batch_loss.append(loss.item())
    model.eval()
    # eval..
    print(epoch)
    pass

and exception is as below.

Traceback (most recent call last):
  File "C:\Users\daehyeon\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\221.6008.17\plugins\python\helpers\pydev\pydevd.py", line 1491, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Users\daehyeon\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\221.6008.17\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/Projects/PycharmProjects/scat/pytorch cuda test.py", line 72, in <module>
    loss.backward()   # Error
  File "D:\Projects\PycharmProjects\venv\scat-venv\lib\site-packages\torch\_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "D:\Projects\PycharmProjects\venv\scat-venv\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Event device type CUDA does not match blocking stream's device type CPU.

Thank you.

Thanks for the update. I cannot reproduce the issue and the code runs correctly using random input data:

x_train, x_val, y_train, y_val = np.random.randn(10, 1500), np.random.randn(10, 1500), np.random.randint(0, 10, (10,)), np.random.randint(0, 10, (10,))

and the latest stable PyTorch release (1.12.1).

After replacing input data as below, an exception has occurred at the same point (loss.backward()).

x_train, x_val, y_train, y_val = np.random.randn(10, 1500), np.random.randn(10, 1500), np.random.randint(0, 10, (10,)), np.random.randint(0, 10, (10,))

I have encountered the same problem when loss.back(). Have you solved your problem?If it solved,could you share how to solve? Very thanks.

I also encountered exactly the same problem only when call loss.backward(). I am using A100 and Pytorch 2.1 and CUDA version 12.1, but I started to see this problem since Pytorch 1.11, no issue for Pytorch 1.10 at all