CUDA RuntimeError: CUDA error: an illegal memory access was encountered

mbacher · November 2, 2020, 6:02am

Hi, I am using colab, pytorch version 1.6.0+cu101. When I try to allocate a model or a parameter to GPU I get the error below. I have tried to reproduce several tips/corrections that were listed in the forum but no one has worked so far…any ideas?
Thanks!

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
g = nn.Parameter(torch.rand((100,128)), requires_grad=False)
if device.type=='cuda':
  g = g.to(device)
  print(next(g.parameters()).is_cuda) # returns a boolean

Here is the error message:

Using device: cuda
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-35-f9356a863a45> in <module>()
      4 g = nn.Parameter(torch.rand((100,128)), requires_grad=False)
      5 if device.type=='cuda':
----> 6   g = g.to(device)
      7   print(next(g.parameters()).is_cuda) # returns a boolean

RuntimeError: CUDA error: an illegal memory access was encountered

ptrblck · November 2, 2020, 6:53am

Have you tried to run the code with CUDA_LAUNCH_BLOCKING=1 python script.py args?
If so, could you post the stack trace here, please?

mbacher · November 2, 2020, 8:48am

for some reason, after restarting the colab, I get a new error:

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

ptrblck · November 2, 2020, 9:07am

Do you get this error message with the blocking command?
If so, could you post the complete stack trace?

mbacher · November 2, 2020, 9:18am

This is what I am trying to run now. Without .cuda() it runs OK.

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.distributions import Uniform, Normal
import torch.utils.data as tdata
import torch.optim as optim
import numpy as np


class DepthToSpace(nn.Module):
    def __init__(self, block_size):
        super().__init__()
        self.block_size = block_size
        self.block_size_sq = block_size * block_size
 
    def forward(self, input):
        output = input.permute(0, 2, 3, 1)
        (batch_size, d_height, d_width, d_depth) = output.size()
        s_depth = int(d_depth / self.block_size_sq)
        s_width = int(d_width * self.block_size)
        s_height = int(d_height * self.block_size)
        t_1 = output.reshape(batch_size, d_height, d_width, self.block_size_sq, s_depth)
        spl = t_1.split(self.block_size, 3)
        stack = [t_t.reshape(batch_size, d_height, s_width, s_depth) for t_t in spl]
        output = torch.stack(stack, 0).transpose(0, 1).permute(0, 2, 1, 3, 4).reshape(batch_size, s_height, s_width,
                                                                                      s_depth)
        output = output.permute(0, 3, 1, 2)
        return output

# Spatial Upsampling with Nearest Neighbors
class Upsample_Conv2d(nn.Module):
  def __init__(self, in_dim, out_dim, kernel_size=(3, 3), stride=1, padding=1):
    super(Upsample_Conv2d, self).__init__()
    self.depth_to_space = DepthToSpace(block_size=2)
    self.conv2d = nn.Conv2d(in_dim, out_dim, kernel_size, stride=stride, 
                            padding=padding)

  def forward(self, x):
    x_ = torch.cat([x, x, x, x], dim=1)
    x_ = self.depth_to_space(x_)
    x_ = self.conv2d(x_)
    return x_


class ReshapeGan(nn.Module):
  def __init__(self, out_shape):
    super(ReshapeGan, self).__init__()
    self.out_shape = out_shape
  
  def forward(self, x):
    b = x.shape[0]
    return x.view(b, *self.out_shape)

class Generator(nn.Module):
  def __init__(self, n_filters=128):
    super(Generator, self).__init__()
    self.n_filters = n_filters
    self.latent = torch.distributions.Normal(torch.tensor(0.), torch.tensor(1.))
    self.network = nn.Sequential(
        nn.Linear(n_filters, 4*4*256),
        ReshapeGan((256,4,4)),
        ResnetBlockUp(in_dim=256, n_filters=n_filters),
        ResnetBlockUp(in_dim=n_filters, n_filters=n_filters),
        ResnetBlockUp(in_dim=n_filters, n_filters=n_filters),
        nn.BatchNorm2d(n_filters),
        nn.ReLU(),
        nn.Conv2d(n_filters, 3, kernel_size=(3, 3), padding=1),
        nn.Tanh()
    )

  def forward(self, z):
    return self.network(z)

  def sample(self, n):
    z = self.latent.sample([n, self.n_filters])
    return self.forward(z)

CUDA_LAUNCH_BLOCKING=1
g = Generator(128)
g = g.cuda() # HERE IT CRASHES
z = g.sample(10)
print(z.shape)

ptrblck · November 2, 2020, 9:19am

The code is not executable, since ResnetBlockUp is undefined.

mbacher · November 2, 2020, 9:22am

Ups, sorry:

class ResnetBlockUp(nn.Module):
  def __init__(self, in_dim, kernel_size=(3, 3), n_filters=256):
    super(ResnetBlockUp, self).__init__()
    self.layers = nn.ModuleList(
        [nn.BatchNorm2d(in_dim), 
         nn.ReLU(),
         nn.Conv2d(in_dim, n_filters, kernel_size, padding=1),
         nn.BatchNorm2d(n_filters), 
         nn.ReLU(),
         Upsample_Conv2d(n_filters, n_filters, kernel_size=kernel_size, padding=1)])
    self.shortcut = Upsample_Conv2d(in_dim, n_filters, 
                                    kernel_size=(1, 1), padding=0)

  def forward(self, x):
    x_ = x
    for layer in self.layers:
      x_ = layer(x_)
    x = self.shortcut(x)
    return x + x_

The strange thing is that it crashes with error code:

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

right after resetting the colab at the g.sample(10) line. Then at .cuda() line I get the illegal access error after retrying to run without resetting the colab…

ptrblck · November 2, 2020, 9:50am

Thanks for the rest of the code.
While running it I get a proper error message:

RuntimeError: Tensor for argument #2 'mat1' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

which points to a device mismatch in Generator.
After checking the code it seems that z is created on the CPU, while the parameters of the model are on the GPU.
Use

z = self.latent.sample([n, self.n_filters]).cuda()

or

z = self.latent.sample([n, self.n_filters]).to(next(self.parameters()).device)

to fix this issue.

mbacher · November 2, 2020, 12:09pm

Amazing help! Thank you very much, I have just missed the gpu data load.
BTW, what should I have done in order to get the error message?
Again, thanks!

ptrblck · November 2, 2020, 7:14pm

It shouldn’t be the case, but maybe the notebook is somehow raising the “wrong” error message?
In any way, updating to the latest stable release (1.7) or the nightly is also a good idea and might help.

pelin_tas · November 4, 2020, 7:21pm

can you help me??
I’m trying to write a code about disease diagnosis through retinal images. I try different codes but I get errors separately from all of them

first code

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import torch.optim as optim
import numpy as np
train_transforms = transforms.Compose([transforms.Grayscale(),
transforms.Resize(255),
transforms.RandomRotation(38),
transforms.RandomResizedCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.5],[0.5])])
test_transforms=transforms.Compose([transforms.Grayscale(),
transforms.Resize(255),
transforms.CenterCrop(244),
transforms.ToTensor()])
train_dataset=torchvision.datasets.ImageFolder(root=‘C:/OCT2017/train’, transform=train_transforms)
test_dataset=torchvision.datasets.ImageFolder(root=‘C:/OCT2017/test’, transform=test_transforms)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = 100, shuffle = False)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = 100, shuffle = False)
class NN(nn.Module):
def init(self, input_size, num_classes):
super(NN.self).init()
self.fc1=nn.Linear(input_size,50)
def forward(self,x):
x=F.relu(self.fc1(x))
return x

class OCTModel(nn.Module):
def init(self,in_channels=1, num_classes =4):
super(OCTModel, self).init()
# Convolution 1
self.conv1= nn.Conv2d(in_channels=1, out_channels=32, kernel_size=(3,3), stride=(1))
self.relu1 = nn.ReLU()
self.pool = nn.MaxPool2d(kernel_size=2)
self.dropout1=nn.Dropout(p=0.2)
self.relu2=nn.ReLU()
self.fc1 = nn.Linear(256,num_classes)

def forward(self, x):
    # Convolution 1
    x = self.conv1(x)
    x = self.relu1(x)
    x = self.pool(x)
    x = self.dropout1(x)
    x = self.relu2(x)
    x = x.reshape(x.shape[0], -1)
    x = self.fc1(x)
    
    return x

device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)
in_channels=1
num_classes=4
learning_rate=0.001
batch_size=10
num_epochs=1
model = OCTModel().to(device)
Criterion=nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range (num_epochs):
for batch_idx, (data,targets) in enumerate(train_loader,0):
data=data.to(device=device)
targets=targets.to(device=device)
scores=model(data)
loss = Criterion(scores,targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def check_accuracy (loader, model):
if loader.datasets.train:
print (“Checking accuracy on training data”)
else:
print(“Checking accuary on test data”)
num_correct=0
num_samples=0
model.eval()
with torch.no_grad():
for x, y in loader:
x=x.to(device=device)
y=y.to(device=device)
scores=model(x)
_, predictions = scores.max(1)
num_correct += (predictions == y).sum()
num_samples += predictions.size(0)
print(f’Got {num_correct} / {num_samples} with accuracy {float(num_correct)/float(num_samples)*100:.2f}’)
model.train()
check_accuracy(train_loader,model)
check_accuracy(test_loader,model)

RuntimeError: CUDA out of memory. Tried to allocate 152.00 MiB (GPU 0; 2.00 GiB total capacity; 1.20 GiB already allocated; 121.74 MiB free; 9.86 MiB cached)

second code

class OCTModel(nn.Module):
def init(self,in_channels=1, num_classes =4):
super(OCTModel, self).init()
# Convolution 1
self.conv1= nn.Conv2d(in_channels=1, out_channels=32, kernel_size=(3,3), stride=(1))
self.relu1 = nn.ReLU()
self.pool = nn.MaxPool2d(kernel_size=2)
self.dropout1=nn.Dropout(p=0.2)
self.relu2=nn.ReLU()
self.fc1 = nn.Linear(256,num_classes)

def forward(self, x):
    # Convolution 1
    x = self.conv1(x)
    x = self.relu1(x)
    x = self.pool(x)
    x = self.dropout1(x)
    x = self.relu2(x)
    x = self.fc1(x)
    
    return x

device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)
input_size=784
num_classes = 4
learning_rate=0.001
batch_size=32
num_epochs=1
train_transforms = transforms.Compose([transforms.Grayscale(),
transforms.Resize(255),
transforms.RandomRotation(38),
transforms.RandomResizedCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.5],[0.5])])
test_transforms=transforms.Compose([transforms.Grayscale(),
transforms.Resize(255),
transforms.CenterCrop(244),
transforms.ToTensor()])
train_dataset=torchvision.datasets.ImageFolder(root=‘C:/OCT2017/train’, transform=train_transforms)
test_dataset=torchvision.datasets.ImageFolder(root=‘C:/OCT2017/test’, transform=test_transforms)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = 32, shuffle = False)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = 32, shuffle = False)
model = OCTModel().to(device)
Criterion=nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range (num_epochs):
for batch_idx, (data,targets) in enumerate(train_loader,0):
data=data.to(device=device)
targets=targets.to(device=device)
data = data.reshape(data.shape[0],-1)
scores=model(data)
loss = Criterion(scores,targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def check_accuracy (loader, model):
if loader.datasets.train:
print (“Checking accuracy on training data”)
else:
print(“Checking accuary on test data”)
num_correct=0
num_samples=0
model.eval()
with torch.no_grad():
for x, y in loader:
x=x.to(device=device)
y=y.to(device=device)
x=x.reshape(x.shape[0],-1)
scores=model(x.unsqueeze(dim=1))
_, predictions = scores.max(1)
num_correct += (predictions == y).sum()
num_samples += predictions.size(0)
print(f’Got {num_correct} / {num_samples} with accuracy {float(num_correct)/float(num_samples)*100:.2f}’)
model.train()
check_accuracy(train_loader,model)
check_accuracy(test_loader,model)

RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1, but got 2-dimensional input of size [32, 50176] instead

ptrblck · November 5, 2020, 2:50am

The error message is:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1, but got 2-dimensional input of size [32, 50176] instead

which doesn’t point towards an illegal memory access, but a shape mismatch.
Most likely you are trying to pass a 2-dimensional input to e.g. an nn.Conv2d layer, which expects an input in the shape [batch_size, channels, height, width].
Since this error is unrelated, feel free to create a new topic, if you get stuck.

PS: you can post code snippets by wrapping them into three backticks ```, which would make debugging easier.

pelin_tas · November 5, 2020, 11:52am

The database has four categories of gray level retinal images. I guess I’m having a mistake here. I’m very sorry, but I don’t quite understand what to do.

[image]

ptrblck · November 5, 2020, 7:19pm

Your input images should have the mentioned shape ([batch_size, channels, height, width]), while they are 2-dimensional tensors.
Have a look at this tutorial for an example on how to build a model and how the data tensors should be created.