CUDA RuntimeError: CUDA error: an illegal memory access was encountered

Hi, I am using colab, pytorch version 1.6.0+cu101. When I try to allocate a model or a parameter to GPU I get the error below. I have tried to reproduce several tips/corrections that were listed in the forum but no one has worked so far…any ideas?
Thanks!

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
g = nn.Parameter(torch.rand((100,128)), requires_grad=False)
if device.type=='cuda':
  g = g.to(device)
  print(next(g.parameters()).is_cuda) # returns a boolean

Here is the error message:

Using device: cuda
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-35-f9356a863a45> in <module>()
      4 g = nn.Parameter(torch.rand((100,128)), requires_grad=False)
      5 if device.type=='cuda':
----> 6   g = g.to(device)
      7   print(next(g.parameters()).is_cuda) # returns a boolean

RuntimeError: CUDA error: an illegal memory access was encountered

Have you tried to run the code with CUDA_LAUNCH_BLOCKING=1 python script.py args?
If so, could you post the stack trace here, please?

for some reason, after restarting the colab, I get a new error:

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Do you get this error message with the blocking command?
If so, could you post the complete stack trace?

This is what I am trying to run now. Without .cuda() it runs OK.

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.distributions import Uniform, Normal
import torch.utils.data as tdata
import torch.optim as optim
import numpy as np


class DepthToSpace(nn.Module):
    def __init__(self, block_size):
        super().__init__()
        self.block_size = block_size
        self.block_size_sq = block_size * block_size
 
    def forward(self, input):
        output = input.permute(0, 2, 3, 1)
        (batch_size, d_height, d_width, d_depth) = output.size()
        s_depth = int(d_depth / self.block_size_sq)
        s_width = int(d_width * self.block_size)
        s_height = int(d_height * self.block_size)
        t_1 = output.reshape(batch_size, d_height, d_width, self.block_size_sq, s_depth)
        spl = t_1.split(self.block_size, 3)
        stack = [t_t.reshape(batch_size, d_height, s_width, s_depth) for t_t in spl]
        output = torch.stack(stack, 0).transpose(0, 1).permute(0, 2, 1, 3, 4).reshape(batch_size, s_height, s_width,
                                                                                      s_depth)
        output = output.permute(0, 3, 1, 2)
        return output

# Spatial Upsampling with Nearest Neighbors
class Upsample_Conv2d(nn.Module):
  def __init__(self, in_dim, out_dim, kernel_size=(3, 3), stride=1, padding=1):
    super(Upsample_Conv2d, self).__init__()
    self.depth_to_space = DepthToSpace(block_size=2)
    self.conv2d = nn.Conv2d(in_dim, out_dim, kernel_size, stride=stride, 
                            padding=padding)

  def forward(self, x):
    x_ = torch.cat([x, x, x, x], dim=1)
    x_ = self.depth_to_space(x_)
    x_ = self.conv2d(x_)
    return x_


class ReshapeGan(nn.Module):
  def __init__(self, out_shape):
    super(ReshapeGan, self).__init__()
    self.out_shape = out_shape
  
  def forward(self, x):
    b = x.shape[0]
    return x.view(b, *self.out_shape)

class Generator(nn.Module):
  def __init__(self, n_filters=128):
    super(Generator, self).__init__()
    self.n_filters = n_filters
    self.latent = torch.distributions.Normal(torch.tensor(0.), torch.tensor(1.))
    self.network = nn.Sequential(
        nn.Linear(n_filters, 4*4*256),
        ReshapeGan((256,4,4)),
        ResnetBlockUp(in_dim=256, n_filters=n_filters),
        ResnetBlockUp(in_dim=n_filters, n_filters=n_filters),
        ResnetBlockUp(in_dim=n_filters, n_filters=n_filters),
        nn.BatchNorm2d(n_filters),
        nn.ReLU(),
        nn.Conv2d(n_filters, 3, kernel_size=(3, 3), padding=1),
        nn.Tanh()
    )

  def forward(self, z):
    return self.network(z)

  def sample(self, n):
    z = self.latent.sample([n, self.n_filters])
    return self.forward(z)

CUDA_LAUNCH_BLOCKING=1
g = Generator(128)
g = g.cuda() # HERE IT CRASHES
z = g.sample(10)
print(z.shape)

The code is not executable, since ResnetBlockUp is undefined.

Ups, sorry:

class ResnetBlockUp(nn.Module):
  def __init__(self, in_dim, kernel_size=(3, 3), n_filters=256):
    super(ResnetBlockUp, self).__init__()
    self.layers = nn.ModuleList(
        [nn.BatchNorm2d(in_dim), 
         nn.ReLU(),
         nn.Conv2d(in_dim, n_filters, kernel_size, padding=1),
         nn.BatchNorm2d(n_filters), 
         nn.ReLU(),
         Upsample_Conv2d(n_filters, n_filters, kernel_size=kernel_size, padding=1)])
    self.shortcut = Upsample_Conv2d(in_dim, n_filters, 
                                    kernel_size=(1, 1), padding=0)

  def forward(self, x):
    x_ = x
    for layer in self.layers:
      x_ = layer(x_)
    x = self.shortcut(x)
    return x + x_

The strange thing is that it crashes with error code:

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

right after resetting the colab at the g.sample(10) line. Then at .cuda() line I get the illegal access error after retrying to run without resetting the colab…

Thanks for the rest of the code.
While running it I get a proper error message:

RuntimeError: Tensor for argument #2 'mat1' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

which points to a device mismatch in Generator.
After checking the code it seems that z is created on the CPU, while the parameters of the model are on the GPU.
Use

z = self.latent.sample([n, self.n_filters]).cuda()

or

z = self.latent.sample([n, self.n_filters]).to(next(self.parameters()).device)

to fix this issue.

1 Like

Amazing help! Thank you very much, I have just missed the gpu data load.
BTW, what should I have done in order to get the error message?
Again, thanks!

It shouldn’t be the case, but maybe the notebook is somehow raising the “wrong” error message? :confused:
In any way, updating to the latest stable release (1.7) or the nightly is also a good idea and might help.

can you help me??
I’m trying to write a code about disease diagnosis through retinal images. I try different codes but I get errors separately from all of them

first code

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import torch.optim as optim
import numpy as np
train_transforms = transforms.Compose([transforms.Grayscale(),
transforms.Resize(255),
transforms.RandomRotation(38),
transforms.RandomResizedCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.5],[0.5])])
test_transforms=transforms.Compose([transforms.Grayscale(),
transforms.Resize(255),
transforms.CenterCrop(244),
transforms.ToTensor()])
train_dataset=torchvision.datasets.ImageFolder(root=‘C:/OCT2017/train’, transform=train_transforms)
test_dataset=torchvision.datasets.ImageFolder(root=‘C:/OCT2017/test’, transform=test_transforms)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = 100, shuffle = False)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = 100, shuffle = False)
class NN(nn.Module):
def init(self, input_size, num_classes):
super(NN.self).init()
self.fc1=nn.Linear(input_size,50)
def forward(self,x):
x=F.relu(self.fc1(x))
return x

class OCTModel(nn.Module):
def init(self,in_channels=1, num_classes =4):
super(OCTModel, self).init()
# Convolution 1
self.conv1= nn.Conv2d(in_channels=1, out_channels=32, kernel_size=(3,3), stride=(1))
self.relu1 = nn.ReLU()
self.pool = nn.MaxPool2d(kernel_size=2)
self.dropout1=nn.Dropout(p=0.2)
self.relu2=nn.ReLU()
self.fc1 = nn.Linear(256,num_classes)

def forward(self, x):
    # Convolution 1
    x = self.conv1(x)
    x = self.relu1(x)
    x = self.pool(x)
    x = self.dropout1(x)
    x = self.relu2(x)
    x = x.reshape(x.shape[0], -1)
    x = self.fc1(x)
    
    return x

device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)
in_channels=1
num_classes=4
learning_rate=0.001
batch_size=10
num_epochs=1
model = OCTModel().to(device)
Criterion=nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range (num_epochs):
for batch_idx, (data,targets) in enumerate(train_loader,0):
data=data.to(device=device)
targets=targets.to(device=device)
scores=model(data)
loss = Criterion(scores,targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def check_accuracy (loader, model):
if loader.datasets.train:
print (“Checking accuracy on training data”)
else:
print(“Checking accuary on test data”)
num_correct=0
num_samples=0
model.eval()
with torch.no_grad():
for x, y in loader:
x=x.to(device=device)
y=y.to(device=device)
scores=model(x)
_, predictions = scores.max(1)
num_correct += (predictions == y).sum()
num_samples += predictions.size(0)
print(f’Got {num_correct} / {num_samples} with accuracy {float(num_correct)/float(num_samples)*100:.2f}’)
model.train()
check_accuracy(train_loader,model)
check_accuracy(test_loader,model)

RuntimeError: CUDA out of memory. Tried to allocate 152.00 MiB (GPU 0; 2.00 GiB total capacity; 1.20 GiB already allocated; 121.74 MiB free; 9.86 MiB cached)

second code

class OCTModel(nn.Module):
def init(self,in_channels=1, num_classes =4):
super(OCTModel, self).init()
# Convolution 1
self.conv1= nn.Conv2d(in_channels=1, out_channels=32, kernel_size=(3,3), stride=(1))
self.relu1 = nn.ReLU()
self.pool = nn.MaxPool2d(kernel_size=2)
self.dropout1=nn.Dropout(p=0.2)
self.relu2=nn.ReLU()
self.fc1 = nn.Linear(256,num_classes)

def forward(self, x):
    # Convolution 1
    x = self.conv1(x)
    x = self.relu1(x)
    x = self.pool(x)
    x = self.dropout1(x)
    x = self.relu2(x)
    x = self.fc1(x)
    
    return x

device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)
input_size=784
num_classes = 4
learning_rate=0.001
batch_size=32
num_epochs=1
train_transforms = transforms.Compose([transforms.Grayscale(),
transforms.Resize(255),
transforms.RandomRotation(38),
transforms.RandomResizedCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.5],[0.5])])
test_transforms=transforms.Compose([transforms.Grayscale(),
transforms.Resize(255),
transforms.CenterCrop(244),
transforms.ToTensor()])
train_dataset=torchvision.datasets.ImageFolder(root=‘C:/OCT2017/train’, transform=train_transforms)
test_dataset=torchvision.datasets.ImageFolder(root=‘C:/OCT2017/test’, transform=test_transforms)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = 32, shuffle = False)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = 32, shuffle = False)
model = OCTModel().to(device)
Criterion=nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range (num_epochs):
for batch_idx, (data,targets) in enumerate(train_loader,0):
data=data.to(device=device)
targets=targets.to(device=device)
data = data.reshape(data.shape[0],-1)
scores=model(data)
loss = Criterion(scores,targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def check_accuracy (loader, model):
if loader.datasets.train:
print (“Checking accuracy on training data”)
else:
print(“Checking accuary on test data”)
num_correct=0
num_samples=0
model.eval()
with torch.no_grad():
for x, y in loader:
x=x.to(device=device)
y=y.to(device=device)
x=x.reshape(x.shape[0],-1)
scores=model(x.unsqueeze(dim=1))
_, predictions = scores.max(1)
num_correct += (predictions == y).sum()
num_samples += predictions.size(0)
print(f’Got {num_correct} / {num_samples} with accuracy {float(num_correct)/float(num_samples)*100:.2f}’)
model.train()
check_accuracy(train_loader,model)
check_accuracy(test_loader,model)

RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1, but got 2-dimensional input of size [32, 50176] instead

The error message is:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1, but got 2-dimensional input of size [32, 50176] instead

which doesn’t point towards an illegal memory access, but a shape mismatch.
Most likely you are trying to pass a 2-dimensional input to e.g. an nn.Conv2d layer, which expects an input in the shape [batch_size, channels, height, width].
Since this error is unrelated, feel free to create a new topic, if you get stuck.

PS: you can post code snippets by wrapping them into three backticks ```, which would make debugging easier. :wink:

The database has four categories of gray level retinal images. I guess I’m having a mistake here. I’m very sorry, but I don’t quite understand what to do. :pensive:

[image]

Your input images should have the mentioned shape ([batch_size, channels, height, width]), while they are 2-dimensional tensors.
Have a look at this tutorial for an example on how to build a model and how the data tensors should be created.

1 Like