CUDA error: an illegal memory access (when fine-tuning GroupNorm)

I’m trying to fine-tune Resnet18, but replaced BatchNorm with GroupNorm. It works on CPU, it works when I don’t unfreeze the GroupNorm, it works with BatchNorm… but with GroupNorm unfrozen, it always fails with RuntimeError: CUDA error: an illegal memory access was encountered.

Here is a minimal code that reproduces the issue:

## Repro CUDA issue
import torch
from torchvision.models import resnet18

m = resnet18(pretrained=True)

def replace_bn(m):
    for name, child in m.named_children():
        if type(child) == torch.nn.BatchNorm2d:
            setattr(m, name, torch.nn.GroupNorm(num_groups=1, num_channels=child.num_features))
        replace_bn(child)

replace_bn(m)

# Freeze the model, except GroupNorm
for p in m.parameters():
    p.requires_grad = False

def unfreeze_norm_layers(m):
    if type(m) == torch.nn.modules.batchnorm.BatchNorm2d or type(m) == torch.nn.GroupNorm:
        for p in m.parameters():
            p.requires_grad = True
m.apply(unfreeze_norm_layers);

device = torch.device('cuda')
m.to(device)
inp = torch.randn(64, 3, 218, 178).to(device)
labels = torch.randint(0, 1000, (64,)).to(device)

opt = torch.optim.SGD(m.parameters(), lr=0.01, momentum=0.9)
opt.zero_grad()
l = torch.nn.CrossEntropyLoss()(m(inp), labels)
l.backward()
l.item()

# -> RuntimeError: CUDA error: an illegal memory access was encountered
# It works if you don't replace the BatchNorm (you can unfreeze BatchNorms)
# It works if you don't unfreeze the GroupNorm
# It works on CPU

# torch version: 1.6.0+cu92

Any ideas what am I doing wrong?

Could you update to the latest PyTorch release and also try the CUDA10.2 or 11.0 binaries?
If you are still hitting the illegal memory access, could you run the script via:

CUDA_LAUNCH_BLOCKING=1 python script.py args

and post the stack trace here?

Yep, with 1.7.1 cuda 10.1, it doesn’t crash.