I’m trying to fine-tune Resnet18, but replaced BatchNorm with GroupNorm. It works on CPU, it works when I don’t unfreeze the GroupNorm, it works with BatchNorm… but with GroupNorm unfrozen, it always fails with
RuntimeError: CUDA error: an illegal memory access was encountered.
Here is a minimal code that reproduces the issue:
## Repro CUDA issue import torch from torchvision.models import resnet18 m = resnet18(pretrained=True) def replace_bn(m): for name, child in m.named_children(): if type(child) == torch.nn.BatchNorm2d: setattr(m, name, torch.nn.GroupNorm(num_groups=1, num_channels=child.num_features)) replace_bn(child) replace_bn(m) # Freeze the model, except GroupNorm for p in m.parameters(): p.requires_grad = False def unfreeze_norm_layers(m): if type(m) == torch.nn.modules.batchnorm.BatchNorm2d or type(m) == torch.nn.GroupNorm: for p in m.parameters(): p.requires_grad = True m.apply(unfreeze_norm_layers); device = torch.device('cuda') m.to(device) inp = torch.randn(64, 3, 218, 178).to(device) labels = torch.randint(0, 1000, (64,)).to(device) opt = torch.optim.SGD(m.parameters(), lr=0.01, momentum=0.9) opt.zero_grad() l = torch.nn.CrossEntropyLoss()(m(inp), labels) l.backward() l.item() # -> RuntimeError: CUDA error: an illegal memory access was encountered # It works if you don't replace the BatchNorm (you can unfreeze BatchNorms) # It works if you don't unfreeze the GroupNorm # It works on CPU # torch version: 1.6.0+cu92
Any ideas what am I doing wrong?