Torch.compile() segfaults on CUDA 11.6

When running with a torch.compile() model with PyTorch 2.0 and CUDA 11.6, my code is segfaulting with just some sample code. When I remove torch.compile(), the code executes just fine. Any insight into what might be going on would be greatly appreciated :slight_smile:.

Here is my CUDA environment:

NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.6 

The exact python packages:

pytorch-triton           2.0.0+0d7e753227
torch                    2.0.0.dev20230202+cu116
torchaudio               2.0.0.dev20230201+cu116
torchvision              0.15.0.dev20230201+cu116

And the code that I’m trying to execute.

import torch
import torchvision.models as models

if __name__ == "__main__":
    model = models.resnet18().cuda()
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    # compiled_model = model # Works fine when not actually compiled
    compiled_model = torch.compile(model)

    x = torch.randn(16, 3, 224, 224).cuda()
    optimizer.zero_grad()
    out = compiled_model(x)
    out.sum().backward()
    optimizer.step()