When I try to run inference on an nn.TransformerEncoder compiled with torch.compile using PyTorch 2.1.0 and CUDA 12.1 I get the following runtime error:
TorchRuntimeError: Failed running call_module fn(*(FakeTensor(..., device='cuda:0', size=(1, 2, 4)),), **{'src_key_padding_mask': FakeTensor(..., device='cuda:0', size=(1, 2), dtype=torch.bool)}):
meta converter nyi
from user code:
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
This is a minimal example code to reproduce the error:
import torch
import torch.nn.functional as F
from torch import nn
encoder_layer = nn.TransformerEncoderLayer(4, 2, 16, batch_first=True, device='cuda')
model = nn.TransformerEncoder(encoder_layer, 2)
model_opt = torch.compile(model)
model.eval()
with torch.inference_mode():
x = torch.arange(0, .8, .1, device='cuda').reshape((1, 2, 4))
mask = torch.tensor([[False, True]], device='cuda')
model_opt(x, src_key_padding_mask=mask)
If I run this code using PyTorch nightly (2.3.0.dev20240228 at the time of writing) I get a different runtime error:
torch._dynamo.exc.TorchRuntimeError: Failed running call_module fn(*(FakeTensor(..., device='cuda:0', size=(1, 2, 4)),), **{'src_key_padding_mask': FakeTensor(..., device='cuda:0', size=(1, 2), dtype=torch.bool)}):
strided nested tensors are not supported by meta conversion
from user code:
File "/home/nvidia/anaconda3/envs/tfm-david/lib/python3.11/site-packages/torch/_dynamo/external_utils.py", line 25, in inner
return fn(*args, **kwargs)
The error disappears if I run the model on training mode or set enable_nested_tensor=False
when creating the TransformerEncoder.
Am I doing something wrong or is there any way to get around this issue?