Using optimize_for_inference on Torchscript model causes error

I fine-tuned the I3D action detection model on a custom dataset, saved it to Torch script and I’m loading it for inference by:

# the model is
model = torch.hub.load("facebookresearch/pytorchvideo", "i3d_r50", pretrained=True)

# training on custom dataset
[...]

# save the trained model
torch.jit.save(torch.jit.script(model), some_local_path)

# load it back
model = torch.jit.load(some_local_path, map_location=device).to(device)

# run inference
predictions = model(some_input)

If I just use the model loaded this way everything is ok…however, if I try adding:

model = torch.jit.load(some_local_path, map_location=device).to(device)
model = torch.jit.optimize_for_inference(model)
predictions = model(some_input)

I’m getting the error:

Traceback of TorchScript (most recent call last):

    graph(%input, %weight, %bias, %stride:int[], %padding:int[], %dilation:int[], %groups:int):
        %res = aten::cudnn_convolution_relu(%input, %weight, %bias, %stride, %padding, %dilation, %groups)
               ~~~~ <--- HERE
        return (%res)
RuntimeError: CUDNN_BACKEND_OPERATIONGRAPH_DESCRIPTOR: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED

Is this a problem in the optimization of the specific model?

P.S. I’ve been suggested against the use of Torchscript here, but this is a fast way to have this running before I explore other options :slight_smile:

1 Like

Yes, since TorchScript is in maintenance mode and I don’t believe will get major fixes anymore.
My guess is FuseFrozenConvAddRelu fails, but this is just my guess.

Fun fact, I noticed that if I use optimize_for_inference after casting the model to half precision, it works:

model = torch.jit.load(some_local_path, map_location=device).to(device)
self.model.half()
model = torch.jit.optimize_for_inference(model)

Since TorchScript is in maintenance mode, what saving format do you suggest as an alternative for running inference in C++.
Thanks in advanced!

I don’t have an alternative to suggest and also don’t know what the roadmap of the code owners is regarding the C++ frontend.

1 Like

Hi,

Do you think using optimize_for_inference will cause inference result different slightly? I found the max value difference would be as large as 1e-4. Is this normal?

The absolute error could be expected due to rounding errors caused by the limited floating point precision. You should double check the relative error as well and could also compare it against a wider dtype, e.g. float64.

1 Like