Using optimize_for_inference on Torchscript model causes error

LewsTherin · January 31, 2024, 6:48pm

I fine-tuned the I3D action detection model on a custom dataset, saved it to Torch script and I’m loading it for inference by:

# the model is
model = torch.hub.load("facebookresearch/pytorchvideo", "i3d_r50", pretrained=True)

# training on custom dataset
[...]

# save the trained model
torch.jit.save(torch.jit.script(model), some_local_path)

# load it back
model = torch.jit.load(some_local_path, map_location=device).to(device)

# run inference
predictions = model(some_input)

If I just use the model loaded this way everything is ok…however, if I try adding:

model = torch.jit.load(some_local_path, map_location=device).to(device)
model = torch.jit.optimize_for_inference(model)
predictions = model(some_input)

I’m getting the error:

Traceback of TorchScript (most recent call last):

    graph(%input, %weight, %bias, %stride:int[], %padding:int[], %dilation:int[], %groups:int):
        %res = aten::cudnn_convolution_relu(%input, %weight, %bias, %stride, %padding, %dilation, %groups)
               ~~~~ <--- HERE
        return (%res)
RuntimeError: CUDNN_BACKEND_OPERATIONGRAPH_DESCRIPTOR: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED

Is this a problem in the optimization of the specific model?

P.S. I’ve been suggested against the use of Torchscript here, but this is a fast way to have this running before I explore other options

ptrblck · January 31, 2024, 7:10pm

Yes, since TorchScript is in maintenance mode and I don’t believe will get major fixes anymore.
My guess is FuseFrozenConvAddRelu fails, but this is just my guess.

LewsTherin · February 7, 2024, 6:49pm

Fun fact, I noticed that if I use optimize_for_inference after casting the model to half precision, it works:

model = torch.jit.load(some_local_path, map_location=device).to(device)
self.model.half()
model = torch.jit.optimize_for_inference(model)

rubenc · March 8, 2024, 8:51am

Since TorchScript is in maintenance mode, what saving format do you suggest as an alternative for running inference in C++.
Thanks in advanced!

ptrblck · March 8, 2024, 1:45pm

I don’t have an alternative to suggest and also don’t know what the roadmap of the code owners is regarding the C++ frontend.