Hi all,
I have an issue with a torchScript module traced from torchvision.models.segmentation.deeplabv3_resnet50
. Since it only contains operations that do not fix input size, I was hoping to be able to load it into C++ and use it for inference on different sized 2D input.
Basically, in python I create the model, (train, not relevant for the error), trace it and save it. Then in C++ I load and try to do inference. At runtime, the size of input can vary and I thought that this would not be a problem, but I always get Microsoft C++ exception: cudnn_frontend::cudnnException at memory location 0x00000052344FBEA0.
for any input that is not the same shape as the example_input
provided to trace
Am I
a) wrong in my understanding that the model should be able to accept variable shaped input
b) doing something wrong
Thank you for your help. Example code and library versions below for reproduction. Let me know if anything else is needed
In python (torch==2.0.0+cu118, torchvision==0.15.1+cu118
on win64), I do:
import torch
import torchvision
class wrapper(torch.nn.Module):
def __init__(self, model):
super(wrapper, self).__init__()
self.model = model
def forward(self, input):
results = []
output = self.model(input)
return output["out"]
numClasses=29
net = torchvision.models.segmentation.deeplabv3_resnet50()
net.classifier[4] = torch.nn.Conv2d(256, numClasses, kernel_size=(1, 1), stride=(1, 1))
net.eval()
model = wrapper(net)
device = torch.device("cuda")
model = model.to(device)
with torch.no_grad():
raw_t = torch.rand((1,3, 2048, 2448 ), device=device)
traced_script_module = torch.jit.trace(model, raw_t)
traced_script_module.save("D:\\Models\\model_2048_2448_cuda.pt")
and then in C++ (libtorch 2.0.0+cu118 in visual studio 2019 v142):
void testjit() {
auto model = torch::jit::load("D:/Models/model_2048_2448_cuda.pt");
model.to(torch::kCUDA);
model.eval();
if (true)
{
torch::NoGradGuard guard;
torch::Tensor inp = torch::rand({ 1,3, 2048, 2448 }, torch::TensorOptions().device(torch::kCUDA).dtype(torch::kFloat32));
std::vector<torch::jit::IValue> inputs;
inputs.push_back(inp);
auto full_image_start = std::chrono::high_resolution_clock::now();
model.forward(inputs); //No problem, same shape as used for tracing in python
std::cout << "2048, 2448 ran succesfully" << std::endl;
}
if (true)
{
torch::Tensor inp = torch::rand({ 1,3,1444,1444 }, torch::TensorOptions().device(torch::kCUDA).dtype(torch::kFloat32));
std::vector<torch::jit::IValue> inputs;
inputs.push_back(inp);
model.forward(inputs);
std::cout << "1444,1444 ran succesfully" << std::endl;
}
if(true)
{
torch::Tensor inp = torch::rand({ 1,3,1444,1443 }, torch::TensorOptions().device(torch::kCUDA).dtype(torch::kFloat32));
std::vector<torch::jit::IValue> inputs;
inputs.push_back(inp);
model.forward(inputs); //Microsoft C++ exception: cudnn_frontend::cudnnException at memory location 0x00000052344FBEA0.
std::cout << "1444,1443 ran succesfully" << std::endl;
}