TorchScript model loading guidance

I have a question, is that something we don’t want to initialize the model architecture and load the weights to do inference, if I save the model in Torch script (TorchScriptModule)format.

In the following below script, they are loading the save traced model directly. So, does it mean that we don’t want to initialize the model and load the saved weight?

import torch
import torchvision

# An instance of your model.
model = torchvision.models.resnet18()

# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)

# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)"")

loaded = torch.jit.load('')

However, traditional way for loading the saved weights is to, first initialize the model and load the saved weights like the below steps

# Define model
class TheModelClass(nn.Module):
    def __init__(self):
        super(TheModelClass, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Initialize model
model = TheModelClass()

# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9), PATH)

model = TheModelClass(*args, **kwargs)

can someone please help me understand this?

@ptrblck #vision #torchscript #deployment #jit

Yes, that’s right. You can load the scripted or traced model directly (e.g. as is done if you want to execute this model in libtorch after exporting it in the Python API).

thanks for your reply @ptrblck , here I believe libtorch is C++ packages. I’m afraid I’m not a C++ developer.
is it mandatory to load the model using “libtorch” and do the inference?

I have a follow-up question,

  1. only if I load the model using libtorch and do the inference, then I see the time difference during inferencing?
  2. is it possible to load the model in python and do the inference? will that reduce the inference time?

No, you can also deploy the workload using the Python frontend.

is it a traditional way, like loading the torchscript model using

model = torch.jit.load('')

and do the inference.

if I do by this way will it reduce the inference timing?

because I tested it with an example provided in the pytorch official documentation
where they have used resnet50 model. I tried with that, by saving the model in a traditional way and torchscript way as well. After some warmup iterartion, I didn’t see any difference in the inference timing.

could you please help here?

Scripting the model can provide a speedup but also depends on a lot of factors such as the used JIT backend (related to the used PyTorch version) etc.
As mentioned in the other post, you could try out the current nightly to see if scripting the model would yield a speedup using the new features in the nvfuser backend we were working on.

thanks for your reply @ptrblck , could you please tell me to which post/article you’re referring to?

could you please send me an example article/blog to emulate the same and see the difference?

Sure, I was referring to your other post where I’ve mentioned the same.

Once the release is done, I can forward you the article(s).

thanks very much for your reply, @ptrblck.