Converting Deeplabv3 for inference

Kishor_G · October 27, 2023, 8:40pm

Hi,

I have finetuned DeepLabv3 model for a custom dataset.

I converted the trained model to a jit model using torch.jit.script() and the model worked fine and gave the desired output during inference. But when I try to convert the same model again to jit, the results are undesirable during inference and looks completely poles apart.

I am new to jit, so trying to figure out things.

The model was trained on 2 GPUs, then convert to a single model GPU, then tried converting it to jit.

ptrblck · October 27, 2023, 9:33pm

Could you explain how you are concerting the model again? Are you calling torch.jit.script on the already scripted model?

Kishor_G · October 27, 2023, 9:40pm

I am calling the torch.jit.script on a loaded state dict, one time was almost a month ago and once today on the same pretrained model, I printed the weights of model they to vary

Kishor_G · October 27, 2023, 9:41pm

I am not calling torch.jit.script on a scripted model

ptrblck · October 27, 2023, 9:42pm

In that case the state_dict seems to have changed and you should check it.

Kishor_G · October 27, 2023, 9:52pm

I am converting from a multigpu (nn.DataParallel) model to single gpu model by this method:

for key in deeplab_model:
new_key = key.replace(‘module.’, ‘’)
new_state_dict[new_key] = deeplab_model[new_key]

Is this right way to do?

ptrblck · October 27, 2023, 9:59pm

It looks correct and model.load_state_dict will also fail if you have mismatching keys.

Kishor_G · October 27, 2023, 10:15pm

This is the inference on the multigpu model

This is the output on the single GPU model

Both the outputs are before converting to a jit model. So, I think converting from multi gpu to single gpu is causing the issue. Is there a way to convert a multigpu model to single gpu other than the above mentioned way?

Kishor_G · October 27, 2023, 10:34pm

Found this post.

Will try this method, and let you know if it works

Kishor_G · October 28, 2023, 2:54am

Using the above mentioned way to convert nn.Dataparallel to single gpu model works consistently compared to the key replacement in the state_dict. Now I am able to get desired solution and indeed the issue is with the state_dict as foreseen by @ptrblck