Convert model With torchScript differs from python version

erict · January 16, 2020, 10:11am

Hi,

i’am trying to convert my model with torchscript and i get the following warning:

TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error:
Not within tolerance rtol=1e-05 atol=1e-05 at input[4, 0, 35, 33] (0.406830757856369 vs. 0.40629732608795166) and 6114 other locations (18.00%)
check_tolerance, _force_outplace, True, _module_class)

The code is this

model=torch.load("...\\Model\\version1\\train0\\gray\\ConvAuto750.pth")
model.eval()
model.cuda()
example = torch.rand(8, 1, 64, 64).cuda()
traced_script_module = torch.jit.trace(model, example)

I want to use a batch size greater than one to speed up inference.

The strange thing is that if i use the model with default weights initialization parameters the above warning not appear even with batch_size greater than one.

This is the code:

model= simple_autoPrelu()
model.eval()
model.cuda()
example = torch.rand(8, 1, 64, 64).cuda()
traced_script_module = torch.jit.trace(model, example)

of course the model that i load has the same architecture as the one below. The only difference is that i have trained it for 750 epochs.

If i change the batch size from 8 to 1 the warning not appears.

Could someone provide me some explanations or some way on how to solve this warning?

Thanks

ptrblck · January 17, 2020, 1:13am

Did you save the state_dict by chance?
If not, could you load your model using your current script, store it via torch.save(model.state_dict(), PATH) and change your script to initialize the model and load the state_dict?

erict · January 17, 2020, 3:27pm

Hi @ptrblck,

thanks for the response. I have tried but i got the same warning. Maybe i have some problems with my custom model, but is a simple Convolutional Autoencoder and i cannot understand what can cause this warning after doing the training. If you have other hints or tell me what can cause this warning, i can try to do some tests.

Thanks

ptrblck · January 17, 2020, 7:10pm

Would it be possible to post a code snippet to reproduce this error?

erict · January 21, 2020, 9:25am

Unfortunately i cannot share the code because it is proprietary. I have checked the inference accuracy of the two models converted with batch size = 1 which gives no warning and batch size = 8 the result is quite similar. I have checked the output pixel’s value in both cases. In the next days i will try to build a different version of the code with no proprietary restriction and if the error persists i will share the code.

Thanks.