Jit.trace affects the accuracy of the model on GPU

I tried to export my torch model into a pt file(while my model is located on the gpu), using this code-

traced_script_module = torch.jit.trace(self.model, gray_img_tens, strict=False)

But when I try to activate this model on the same input, using-

tracedPT_outputs = traced_script_module.forward(gray_img_tens)

I recieve different output. The output is usually close to the original, but I can’t rely on that. The mispredictions are accumulated during the code, and th final result is far than expected.

  • The problem occures just on the gpu!
    When I try to do the same on the cpu, the output I recieve is the same.
    In this case, the latency is very high, and I can’t rely on that.

It might be difficult, but could you check if it would be possible to isolate where the differences first start appearing between the traced/non-traced versions of the model? For example, one approach would be to remove layers until the model is very small yet still showing differences.

Alternatively, depending on the size of the model and the numerics of the traced ops, small numerical differences might be expected. A litmus test for this could be to run the model in higher (e.g., double) precision and to see if the traced model’s output is further away than the standard eager mode model output.