Version compatability scheme for PyTorch and compiled Torch extensions

charliebudd · October 28, 2022, 3:18pm

I’m current experiencing inter-op issues for code compiled for torch 1.12.0 being called from python running torch 1.12.1 and vice-versa. What compatibility should I expect for code compiled for different patch versions of torch? Is this a bug introduced by 1.12.1 or is it a miracle it worked for the other minor versions of PyTorch so far?

ptrblck · October 28, 2022, 5:59pm

PyTorch should be backwards compatible. I.e. a model written in 1.12.0 should work in 1.12.1, but not necessarily vice versa.

charliebudd · October 31, 2022, 9:54am

Sorry, my original question was poorly worded. To clarify, the python compatibility is not my issue. I have a compiled c++ .so containing a torch extension. This was compiled with torch 1.12.0. My question is, should we expect this to be callable from python code working with torch 1.12.1 (and any subsequent patch versions of 1.12). This certainly works for patch versions of 1.10, and 1.11, and the symbol definitions have not changed between 1.12.0 and 1.12.1.

The error I get is when passing a jit model down to c++. Before 1.12.1, the _c attribute on a jit script model returned a torch.jit.ScriptModule which will cast to a torch::jit::Module when passed to a c++ function. However, in 1.12.1, the _c attribute returns a torch.ScriptModule which no longer casts in the same way causing an exception to be thrown.

ptrblck · October 31, 2022, 7:04pm

No, I would not expect this to work and would assume you need to rebuild all libs (e.g. torchvision) as well as extensions using the matching PyTorch version.

charliebudd · November 1, 2022, 9:23am

I’m not sure that this should be the intended behaviour, taken to its extreme where even basic functionality broke between patches, developers wishing to build CUDA extensions for distribution would need to build for every patch version. Multiply this by every python version, and multiply this by every CUDA version. You’re essentially limiting PyTorch extension code to be distributed as source to be compiled on the user’s machine, which also requires that they have the CUDA toolkit installed and that the CUDA toolkit version matches that which was used to build their version of torch. The build matrix is already pretty substantial just for covering recent minor versions of Pytorch.

ptrblck · November 1, 2022, 4:43pm

Yes, you are right that the support matrix is large, but it’s also a balance to support users on older GPUs while also allowing the latest generation(s) to run. You are asking for an ability to e.g. mix and match torch and torchvision (or any other sub-library/extension) freely, which I’m sure won’t work in some cases and which is why the releases are aligned.
However, if you are only concerned about patch releases (no major/minor releases) then I think your extension should just work since only critical bug fixes should land in patch releases.

charliebudd · November 2, 2022, 9:22am

Yes I think it’s reasonable to expect issues between major and minor versions. My build matrix accounts for all of these. My current issue is due to a patch difference though. Thanks for your help, I’ll make a git issue explaining the situation.

Git Issue: https://github.com/pytorch/pytorch/issues/88301