Hi all, I have a custom regression model implemented in pytorch and I am noticing some discrepancies in performance that seem larger than random drift.
First let me describe my data. I have a training set(generated ‘randomly’), a dummy test set(generated ‘randomly’ in the same way as the training set), and a real world test set(significantly smaller than the dummy test set, not generated in the same way)
There are 3 cases I am comparing:
case 1: training model(custom nn.Module) with no jit at all
case 2: training torch.jit.script(model) where modules are custom nn.Module
case 3: training model where nn.Modules have been rewritten as torch.jit.ScriptModules with all methods decorated with @torch.jit.script_method.
In case 1 and 2 the results are similar enough in terms of test/predict error and run times. Case 3 is where things get interesting. In case 3 I obtain significantly worse results with respect to a subset of the output features( ~10% worse MAPE than case 1 and 2) on the dummy test set. This is fine, it probably means there’s a bug in my ScriptModule code. BUT, using this ScriptModule to predict on the real world test set yields a result ~10% better than case 1 or 2. The issue here is twofold. First, I don’t actually care about my results on the dummy test set. Second, in this case I have ground truth knowledge for my real world test data, but in reality I will not have access to this knowledge and can only compare models in terms of their performance on dummy test data. How do I proceed in situations like this? I feel like I’ve opened pandora’s box here because I only started using jit to obtain some performance gains and was not expecting large differences in model results.