[Solved] Python 3.5 vs Python 3.6 Performance Difference

Hello all,
I’ve designed a CNN-based network in PyTorch with no issue. I’d been running my experiments on Windows 10 and Python 3.6 with no issue. Recently I transferred the same code to Python 3.5 running on Ubuntu 16.04 and found (to my surprise) that I was getting up to a 20% increase in classification accuracy on Ubuntu compared to Windows.

Following the training process, it is clear that the network seems to be training better (i.e lower validation loss after each epoch) as compared to Windows. To determine whether it was a Windows or Ubuntu issue, I created an Anaconda environment with Python 3.5 installed and ran the same code and found that I was getting about the same massive improvement in this 3.5 environment as well.

Can anyone please explain why this might be happening? I’ve checked the code many times and it certainly doesn’t rely on anything Python 3.5-specific but yet…

I’m using PyTorch 0.4.1 (CPU Only).

Thanks in advance!

I assume, you were running exact same code.
Here is a wild guess. Can it be due to the random initialisation of the parameters?
Did you set the same seed while training in python 3.5, 3.6?

Hello @InnovArul,
Yeah I was. Literally copied and pasted.
Err, I don’t know. I don’t think I’ve ever set the seed when training. I assume by default that PyTorch does that for me…

Could you try to set the random seed in both solutions and run them again?
You can find more info here.

Would it be possible to get an executable script to reproduce this issue?

Hello @ptrblck,
Setting the same seed in both environments doesn’t seem to have any effect. Python 3.5 still outperforms 3.6 by a very large margin.
That’d be quite difficult to be honest - but I can provide any details of the code/model as required …

Hello @ptrblck,
Sorry - I know its been a long while, but I recently got permission to share the code. Can I still share the executable script complete with pickled train and test data?

Sure! This issue is still quite interesting.

Hello @ptrblck,
Apologies for the late response. I was trying to put code together to replicate the issue, and it seems the mystery is finally resolved.
Apparently the issue was coming from my own code - I’d written a custom Dataset class which relied heavily on dicts. The behavior of dicts between Python 3.5 and 3.6 changed drastically, which inadvertently caused some of my training data to bleed into my testing data on Python 3.5, while this wasn’t happening on Python 3.6. Therefore the performance gain reported was completely spurious and I can confirm now that the results can now be replicated across Python releases.

Really sorry for wasting your time with this. Thank you very much for your input. I appreciate it alot.

Ps: I will change the title of this thread to reflect this turn of events

1 Like

I’m glad it’s solved! :slight_smile:
Just out of personal interest, which dict operations created the data leak?


Well, I was using a for … in loop. Apparently, in Python 3.5, the order in which items are returned is not the same as the order in which they were inserted. From 3.6 onward, the retrieval order is the same as the insertion order…a trivial change really, but I’d assumed ordered behavior right from the start. Turns out I was wrong.

Thanks for the explanation!
Oh right. As far as I know a Python dict in <=3.5 was implemented using a hash function.
In 3.7 the ordering seems to be a mandatory.
However, if you don’t want to rely on the version, you could also use an OrderedDict.

@ptrblck Its my pleasure. Least I can do after making such a fuss :slight_smile:

Yeah, I later found that out. Currently experimenting with some alternatives and its…in the list. No pun intended…

Thanks for the tip!

1 Like