Wanted to stop by and check if things are working as expected before making an issue on the timm Github page.
In short, I am getting non-reproducible results when networks have a BlurPool module from timm. Other networks (without BlurPool) are exactly reproducible after setting the seed across the needed libraries (numpy, random, torch, torch.cuda, etc).
It is the strangest thing, since there is nothing in the module that screams hidden randomness:
I’ve confirmed this on several machines with different environments and versions of CUDA.
Has anyone seen something similar, or have some suggestions about what is going on?
Or is this expected behavior from BlurPool?
Thanks much for any help in advance, please let me know if more details are needed.
I assume you’ve followed all steps described in the reproducibility docs to get deterministic results?
In particular, since a convolution is used inside this layer you would have to make sure to use deterministic cuDNN algorithms.
I am also seeding the DataLoader workers with the worker_init_function in the same way.
The strangest thing is I can make things non-reproducible just by swapping the pooling layer. In the ResNets I am using that is either in: the max-pool after the stem, or the anti-aliasing avg-pool in the downsampling paths.
As soon as I swap either of those for a BlurPool2d, things become irreproducible.