Non-reproducible results with BlurPool in timm

clck10 · August 27, 2021, 9:23pm

Hello all,

Wanted to stop by and check if things are working as expected before making an issue on the timm Github page.

In short, I am getting non-reproducible results when networks have a BlurPool module from timm. Other networks (without BlurPool) are exactly reproducible after setting the seed across the needed libraries (numpy, random, torch, torch.cuda, etc).

It is the strangest thing, since there is nothing in the module that screams hidden randomness:

github.com

rwightman/pytorch-image-models/blob/master/timm/models/layers/blur_pool.py

"""
BlurPool layer inspired by
 - Kornia's Max_BlurPool2d
 - Making Convolutional Networks Shift-Invariant Again :cite:`zhang2019shiftinvar`

Hacked together by Chris Ha and Ross Wightman
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from .padding import get_padding


class BlurPool2d(nn.Module):
    r"""Creates a module that computes blurs and downsample a given feature map.
    See :cite:`zhang2019shiftinvar` for more details.
    Corresponds to the Downsample class, which does blurring and subsampling

This file has been truncated. show original

I’ve confirmed this on several machines with different environments and versions of CUDA.

Has anyone seen something similar, or have some suggestions about what is going on?
Or is this expected behavior from BlurPool?

Thanks much for any help in advance, please let me know if more details are needed.

Ty!

ptrblck · August 28, 2021, 8:25am

I assume you’ve followed all steps described in the reproducibility docs to get deterministic results?
In particular, since a convolution is used inside this layer you would have to make sure to use deterministic cuDNN algorithms.

clck10 · August 28, 2021, 4:53pm

Thank you for your response! Yes, I believe so. Here is my current seed setter that is called in the main training function:

    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
    os.environ['PYTHONHASHSEED'] = str(seed)

I am also seeding the DataLoader workers with the worker_init_function in the same way.

The strangest thing is I can make things non-reproducible just by swapping the pooling layer. In the ResNets I am using that is either in: the max-pool after the stem, or the anti-aliasing avg-pool in the downsampling paths.

As soon as I swap either of those for a BlurPool2d, things become irreproducible.

Thank you again!