Non-deterministic training (gradient update) in Resnet

Hello I am implementing a resnet 3d model in pytorch but constantly get non-deterministic result.
I do have all seed set up:
def seed_torch(seed=123):
os.environ[‘PYTHONHASHSEED’] = str(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

def _init_fn(worker_id):

I also change the model to Alexnet then the result is deterministic, so there must be some functions inside Resnet that cause this issue.

I am using the resnet 3D from:

I tried both wide resnet and resnet, 18 and 50, all have the same issue

Probably the backward of the downsampling:

Seems to be one of the things everyone wants to have but nobody takes the time to do.

Best regards


Do you mean in shortcut A, the down_sample_basic_block?
I used shortcut B, which the downsample is basically a conv + batchnorm

Did you click on the link?

A number of operations have backwards that use atomicAdd , in particular torch.nn.functional.embedding_bag() , torch.nn.functional.ctc_loss() and many forms of pooling, padding, and sampling. There currently is no simple way of avoiding non-determinism in these functions.

From a cursory reading of the code, you use pooling.

Best regards


yes I did read that although it is very vague…
but I also use max pooling/avg pooling 3d in Alexnet, which is fine, so I figure pooling is not the issue.

add more info:
for 2d models I work with before, resnet has no such issue

The implementations can differ, both for the networks on top of the ops and the ops themselves. For example 3d avg pooling seems to use atomicAdd for cases where the windows can overlap, so something like

self.avgpool = nn.AvgPool3d((last_duration, last_size, last_size), stride=1)

is suspect.

Best regards


Thank you Tom, I tried to substitute this with maxpool 3d but still nondeterministic…
however maxpool 3d is deterministic in my alexnet
I will continue to explore other layer’s behavior