Hello I am implementing a resnet 3d model in pytorch but constantly get non-deterministic result.
I do have all seed set up:
def seed_torch(seed=123):
random.seed(seed)
os.environ[‘PYTHONHASHSEED’] = str(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
def _init_fn(worker_id):
seed_torch(SEED)
I also change the model to Alexnet then the result is deterministic, so there must be some functions inside Resnet that cause this issue.
I am using the resnet 3D from:
I tried both wide resnet and resnet, 18 and 50, all have the same issue
A number of operations have backwards that use atomicAdd , in particular torch.nn.functional.embedding_bag(), torch.nn.functional.ctc_loss() and many forms of pooling, padding, and sampling. There currently is no simple way of avoiding non-determinism in these functions.
From a cursory reading of the code, you use pooling.
yes I did read that although it is very vague…
but I also use max pooling/avg pooling 3d in Alexnet, which is fine, so I figure pooling is not the issue.
add more info:
for 2d models I work with before, resnet has no such issue
The implementations can differ, both for the networks on top of the ops and the ops themselves. For example 3d avg pooling seems to use atomicAdd for cases where the windows can overlap, so something like
Thank you Tom, I tried to substitute this with maxpool 3d but still nondeterministic…
however maxpool 3d is deterministic in my alexnet
I will continue to explore other layer’s behavior