I was trying to reproduce the results for a CNN in two different machines (both running on CPU) but I ended up with different results (setting the same python and torch seed in advance). After checking the CNN layers individually, I noticed that the difference is caused in the Dropout layer specifically. I couldn’t find the exact implementation for Dropout, so I was wondering if there’s any idea on what may be behind this?
Some additional details:
import random
import torch
random.seed(2)
torch.manual_seed(3)
a = torch.randn(4, 4)
dp1 = torch.nn.Dropout(p=0.3)
dp1(data)
# Result in one of the machines:
# tensor([[-1.0361, -3.5810, -0.0000, 1.8477],
# [-2.0117, -0.0000, -0.0785, 1.7217],
# [-0.0000, -2.3690, -0.0955, -0.7392],
# [-0.0000, 1.5011, 1.8367, 2.3154]])
# Result for the other machine:
# tensor([[-1.0361, -0.0000, -1.1379, 1.8477],
# [-2.0117, -1.2687, -0.0000, 1.7217],
# [-1.4952, -2.3690, -0.0000, -0.7392],
# [-0.4844, 1.5011, 1.8367, 2.3154]])
- I also tried the tips from the reproducibility page - using deterministic algorithms, … However, the results are still different.
- I used a docker image, so library versions are exactly the same (Python 3.9/torchvision 0.11.1/torch 1.10.0)
Any tips/ideas are welcome, thanks.