Hi,
I’ve been scratching my head for a while now.
Env: Python 2.7; pytorch 1.8 + cuda
class Model(nn.Module):
def __init__(self, feature_extractor, dropout=0, pretrained=True, feat_dim=2048):
super().__init__()
self.dropout = dropout
self.feature_extractor = feature_extractor
self.feature_extractor.avgpool = nn.AdaptiveAvgPool2d(1)
fe_out_planes = self.feature_extractor.fc.in_features
self.feature_extractor.fc = nn.Linear(fe_out_planes, feat_dim)
self.fc_t = nn.Linear(feat_dim, 3)
self.fc_q = nn.Linear(feat_dim, 3)
# initialize the model
if pretrained:
init_modules = [self.feature_extractor.fc, self.fc_xyz, self.fc_wpqr]
else:
init_modules = self.modules()
for m in init_modules:
if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
nn.init.constant_(m.weight.data, 0.01) # constant weights
if m.bias is not None:
nn.init.constant_(m.bias.data, 0)
def forward(self, x):
s = x.size()
x = x.view(-1, *s[2:])
x = self.feature_extractor(x)
x = F.relu(x)
"""if self.dropout > 0:
x = F.dropout(x, p=self.dropout)"""
t = self.fc_t(x)
q = self.fc_q(x)
out = torch.cat((t, q), 1)
out = out.view(s[0], s[1], -1)
return out
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
feature_extractor = models.resnet34(pretrained=True)
model = Model(feature_extractor, dropout=0, feat_dim=2048)
Now, the interesting part:
When I run,
model.cuda()
print('Feed a random batch to test the model: ')
input = torch.ones(1, 64, 3, 7, 7)*0.3
input = input.cuda()
model.eval()
output = model(input)
print(output)
tensor([[106.7721, 106.7721, 106.7721, 106.7721, 106.7721, 106.7721], … ]], device=‘cuda:0’)
Compared to when I run without model.cuda(), i.e.:
print('Feed a random batch to test the model: ')
input = torch.ones(1, 64, 3, 7, 7)*0.3
model.eval()
output = model(input)
print(output)
tensor([[53.3860, 53.3860, 53.3860, 53.3860, 53.3860, 53.3860],…])
Almost half the values and this is consistent with different inputs.
I actually discovered this when I was porting the original repo to Python 3 (v3.8 with same version of pytorch) and I tried comparing the outputs with same input data, same constant weight init, no shuffle, no dropout with model.eval() but I noticed different outputs.
Then after some many print statements, I noticed this. Interestingly, this is not the case in the Python 3 version i.e., running the script by putting the model and data into gpu by using model.cuda() and input = input.cuda() gives the same output as compared to when .cuda() is not used.
I am really confused as to what the issue is and I couldn’t find any relevant documentation regarding this.
Please help.
Thank you.