Hi
I failed to train two models sharing same parameter to produce same result.
It seems that a model including 1x1 kernel convolution module makes bad gradients when it was copied.
OS: ubuntu 16.04
PyTorch version: 0.3.1
How you installed PyTorch (conda, pip, source): conda in docker container(ubuntu 16.04)
Python version: 3.6.1
CUDA/cuDNN version: 9.0
GPU models and configuration: k80 (aws p2 instance)
GCC version (if compiling from source):
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
class Model(nn.Module):
def __init__(self, num_classes=1000):
super(Model, self).__init__()
self.conv1x1 = nn.Conv2d(3, 64, kernel_size=1, stride=1, padding=0, bias=False)
self.pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Linear(64, num_classes)
def forward(self, x):
x_c = self.conv1x1(x)
x_pool = self.pool(x_c).view(-1, 64)
output = self.fc(x_pool)
return x_c, x_pool, output
np.random.seed(1)
torch.manual_seed(1)
model1 = Model().cuda()
model2 = Model().cuda()
model2.load_state_dict(model1.state_dict())
model1.train()
model2.train()
input_np = np.random.random((1, 3, 64, 64))
input = Variable(torch.Tensor(input_np), requires_grad=False).cuda()
criterion1 = nn.CrossEntropyLoss().cuda()
criterion2 = nn.CrossEntropyLoss().cuda()
target = Variable(torch.LongTensor(np.array([0])), requires_grad=False).cuda()
x_c_1, x_p_1, o1 = model1(input)
x_c_2, x_p_2, o2 = model2(input)
loss1 = criterion1(o1, target)
loss2 = criterion2(o2, target)
loss1.backward()
loss2.backward()
param1 = list(model1.parameters())
param2 = list(model2.parameters())
for p1, p2 in zip(param1, param2):
# fail when parameter is in conv layer
print("params have same gradient?: ", p1.data.numel() == (p1.grad.data == p2.grad.data).nonzero().size(0))
When height and width of input are small (<= 32x32), gradient of the module is same as master model’s gradient.
And 3x3 kernel convolution modules didn’t cause the problem whatever their inputs size were.
Am I missing something? Thanks!