Hi @smth, thanks for your reply! It will be hard to reproduce it fully, but i’ll try my best to provide as many details as possible.
In my task of object detection in 3D images, I have a U-shape resnet architecture with Faster R-CNN framework for object detection, i.e., finding objects (objectiveness score/probability) and their locations (coordinates of the bounding box).
So in training, i have something like this:
for i, (data, target) in enumerate(data_loader):
data = Variable(data.cuda(async=True))
target = Variable(target.cuda(async=True))
out_dict = net(data)
loss_output = loss(out_dict['predictions'], target, train=True)
optimizer.zero_grad()
loss_output.backward()
optimizer.step()
if epoch % save_freq == 0:
save_model(net, optimizer, os.path.join(save_dir, '%03d.ckpt' % self.epoch))
The following is saved in the model file:
def save_model(net, optim, ckpt_fname):
state_dict = net.module.state_dict()
for key in state_dict.keys():
state_dict[key] = state_dict[key].cpu()
torch.save({
'epoch': epoch,
'state_dict': state_dict,
'optimizer': optim},
ckpt_fname)
Then in resume training, I load in the pre-trained model file, and use that to resume to the training stage of say, Epoch k. Before the training actually happens, I check if this is regular or resume training:
if pretrained is not None:
state_dict = torch.load(pretrained)
new_state_dict = OrderedDict()
for k, value in state_dict['state_dict'].iteritems():
key = "module.{}".format(k)
new_state_dict[key] = value
net.load_state_dict(new_state_dict)
epoch = state_dict['epoch']
print "pre-trained epoch number: {}".format(epoch)
optimizer = state_dict['optimizer']
else:
optimizer = SGD(
net.parameters(),
learning_rate,
momentum=momentum,
weight_decay=weight_decay)
For the learning rate, I have stair-wise decaying, so to train for 100 epochs, I decay the learning rate to 1/10 at Epoch 50, and then another 1/10 at Epoch 80. In this example, k=5, so the learning rate should be the same for the two comparison runs above (baseline 1 and resume training 2).
The result is in baseline 1, I saw gradually decreasing false negative rates. While for the 2nd case, I didn’t see such progress during resume training. Also, the loss values were significantly different. I’m aware of the non-deterministic nature of GPU training, but that should not be the entire reason of such discrepancy.
I ran these two multiple times and got the same distinct results each time.
Did i miss anything important here?
Thank you!