Inconsistency with train and valiadtaion loss

kranti · January 30, 2020, 8:12pm

I defined a linear layer after the network and this layer is neither part of the network nor the weights are updated during training. It is defined just before the start of the first epoch as mentioned below.

mod1_encoder = linear([1024, 256, 256])
mod2_encoder = linear([1024, 512, 256])
mod1_classifier = linear([256, 4, 23])
mod2_classifier = linear([256, 4, 23])
net = encoder_individual_classifier(mod1_encoder, mod2_encoder, mod1_classifier, mod2_classifier)
net.cuda()

dset_train = data_loader(root_data_path=params['root_data'], labels_txt=all_labels, mode='trn', transform=None, randomize=True)
train_loader = DataLoader(dset_train, batch_size=params['batch_size'], num_workers=10, sampler=samplr)

dset_val = data_loader(root_data_path=params['root_data'], labels_txt=all_labels, mode='val', transform=None, randomize=False)
val_loader = DataLoader(dset_val, batch_size=params['batch_size'], shuffle=False, num_workers=10)

optimizer = torch.optim.Adam(net.parameters(), lr=params['lr'])
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=params['decay'], gamma=0.1)

tmp = linear([2,2])
for epoch in ....

I noticed that gradients backpropagated are different after adding the ‘tmp’ layer even after using the same seed for both the cases. What could be the possible explanation for this?

ptrblck · February 2, 2020, 5:53am

If you create a new layer, the parameters will be initialized, which will most likely call into a random initialization method and thus the pseudo-random number generator.
If you reset the seed after the initialization of tmp, you might get the same values.
Note that some methods are not deterministic as explained here.