I defined a linear layer after the network and this layer is neither part of the network nor the weights are updated during training. It is defined just before the start of the first epoch as mentioned below.
mod1_encoder = linear([1024, 256, 256])
mod2_encoder = linear([1024, 512, 256])
mod1_classifier = linear([256, 4, 23])
mod2_classifier = linear([256, 4, 23])
net = encoder_individual_classifier(mod1_encoder, mod2_encoder, mod1_classifier, mod2_classifier)
net.cuda()
dset_train = data_loader(root_data_path=params['root_data'], labels_txt=all_labels, mode='trn', transform=None, randomize=True)
train_loader = DataLoader(dset_train, batch_size=params['batch_size'], num_workers=10, sampler=samplr)
dset_val = data_loader(root_data_path=params['root_data'], labels_txt=all_labels, mode='val', transform=None, randomize=False)
val_loader = DataLoader(dset_val, batch_size=params['batch_size'], shuffle=False, num_workers=10)
optimizer = torch.optim.Adam(net.parameters(), lr=params['lr'])
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=params['decay'], gamma=0.1)
tmp = linear([2,2])
for epoch in ....
I noticed that gradients backpropagated are different after adding the ‘tmp’ layer even after using the same seed for both the cases. What could be the possible explanation for this?