Hi, everyone! I am confronted by a really weird phenomenon
I’ve written a pytorch code, whose backbone is listed below:(please keep in mind that target_feature_extraction_module and target_classificaiton_module have been adequately trained before so that the classification accuracy can reach 100% when used in conjunction with each other)
for cur_epoch in range(epoch_num):
target_feature_extraction_module.train()
target_classification_module.train()
source_feature_extraction_module.train()
source_to_target_feature_trans.train()
source_classification_module.train() #All the five models required, which are all the instantiated objects of nn.Module
target_data = list(enumerate(target_train_loader))
source_data = list(enumerate(source_train_loader))
rounds_per_epoch = min(len(target_data), len(source_data))
for batch_idx in range(rounds_per_epoch):
_, (target_train,target_label) = target_data[batch_idx]
_, (source_train,source_label) = source_data[batch_idx]
if with_nvidia:
target_train = target_train.float().cuda()
target_label = target_label.cuda()
source_train = source_train.float().cuda()
target_feature = target_feature_extraction_module(target_train)
source_feature = source_feature_extraction_module(source_train)
source_shape_changed_feature = source_to_target_feature_trans(source_feature)
target_classification_result, target_before_last_linear = target_classification_module(target_feature)
#just print the accuracy of classification
y_predict = target_classification_result.detach().cpu().numpy()
y_predict = np.argmax(y_predict, axis=1)
acc = accuracy_score(y_predict,target_label.cpu().numpy())
print(acc) #always equals to 1(whenever in any batch, any circumstance) since the models has been adequately trained before
#the following sentence is the most horrible and even magical part for me
#a , b = target_classification_module(source_shape_changed_feature)
target_classification_loss = nn.CrossEntropyLoss()(target_classification_result,target_label)
str_out = "Epoch:" + str(cur_epoch) +" batch_num:"+str(batch_idx)+" t_c_loss:"+str(target_classification_loss.data.cpu().numpy())
print(str_out)
target_classification_loss.backward()
for the_optim in optimizer_list:
the_optim.step()
for the_optim in optimizer_list:
the_optim.zero_grad()
if cur_epoch%2 == 0:
target_feature_extraction_module.eval()
target_classification_module.eval()
eval_model_traindata(target_feature_extraction_module,target_classification_module,target_train_loader,cur_epoch,with_nvidia) #this function will calculate the accuracy for training set, which operates the same as the print(acc) part above. So the output value of this function is expected to be 1 as well.
Please pay attention to the “#a , b = target_classification_module(source_shape_changed_feature)” in the middle of the code above. If no changes are made for the code above, the eval_model_traindata will output 1. However, If I change this sentence from a code comment to a line of actual code “a, b = target_classification_module(source_shape_changed_feature)”, the output value of eval_model_traindata function will be 0.3, which is really weird! Since the backpropagation proecess seems irrelevant to a and b, the accuracy and the parameters of the models shouldn’t be affected at all.
I can promise that the “eval_model_traindata” function can properly calculate the accuracy of a given model, which can also be verified by the fact that with “#a , b =target_classification_module(source_shape_changed_feature)” ,it can output 1 as expected.
I’ve run the code many times today. If “#a , b = target_classification_module(source_shape_changed_feature)” is annotated, the value of both acc and eval_model_traindata will be 1. However, under the condition of “a , b = target_classification_module(source_shape_changed_feature)”, only acc equals to 1 and the output of eval_model_traindata is a value lower than 0.5
I sincerely appreciate all your help and suggestions!