Hi, everyone! I am confronted by a really weird phenomenon
I’ve written a pytorch code, whose backbone is listed below:(please keep in mind that target_feature_extraction_module and target_classificaiton_module have been adequately trained before so that the classification accuracy can reach 100% when used in conjunction with each other)
for cur_epoch in range(epoch_num): target_feature_extraction_module.train() target_classification_module.train() source_feature_extraction_module.train() source_to_target_feature_trans.train() source_classification_module.train() #All the five models required, which are all the instantiated objects of nn.Module target_data = list(enumerate(target_train_loader)) source_data = list(enumerate(source_train_loader)) rounds_per_epoch = min(len(target_data), len(source_data)) for batch_idx in range(rounds_per_epoch): _, (target_train,target_label) = target_data[batch_idx] _, (source_train,source_label) = source_data[batch_idx] if with_nvidia: target_train = target_train.float().cuda() target_label = target_label.cuda() source_train = source_train.float().cuda() target_feature = target_feature_extraction_module(target_train) source_feature = source_feature_extraction_module(source_train) source_shape_changed_feature = source_to_target_feature_trans(source_feature) target_classification_result, target_before_last_linear = target_classification_module(target_feature) #just print the accuracy of classification y_predict = target_classification_result.detach().cpu().numpy() y_predict = np.argmax(y_predict, axis=1) acc = accuracy_score(y_predict,target_label.cpu().numpy()) print(acc) #always equals to 1(whenever in any batch, any circumstance) since the models has been adequately trained before #the following sentence is the most horrible and even magical part for me #a , b = target_classification_module(source_shape_changed_feature) target_classification_loss = nn.CrossEntropyLoss()(target_classification_result,target_label) str_out = "Epoch:" + str(cur_epoch) +" batch_num:"+str(batch_idx)+" t_c_loss:"+str(target_classification_loss.data.cpu().numpy()) print(str_out) target_classification_loss.backward() for the_optim in optimizer_list: the_optim.step() for the_optim in optimizer_list: the_optim.zero_grad() if cur_epoch%2 == 0: target_feature_extraction_module.eval() target_classification_module.eval() eval_model_traindata(target_feature_extraction_module,target_classification_module,target_train_loader,cur_epoch,with_nvidia) #this function will calculate the accuracy for training set, which operates the same as the print(acc) part above. So the output value of this function is expected to be 1 as well.
Please pay attention to the “#a , b = target_classification_module(source_shape_changed_feature)” in the middle of the code above. If no changes are made for the code above, the eval_model_traindata will output 1. However, If I change this sentence from a code comment to a line of actual code “a, b = target_classification_module(source_shape_changed_feature)”, the output value of eval_model_traindata function will be 0.3, which is really weird! Since the backpropagation proecess seems irrelevant to a and b, the accuracy and the parameters of the models shouldn’t be affected at all.
I can promise that the “eval_model_traindata” function can properly calculate the accuracy of a given model, which can also be verified by the fact that with “#a , b =target_classification_module(source_shape_changed_feature)” ,it can output 1 as expected.
I’ve run the code many times today. If “#a , b = target_classification_module(source_shape_changed_feature)” is annotated, the value of both acc and eval_model_traindata will be 1. However, under the condition of “a , b = target_classification_module(source_shape_changed_feature)”, only acc equals to 1 and the output of eval_model_traindata is a value lower than 0.5
I sincerely appreciate all your help and suggestions!