Gradients Across Two Networks ; autograd

Henbe · August 2, 2024, 6:24pm

Hi,

Context: I have a noisy training set (set A) and a very clean validation set (set B). For each batch of training, I want to:

Use set A in a first NN and calculate the loss (loss A)
Use the calculated gradients to adjust the weights of a second NN
Use set B on the second NN and calculate the loss (loss B)
Calculate the gradient of loss B w.r.t. the parameters changes induced by set A (so w.r.t. “loss vector A”)

My goal is to understand how elements of a batch of set A influence the loss B, checking the similarity between the training and validation inputs and gradient directions.
This way, I can adjust the importance (through a weight vector) assigned to each noisy example based on their “helpfulness” (similar gradient directions) to the validation set.
(up-weighting helpful training example and down-weighting harmful ones)

To achieve this, I planned to compute these gradients with autograd. However, I’m facing issues in trying to make them part of the same graph. I keep encountering different errors and am unsure of the best approach.

Please find below my initial approach and an alternative I’m considering.
(in the middle of trying the alternative approach but I want to post it before the weekend to maximize the chances of getting a reply)

Please, any help / indication / comments would be VERY HIGHLY appreciated!

This is my first time posting on this forum. Please don’t hesitate to let me know if i should change the format of the topic, way it is presented, reformulate it, etc.

CODE 1 : initial approach

  def reweight_autodiff_example_NewTry(self, inp_a, label_a, inp_b, label_b, bsize_a, bsize_b, eps=0.0):
        
        model = self
        model.train()  

        # --- First pass (Noisy) ---   
        optimizer = torch_optim.Adam(model.parameters(), lr=0.001)
        optimizer.zero_grad()  

        output_a = model(inp_a)  
        criterion = nn.BCELoss(reduction='none')
        loss_a = criterion(output_a, label_a)
        
        # Initialize example weights ex_wts_a with requires_grad=True
        ex_wts_a = torch.ones(bsize_a, dtype=torch.float32, requires_grad=True) / bsize_a 
        weighted_loss_a = (loss_a * ex_wts_a).sum()
        
        # Compute gradients of weighted_loss_a w.r.t. model parameters
        grads = torch.autograd.grad(weighted_loss_a, model.parameters(), create_graph=True)
        
        # --- Apply / adjust parameters in second NN with validation set   ---
        model_new = CSNeuralNetwork(model.lin_layer1.in_features)
        with torch.no_grad():
            for param_new, param, grad in zip(model_new.parameters(), model.parameters(), grads):
                param_new.copy_(param - grad)
        
        """    Or removing the 'torch.no_grad():'?  but then there are leaf issues (a leaf Variable that requires grad is being used in an in-place operation)
        for param_new, param, grad in zip(model_new.parameters(), model.parameters(), grads):
                param_new.copy_(param - grad)
        """  

        # --- Second pass (Clean) ---
        model_new.train()
        output_b = model_new(inp_b)
        loss_b = criterion(output_b, label_b)
        
        ex_wts_b = torch.ones(bsize_b, dtype=torch.float32) / bsize_b  

        # Compute the clean weighted loss
        weighted_loss_b = (loss_b * ex_wts_b).sum()


        # Compute gradient of weighted_loss_b w.r.t. ex_wts_a
        grads_ex_wts = torch.autograd.grad(weighted_loss_b, ex_wts_a, create_graph=True)[0]
        
        #grads_ex_wts = torch.autograd.grad(weighted_loss_b, ex_wts_a, create_graph=True, allow_unused=True)[0] #Trying stuff around
        #grads_ex_wts = torch.autograd.grad(weighted_loss_b, weighted_loss_a, create_graph=True, allow_unused=True)[0] #Trying stuff around


        # --- Compute new example weights ---
        ex_weight = -grads_ex_wts
        ex_weight_plus = torch.maximum(ex_weight, torch.tensor(eps))
        ex_weight_sum = ex_weight_plus.sum()
        ex_weight_sum += torch.eq(ex_weight_sum, 0.0).float()
        ex_weight_norm = ex_weight_plus / ex_weight_sum

        print('\n\n***********************\n')
        print('At the end of reweight_autodiff_examples')
        print(f'loss_a: {loss_a}\n')
        print(f'weighted_loss_a: {weighted_loss_a}\n')
        print(f'grads: {grads}\n')
        print(f'weighted_loss_b: {weighted_loss_b}\n')
        print(f'grads_ex_wts: {grads_ex_wts}\n')
        print(f'ex_weight_norm: {ex_weight_norm}\n')

        return ex_weight_norm

Alternative approach

keeping the same logic as the code above, should i try computing gradient separetly and then computing the gradients w.r.t. to each others? such as :

grads_a = torch.autograd.grad(weighted_loss_a, model.parameters(), create_graph=True)
grads_b = torch.autograd.grad(weighted_loss_b, model_new.parameters(), create_graph=True) 
grads_ex_wts = torch.autograd.grad(grads_a, [ex_wts_a], grads_b, create_graph=True)[0]

Thank you very much for reading this. Any help is greatly appreciated.