Is there a way to train a CNN with the loss of another net?

I have 2 CNNs. One is called net and the other is called mask. Both have the same architecture but are initialised differently. My training loop looks like this:

for epoch in range(epochs):

  print(f"Epoch {epoch + 1}")
  
  net.train()
  mask.train()
  
  # I use the mask to prune the net
  net_masked = apply_mask(net, mask)
  
  running_loss = 0.0
  
  for images, labels in train_loader:
    images, labels = images.to(device), labels.to(device)
  
    # I use the images to calculate the loss of the net
    outputs_net = net_masked(images)
    loss_net = criterion(outputs_net, labels)
  
    outputs_mask = mask(images) # I also get the loss of the mask (I don't use loss_mask.item())
    loss_mask = criterion(outputs_mask, labels)
  
    loss = loss_net + loss_mask*0 # Now loss looks like this : tensor(2.2969, grad_fn=<AddBackward0>)
  
    optimizer_mask.zero_grad()
  
    loss.backward() # Here I try to calculate the gradients
  
  
    optimizer_mask.step()  # Update weights
  
    running_loss += loss_net.item()
  
  train_loss.append(running_loss / len(train_loader))

What I am trying to do is update the mask weighs to prune net in different ways each time. My problem is that when I try to compute the gradients I get “None” in all of them.

I don’t exactly know why the gradients are not beeing computed maybe is the way I am computing the loss.

If someone can help me I would be so grateful