I have 2 CNNs. One is called net and the other is called mask. Both have the same architecture but are initialised differently. My training loop looks like this:
for epoch in range(epochs):
print(f"Epoch {epoch + 1}")
net.train()
mask.train()
# I use the mask to prune the net
net_masked = apply_mask(net, mask)
running_loss = 0.0
for images, labels in train_loader:
images, labels = images.to(device), labels.to(device)
# I use the images to calculate the loss of the net
outputs_net = net_masked(images)
loss_net = criterion(outputs_net, labels)
outputs_mask = mask(images) # I also get the loss of the mask (I don't use loss_mask.item())
loss_mask = criterion(outputs_mask, labels)
loss = loss_net + loss_mask*0 # Now loss looks like this : tensor(2.2969, grad_fn=<AddBackward0>)
optimizer_mask.zero_grad()
loss.backward() # Here I try to calculate the gradients
optimizer_mask.step() # Update weights
running_loss += loss_net.item()
train_loss.append(running_loss / len(train_loader))
What I am trying to do is update the mask weighs to prune net in different ways each time. My problem is that when I try to compute the gradients I get “None” in all of them.
I don’t exactly know why the gradients are not beeing computed maybe is the way I am computing the loss.
If someone can help me I would be so grateful