I’m a little new to PyTorch. I’m training a model where the loss is computed as the MSE between running the output of the model through a function and running the training sample through the same function. To run the output through the function, I need to access the values of the output tensor in my model and convert them to numpy arrays (which requires you to detach the tensor before converting). However, I can’t detach the output because I need to maintain the computation graph for back propagation. How can I access the value of the output tensor without detaching it?
You could use
clone, you could do some like,
loss_as_numpy_array = loss.clone().detach().numpy() #copy and detach loss.backward() #backprop loss
Unfortunately that’s not fixing the error I’m seeing (RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn), but I realize the issue is likely elsewhere in my code.
I assume the input values to the loss (nn.MSELoss()) function need to have the graph structure attached in order to backpropagate the loss through the structure. Right now, I’m using the output of the function mentioned above as, which obviously doesn’t include the graph structure, as input to the loss function. Is there any way to reattach the graph structure? I’ve looked into creating custom loss functions in PyTorch, but the function I’m passing the outputs through can’t be easily implemented using PyTorch functions alone. (I’m using another neural network as a discriminator, and I want the network I’m training to minimize the loss between running the output through the NN discriminator and the running the training sample through the NN discriminator. The loss should still be differentiable given that I only need to take the MSE between the two outputs…)
Could you share some code or a reproducible snippet?
Sure. I’ve done my best to clean up the code and paraphrase where I can. The actual code is really, really messy since the discriminator model is implemented in Tensorflow (don’t ask). Here’s the training loop:
criterion = nn.MSELoss() for epoch in range(num_epochs): for i, training_batch in enumerate(training_samples): net.zero_grad() output = net(training_batch) # Reformat training batch for discriminator training_batch2 = training_batch.clone().detach().cpu().numpy() # Emitted code: format training_batch2 to fit into discriminator model # Run discriminator model with training batch # paraphrased code: output_pred_vals = discriminator(training_batch2) # Reformat model output for discriminator output2 = output.clone().detach().cpu().numpy() # Emitted code: format output2 to fit into discriminator model # Run discriminator model with training batch # paraphrased code: training_pred_vals = discriminator(output2) output_pred = torch.as_tensor(output_pred_vals).to(device, dtype=torch.float) training_pred = torch.as_tensor(training_pred_vals).to(device, dtype=torch.float) training_loss = criterion(output_pred, training_pred) training_loss.backward() optimizer.step() # print training loss, validation loss, etc.
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I think this error happens because of the following line:
training_batch2 = training_batch.clone().detach().cpu().numpy()
You do not compute the gradients of the input (batch) so, it has
required_grad=False. See if removing
.detach() from that line removes that error?
Also following on what @zilong said, make sure you don’t detach the gradients you want to compute. You stated you want a copy for numpy and one for gradients, make sure the numpy version is detached and the one that propagates through to the loss is not detach!
This gets back to the original problem I posted about. I can’t get the numpy ndarray of the output tensor from the model without detaching it first, but I need the numpy ndarray of the tensor to compute the gradients of the loss.
RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
There is a second problem here in that the graph of the network I am training needs to be reattached to the outout_pred and training_pred variables in the code sample above.
Perhaps what I am trying to do (use the output of a separate NN run on the output of the network I am training and the training sample and compute the MSE loss between these two outputs) is impossible given how back-propagation works, but unfortunately I don’t know enough about back-propagation calculations to answer this definitively.
Are you using a numpy function to calculate some intermediate value? If so, which one?
It seems like you’re calculating some term with numpy and that breaks the graph. Is this true?
Problem with this loop is that loss you are trying to backpropagate is still detached from graph.
I came up with idea once before to multiply dummy loss by zero, and add (calculated on the) side value, this way graph will stay intact, but now I think this operation will probably zero all gradients upstream.
Maybe there’s a way to replace value of calculated dummy loss inplace with new value, but not sure how, since torch doesn’t allow to modify values of leaf tensors inplace.
As described above, I’m using a separate, fully trained neural network to calculate intermediate values from which the loss for the model I’m training is calculated. This NN isn’t directly implemented in numpy.
Extending torch.autograd.Function (Extending PyTorch — PyTorch master documentation) looks like a promising option to calculate the loss from the intermediate values while maintaining the graph structure.
There’s got to be an easier solution, but extending Tensor and adding a function to retrieve a numpy ndarray of the tensor’s data without detaching it from the graph might be another option.
hmmm so I’m struggling to visualize why numpy is even called? If your loss is calculate purely by a
nn.Module why do you need to detach it to numpy?
I assume this is the reason as to why you’re detaching to numpy? So, you can pass your values to TF? I don’t think that’ll work as how is PyTorch meant to track TF’s operations? Also, TF and PyTorch are structured completely differently. PyTorch is dynamic whereas TF is static. Is it not possible to re-write the TF code in PyTorch?
torch.autograd.Function is a potential avenue for adding in additional functional to PyTorch with the python APi but there are some caveats to it, for example, you won’t be able to JIT your network. And, you’ll need to manually define your derivatives which gets extremely messy if your operation involves Linear algebra operations.
Excellent question, have you found a solution already ?