Get values from tensor without detaching?

colinrsmall · December 3, 2021, 5:19pm

I’m a little new to PyTorch. I’m training a model where the loss is computed as the MSE between running the output of the model through a function and running the training sample through the same function. To run the output through the function, I need to access the values of the output tensor in my model and convert them to numpy arrays (which requires you to detach the tensor before converting). However, I can’t detach the output because I need to maintain the computation graph for back propagation. How can I access the value of the output tensor without detaching it?

AlphaBetaGamma96 · December 3, 2021, 6:09pm

You could use clone, you could do some like,

loss_as_numpy_array = loss.clone().detach().numpy() #copy and detach
loss.backward() #backprop loss

colinrsmall · December 3, 2021, 7:31pm

Unfortunately that’s not fixing the error I’m seeing (RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn), but I realize the issue is likely elsewhere in my code.

I assume the input values to the loss (nn.MSELoss()) function need to have the graph structure attached in order to backpropagate the loss through the structure. Right now, I’m using the output of the function mentioned above as, which obviously doesn’t include the graph structure, as input to the loss function. Is there any way to reattach the graph structure? I’ve looked into creating custom loss functions in PyTorch, but the function I’m passing the outputs through can’t be easily implemented using PyTorch functions alone. (I’m using another neural network as a discriminator, and I want the network I’m training to minimize the loss between running the output through the NN discriminator and the running the training sample through the NN discriminator. The loss should still be differentiable given that I only need to take the MSE between the two outputs…)

AlphaBetaGamma96 · December 3, 2021, 8:53pm

Could you share some code or a reproducible snippet?

colinrsmall · December 3, 2021, 9:12pm

Sure. I’ve done my best to clean up the code and paraphrase where I can. The actual code is really, really messy since the discriminator model is implemented in Tensorflow (don’t ask). Here’s the training loop:

criterion = nn.MSELoss()

for epoch in range(num_epochs):
  for i, training_batch in enumerate(training_samples):
    
    net.zero_grad()
    output = net(training_batch)

    # Reformat training batch for discriminator
    training_batch2 = training_batch.clone().detach().cpu().numpy()
	# Emitted code: format training_batch2 to fit into discriminator model

    # Run discriminator model with training batch
    # paraphrased code: output_pred_vals = discriminator(training_batch2)

    # Reformat model output for discriminator
    output2 = output.clone().detach().cpu().numpy()
    # Emitted code: format output2 to fit into discriminator model

    # Run discriminator model with training batch
    # paraphrased code: training_pred_vals = discriminator(output2)
    
    output_pred = torch.as_tensor(output_pred_vals).to(device, dtype=torch.float) 
    training_pred = torch.as_tensor(training_pred_vals).to(device, dtype=torch.float) 

    training_loss = criterion(output_pred, training_pred)

    training_loss.backward()
    optimizer.step()

    # print training loss, validation loss, etc.

zilong · December 4, 2021, 1:40am

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I think this error happens because of the following line:

training_batch2 = training_batch.clone().detach().cpu().numpy()

You do not compute the gradients of the input (batch) so, it has required_grad=False. See if removing .detach() from that line removes that error?

AlphaBetaGamma96 · December 4, 2021, 3:47am

Also following on what @zilong said, make sure you don’t detach the gradients you want to compute. You stated you want a copy for numpy and one for gradients, make sure the numpy version is detached and the one that propagates through to the loss is not detach!

colinrsmall · December 4, 2021, 5:20pm

This gets back to the original problem I posted about. I can’t get the numpy ndarray of the output tensor from the model without detaching it first, but I need the numpy ndarray of the tensor to compute the gradients of the loss.

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

There is a second problem here in that the graph of the network I am training needs to be reattached to the outout_pred and training_pred variables in the code sample above.

Perhaps what I am trying to do (use the output of a separate NN run on the output of the network I am training and the training sample and compute the MSE loss between these two outputs) is impossible given how back-propagation works, but unfortunately I don’t know enough about back-propagation calculations to answer this definitively.

AlphaBetaGamma96 · December 4, 2021, 6:02pm

Are you using a numpy function to calculate some intermediate value? If so, which one?

It seems like you’re calculating some term with numpy and that breaks the graph. Is this true?

my3bikaht · December 5, 2021, 10:22am

Problem with this loop is that loss you are trying to backpropagate is still detached from graph.

I came up with idea once before to multiply dummy loss by zero, and add (calculated on the) side value, this way graph will stay intact, but now I think this operation will probably zero all gradients upstream.

Maybe there’s a way to replace value of calculated dummy loss inplace with new value, but not sure how, since torch doesn’t allow to modify values of leaf tensors inplace.

colinrsmall · December 6, 2021, 3:32pm

As described above, I’m using a separate, fully trained neural network to calculate intermediate values from which the loss for the model I’m training is calculated. This NN isn’t directly implemented in numpy.

colinrsmall · December 6, 2021, 4:01pm

Extending torch.autograd.Function (Extending PyTorch — PyTorch master documentation) looks like a promising option to calculate the loss from the intermediate values while maintaining the graph structure.

There’s got to be an easier solution, but extending Tensor and adding a function to retrieve a numpy ndarray of the tensor’s data without detaching it from the graph might be another option.

AlphaBetaGamma96 · December 6, 2021, 6:58pm

hmmm so I’m struggling to visualize why numpy is even called? If your loss is calculate purely by a nn.Module why do you need to detach it to numpy?

I assume this is the reason as to why you’re detaching to numpy? So, you can pass your values to TF? I don’t think that’ll work as how is PyTorch meant to track TF’s operations? Also, TF and PyTorch are structured completely differently. PyTorch is dynamic whereas TF is static. Is it not possible to re-write the TF code in PyTorch?

Also, torch.autograd.Function is a potential avenue for adding in additional functional to PyTorch with the python APi but there are some caveats to it, for example, you won’t be able to JIT your network. And, you’ll need to manually define your derivatives which gets extremely messy if your operation involves Linear algebra operations.

Avi_Gershon · December 11, 2021, 10:57am

Excellent question, have you found a solution already ?

Sourabh · April 14, 2023, 3:55pm

Was a solution to this found??

AlphaBetaGamma96 · April 15, 2023, 5:27pm

I’d ask @colinrsmall