Error regarding calling numpy() on Tensor with requires grad

Hi all,

I am implementing part of the code for Federated Matched Averaging (full code found here: https://github.com/IBM/FedMA/blob/master/language_modeling/language_fedma.py)

An error occurs when the following line runs:

batch_weights_norm = [w * s for w, s in zip(weights_bias, sigma_inv_layer)]

And, the runtime error is

RuntimeError                              Traceback (most recent call last)
c:\Users\65967\Desktop\Federated Learning\Python\PyT_FedMA.py in <module>
     345                                                                     it=it,
     346                                                                     n_layers=NUM_LAYERS,
---> 347                                                                     matching_shapes=matching_shapes)
     348     matching_shapes.append(next_layer_shape)
     349     assignments_list.append(assignments)

c:\Users\65967\Desktop\Papers\battery\Federated Learning\Python\language_fedma.py in layerwise_fedma(batch_weights, layer_index, sigma_layers, sigma0_layers, gamma_layers, it, n_layers, matching_shapes)
    283     ########################################
    284     assignment_c, global_weights_c, global_sigmas_c, popularity_counts = match_layer(weights_bias, sigma_inv_layer, mean_prior,
--> 285                                                                   sigma_inv_prior, gamma, it)
    286 
    287     ########################################

c:\Users\65967\Desktop\Federated Learning\Python\language_fedma.py in match_layer(weights_bias, sigma_inv_layer, mean_prior, sigma_inv_prior, gamma, it)
    146     #AA: On how to use built-in sorted() function with key attribute https://www.programiz.com/python-programming/methods/built-in/sorted
    147     group_order = sorted(range(J), key=lambda x: -weights_bias[x].shape[0]) #AA: This sorts index of J from one with largest negative number (hence smallest value) to lowest negative number (hence largest value)?? Does it??
--> 148     batch_weights_norm = [w * s for w, s in zip(weights_bias, sigma_inv_layer)] #AA: RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
    149     #batch_weights_norm = [w.detach().numpy() * s.detach().numpy() for w, s in zip(weights_bias, sigma_inv_layer)]
    150     #batch_weights_norm = [torch.Tensor.cpu(w) * torch.Tensor.cpu(s) for w, s in zip(weights_bias, sigma_inv_layer)]

c:\Users\65967\Desktop\Federated Learning\Python\language_fedma.py in <listcomp>(.0)
    146     #AA: On how to use built-in sorted() function with key attribute https://www.programiz.com/python-programming/methods/built-in/sorted
    147     group_order = sorted(range(J), key=lambda x: -weights_bias[x].shape[0]) #AA: This sorts index of J from one with largest negative number (hence smallest value) to lowest negative number (hence largest value)?? Does it??
--> 148     batch_weights_norm = [w * s for w, s in zip(weights_bias, sigma_inv_layer)] #AA: RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
    149     #batch_weights_norm = [w.detach().numpy() * s.detach().numpy() for w, s in zip(weights_bias, sigma_inv_layer)]
    150     #batch_weights_norm = [torch.Tensor.cpu(w) * torch.Tensor.cpu(s) for w, s in zip(weights_bias, sigma_inv_layer)]

~\anaconda3\envs\torch17cuda11\lib\site-packages\torch\_tensor.py in __array__(self, dtype)
    676             return handle_torch_function(Tensor.__array__, (self,), self, dtype=dtype)
    677         if dtype is None:
--> 678             return self.numpy()
    679         else:
    680             return self.numpy().astype(dtype, copy=False)

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

I have tried using w.detach().numpy() and torch.Tensor.cpu(w) to rectify the error but I got a message saying that w is already a numpy array.

Could anyone please enlighten me on what is triggering this error?

Thank you very much.

Calling .detach().numpy() should work, since the error message explains that an explicit detaching is needed as seen here:

x = torch.randn(1, requires_grad=True)
x.numpy()
# > RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
print(x.detach().numpy())
# > array([-0.13592759], dtype=float32)

Thank you for the reply!

I had tried w.detach().numpy() previously. Specifically, this was how I revised line 148:

 batch_weights_norm = [w.detach().numpy() * s.detach().numpy() for w, s in zip(weights_bias, sigma_inv_layer)]

I get the following attribute error message which indicates w is a numpy array already. Hence, my confusion.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
c:\Users\65967\Desktop\Federated Learning\Python\PyT_FedMA.py in <module>
     345                                                                     it=it,
     346                                                                     n_layers=NUM_LAYERS,
---> 347                                                                     matching_shapes=matching_shapes)
     348     matching_shapes.append(next_layer_shape)
     349     assignments_list.append(assignments)

c:\Users\65967\Desktop\Federated Learning\Python\language_fedma.py in layerwise_fedma(batch_weights, layer_index, sigma_layers, sigma0_layers, gamma_layers, it, n_layers, matching_shapes)
    283     ########################################
    284     assignment_c, global_weights_c, global_sigmas_c, popularity_counts = match_layer(weights_bias, sigma_inv_layer, mean_prior,
--> 285                                                                   sigma_inv_prior, gamma, it)
    286 
    287     ########################################

c:\Users\65967\Desktop\Federated Learning\Python\language_fedma.py in match_layer(weights_bias, sigma_inv_layer, mean_prior, sigma_inv_prior, gamma, it)
    147     group_order = sorted(range(J), key=lambda x: -weights_bias[x].shape[0]) #AA: This sorts index of J from one with largest negative number (hence smallest value) to lowest negative number (hence largest value)?? Does it??
    148     #batch_weights_norm = [w * s for w, s in zip(weights_bias, sigma_inv_layer)] #AA: RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
--> 149     batch_weights_norm = [w.detach().numpy() * s.detach().numpy() for w, s in zip(weights_bias, sigma_inv_layer)]
    150     #batch_weights_norm = [torch.Tensor.cpu(w) * torch.Tensor.cpu(s) for w, s in zip(weights_bias, sigma_inv_layer)]
    151 

c:\Users\65967\Desktop\Federated Learning\Python\language_fedma.py in <listcomp>(.0)
    147     group_order = sorted(range(J), key=lambda x: -weights_bias[x].shape[0]) #AA: This sorts index of J from one with largest negative number (hence smallest value) to lowest negative number (hence largest value)?? Does it??
    148     #batch_weights_norm = [w * s for w, s in zip(weights_bias, sigma_inv_layer)] #AA: RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
--> 149     batch_weights_norm = [w.detach().numpy() * s.detach().numpy() for w, s in zip(weights_bias, sigma_inv_layer)]
    150     #batch_weights_norm = [torch.Tensor.cpu(w) * torch.Tensor.cpu(s) for w, s in zip(weights_bias, sigma_inv_layer)]
    151 

AttributeError: 'numpy.ndarray' object has no attribute 'detach'

I guess weights_bias might be a tensor sometimes and a numpy array other times?
If so, you could add an if condition to check for its type.

1 Like

Thank you for the reply and spending time on my question! I appreciate it.

It is an interesting thought that weights_bias can be tensor or a numpy array at sometimes. I’d like to clarify that I am working on a Federated learning problem with 3 local clients. weights_bias is actually a list containing the weights of the first layer of each client’s model. Therefore length weights_bias is 3 (for 3 clients). With this in mind, I checked the data type of each element of the list weights_bias and the list sigma_inv_layer:

    logger.info("weights bias: {}".format(weights_bias[0].shape)) #AA: weights_bias[0] refers to first client
    logger.info("weights bias: {}".format(weights_bias[1].shape))
    logger.info("weights bias: {}".format(weights_bias[2].shape))
    logger.info("sigma_inv_layer: {}".format(sigma_inv_layer[0].shape))
    logger.info("sigma_inv_layer: {}".format(sigma_inv_layer[1].shape))
    logger.info("sigma_inv_layer: {}".format(sigma_inv_layer[2].shape))

The output below shows that each element of the weights_bias is a tensor but elements of sigma_inv_layer are not. Hence, w * s is resulting in errors as you mentioned.

INFO:root:weights bias: torch.Size([3, 800])
INFO:root:weights bias: torch.Size([3, 800])
INFO:root:weights bias: torch.Size([3, 800])
INFO:root:sigma_inv_layer: (800,)
INFO:root:sigma_inv_layer: (800,)
INFO:root:sigma_inv_layer: (800,)

As I need w to be a tensor, one solution I can think of is to change s to be a tensor too. The following revised code eliminates the original error message but I am afraid this tweak is throwing up further errors downstream:

batch_weights_norm = [w * torch.tensor(s, requires_grad=True) for w, s in zip(weights_bias, sigma_inv_layer)]

Thus, may I check if there are any alternative approaches to making the original line of code work? (It seems to have worked for the authors of the github repo). If it’s not possible, it’s okay. I am new to Pytorch.

batch_weights_norm = [w * s for w, s in zip(weights_bias, sigma_inv_layer)]

Thank you once again!

Ah, good catch!
Based on your original code snippet you don’t want w or s to track the gradient history, since you are explicitly detaching both and are using the numpy arrays (which Autograd won’t be able to track).
In this case, keep the w.detach().numpy() usage and just multiply it with s directly (without trying to detach() it).

Thank you for your reply! I was only able to catch it thanks to your astute observation.

Sorry, to clarify, the original code snippet I was referring to is below (sorry, wasn’t clearer in my question).

batch_weights_norm = [w * s for w, s in zip(weights_bias, sigma_inv_layer)]

w needs to track the gradient as it is the weights. It looks like s does not need to as it is not a tensor in the first place. I only used w.detach().numpy() due to the initial runtime error.

So, in conclusion, may I check that it is not possible to perform the w * s elementwise multiplication without changing s to a tensor?

Thank you!

Yes, if s is a numpy array create the tensor via torch.from_numpy(s). Otherwise w * s will try to convert w to a numpy array and thus the error message will be raised.

Thank you! I understand it now. :smiley: