Hello, I need to manually accumulate the parameter.grad attributes of my network based on the values of another tensor. I’m working with the following variables:
- I have a bit-string array of sampled bits
m
, of size(M,)
(where M is the number of samples) - I also have two tensors of gradients,
grad1
andgrad2
.grad1[i]
andgrad2[i]
represent the gradients corresponding to sampled bits of -1 and 1 respectively. Each specific gradient tensor located ati
is a different size based on the parameter being modified. Ex. If M=5 (5 sampled bits), I will havegrad1
andgrad2
will each be tensors of 5 parameter gradient matrices. - I have
total_grad
, which takes the same shape asgrad1
andgrad2
.total_grad
is increased by eithergrad1
orgrad2
.
I’m trying to parallelize the process of accumulating total_grad
based on m
. If m[i] > 0
, I want to add grad1[i]
to total_grad[i]
for all samples. Otherwise, I want to add grad2[i]
. However, I don’t want to do this with a for loop (for obvious efficiency reasons), so I tried to parallelize this using torch.where()
:
for param in range(params)
total_grads[param][:] += torch.where(m[:] > 0, grad1[param][:], grad2[param][:])
This code appears to work for the first list of parameter gradients (of size (M, 4, 2)
), but not for the second (of size (M,4)
). I’m receiving the following Runtime error:
The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 1
Can someone help me understand whats going on here, or a better way to do this? Thanks!