I want to get gradients of each sample’s loss w.r.t. each parameters.
- the number of total data is N
- mini-batch size: B
- parameter (filter) shape:
[C_in, C_out, w, h]
I want to get the gradients for each sample, that is
- gradients shape:
[N, C_in, C_out, w, h].
To get the gradients for all samples,
I tried to get the gradients for each mini-batch.
- graidnets shape:
[B, C_in, C_out, w, h]
However, after i use
loss = torch.sum(-onehot*pred, dim=1) # onehot: [B, # of class] # pred: [B, # of class] loss.backward(gradient=torch.ones_like(loss))
I found that the gradients of loss w.r.t. parameters
have the shape of
[C_in, C_out, w, h] not
[B, C_in, C_out, w, h].
That is, the gradients of loss for each sample are accumulated.
How can i get the gradient for each sample?
Is the only way to solve using for-loop?