When I pass inputs o = model(x) and print o.grad_fn I get an AddmmBackward0.
However, when I try to just take a single input, for example o[1].grad_fn I get a SelectBackward0.
Why is this?
When I use a DataLoader with batch_size=1, I get AddmmBackward0.
Anyways, down the line I have this issue:
>>>o[i] # got this from calling
tensor([-2.0692, 2.0274], grad_fn=<SelectBackward0>)
# got this from data loader
#for i, data in enumerate(unknown_dataloader):
# inputs, labels = data
# outputs = model(inputs)
outputs
tensor([[-2.0692, 2.0274]], grad_fn=<AddmmBackward0>)
if I call self.criterion(o[i], labels) I get an error: RuntimeError: size mismatch (got input: [2], target: [1])
How would I fix this for all o? I don’t want to use a dataloader to run the entire inputs in batch sizes of 1.
You are seeing SelectBackward0 because you are indexing/selecting the output via o[0] which is a differentiable operation and are then checking the .grad_fn attribute of this indexed tensor.
You would need to explain the use case a bit more, i.e. which criterion is used, what the output and target shapes are expected to be, why you are indexing the output etc.
I’m using SoftMax & CrossEntropy, this is a classification problem.
I’m implementing a novel approach that uses the values of backpropagation on a trained model, without optimizing. In essence, I want to save the backpropagation gradient values for every input given to me for this task.
for i, data in enumerate(some_dataloader): # dataloader batch size is 1
inputs, labels = data
model.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels) # torch.nn.CrossEntropyLoss()
# BP
loss.backward()
# get the gradient of some layer
grad = some_layer.bias.grad # I use this later
Before this loop occurs I have already passed all the data I need through the model, o = model(data)
and so I want to use this output o[i] instead of recalculating it in the loop all over again.
Edit:
If already on topic, is there a way for this to work on every sample with a larger batch size? For example, loss= criterion(outputs, labels) with batch size 32 would be 32 gradients, rather than 1?
Would you know of a solution to the first part of my problem? Even if I calculate the gradients efficiently, I would still need to pass the entire test data twice.