Hi everyone,
I’m trying to implement importance sampling based on https://github.com/idiap/importance-sampling into my PyTorch project.
My code:
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
# Compute per sample loss
per_sample_loss_fn = torch.nn.CrossEntropyLoss(reduction='none')
per_sample_loss = per_sample_loss_fn(pred, y)
# Compute per sample gradient w.r.t. the last layer
last_layer_params = model.lin3.parameters()
per_sample_last_layer_grads = torch.autograd.grad(per_sample_loss, last_layer_params)
# Compute prediction error
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
I am getting the following error when computing the per sample gradients w.r.t. the last layer:
File "/home/stdmichal/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/tmp/pycharm_project_451/test.py", line 128, in <module>
idx, scores = train(train_dataloader, model, loss_fn, optimizer)
File "/tmp/pycharm_project_451/test.py", line 46, in train
per_sample_last_layer_grads = torch.autograd.grad(per_sample_loss, last_layer_params)
File "/home/stdmichal/miniconda3/envs/pytorch_env/lib/python3.7/site-packages/torch/autograd/__init__.py", line 218, in grad
grad_outputs_ = _make_grads(outputs, grad_outputs_)
File "/home/stdmichal/miniconda3/envs/pytorch_env/lib/python3.7/site-packages/torch/autograd/__init__.py", line 50, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs
I found some similar problems (1, 2) but I am not trying to compute the whole backprop as in 1 nor can I reduce the loss as I need per sample gradients to determine sample’s importance.
What would be the correct way to do this please?