Changing view of input breaks graph

VRehnberg · May 4, 2021, 3:12pm

I want to compute the gradient between an input and an output. However, when I change the view of the input I get an error message

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

What I’ve tried is

import torch

func = torch.nn.Linear(5, 6)
x = torch.rand(3, 5, requires_grad=True)
for x_row in x.unbind(0):
    y_row = func(x_row)
    torch.autograd.grad(y_row, [x_row], grad_outputs=torch.ones_like(y_row), create_graph=True)[0]
    # This is fine

y = func(x)
for x_row, y_row in zip(torch.split(x, 1, dim=0), torch.split(y, 1, dim=0)):
    torch.autograd.grad(y_row, [x_row], grad_outputs=torch.ones_like(y_row), create_graph=True)[0]
    # RuntimeError

y = func(x)
for x_row, y_row in zip(x.unbind(0), y.unbind(0)):
    torch.autograd.grad(y_row, [x_row], grad_outputs=torch.ones_like(y_row), create_graph=True)[0]
    # RuntimeError


x_row = torch.rand(1, 5, requires_grad=True)
x_row = x_row.flatten()
y_row = func(x_row)
torch.autograd.grad(y_row, [x_row], grad_outputs=torch.ones_like(y_row), create_graph=True)[0]
# This is fine

x_row = torch.rand(1, 5, requires_grad=True)
y_row = func(x_row)
x_row = x_row.flatten()
y_row = y_row.flatten()
torch.autograd.grad(y_row, [x_row], grad_outputs=torch.ones_like(y_row), create_graph=True)[0]
# RuntimeError

How am I supposed to do this?

VRehnberg · May 5, 2021, 11:36am

So the problem is probably quite obvious. It has to do with the direction of the DAG of gradients.

A possible way to rework this is

import torch

func = torch.nn.Linear(5, 6)
x = torch.rand(3, 5, requires_grad=True)
x_rows = x.unbind(0)
x = torch.vstack(x_rows)
y_rows = func(x).unbind(0)
for x_row, y_row in zip(x_rows, y_rows):
    torch.autograd.grad(y_row, [x_row], grad_outputs=torch.ones_like(y_row), create_graph=True)[0]

However, I wouldn’t consider this to be a solution as this depends on having access to what is put into the model. The best outcome would be if the experimental feature vmap would turn out to work together with autograd.grad.