I know the question seems like it answers itself (compute weight gradients without gradients?). But the issue I am trying to resolve is that I will need to change leaf variables before computing the backwards pass. Without no_grad pytorch will complain about in-place modifications.

- Define the model (it is split in two stages of a pipeline):

```
class Pipe1(nn.Module):
def __init__(self):
super(Pipe1, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2(x), 2))
return x
class Pipe2(nn.Module):
def __init__(self):
super(Pipe2, self).__init__()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x)
```

- Make the pipeline:

```
pipe = [Pipe1(), Pipe2()]
```

- Run a micro-batch through the pipeline:

```
with torch.no_grad():
for batch_idx, (data, target) in enumerate(train_loader):
output = pipe [1](pipe [0](data))
loss = F.nll_loss(output, target)
pipe[0].conv1.weight *= 3
```

And… what is four? Given the current loss I would ideally want to compute the derivatives with respect to the new updated weights. Is there any “proper” way of doing this in Pytorch? Would I need to compute the gradients myself?