retain_graph=True works, but why tho?


I have following model and training routine:

model = nn.Sequential(
        nn.Linear(feats_size, feats_size / 2),
        nn.Linear(feats_size / 2, feats_size / 4),
        nn.Linear(feats_size / 4, num_images),

for batch_id, batch in enumerate(self.t_loader):
        feats = batch[0].to(device)
        labels = batch[1].to(device)
        output = model(feats)
        loss = criterion(output, labels)

From other related discussion threads, I gathered that retain_graph=True is required when there is some sharing between/across batches. Clearly there is no such sharing in the above case but PyTorch still throws an error without retain_graph=True. What am I missing?

Retaining the graph would add an overhead as the computation graph with the stored intermediate forward activations won’t be freed.
I wouldn’t know why it should fail so could you explain your expectation of failure a bit more?

My bad, by error, I was referring to the following (when retain_graph=True is removed from the above code):

Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

And I don’t understand why the previous batch’s graph is required for later propagations? :thinking:

I cannot reproduce the error using your model definition and your training loop with random input data:

feats_size = 16
num_images = 10
model = nn.Sequential(
        nn.Linear(feats_size, feats_size // 2),
        nn.Linear(feats_size // 2, feats_size // 4),
        nn.Linear(feats_size // 4, num_images),
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

num_samples = 100
dataset = TensorDataset(torch.randn(num_samples, 16), torch.randint(0, num_images, (num_samples,)))
loader = DataLoader(dataset, batch_size=5)

criterion = nn.CrossEntropyLoss()
device = 'cpu'

for batch_id, batch in enumerate(loader):
        feats = batch[0].to(device)
        labels = batch[1].to(device)
        output = model(feats)
        loss = criterion(output, labels)

I also don’t see how this error could be raised in your code as the training loop doesn’t seem to append operations to the computation graph from previous iterations.

Yes! the above code works fine (without retain_graph=True).
Made me realise that the problem is neither with the model nor with the training procedure. It was of-course with the dataset/loader:

  • For constructing the Dataset, I was loading data (feature tensors) from disk, which had requires_grad = True (were saved by some other process that way).

Just doing data = data.detach() before constructing the Dataset worked. Thanks for helping out :slight_smile:

Oh, that’s tricky to narrow down but good to hear you’ve found the issue and it’s working now. :slight_smile:

