Hello!
I have following model and training routine:
model = nn.Sequential(
nn.Linear(feats_size, feats_size / 2),
nn.ReLU(),
nn.Linear(feats_size / 2, feats_size / 4),
nn.ReLU(),
nn.Linear(feats_size / 4, num_images),
nn.Tanh()
)
for batch_id, batch in enumerate(self.t_loader):
feats = batch[0].to(device)
labels = batch[1].to(device)
output = model(feats)
loss = criterion(output, labels)
loss.backward(retain_graph=True)
optimizer.step()
optimizer.zero_grad()
From other related discussion threads, I gathered that retain_graph=True
is required when there is some sharing between/across batches. Clearly there is no such sharing in the above case but PyTorch still throws an error without retain_graph=True
. What am I missing?
Retaining the graph would add an overhead as the computation graph with the stored intermediate forward activations won’t be freed.
I wouldn’t know why it should fail so could you explain your expectation of failure a bit more?
My bad, by error, I was referring to the following (when retain_graph=True
is removed from the above code):
Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
And I don’t understand why the previous batch’s graph is required for later propagations?
I cannot reproduce the error using your model definition and your training loop with random input data:
feats_size = 16
num_images = 10
model = nn.Sequential(
nn.Linear(feats_size, feats_size // 2),
nn.ReLU(),
nn.Linear(feats_size // 2, feats_size // 4),
nn.ReLU(),
nn.Linear(feats_size // 4, num_images),
)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
num_samples = 100
dataset = TensorDataset(torch.randn(num_samples, 16), torch.randint(0, num_images, (num_samples,)))
loader = DataLoader(dataset, batch_size=5)
criterion = nn.CrossEntropyLoss()
device = 'cpu'
for batch_id, batch in enumerate(loader):
feats = batch[0].to(device)
labels = batch[1].to(device)
output = model(feats)
loss = criterion(output, labels)
loss.backward()
optimizer.step()
optimizer.zero_grad()
I also don’t see how this error could be raised in your code as the training loop doesn’t seem to append operations to the computation graph from previous iterations.
1 Like
Yes! the above code works fine (without retain_graph=True
).
Made me realise that the problem is neither with the model nor with the training procedure. It was of-course with the dataset/loader:
- For constructing the Dataset, I was loading data (feature tensors) from disk, which had
requires_grad = True
(were saved by some other process that way).
Just doing data = data.detach()
before constructing the Dataset worked. Thanks for helping out
1 Like
Oh, that’s tricky to narrow down but good to hear you’ve found the issue and it’s working now.
1 Like