I create blocks of layers and combine them in a tree structure. The blocks are stored in a list:
blocks = [block0, block1, block2]
And then, the forward function will hierarchically use these blocks:
-> block1 -> output1
block0-|
-> block2 -> output2
When I implement this without recursion, it works:
def forward(self, x):
x = self.blocks[0](x)
y1 = self.blocks[1](x)
y2 = self.blocks[2](x)
self.outputs = [y1, y2]
return self.outputs
But, when I implement it in recursion, it raises a RuntimeError:
def forward(self, x):
x = self.blocks[0](x)
self.traverse(0, x)
return self.outputs
def traverse(self, now, x):
leaves = True
if now*2+1 < self.nodes: # self.nodes for this case is 3
y = self.blocks[now*2+1](x)
self.traverse(now*2+1, y)
leaves = False
if now*2+2 < self.nodes: # self.nodes for this case is 3
y = self.blocks[now*2+2](x)
self.traverse(now*2+2, y)
leaves = False
if leaves:
self.outputs.append(x)
It raises this error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [100, 10]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
My training part:
model.train()
with torch.autograd.detect_anomaly():
for i in range(epoch):
for image, label in tqdm(trainloaders):
image = image.to(device)
label = label.to(device)
out = model(image)
out = torch.mean(torch.stack(out), dim=0)
loss = criterion(out, label)
optimizer.zero_grad()
loss.backward(retain_graph=True)
optimizer.step()
Can someone help me understand why this error happens, please?