I just started to use PyTorch and I plan to build an ensemble of networks. I implemented it with a list. But I got the following error:
I’m confused by the error. The network has just 2 fc layers. Could you please help me explain that? Thanks in advance.
Traceback (most recent call last):
File "test_list.py", line 107, in <module>
list_of_loss[i].backward()
File "/home/weiguo/anaconda3/envs/pomdp/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/weiguo/anaconda3/envs/pomdp/lib/python3.6/site-packages/torch/autograd/__init__.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.
Here’s my code:
num_model = 3
list_of_model = []
list_of_cri = []
list_of_opt = []
for _ in range(num_model):
tem_model = TestNet(input_size, hidden_size, num_classes).to(device)
tem_cri = nn.CrossEntropyLoss()
list_of_model.append(tem_model)
list_of_cri.append(tem_cri)
for i in range(num_model):
list_of_opt.append(torch.optim.Adam(list_of_model[i].parameters(), lr=learning_rate))
list_of_loss = []
total_step = len(train_loader)
for epoch in range(num_epochs):
for j, (images, labels) in enumerate(train_loader):
for i in range(num_model):
# print(i)
images = images.reshape(-1, 28*28).to(device)
labels = labels.to(device)
output = list_of_model[i](images)
loss = list_of_cri[i](output, labels)
list_of_loss.append(loss)
list_of_opt[i].zero_grad()
list_of_loss[i].backward()
# loss.backward()
list_of_opt[i].step()
if (j+1) % 100 == 0:
_, predicted = torch.max(output.data, 1)
correct = (predicted == labels).sum().item()
if i == 0:
print("########")
print(correct/labels.size(0))
However, if I comment list_of_loss.append(loss) and list_of_loss[i].backward(), but use loss.backward() straightforwardly in each loop, there’s no error.
list_of_loss = []
...
for epoch in range(num_epochs):
list_of_loss = [0 for _ in range(num_model)]
for j, (images, labels) in enumerate(train_loader):
for i in range(num_model):
...
list_of_loss[i] = list_of_cri[i](output, labels)
...
list_of_loss[i].backward()
To solve this. Based on the numerical experiment, this seems to work. Is this correct?
My core question is: can we put all these terms into a list, use a list of optimizer/loss, update list_of_optimizer[i], backward the list_of_loss[i]? I wrote this demo code to numerically prove that.
Yes, absolutely this will work. It’s a perfectly reasonable approach
to take. Of course, as you saw with the issue in your first version,
you have to do it correctly, but that’s true with any code.
A few comments, mostly stylistic:
Your first list_of_loss = [] has no effect because you subsequently
overwrite it with list_of_loss = [0 for _ in range(num_model)]
before you ever use it.
I would create the list of length num_model once, outside of all of
the loops. Stylistically, I would initialize the elements of the list with None rather than 0. (These initial values are never used, so which
initialization you use has no practical effect.) I would also not use
a list comprehension merely to create a list of a given length. Thus,
I would probably do something like this:
list_of_loss = [None] * num_model
only_need_one_cri = nn.CrossEntropyLoss()
...
for epoch in range(num_epochs):
for j, (images, labels) in enumerate(train_loader):
for i in range(num_model):
...
list_of_loss[i] = only_need_one_cri (output, labels)
...
list_of_loss[i].backward()
Also, I’ve illustrated that because an instance of CrossEntropyLoss
contains no state (that depends on i, list_of_model[i], or whether
its been called before), we don’t need a list_of_cri that contains
multiple instances of CrossEntropyLoss – a single instance suffices.