Loss error when using a list for ensemble of network

I just started to use PyTorch and I plan to build an ensemble of networks. I implemented it with a list. But I got the following error:
I’m confused by the error. The network has just 2 fc layers. Could you please help me explain that? Thanks in advance.

Traceback (most recent call last):
  File "test_list.py", line 107, in <module>
    list_of_loss[i].backward()
  File "/home/weiguo/anaconda3/envs/pomdp/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/weiguo/anaconda3/envs/pomdp/lib/python3.6/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

Here’s my code:

num_model = 3

list_of_model = []
list_of_cri = []
list_of_opt = []

for _ in range(num_model):
    tem_model = TestNet(input_size, hidden_size, num_classes).to(device)
    tem_cri = nn.CrossEntropyLoss()
    list_of_model.append(tem_model)
    list_of_cri.append(tem_cri)

for i in range(num_model):
    list_of_opt.append(torch.optim.Adam(list_of_model[i].parameters(), lr=learning_rate))

list_of_loss = []


total_step = len(train_loader)
for epoch in range(num_epochs):
    for j, (images, labels) in enumerate(train_loader):
        for i in range(num_model):
            # print(i)
            images = images.reshape(-1, 28*28).to(device)
            labels = labels.to(device)

            output = list_of_model[i](images)
            loss = list_of_cri[i](output, labels)
            list_of_loss.append(loss)

            list_of_opt[i].zero_grad()
            list_of_loss[i].backward()
            # loss.backward()
            list_of_opt[i].step()

            if (j+1) % 100 == 0:
                _, predicted = torch.max(output.data, 1)
                correct = (predicted == labels).sum().item()
                if i == 0:
                    print("########")
                print(correct/labels.size(0))

However, if I comment list_of_loss.append(loss) and list_of_loss[i].backward(), but use loss.backward() straightforwardly in each loop, there’s no error.

Hi Wei!

Your list_of_loss keeps growing with every iteration of the loops
over num_epochs and train_loader.

Therefore you keep calling list_of_loss[0].backward() on the
same loss over and over again that was created for the first sample
in your first epoch.

Put another way, the newly appended loss could be accessed and
backpropagated as:

list_of_loss[-1].backward()

which is not the same as:

list_of_loss[i].backward()

Best.

K. Frank

Thank you, Frank!

I now use


list_of_loss = []
...
for epoch in range(num_epochs):
    list_of_loss = [0 for _ in range(num_model)]

    for j, (images, labels) in enumerate(train_loader):
        for i in range(num_model):
            ...
            list_of_loss[i] = list_of_cri[i](output, labels)

            ...
            list_of_loss[i].backward()

To solve this. Based on the numerical experiment, this seems to work. Is this correct?

My core question is: can we put all these terms into a list, use a list of optimizer/loss, update list_of_optimizer[i], backward the list_of_loss[i]? I wrote this demo code to numerically prove that.

Hi Wei!

Yes, absolutely this will work. It’s a perfectly reasonable approach
to take. Of course, as you saw with the issue in your first version,
you have to do it correctly, but that’s true with any code.

A few comments, mostly stylistic:

Your first list_of_loss = [] has no effect because you subsequently
overwrite it with list_of_loss = [0 for _ in range(num_model)]
before you ever use it.

I would create the list of length num_model once, outside of all of
the loops. Stylistically, I would initialize the elements of the list with
None rather than 0. (These initial values are never used, so which
initialization you use has no practical effect.) I would also not use
a list comprehension merely to create a list of a given length. Thus,
I would probably do something like this:

list_of_loss = [None] * num_model
only_need_one_cri = nn.CrossEntropyLoss()
...
for epoch in range(num_epochs):
    for j, (images, labels) in enumerate(train_loader):
        for i in range(num_model):
            ...
            list_of_loss[i] = only_need_one_cri (output, labels)
            ...
            list_of_loss[i].backward()

Also, I’ve illustrated that because an instance of CrossEntropyLoss
contains no state (that depends on i, list_of_model[i], or whether
its been called before), we don’t need a list_of_cri that contains
multiple instances of CrossEntropyLoss – a single instance suffices.

Best.

K. Frank

1 Like

Thank you so much, Frank! That really helps me a lot.