Autograd for partial classification

I am building a task-incremental continual learning model. I am deriving outputs only from part of the output nodes in every task.

For example, I will present a simple example below with 7 classes: [0,1,2,3,4,5,6]:

import torch
import copy

classes_in_task = {
                 0: [0,1],
                 1: [2,3],
                 2: [4,5],
                 3: [6]
            }

no_class_per_task=2
labels = torch.randint(high=7,size=(15,)).to('cuda')
inputs = torch.randn((15,7)).to('cuda')

model = torch.nn.Sequential(
                torch.nn.Linear(7,8),
                torch.nn.Linear(8,7)).to('cuda')

mc = copy.deepcopy(model)
optimizer = torch.optim.Adam(model.parameters(),lr=0.01)


def equal_(model,mc):
    for (n,p),(_,pc) in zip(model.named_parameters(), mc.named_parameters()):
        if not torch.all(p.eq(pc)).data:
            print(n,"\n", p.eq(pc),sep='\t')
        else:
            return True

for task_no in range(4):
    conditions = torch.BoolTensor([l in classes_in_task [task_no] for l in labels]).to('cuda')
    print(task_no)

    for epoch in range(2):
        optimizer.zero_grad()
        y = model(inputs)[conditions][:,classes_in_task[task_no]]
        l = labels[conditions]
        loss = torch.nn.CrossEntropyLoss()(y,l-task_no*no_class_per_task)
        loss.backward()
        optimizer.step()
        equal_(model,mc)
        mc = copy.deepcopy(model)

I am getting expected gradient flow for task 0 i.e., gradients/ weights of the 2 classifier nodes corresponding to classes: [0,1] are changing in output layer. i.e.,

0.weight
        tensor([[False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False]], device='cuda:0')
0.bias
        tensor([False, False, False, False, False, False, False, False],
       device='cuda:0')
1.weight
        tensor([[False, False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False, False],
        [ True,  True,  True,  True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True]],
       device='cuda:0')
1.bias
        tensor([False, False,  True,  True,  True,  True,  True], device='cuda:0')

But for the next task, the nodes corresponding to previous task-classes are also changing when they are supposed to be static/invariant and only the nodes corresponding to the current task-classes are to vary. i.e., :

0.weight
        tensor([[False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False]], device='cuda:0')
0.bias
        tensor([False, False, False, False, False, False, False, False],
       device='cuda:0')
1.weight
        tensor([[False, False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False, False],
        [ True,  True,  True,  True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True]],
       device='cuda:0')
1.bias
        tensor([False, False, False, False,  True,  True,  True], device='cuda:0')

How do I fix this?
Thanks in advance!

Some optimizers, such as Adam, track running stats for each parameter and will also update them even if the gradients are zero.
You could either recreate a new optimizer, or try to set the .grad attributes of the unwanted parameters to None (double check the functionality of this approach).

1 Like