I am building a task-incremental continual learning model. I am deriving outputs only from part of the output nodes in every task.
For example, I will present a simple example below with 7 classes: [0,1,2,3,4,5,6]
:
import torch
import copy
classes_in_task = {
0: [0,1],
1: [2,3],
2: [4,5],
3: [6]
}
no_class_per_task=2
labels = torch.randint(high=7,size=(15,)).to('cuda')
inputs = torch.randn((15,7)).to('cuda')
model = torch.nn.Sequential(
torch.nn.Linear(7,8),
torch.nn.Linear(8,7)).to('cuda')
mc = copy.deepcopy(model)
optimizer = torch.optim.Adam(model.parameters(),lr=0.01)
def equal_(model,mc):
for (n,p),(_,pc) in zip(model.named_parameters(), mc.named_parameters()):
if not torch.all(p.eq(pc)).data:
print(n,"\n", p.eq(pc),sep='\t')
else:
return True
for task_no in range(4):
conditions = torch.BoolTensor([l in classes_in_task [task_no] for l in labels]).to('cuda')
print(task_no)
for epoch in range(2):
optimizer.zero_grad()
y = model(inputs)[conditions][:,classes_in_task[task_no]]
l = labels[conditions]
loss = torch.nn.CrossEntropyLoss()(y,l-task_no*no_class_per_task)
loss.backward()
optimizer.step()
equal_(model,mc)
mc = copy.deepcopy(model)
I am getting expected gradient flow for task 0 i.e., gradients/ weights of the 2 classifier nodes corresponding to classes: [0,1] are changing in output layer. i.e.,
0.weight
tensor([[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False]], device='cuda:0')
0.bias
tensor([False, False, False, False, False, False, False, False],
device='cuda:0')
1.weight
tensor([[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[ True, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, True, True]],
device='cuda:0')
1.bias
tensor([False, False, True, True, True, True, True], device='cuda:0')
But for the next task, the nodes corresponding to previous task-classes are also changing when they are supposed to be static/invariant and only the nodes corresponding to the current task-classes are to vary. i.e., :
0.weight
tensor([[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False]], device='cuda:0')
0.bias
tensor([False, False, False, False, False, False, False, False],
device='cuda:0')
1.weight
tensor([[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[ True, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, True, True]],
device='cuda:0')
1.bias
tensor([False, False, False, False, True, True, True], device='cuda:0')
How do I fix this?
Thanks in advance!