Probabely bug with parameter updating using optim package

mderakhshani · February 4, 2018, 5:10pm

Hello,

I have a large neural network containing 100 layers. In my net, I used a pre-trained network in object detection task called YOLO. I have some extra layers added to it to do some extra task. I would like to freeze the actual YOLO weight and just learn the extra layers’ parameters. From some discussion topics here I’ve written code as bellow:

def release_weight(yolo):
    for param in yolo.parameters():
        param.requires_grad = True
    return yolo

def freeze_weight(yolo):
    for param in yolo.stage1.parameters():
        param.requires_grad = False
    for param in yolo.stage4.parameters():
        param.requires_grad = False
    for param in yolo.stage5.parameters():
        param.requires_grad = False       
    for param in yolo.stage5_1.parameters():
        param.requires_grad = False
    for param in yolo.parallel1.parameters():
        param.requires_grad = False
    for param in yolo.parallel2.parameters():
        param.requires_grad = False
    for param in yolo.stage7.parameters():
        param.requires_grad = False
    for param in yolo.final.parameters():
        param.requires_grad = False
    for param in yolo.d.parameters():
        param.requires_grad = False
    return yolo

yolo = YoloV2(model_path, 425)
yolo = release_weight(yolo)
yolo = freeze_weight(yolo)

Then I selected the parameters with requires_gard = True flag. Here is my code:

parameters = itertools.filterfalse(lambda p: not p.requires_grad, yolo.parameters())
optimizer = optim.SGD(parameters, lr=baseLR, weight_decay=0.001, momentum=0.9)

And then I have checked to determine whether the names of the parameters selected by above code are correct or not and I was assured by bellow code that the selection was correct.

print("Names of training parameters")
    for name, param in yolo.named_parameters():
        if param.requires_grad:
            print(name)

But when I started training, I somehow found that the weights of pre-trained were changed. Because after running some epochs, I have compared the output of frozen parts.

So could you please help me what is the source of the problem? Because I actually determine which parameters should be trained and which ones should not!

mfayyaz · February 5, 2018, 7:38am

I have the same problem. I’m not sure why other parameters are being updated it’s too weird

mderakhshani · February 5, 2018, 7:45am

Really? @smth, It may be a really important bug . Do you have any idea how is it possible?

mfayyaz · February 5, 2018, 10:46pm

Yes
I think it comes from BN layer

Probabely **bug** with parameter updating using optim package

Probabely bug with parameter updating using optim package