Why Auxiliary logits set to false in train mode?

rajasekhar · March 23, 2019, 2:39pm

I’m trying to train a classifier on 15k images over five categories using googlenet architecture.

I followed the fine-tune tutorial (but used as pretrained=false just to train from scratch).

But the training is only possible if i set the ‘aux logits as false’
‘’’
model.aux.logits=False
‘’’
Can someone explain why I have to do this for training?

ptrblck · March 24, 2019, 11:19am

Do you get an error if you leave aux_logits=True?
As far as I remember they were in fact only used during training in the original paper.

rajasekhar · March 24, 2019, 3:03pm

Hi thanks. I’m following the pytorch/examples/imagenet script for training. I also seen your similar comments after posting this topic.

When I set true and changed

loss = criterion(output[0], target)

then in the script, wherever there is an output it gives dimension error.

If i don’t use aux_logits=False, will that effect the val_acc?

ptrblck · March 24, 2019, 5:00pm

Could you post the dimension error?
Since the Inception model is quite deep, the auxiliary loss was used to stabilize the training.
If you are training from scratch, using the aux_loss might help.

rajasekhar · March 25, 2019, 9:27am

Thanks. I used this approach from the pytorch tutorial on fine-tune inception and modified the examples/imagenet/train.py

outputs, aux_outputs = model(inputs)
loss1 = criterion(outputs, target)
loss2 = criterion(aux_outputs, target)
loss = loss1 + 0.4*loss2

now the error :

Traceback (most recent call last):
  File "imagenet.py", line 406, in <module>
    main()
  File "imagenet.py", line 113, in main
    main_worker(args.gpu, ngpus_per_node, args)
  File "imagenet.py", line 239, in main_worker
    train(train_loader, model, criterion, optimizer, epoch, args)
  File "imagenet.py", line 279, in train
    output, aux_outputs = model(input)
ValueError: too many values to unpack (expected 2)

ptrblck · March 25, 2019, 12:46pm

Is your model is train() and did you leave aux_logits=True?

rajasekhar · March 25, 2019, 12:59pm

Hi. Yes. It is in train() and leave aux_logits=True.

imagenet based script process over the batch iteration for one epoch, then before second epoch it gave me that error. It tries to go the eval() for the first epoch to give the acc over that batch right?

Below is the snippet for main_worker function.

def main_worker(gpu, ngpus_per_node, args):
.....
.....
.....
for epoch in range(args.start_epoch, args.epochs):
        if args.distributed:
            train_sampler.set_epoch(epoch)
        adjust_learning_rate(optimizer, epoch, args)

        # train for one epoch
        train(train_loader, model, criterion, optimizer, epoch, args)

        # evaluate on validation set
        acc1 = validate(val_loader, model, criterion, args)

        # remember best acc@1 and save checkpoint
        is_best = acc1 > best_acc1
        best_acc1 = max(acc1, best_acc1)

        if not args.multiprocessing_distributed or (args.multiprocessing_distributed
                and args.rank % ngpus_per_node == 0):
            save_checkpoint({
                'epoch': epoch + 1,
                'arch': args.arch,
                'state_dict': model.state_dict(),
                'best_acc1': best_acc1,
                'optimizer' : optimizer.state_dict(),
            }, is_best)

ptrblck · March 25, 2019, 1:05pm

It seems that the model might still be in eval() after the first epoch.
The aux_logits will only be returned in train() mode, so make sure to activate it before the next epoch.

rajasekhar · March 25, 2019, 1:12pm

No, it didn’t went to eval(), its just done with the train(), before going to eval(), error thrown.
I just chaged the def train() params, set def validate() as it is. am I incorrect?

imagenet script organization:

def main()
def main_worker()
def train()
   outputs, aux_outputs = model(inputs)
   loss1 = criterion(outputs, target)
   loss2 = criterion(aux_outputs, target)
   loss = loss1 + 0.4*loss2
def validate()
   outputs = model(inputs)
   loss = criterion(outputs, target)
def adjust_learning_rate()
def accuracy()

ptrblck · March 25, 2019, 1:18pm

Try to set the desired mode specifically in both functions:

def train():
    model.train()
    ...

def validate():
    model.eval()
    ...

rajasekhar · March 25, 2019, 1:21pm

Yeah, set to desired mode in both functions. and gave below command

python imagenet.py -a googlenet --epochs 30 --batch-size 96 --gpu 1 data/

ptrblck · March 25, 2019, 1:23pm

And you still get the ValueError?
Could you create a gist so that I could have a look at the complete code?

rajasekhar · March 25, 2019, 1:31pm

Yes, I still get the error. ;/ , usually why this error raises
ValueError: too many values to unpack (expected 2).

Does it says, in outputs, aux_outputs = model(inputs), model(1, 2) where 2 expected?

gist: https://gist.github.com/rajasekharponakala/80514484ea7a38dc444cdc244dc9f950
(same but just the loss modification for googlenet, script from pytorch/examples/imagenet/main.py)

rajasekhar · March 25, 2019, 3:18pm

There should be some problem with the yielding values of both sides. not sure ;/

outputs, aux_outputs = model(inputs)

rajasekhar · March 25, 2019, 8:30pm

For GoogLeNet, there are 2 aux branches. So we have to do this way:

aux1, aux2, output = model(inputs)     
loss1 = criterion(outputs, target)
loss2 = criterion(aux1, target)
loss3 = criterion(aux2, target)
loss = loss1 + 0.3*(loss2+loss3)

For Inception v3, it has only one aux branch.

outputs, aux_outputs = model(inputs)
loss1 = criterion(outputs, target)
loss2 = criterion(aux_outputs, target)
loss = loss1 + 0.4*loss2

Now, it’s working!. Thanks for the followup.

ptrblck · March 26, 2019, 12:38am

Thanks for the information and sorry for missing that you are using GoogleNet and not Inception_v3.
I’m glad you figured it out!

rajasekhar · March 26, 2019, 5:03am

Yeah, I haven’t noticed too, someone from the github/pytorch/issues on the same topic has figured it out

rajasekhar · April 3, 2019, 3:59pm

@ptrblck
Hi!

This time, there is little confusion with the fc layer. I followed the finetune tutorial (just want to run with aux_logits=True): for inception as there is only one aux_logit below snippet working fine.

    elif model_name == "inception":
        """ Inception v3 
        Be careful, expects (299,299) sized images and has auxiliary output
        """
        model_ft = models.inception_v3(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        # Handle the auxilary net
        num_ftrs = model_ft.AuxLogits.fc.in_features
        model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
        # Handle the primary net
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs,num_classes)
        input_size = 299

correspoing inception_v3 net file snippet:

if self.training and self.aux_logits:
aux = self.AuxLogits(x)

and the fc snippet:

self.fc = nn.Linear(768, num_classes)

Whereas for GoogLeNet has two auxilary outputs, the net file snippet has:

if self.training and self.aux_logits:
aux1 = self.aux1(x)
.....
if self.training and self.aux_logits:
aux2 = self.aux2(x)

and the fc snippets:

self.fc1 = nn.Linear(2048, 1024)
self.fc2 = nn.Linear(1024, num_classes)

Now, my confusion is about using the fc in finetuning script, how to embed?

num_ftrs = model_ft.(aux1/aux2).(fc1/fc2).in_features
model_ft.(aux1/aux2).(fc1/fc2) = nn.Linear(num_ftrs, num_classes)

any thoughts?