I’m trying to train a classifier on 15k images over five categories using googlenet architecture.
I followed the fine-tune tutorial (but used as pretrained=false just to train from scratch).
But the training is only possible if i set the ‘aux logits as false’
‘’’
model.aux.logits=False
‘’’
Can someone explain why I have to do this for training?
Could you post the dimension error?
Since the Inception model is quite deep, the auxiliary loss was used to stabilize the training.
If you are training from scratch, using the aux_loss might help.
Traceback (most recent call last):
File "imagenet.py", line 406, in <module>
main()
File "imagenet.py", line 113, in main
main_worker(args.gpu, ngpus_per_node, args)
File "imagenet.py", line 239, in main_worker
train(train_loader, model, criterion, optimizer, epoch, args)
File "imagenet.py", line 279, in train
output, aux_outputs = model(input)
ValueError: too many values to unpack (expected 2)
Hi. Yes. It is in train() and leave aux_logits=True.
imagenet based script process over the batch iteration for one epoch, then before second epoch it gave me that error. It tries to go the eval() for the first epoch to give the acc over that batch right?
Below is the snippet for main_worker function.
def main_worker(gpu, ngpus_per_node, args):
.....
.....
.....
for epoch in range(args.start_epoch, args.epochs):
if args.distributed:
train_sampler.set_epoch(epoch)
adjust_learning_rate(optimizer, epoch, args)
# train for one epoch
train(train_loader, model, criterion, optimizer, epoch, args)
# evaluate on validation set
acc1 = validate(val_loader, model, criterion, args)
# remember best acc@1 and save checkpoint
is_best = acc1 > best_acc1
best_acc1 = max(acc1, best_acc1)
if not args.multiprocessing_distributed or (args.multiprocessing_distributed
and args.rank % ngpus_per_node == 0):
save_checkpoint({
'epoch': epoch + 1,
'arch': args.arch,
'state_dict': model.state_dict(),
'best_acc1': best_acc1,
'optimizer' : optimizer.state_dict(),
}, is_best)
It seems that the model might still be in eval() after the first epoch.
The aux_logits will only be returned in train() mode, so make sure to activate it before the next epoch.
No, it didn’t went to eval(), its just done with the train(), before going to eval(), error thrown.
I just chaged the def train() params, set def validate() as it is. am I incorrect?
This time, there is little confusion with the fc layer. I followed the finetune tutorial (just want to run with aux_logits=True): for inception as there is only one aux_logit below snippet working fine.
elif model_name == "inception":
""" Inception v3
Be careful, expects (299,299) sized images and has auxiliary output
"""
model_ft = models.inception_v3(pretrained=use_pretrained)
set_parameter_requires_grad(model_ft, feature_extract)
# Handle the auxilary net
num_ftrs = model_ft.AuxLogits.fc.in_features
model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
# Handle the primary net
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs,num_classes)
input_size = 299
correspoing inception_v3 net file snippet:
if self.training and self.aux_logits:
aux = self.AuxLogits(x)
and the fc snippet:
self.fc = nn.Linear(768, num_classes)
Whereas for GoogLeNet has two auxilary outputs, the net file snippet has:
if self.training and self.aux_logits:
aux1 = self.aux1(x)
.....
if self.training and self.aux_logits:
aux2 = self.aux2(x)