Test the finetune resnet18 model

andyhx · March 28, 2017, 12:55pm

hi, i am trying to finetune the resnet model with my own data,i follow the imagenet folders main.py example to modify the fc layer in this way, i only finetune in resnet not alexnet

def main():
global args, best_prec1
args = parser.parse_args()

# create model
if args.pretrained:
    print("=> using pre-trained model '{}'".format(args.arch))
    model = models.__dict__[args.arch](pretrained=True)
  #modify the fc layer
    model.fc=nn.Linear(512,100)
else:
    print("=> creating model '{}'".format(args.arch))
    model = models.__dict__[args.arch]()

if args.arch.startswith('alexnet') or args.arch.startswith('vgg'):
    model.features = torch.nn.DataParallel(model.features)
    model.cuda()
else:
    model = torch.nn.DataParallel(model).cuda()

# optionally resume from a checkpoint
if args.resume:
    if os.path.isfile(args.resume):
        print("=> loading checkpoint '{}'".format(args.resume))
        checkpoint = torch.load(args.resume)
        args.start_epoch = checkpoint['epoch']
        best_prec1 = checkpoint['best_prec1']
        model.load_state_dict(checkpoint['state_dict'])
        print("=> loaded checkpoint '{}' (epoch {})"
              .format(args.resume, checkpoint['epoch']))
    else:
        print("=> no checkpoint found at '{}'".format(args.resume))

cudnn.benchmark = True

the other code remain same as the imagenet main.py
https://github.com/pytorch/examples/blob/master/imagenet/main.py
and when testing the model i trained ,i found the fc layer is still 1000 kinds
,i struggle to figure it out for a long time ,but it still the same ,i dont why

(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU (inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(downsample): Sequential (
  (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
  (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
)

)
(1): BasicBlock (
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU (inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
)
)
(avgpool): AvgPool2d (
)
(fc): Linear (512 -> 1000)
)
)

here is my testing code:

import torch
import torch.nn as nn
#from __future__ import print_function
import argparse
from PIL import Image
import torchvision.models as models
import skimage.io
from torch.autograd import Variable as V
from torch.nn import functional as f
from torchvision import transforms as trn

# define image transformation
centre_crop = trn.Compose([
        trn.ToPILImage(),
        trn.Scale(256),
        trn.CenterCrop(224),
        trn.ToTensor(),
        trn.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
filename=r'2780-0-20161221_0001.jpg'
img = skimage.io.imread(filename)
x = V(centre_crop(img).unsqueeze(0), volatile=True)
model = models.__dict__['resnet18']()
model = torch.nn.DataParallel(model).cuda()
checkpoint = torch.load('model_best1.pth.tar')
model.load_state_dict(checkpoint['state_dict'])
best_prec1 = checkpoint['best_prec1']
logit = model(x)
print(logit)
print(len(logit))
h_x = f.softmax(logit).data.squeeze()

anyone can tell me where do i go wrong and how to extrac the last averarge pooling layer features ,thank you so much!

andyhx · March 28, 2017, 1:05pm

and also i try another code in testing the model,do i have to modify the resnet model again in testing the saving model?
import torch
import torch.nn as nn
#from torchvision import models
#from future import print_function
import argparse
#import torch
#from torch.autograd import Variable
from PIL import Image
#from torchvision.transforms import ToTensor
import torchvision.models as models
import skimage.io
from torch.autograd import Variable as V
from torch.nn import functional as f
from torchvision import transforms as trn

# define image transformation
centre_crop = trn.Compose([
        trn.ToPILImage(),
        trn.Scale(256),
        trn.CenterCrop(224),
        trn.ToTensor(),
        trn.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
filename=r'2780-0-20161221_0001.jpg'
img = skimage.io.imread(filename)
x = V(centre_crop(img).unsqueeze(0), volatile=True)

model = models.__dict__['resnet18']()
model.fc=nn.Linear(512,100)
checkpoint = torch.load('model_best1.pth.tar')
best_prec1 = checkpoint['best_prec1']
model.load_state_dict(checkpoint['state_dict'])
model = torch.nn.DataParallel(model).cuda()
logit = model(x)
print(logit)
print(len(logit))
h_x = f.softmax(logit).data.squeeze()

an error occured ,i have no idea why

andyhx · March 28, 2017, 11:24pm

help wanted,many thanks,do i have add freeze code like this in training

for param in model.parameters():
      param.requires_grad = False

and update the optimize sgd

fmassa · March 29, 2017, 1:27am

This was a bug in PyTorch that was fixed in commit https://github.com/pytorch/pytorch/pull/982 that was merged into master 16 days ago. Did you try updating your PyTorch installation?

andyhx · March 29, 2017, 1:51am

my version is torch-0.1.10.post2-cp27-none-linux_x86_64.whl which i download from 14 days ago ,is there any problem with this version,can i update the whl by update command,i belive there is something wrong with my code, but i cant figure it out

fmassa · March 29, 2017, 2:00am

Can you try this minimal example in your interpreter and see if it changes the layer? In my PyTorch installation it works without problems.

import torch.nn as nn

class M(nn.Module):
    def __init__(self):
        super(M, self).__init__()
        self.m = nn.Linear(2,2)
    def forward(self, x):
        return self.m(x)

m = M()
print(m)
# should be 
# M (
#   (m): Linear (2 -> 2)
# )

m.m = nn.Linear(3, 3)
print(m)
# should be 
# M (
#   (m): Linear (3 -> 3)
# )

andyhx · March 29, 2017, 2:06am

it is same as u did

fmassa · March 29, 2017, 2:13am

Ok, now try doing the same thing but with resnet

from torchvision import models
import torch.nn as nn

m = models.resnet18()
m.fc = nn.Linear(512, 10)
print(m) # see if the last layer was modified

If the last layer is correctly modified, then there is an inconsistency with what you have written in the first message, and we might be missing information to help you further debug your problem

andyhx · March 29, 2017, 2:17am

i try retrain with freeze method ,i follow the example ,but i didnt succeed ,my code is this way:
def main():
global args, best_prec1
args = parser.parse_args()

    # create model
    if args.pretrained:
        print("=> using pre-trained model '{}'".format(args.arch))
        model = models.__dict__[args.arch](pretrained=True)
       #xxxxxxxxxxxxxx to modify  resnet 18 the fc layer xxxxxxxxxxxxxx
        model.fc=nn.Linear(512,100)
    else:
        print("=> creating model '{}'".format(args.arch))
        model = models.__dict__[args.arch]()

    #for param in model.parameters():
        #param.requires_grad = False
    if args.arch.startswith('alexnet') or args.arch.startswith('vgg'):
        model.features = torch.nn.DataParallel(model.features)
        model.cuda()
    else:
        model = torch.nn.DataParallel(model).cuda()

    # optionally resume from a checkpoint
    if args.resume:
        if os.path.isfile(args.resume):
            print("=> loading checkpoint '{}'".format(args.resume))
            checkpoint = torch.load(args.resume)
            args.start_epoch = checkpoint['epoch']
            best_prec1 = checkpoint['best_prec1']
            model.load_state_dict(checkpoint['state_dict'])
            print("=> loaded checkpoint '{}' (epoch {})"
                  .format(args.resume, checkpoint['epoch']))
        else:
            print("=> no checkpoint found at '{}'".format(args.resume))
#xxxxxxxxxxxxxx freeze  update xxxxxxxxxxxxxx
    for param in model.parameters():
        param.requires_grad = False
    
    # Replace the last fully-connected layer
    # Parameters of newly constructed modules have requires_grad=True by default
    #model.fc = torch.nn.Linear(512, 3)
    print(model)

    cudnn.benchmark = True

    # Data loading code
    traindir = os.path.join(args.data, 'train')
    valdir = os.path.join(args.data, 'val')
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

    train_loader = torch.utils.data.DataLoader(
        datasets.ImageFolder(traindir, transforms.Compose([
            transforms.RandomSizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ])),
        batch_size=args.batch_size, shuffle=True,
        num_workers=args.workers, pin_memory=True)

    val_loader = torch.utils.data.DataLoader(
        datasets.ImageFolder(valdir, transforms.Compose([
            transforms.Scale(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            normalize,
        ])),
        batch_size=args.batch_size, shuffle=False,
        num_workers=args.workers, pin_memory=True)

    # define loss function (criterion) and pptimizer
    criterion = nn.CrossEntropyLoss().cuda()
   #xxxxxxxxxxxxxx try to  make sgd only changing fc layer xxxxxxxxxxxxxx
    ignored_params = list(map(id, model.module.fc.parameters()))
    base_params = filter(lambda p: id(p) not in ignored_params,
                     model.module.parameters())


    optimizer = torch.optim.SGD([
            {'params': base_params},
            {'params': model.module.fc.parameters()
           }], args.lr,momentum=args.momentum, weight_decay=args.weight_decay)
    # optimizer = torch.optim.SGD(model.module.fc.parameters(), args.lr,momentum=args.momentum,                          weight_decay=args.weight_decay)

it occured a problem
ValueError: optimizing a parameter that doesn’t require gradients
where did i missed

fmassa · March 29, 2017, 2:21am

You are freezing all the parameters of your network, so the optimizer is complaining that you don’t have parameters to optimize.
If you only want to train the newly added fully-connected layer, you should do instead

for param in model.parameters():
    param.requires_grad = False
for param in model.fc.parameters():
    param.requires_grad = True

andyhx · March 29, 2017, 2:21am

yes it is modified
so something wrong is with my testing code ,when loading the model?

fmassa · March 29, 2017, 2:25am

In your first testing code, you forgot to modify the fc layer.
In the second one, I’d recommend to add the DataParallel before you load the state dict, as your models were saved using DataParallel, so you need them to have DataParallel to be properly deserialized

andyhx · March 29, 2017, 2:49am

seems not right ,how to use load_state_dict after using the DataParalled method?
i also use the
model.module.load_state_dict(checkpoint[‘state_dict’])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/public/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 311, in load_state_dict
    .format(name))
KeyError: 'unexpected key "module.conv1.weight" in state_dict'

andyhx · March 29, 2017, 3:11am

thank u sir ,i retrain with freeze method ,and get a new model ,loading model is ok now,the classes is right,thank u!