Model.save and load giving different result

Igo312 · April 5, 2020, 3:28am

As the other similar problem describe. When I try resume training, it start at a random prediction point. And so far I cannot find a solution.
here’s what I have done:

using different dataset:
- I’m using mnist with Zerospadding(114) whose size is (256,256). And after reloading and do prediction it returns a high accuracy, so I think the reloading works fine in mnist dataset
- And I also use torch.ones(input_shape) to train and evaluate, reload data can output a same result as trained model does.
- use only one spectrum(from my audio datasets), model can predict well after reloading in different session.
training for a while and reload it in the same session: it gives me a high accuracy as well as the model after some training does.but in different session it doesn’t work anymore.
As for the method of yeilding data. I’m using python generator to yiled data then transfor it into Tensor. But I also tried pytorch.utils.data.Datasets and DataLoader. Unfortunately, it cannot work either.
According to the operation No.3, it is may not dataset problem? And I think it is may AvgPool2d 's problem? I replace it with AdaptiveAvgPool2d, and it cannot work again…

here’s my model script(MobileNetv1)

import torch.nn as nn
import torch.nn.functional as F

class MobileNet1(nn.Module):
    def __init__(self,initial_channel,n_class):
        super(MobileNet1, self).__init__()
        self.class_num = n_class

        def conv_bn(inp, oup, stride):
            return nn.Sequential(
                nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
                nn.BatchNorm2d(oup),
                nn.ReLU(inplace=True)
            )

        def conv_dw(inp, oup, stride):
            return nn.Sequential(
                nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
                nn.BatchNorm2d(inp),
                nn.ReLU(inplace=True),

                nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
                nn.ReLU(inplace=True),
            )

        self.model = nn.Sequential(
            conv_bn(initial_channel, 32, 2),
            conv_dw(32, 64, 1),
            conv_dw(64, 128, 2),
            conv_dw(128, 128, 1),
            conv_dw(128, 256, 2),
            conv_dw(256, 256, 1),
            conv_dw(256, 512, 2),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 1024, 2),
            conv_dw(1024, 1024, 1),
            nn.AvgPool2d(8),
        ) 
        self.fc = nn.Linear(1024, self.class_num)

    def forward(self, x):
        x = self.model(x)
        x = x.view(-1, 1024)
        x = self.fc(x)
        return x

here’s my simple description for main code

#-----------------------fix random seed---------------------------------#
args.seed = 5153
print("Random Seed: ", args.seed)
random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed)
if args.gpus:
    # Sets the seed for generating random numbers on all GPUs.
    torch.cuda.manual_seed(args.seed)
    torch.cuda.manual_seed_all(args.seed)

#---------------------------training-------------------------------------#
model = MobileNet1()
model.load_state_dict(weight['state_dict'])
data = generator(data_path) # generator is a python generator
for epoch in range(epochs):
    for x, y in data:
        output = model(x)
        loss = criterion(output, y)

        optimizer.zero_grad()
        loss /= accumulate_step
        loss.backward()
        optimizer.step()
    scheduler.step()
    save_checkpoint(filepath=args.save,
                    filename='{}-epoch{}-val_loss{:.4f}.pth'.format(
                            args.model_name, epoch, val_loss),
                            state={'epoch': epoch , 'state_dict': 
                             model.state_dict(), 'best_prec1': best_test,
                            'optimizer': optimizer.state_dict()},
            )