Model.save and load giving different result

As the other similar problem describe. When I try resume training, it start at a random prediction point. And so far I cannot find a solution.
here’s what I have done:

  • using different dataset:

    • I’m using mnist with Zerospadding(114) whose size is (256,256). And after reloading and do prediction it returns a high accuracy, so I think the reloading works fine in mnist dataset
    • And I also use torch.ones(input_shape) to train and evaluate, reload data can output a same result as trained model does.
    • use only one spectrum(from my audio datasets), model can predict well after reloading in different session.
  • training for a while and reload it in the same session: it gives me a high accuracy as well as the model after some training does.but in different session it doesn’t work anymore.

  • As for the method of yeilding data. I’m using python generator to yiled data then transfor it into Tensor. But I also tried pytorch.utils.data.Datasets and DataLoader. Unfortunately, it cannot work either.

  • According to the operation No.3, it is may not dataset problem? And I think it is may AvgPool2d 's problem? I replace it with AdaptiveAvgPool2d, and it cannot work again…

here’s my model script(MobileNetv1)

import torch.nn as nn
import torch.nn.functional as F

class MobileNet1(nn.Module):
    def __init__(self,initial_channel,n_class):
        super(MobileNet1, self).__init__()
        self.class_num = n_class

        def conv_bn(inp, oup, stride):
            return nn.Sequential(
                nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
                nn.BatchNorm2d(oup),
                nn.ReLU(inplace=True)
            )

        def conv_dw(inp, oup, stride):
            return nn.Sequential(
                nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
                nn.BatchNorm2d(inp),
                nn.ReLU(inplace=True),

                nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
                nn.ReLU(inplace=True),
            )

        self.model = nn.Sequential(
            conv_bn(initial_channel, 32, 2),
            conv_dw(32, 64, 1),
            conv_dw(64, 128, 2),
            conv_dw(128, 128, 1),
            conv_dw(128, 256, 2),
            conv_dw(256, 256, 1),
            conv_dw(256, 512, 2),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 1024, 2),
            conv_dw(1024, 1024, 1),
            nn.AvgPool2d(8),
        ) 
        self.fc = nn.Linear(1024, self.class_num)

    def forward(self, x):
        x = self.model(x)
        x = x.view(-1, 1024)
        x = self.fc(x)
        return x

here’s my simple description for main code

#-----------------------fix random seed---------------------------------#
args.seed = 5153
print("Random Seed: ", args.seed)
random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed)
if args.gpus:
    # Sets the seed for generating random numbers on all GPUs.
    torch.cuda.manual_seed(args.seed)
    torch.cuda.manual_seed_all(args.seed)

#---------------------------training-------------------------------------#
model = MobileNet1()
model.load_state_dict(weight['state_dict'])
data = generator(data_path) # generator is a python generator
for epoch in range(epochs):
    for x, y in data:
        output = model(x)
        loss = criterion(output, y)

        optimizer.zero_grad()
        loss /= accumulate_step
        loss.backward()
        optimizer.step()
    scheduler.step()
    save_checkpoint(filepath=args.save,
                    filename='{}-epoch{}-val_loss{:.4f}.pth'.format(
                            args.model_name, epoch, val_loss),
                            state={'epoch': epoch , 'state_dict': 
                             model.state_dict(), 'best_prec1': best_test,
                            'optimizer': optimizer.state_dict()},
            )