How to ignore dropout in saved model in eval mode

RuFAI · July 3, 2019, 7:42am

Hi, I use the dropout layer in my ann and it works pretty well in training mode. The problem is that it is enabled despite I set the mode to eval() before saving it and it predicts the same input to different outputs. Do you have some advice to solve this problem?

The dropout layer is defined in the init function

ptrblck · July 3, 2019, 2:39pm

If the dropout layer is defined as an attribute in the model’s __init__, if should be disabled, if you call model.eval().
Could you check for usage of the functional API (F.dropout), where the training argument wasn’t passed?

RuFAI · July 4, 2019, 7:10am

Hi, I replaced the dropout layers by the funtion of the nn.functional API. The print of the state dict shows dropout.training. This behaviour is not as expected. I double checked my code to be sure that there is no network = network.eval() at the wrong place.

ptrblck · July 4, 2019, 10:14am

If you replaced the modules with the functional API calls, you would have to pass the training flag manually:

def forward(self, x):
    x = F.dropout(x, training=self.training)

Could you post the model definition so that we can have a look?

RuFAI · July 4, 2019, 10:32am

This is my model definition. You can choose between the dropout implementation. Thank you so much for your help

    
    def __init__(self):
        super(Network, self).__init__()
     
        self.conv1 = nn.Conv2d(1,8,kernel_size = (5,5)) 
        self.conv2 = nn.Conv2d(8,16, kernel_size = (3,3))
        self.conv3 = nn.Conv2d(16, 32, kernel_size = (3,3))
        self.conv4 = nn.Conv2d(32,64, kernel_size= (5,5))
        
        #Maxpool Layers
        self.maxp1 = nn.MaxPool2d((2,2))
        self.maxp2 = nn.MaxPool2d((2,2))
        self.maxp3 = nn.MaxPool2d((2,2))
        self.maxp4 = nn.MaxPool2d((2,2))
        
        #Dropout Layer
        self.dropout = nn.Dropout(0.25)
        
        #Linear Layers
        self.linear1 = nn.Linear(64*5*5, 600)
        self.linear2 = nn.Linear(600,100)
        self.linear3 = nn.Linear(100,2)
        
        
    @ torch.jit.script_method
    def forward(self,x):
        
        x = F.relu(self.conv1(x))# In: 1*128*128, Out: 8x124x124
    
        x = self.maxp1(x)        # In: 8x124x124, Out: 8*62*62
        #x = self.dropout(x)
        #x = F.dropout(x)
        #print(x.size())
        
        x = F.relu(self.conv2(x))# In: 8*62*62,   Out: 16*60*60
        print(x.size())
        x = self.maxp2(x)        # In: 16x60x60,   Out: 16*30*30
        #x = self.dropout(x)
        #x =  F.dropout(x)
        
        x = F.relu(self.conv3(x))# In: 16*30*30,  Out: 32*28*28
        #print(x.size())
        x = self.maxp3(x)        # In: 32*28*28,  Out: 32*14*14
        #x = self.dropout(x)
        #x = F.dropout(x)
        
        x = F.relu(self.conv4(x))# In: 32*14*14,  Out:64*10*10
        #print(x.size())
        x = self.maxp4(x)
        # In: 64*10*10,  Out:64*5*5
        #x = F.dropout(x)
        #x = self.dropout(x)
        
        x = x.view(-1,64*5*5)
        
        x = F.relu(self.linear1(x))
        #x =  F.dropout(x)
        #x =  self.dropout(x)
       
        x = F.relu(self.linear2(x))
        
        #x = F.dropout(x)
        #x = self.dropout(x)
        
        x =self.linear3(x)
        
        x = F.softmax(x)
        
        return x

Here ist the part, where most of the magic happens. I reduced it to the relevant code.

def train(network =  None, epochs = 10, train_loader =  None, learning_rate = 0.001):
    
    network = network.train()
    
    optimizer = opt.Adam(network.parameters(), lr=learning_rate)

    loss_fn = torch.nn.CrossEntropyLoss()
    
    current_acc = 0.0

    for epoch in range(epochs):

        for inputs, targets in train_loader:
            
            optimizer.zero_grad()
            
            outputs = network(inputs)
            
            loss = loss_fn(outputs, targets)
            
            loss.backward()
            
            optimizer.step()
        print("Epoch: ", epoch)
  
        accuracy = evaluate(trained_network = network, eval_loader=eval_loader)
        
        arrEpochs.append(epoch)
        
        arrAccs.append(accuracy)
        
        
        if accuracy > current_acc:
            print("saving Model...")
            
            current_acc =  accuracy
            
            network = network.eval()
            torch.save(network.state_dict(), 'trained_model.pt')
            network = network.train()
           
    return network

When I push data manually through this ann I always set the network to eval()

ptrblck · July 4, 2019, 12:09pm

I think the behavior might be related to scripting the module.
If you are using @torch.jit.script_method, you should derive the module from torch.jit.ScriptModule.
Could you update PyTorch to the nightly builds (have a look at the instructions) and run the code again?

RuFAI · July 4, 2019, 12:31pm

Thanks for your answer. The Model class actulually derives from the torch.jit.ScriptModule class. I will update it and report again. Thank you for your time.