Maintaining dropout layer for deployment

Hello,
I am trying to deploy a pytorch model with dropout layers using torchscript into the c++ api. However, I run into an issue where the parameters which can be loaded using the regular python script and the torchscript model are different. When setting up the model by inheriting from torch.jit.ScriptModule (as is necessary to export the annotated model) I observe that for every dropout layer, a parameter named *.training which is empty is created and causes a problem when loading weights from a file stream, since the model weights exported from the model inherited from nn.Module don’t have a parameter associated with the dropout layers (see the example outputs below).

I realize that it is uncommon to preserve dropout layers outside of training, however in this case, the model is supposed to retain the dropout layers to exhibit stochastic behavior.
Below is 2 versions of the code i run for the model, one which uses pytorch and the other which i try to create a torchscript module, along with the code I have attached the output comparison between the state dictionary of the torchscript and the pytorch modules.

Below is the output from printing the first few parameters in the model state dictionary when the model is created using torch.jit.ScriptModule (notice the empty tensor parameter for layer 2, “fc.2.training”)

Model's state_dict:                                                                                                  fc.0.bias        torch.Size([1280])
fc.0.weight      torch.Size([1280, 74])
fc.1.weight      torch.Size([1])
fc.2.training    torch.Size([])
fc.3.bias        torch.Size([896]) 
fc.3.weight      torch.Size([896, 1280]) 

Below is the output from printing the first few parameters in the model state dictionary when the model is created using nn.Module (here notice there is no parameter associated with layer 2)

Model's state_dict:                                                                                                  fc.0.weight      torch.Size([1280, 74])
fc.0.bias        torch.Size([1280])
fc.1.weight      torch.Size([1])
fc.3.weight      torch.Size([896, 1280])
fc.3.bias        torch.Size([896]) 

Below is a photo of the 2 python programs used to produce the results

If anyone can help me figure out how to export my trained model so that I can load it into a C++ program with the dropout preserved, that would be much appreciated.

The training parameter was a hack that is not needed anymore, this PR fixes it so it won’t show up in the state dict anymore. Could you post your model as code so we can run it and repro your issue to make sure it’s fixed? Thanks!

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

import torch
import torch.nn as nn
import pickle
from torch.autograd import Variable
import numpy as np
import pypcd


class Encoder_End2End_Annotated(torch.jit.ScriptModule):
    __constants__ = ['encoder']

    def __init__(self):
        super(Encoder_End2End_Annotated, self).__init__()
        self.encoder = nn.Sequential(nn.Linear(16053, 256), nn.PReLU(), #adds dropouts are not expected
                                    nn.Linear(256, 256), nn.PReLU(),
                                 	nn.Linear(256, 60))

    @torch.jit.script_method
    def forward(self, x):
        x = self.encoder(x)
        return x


class MLP_NN_Annotated(torch.jit.ScriptModule):
    __constants__ = ['fc']
    def __init__(self):
        super(MLP_NN_Annotated, self).__init__()
        self.fc = nn.Sequential(
                	nn.Linear(74, 1280), nn.PReLU(), nn.Dropout(),
                	nn.Linear(1280, 896), nn.PReLU(), nn.Dropout(),
                	nn.Linear(896, 512), nn.PReLU(), nn.Dropout(),
                	nn.Linear(512, 384), nn.PReLU(), nn.Dropout(),
                	nn.Linear(384, 256), nn.PReLU(), nn.Dropout(),
                	nn.Linear(256, 128), nn.PReLU(), nn.Dropout(),
                	nn.Linear(128, 64), nn.PReLU(), nn.Dropout(),
                	nn.Linear(64, 32), nn.PReLU(),
                	nn.Linear(32, 7))

    @torch.jit.script_method
    def forward(self, x):
        out = self.fc(x)
        return out




# Creates the script
encoder = Encoder_End2End_Annotated()
MLP = MLP_NN_Annotated()

#modified to load weights
device = torch.device('cpu')
cae_filename = 'cae_encoder_140.pkl'
mlp_filename = 'mlp_PReLU_ae_dd140.pkl'
encoder.load_state_dict(torch.load(cae_filename, map_location=device))

# Print model's state_dict
print("Model's state_dict:")
for param_tensor in MLP.state_dict():
    print(param_tensor, "\t", MLP.state_dict()[param_tensor].size())


#print(MLP.state_dict())


MLP.load_state_dict(torch.load(mlp_filename, map_location=device))


encoder.save("encoder_annotated.pt")
encoder.save("mlp_annotated.pt")