What is causing my results to change every time I initialize my model and use it for inference?

I have defined my model in the following way:

class Net(nn.Module):

    def __init__(self, pre_classifier_init, classifier_init):

        super(Net, self).__init__()
        self.pre_classifier = nn.Linear(768, 768)
        self.classifier = nn.Linear(768, 2)
        self.dropout = nn.Dropout(0.1)
        self.pre_classifier.weight.data.copy_(pre_classifier_init.weight.data)
        self.classifier.weight.data.copy_(classifier_init.weight.data)

    def forward(self, x, labels = None):
        x = self.pre_classifier(x)
        x = nn.ReLU()(x)
        x = self.dropout(x)
        logits = self.classifier(x)
        loss = None
        if labels is not None:
          loss_fct = CrossEntropyLoss()
          loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

        return SequenceClassifierOutput(
            loss = loss,
            logits = logits
        )

net = Net(model.pre_classifier, model.classifier1)
net.eval()

As you can see, I am passing parameters to the constructor to set the weights for the 2 layers in the model each time I instantiate it; as well as putting the model in evaluation mode to turn off the effects of the dropout layer. With this setting, I would thus expect there to be no randomness involved and for my results to be deterministic. However the performance of my model varies each time I instantiate it so there still must be some randomness going on.

I confirm this by running this code beforehand:

seed = 0
torch.manual_seed(seed)

So where is this randomness coming from?

The non-deterministic results could be created by e.g. cublas as described in the Reproducibility docs so you could set torch.use_deterministic_algorithms(True) and see, if non-determinsitic algorithms would be used.

Hi @ptrblck, thank you for the reply. I discovered that I was only setting the weights upon instantiation but I also needed to set the biases by adding these lines:

        self.pre_classifier.bias.data.copy_(pre_classifier_init.bias.data)
        self.classifier.bias.data.copy_(classifier_init.bias.data)

With these lines, my results become deterministic.

Good to hear it’s working now!
A small addition: don’t use the .data attribute, as it could yield unwanted side effects. Wrap the assignment in a with torch.no_grad() instead and assign the new values to the parameters.