How to reset model weights to effectively implement crossvalidation?

I am trying to implement cross validation by running an instance of my LSTM model on different crossvalidation fold datasets. The issue I’m having is that the model is remembering the weights of the parameters with each subsequent run of the cross validation. What is the easiest way to reset the weights of the model so that each cross validation fold starts from some random initial state and is not learning from the previous folds?

Here is my model as currently defined:

class LSTMModel(nn.Module):
    LSTM Model parameters
    def __init__(self, input_dim, hidden_dim, num_layers, output_dim):
        super(LSTMModel, self).__init__()
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.output_dim = output_dim
        self.num_layers = num_layers
        self.lstm = nn.LSTM(self.input_dim, self.hidden_dim, self.num_layers)
        self.forecast = nn.Linear(self.hidden_dim, self.output_dim)

    def forward(self, x):

        batch_size = 1
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_dim).requires_grad_()
        c0 = torch.zeros(self.num_layers, batch_size, self.hidden_dim).requires_grad_()

        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))

        out = self.forecast(out[:, -1, :])
        out = out.unsqueeze_(-1)
        return out

def train_model(datafile, date_col, target_col, hidden_dim,
                num_layers, output_dim, num_epochs, learning_rate):
    dataset = CreateDataset(datafile, date_col, target_col)
    input_dim = dataset.num_features

    X_train_sets = dataset.X_train_sets
    y_train_sets = dataset.y_train_sets
    X_test_sets = dataset.X_test_sets
    y_test_sets = dataset.y_test_sets

    model = LSTMModel(input_dim, hidden_dim, num_layers, output_dim)
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    hist = np.zeros(num_epochs)
    test_error = []

    for X_train, y_train, X_test, y_test in zip(
            X_train_sets, y_train_sets, X_test_sets, y_test_sets):
        for epoch in range(num_epochs):
            output = model.forward(X_train)
            loss = criterion(output, y_train)

            if epoch % 100 == 0:
                print('Epoch ', epoch, 'Loss: ', loss.item())
            hist[epoch] = loss.item()
1 Like

You could call .reset_parameters() on all child modules:

model = LSTMModel(1, 1, 1, 1)
for name, module in model.named_children():
    print('resetting ', name)

Thank you very much!

would it also work to just do:

model = LSTMModel(input_dim, hidden_dim, num_layers, output_dim)

under the first loop block every time?

This could work and depends a bit on your work flow.
Creating a new module would also force you to recreate the optimizer, since the old optimizer will contain the references to the already deleted model.

Well, for my problem I was doing a 5 fold cross validation using Unet, and what I would do is create a new instance of the model every time and I would create a new instance of the optimizer as well. I assumed it was working because the F1 score would start at 0 each fold when I keep track of the F1 score each epoch.

Yes, as explained it might fit your use case, if you are re-initializing the optimizers as well. :slight_smile:

1 Like

When you say It might work, is there a possibility that it is not working as intended? I just want to be absolutely sure. I am using the UNet in this repo:

and my fold loop is like such:

for fold in range 5:
   model = UNet().to(device)
   optimizer = torch.optim.adam(model.parameters(), learning rate)
   criterion = ...
   #select data

Not necessarily, but I can’t claim it’ll work without seeing code or reproducing it myself.
That being said, the approach is right, and especially since you are seeing the desired behavior, you can assume it’s working.

Is there any way to absolutely guarantee that the weights are reset? I just want to be precise as possible because it is for a paper I am working on.

alternatively I could send you the code privately if you would be willing to take a look because it is not in a public repository, I’ll just need to verify that it is ok to do so.

No, I wouldn’t recommend to send private repositories to people outside your lab (or other collaborators).

A new model instance will have newly initialized parameters. You could double check it by creating a deepcopy of the already trained model, create the new model and optimizer, and compare the state_dicts of both.

Ok, one follow up question. If I am to pretrain the model on one data set, save the weights and optimizer for continuing to train later, once I load in these weights to train on the new data set, should I reset the optimizer when I start to train on the new dataset? or should I load in the optimizer dictionary information? I was looking at a pytorch documentation about saving models and it said to save the optimizer when you want to continue training, but I am not sure if that applies to pretraining and then training on a new dataset.

If your optimizer contains internal states (running estimates of the gradients etc.), it would probably make sense to restore it. Otherwise you might see a peak in your loss when you continue the training on the new dataset. I don’t think the loading of the optimizer is dataset-dependent, but let us know, if you see a different behavior for your use case (which would be interesting to know).

Ok yeah, I’ll try it both ways and update you on the results! Thanks!