Training loop checking validation accuracy


When training my model, at the end of each epoch I check the accuracy on the validation set. To do this I use model.eval() and then set it to model.train() after checking the validation set. This leads to an accuracy of around 90%. However when I run my model without checking the validation set until after the whole training is complete, the accuracy becomes 80%. This could be a random shift in the accuracy by the end, but I was wondering if I am doing something wrong. The code is roughly below. Check accuracy just feeds the dataloader through the model and compares the outputs to calc the accuracy

for e in range(epochs):
        for t, x in enumerate(my_dataloader):
            x_arr = np.array(x)
            x1 = x_arr[:,0]
            x1 = Batch.from_data_list(x1).to(device)
            x2 = torch.stack(x_arr[:,2].tolist(), dim=0).to(device=device, dtype=torch.long)
            x3 = x_arr[:, 3]
            x3 = Batch.from_data_list(x3).to(device)

            model.train()  # put model to training mode
            # x =  # move to device, e.g. GPU
            outputs = model(x1,x3,x2)
            # maxes = outputs.max(1)[0].reshape(-1,1)
            # print(len(maxes))
            loss = nn.CrossEntropyLoss(weight=weight)(outputs, x1.y)



            # Update the parameters of the model using the gradients

            if t % print_every == 0:
                print('Epoch: %d, Iteration %d, loss = %.4f' % (e, t, loss.item()))
      with torch.no_grad():

        my_dataloader = DataListLoader(X_test, batch_size=128)
        acc, y_pred, y_true = check_accuracy(my_dataloader, model)

The code looks alright.
How reproducible is this behavior? I.e. how many runs did you try with and without the eval code and what is the mean+std of the final accuracy?