Will not using eval() when validating affect training performance?

I’ve just started using pytorch and am working on my first project. What got me confused was the eval() function. I followed a starting tutorial that showed how to use pytorch and it did not contain using eval() function for validation. So I trained a model with a satisfactory validation performance, but then came across eval and train modes when looking something up. As far as I understand, not using it would only affect how accurate my validation actually is (because dropouts and normalizations would still be used for validation), but it would not affect the training itself. However, when I added the model.eval() and model.train() lines between validation, I noticed that the training loss for the same amount of epochs is now different?

for i in range(epochs):
    i += 1
    start = time.time()
    y_pred = model(categorical_train_data, numerical_train_data)
    single_loss = loss_function(y_pred, train_outputs)
    aggregated_losses.append(single_loss)
    print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')

    optimizer.zero_grad()
    single_loss.backward()
    optimizer.step()
    with torch.no_grad():
        y_val = model(categorical_test_data, numerical_test_data)
        loss = loss_function(y_val, test_outputs)

The code above produces

epoch:   1 loss: 0.80006623
epoch:   2 loss: 0.78904432

however, if I change it to

for i in range(epochs):
    i += 1
    start = time.time()
    y_pred = model(categorical_train_data, numerical_train_data)
    single_loss = loss_function(y_pred, train_outputs)
    aggregated_losses.append(single_loss)
    print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')

    optimizer.zero_grad()
    single_loss.backward()
    optimizer.step()
    model.eval()
    with torch.no_grad():
        y_val = model(categorical_test_data, numerical_test_data)
        loss = loss_function(y_val, test_outputs)
    model.train()

I get a different output:

epoch:   1 loss: 0.80006623
epoch:   2 loss: 0.78863680

I used torch.manual_seed(0) so I know it’s not the initial distribution. I can run the code multiple times and get the same output for both these cases.

Which as far as I understand means that weights of the model were updated. Does that mean that without using eval mode I let the validation set influence the training of the actual model and therefore the validation performance is not actually validation?

No, that won’t be the case, since you are never updating the model (and the gradient calculation is also disabled via torch.no_grad()).
The difference in the losses might come from additional calls into the random number generator during the validation loop while the model is still in training mode.
As you’ve already described the dropout layers will be used, which will sample random numbers and would thus change the next call into the random number generator.
Besides that the running stats in all batchnorm layers will be updated with the validation batch statistics, which would be a data leak.