Training and test discrepancy (libtorch)

So I’ve been playing around with pytorch/libtorch C++ to do something similar to AlphaGo but with Gomoku instead and have almost everything working.
What I’ve noticed is that what I feed into my model for training doesn’t seem to get the same result when I test it again.

If someone wants any part of the source code I’ll post it but I’ll try and summarize first.
My model takes in a tensor of shape (x, 4, 15, 15) (where x is the batch size)
it outputs a simple tanh float number between -1.0 and 1.0

As a test I gave it 2 boards and told it that board 1 should be 1.0 and board 2 should be -1.0.
As it trains with an Adam optimizer I record down what the forward outputs, and the last output is something like

However, if I feed board 1 by itself into my model as a shape (1, 4, 15, 15) the model spits out 0.5648.
I then send in both board 1 and board 2 as a shape (2, 4, 15, 15) and only then would it output my recorded 0.98/-0.84 above.

Is this expected behaviour? Am I training my model wrong?

P.S. I train my model by manually creating a shape (x, 4, 15, 15) inputTensor and a (x, 1) valueAnswerTensor with the following training loop.

torch::Tensor valueTensor = m_pNetworkGpu->forward(;
torch::Tensor valueLoss = (valueTensor -;
std::cout << valueTensor;

For a game neural network to work I need to send individual boards to the network to evaluate, but I’m under the impression you want to send batches for training. But the results above seems to point that you have to train using 1 example at a time?

For any future developers that come upon this, I figured out why, I had some dropout based layers in my code, which means the network being in train/eval mode mattered.

I was able to fix this by calling
model.train() in the beginning of my training function and model.eval() when calling just trying to get an accurate evaluation of a boardstate.