TransformerEncoderLayer Neural Network model accuracy changes based on batch size

I have trained a Neural Network model with a TransformerEncoderLayer in it. I have the model saved, but when I go to load the model and evaluate the model, the accuracy changes based on the batch size. Why is this?

Can you show a simple example (with some reproducible code) that illustrates this?

It’s just any time I use nn.TransformerEncoderLayer in anyway with a saved model if the data is in a different order i get different results. Is there a way to save the Encode table, this would be in the MultiheadAttention part of the TransformerEncoderLayer right?

I’m some what new to Transformers.

just using TransformerEncoderLayer and save the model and then use np.random.permutation() to shuffle the input data. This always gives me different results unless I use the same order every time.

i have this layer in my model like this self.transformer = nn.TransformerEncoderLayer()
and save the model like so, path)
does this not save the nn.TransformerEncoderLayer() or something?

Typically one saves the state_dict, per the instructions:, path)
They both do the same thing, my way just allows you not to have to redefine the model. Sorry for the late response I got sick.

It has something to do with the nn.TransformerEncoderLayer() because it works fine if i take this layer out

Got it, what is the shape of your input, and are you using batch_first=True or not? Basically one thing to just make sure of is that you don’t have your batch and the sequence dimensions mixed up in your implementation.

batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False (seq, batch, feature).

Thank you! this might be my problem