I have trained a Neural Network model with a TransformerEncoderLayer in it. I have the model saved, but when I go to load the model and evaluate the model, the accuracy changes based on the batch size. Why is this?
Can you show a simple example (with some reproducible code) that illustrates this?
It’s just any time I use nn.TransformerEncoderLayer in anyway with a saved model if the data is in a different order i get different results. Is there a way to save the Encode table, this would be in the MultiheadAttention part of the TransformerEncoderLayer right?
I’m some what new to Transformers.
edit:
just using TransformerEncoderLayer and save the model and then use np.random.permutation() to shuffle the input data. This always gives me different results unless I use the same order every time.
i have this layer in my model like this self.transformer = nn.TransformerEncoderLayer()
and save the model like so
torch.save(model, path)
does this not save the nn.TransformerEncoderLayer() or something?
Typically one saves the state_dict, per the instructions:
torch.save(model.state_dict(), path)
They both do the same thing, my way just allows you not to have to redefine the model. Sorry for the late response I got sick.
It has something to do with the nn.TransformerEncoderLayer() because it works fine if i take this layer out
Got it, what is the shape of your input, and are you using batch_first=True
or not? Basically one thing to just make sure of is that you don’t have your batch and the sequence dimensions mixed up in your implementation.
batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False (seq, batch, feature).
Thank you! this might be my problem