The model trained in PyTorch produces inconsistent predictions for the same image when processed individually versus in a batch.

I am noticing a significant difference in model predictions when running predictions on a single image versus the whole dataset. The model, which was trained using PyTorch, gives drastically different predictions for the same image when processed individually versus in a batch. Is there any way to ensure that the predictions are consistent for the same image when processed individually and in a batch?

How large are the relative and absolute errors?

For single image its [0.37732467 0.2642143 0.35846105]
and for that same image in batch its [0.3185594 0.40971586 0.2717247 ].

Could you post a minimal and executable code snippet reproducing these differences?

I am sharing the following code for training and predicting with my model. This is how the training process works:

  1. Training: The model is trained using the Trainer class from HuggingFace’s transformers library. The training arguments are set using TrainingArguments, and the model is trained on the train_dataset and evaluated on the val_dataset. After training, the model is saved.
  2. Prediction: Once the model is trained, I load the saved model and training arguments. Then, I use the Trainer class again to predict on the test_dataset.

Here’s the issue I’m encountering:

When I comment out the line training_args_loaded.per_device_eval_batch_size = 1, the evaluation batch size defaults to 320. However, when I uncomment this line, the batch size is set to 1. This change in batch size leads to a noticeable difference in the predictions for the same image—predicting with batch size 1 produces different results than when using batch size 320.

Could you help me understand why this discrepancy occurs and suggest how I can ensure consistent predictions regardless of the batch size setting?

Below is the code:

from transformers import Trainer, TrainingArguments
import torch
from torch.utils.data import Dataset
from transformers import PreTrainedModel, PretrainedConfig, TrainerCallback
import torch
import torch.nn.functional as F
import numpy as np

numOfFeatures=128
class SequenceDataset(Dataset):
def init(self, X, y):
self.X = torch.tensor(X, dtype=torch.float32)
self.y = torch.tensor(y, dtype=torch.long)

def __len__(self):
    return len(self.y)

def __getitem__(self, idx):
    return {"input_ids": self.X[idx], "labels": self.y[idx]}

class SequenceConfig(PretrainedConfig):
model_type = “sequence_transformer”
def init(self, num_features=numOfFeatures, num_classes=3, d_model=1024, nhead=4, num_layers=4, dim_feedforward=512, **kwargs):
self.num_features = num_features
self.num_classes = num_classes
self.d_model = d_model
self.nhead = nhead
self.num_layers = num_layers
self.dim_feedforward = dim_feedforward
super().init(**kwargs)

class SequenceTransformer(PreTrainedModel):
config_class = SequenceConfig

def __init__(self, config):
    super().__init__(config)
    self.embedding = torch.nn.Linear(config.num_features, config.d_model)
    self.positional_encoding = torch.nn.Parameter(torch.zeros(1, config.d_model))
    encoder_layer = torch.nn.TransformerEncoderLayer(d_model=config.d_model, nhead=config.nhead, dim_feedforward=config.dim_feedforward, batch_first=True)
    self.transformer_encoder = torch.nn.TransformerEncoder(encoder_layer, num_layers=config.num_layers)
    self.fc = torch.nn.Linear(config.d_model, config.num_classes)

def forward(self, input_ids, labels=None):
   
    src = self.embedding(input_ids) + self.positional_encoding
    output = self.transformer_encoder(src)
    logits = self.fc(output)
    probs = F.softmax(logits, dim=-1)

    loss = None
    if labels is not None:
        loss_fct = torch.nn.CrossEntropyLoss()
        loss = loss_fct(logits, labels)
        
    if labels is not None:
        loss_fct = torch.nn.CrossEntropyLoss()
        loss = loss_fct(logits, labels)
        
    return {"loss": loss, "logits": logits, "probs": probs} if loss is not None else logits

Training Code

config = SequenceConfig()
model = SequenceTransformer(config)
metrics=[]

# Training Arguments
batchSize=32
numWarmUpSteps=int(np.shape(train_image)[0]/batchSize/numOfBreakpointsPerEpoch/10)
training_args = TrainingArguments(
    output_dir=path,
    num_train_epochs=1, 
    per_device_train_batch_size=batchSize,
    per_device_eval_batch_size=320,
    warmup_steps=numWarmUpSteps,
    weight_decay=0.1,
    logging_strategy='no',
    eval_strategy="epoch",
    save_strategy="epoch",
    metric_for_best_model="accuracy",
    save_only_model=True,
)

trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=compute_metrics,
        callbacks =[]
)

train_output = trainer.train()

Prediction code

training_args_loaded = torch.load(path+“\SavedModels\training_args.bin”)
model_save_path = path+“\SavedModels\”
model = load_model(model_save_path, SequenceTransformer)

training_args_loaded.per_device_eval_batch_size=1

    trainer = Trainer(model=model,compute_metrics=compute_metrics,args=training_args_loaded)

testPredictions=trainer.predict(torch.tensor(test_dataset, dtype=torch.float32))