Assume we train a model using X in shape `(batch_size, num_feature)`

and Y in shape `(batch_size, output_num)`

, then use it to predict some test input, sometimes it gives different results of `model(X)[-n:]`

and `model(X[-n:])`

. For example:

```
import torch
from sklearn.datasets import fetch_openml, fetch_california_housing
from torch.utils.data import TensorDataset, DataLoader
from sklearn.preprocessing import MinMaxScaler
torch.manual_seed(42)
# load the data
X, y = fetch_california_housing(return_X_y=True)
# normalize the data
scaler_x = MinMaxScaler()
scaler_y = MinMaxScaler()
X = scaler_x.fit_transform(X)
y = scaler_y.fit_transform(y.reshape(-1, 1))
X, y = torch.tensor(X, dtype=torch.float32), torch.tensor(
y, dtype=torch.float32)
print(X.shape, y.shape)
train = TensorDataset(X, y)
train_loader = DataLoader(train, batch_size=64, shuffle=False)
# define the model
model = torch.nn.Sequential(
torch.nn.Linear(X.shape[1], 64),
torch.nn.ReLU(),
torch.nn.Linear(64, 128),
torch.nn.ReLU(),
torch.nn.Linear(128, y.shape[1])
# torch.nn.Sigmoid()
)
# define the loss function and the optimizier
criterion = torch.nn.MSELoss(reduction='mean')
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
for epoch in range(10):
for x_batch, y_batch in train_loader:
# forward pass: compute predicted y
y_pred = model(x_batch)
# compute loss
loss = criterion(y_pred, y_batch)
# backward pass
loss.backward()
optimizer.step()
optimizer.zero_grad()
model.eval()
# test the model
n = 3
y_pred_whole = model(X).detach().numpy().flatten()
y_pred_tail = model(X[-n:]).detach().numpy().flatten()
print(y_pred_whole[-n:] == y_pred_tail)
print(y_pred_whole[-n:] - y_pred_tail)
```

The output on my computer (torch version 1.12.1)

```
torch.Size([20640, 8]) torch.Size([20640, 1])
[False False False]
[7.450581e-09 7.450581e-09 7.450581e-09]
```

A gap of e-09 lies between two prediction batches, but they are both the prediction results of the last 3 samples in the train data, which means they are supposed to be the same. I know it may be just the float precision issue (see IEEE 754), but itâ€™s interesting that the error may change on different `n`

values (last 1,2,3,4,â€¦ samples) or different runtime environments (run the sample code on Colab with version 2.1.0+cu118, the difference become [-1.4901161e-08 0.0000000e+00 0.0000000e+00]). I never come across the sample problem when using Keras because of the fixed batch size when building models, does it mean we also have to fix the batch size of inputs when sharing a trained/pre-trained model with others in PyTorch?