Even when using model.eval()
I get different predictions when changing the batch size. I’ve found this issue when working on a project with Faster R-CNN and my own data, but I can replicate it in the tutorial “TorchVision Object Detection Finetuning Tutorial” (TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 1.9.0+cu102 documentation), which uses Mask R-CNN.
Steps to replicate the issue:
- Open collab version: Google Colaboratory
- Run all cells
- Insert a new cell at the bottom with the code below and run it:
def get_device():
if torch.cuda.is_available():
return torch.device('cuda')
else:
return torch.device('cpu')
def predict(model, image_tensors):
"""
Generate model's prediction (bounding boxes, scores and labels) for a batch
of image tensors
"""
model.eval()
with torch.no_grad():
predictions = model([x.to(get_device()) for x in image_tensors])
return predictions
def generate_preds(model, batch_size):
"""
Create dataloader for test dataset with configurable batch size.
Generate predictions. Return a list of predictions per sample.
"""
dataloader = torch.utils.data.DataLoader(
dataset_test, batch_size=batch_size, shuffle=False, num_workers=2,
collate_fn=utils.collate_fn)
all_pred = []
for batch in dataloader:
image_tensors, targets = batch
predictions = predict(model, image_tensors)
all_pred += predictions
return all_pred
# Generate two sets of predictions, only change is batch size
preds1 = generate_preds(model, 1)
preds8 = generate_preds(model, 8)
assert len(preds1) == len(preds8)
# Investigate first five samples:
for x in range(5):
print(f"\nSample {x}:")
print("-Boxes")
print(preds1[x]["boxes"])
print(preds8[x]["boxes"])
print("-Scores")
print(preds1[x]["scores"])
print(preds8[x]["scores"])
print("-Labels")
print(preds1[x]["labels"])
print(preds8[x]["labels"])
The code above generates two sets of predictions for the test set. The first one is generated with a batch size 1 and the second with a batch size 8. The output that I get when I run that cell:
Sample 0:
-Boxes
tensor([[ 61.2343, 37.6461, 197.8525, 325.6508],
[276.4769, 23.9664, 290.8987, 73.1913]], device='cuda:0')
tensor([[ 59.1616, 36.3829, 201.7858, 331.4406],
[276.4261, 23.7988, 290.8489, 72.8123],
[ 81.2091, 37.6342, 192.8113, 217.8009]], device='cuda:0')
-Scores
tensor([0.9989, 0.5048], device='cuda:0')
tensor([0.9988, 0.6410, 0.1294], device='cuda:0')
-Labels
tensor([1, 1], device='cuda:0')
tensor([1, 1, 1], device='cuda:0')
Sample 1:
-Boxes
tensor([[ 90.7305, 60.1291, 232.4859, 341.7854],
[245.7694, 56.3715, 305.2585, 349.5301],
[243.0723, 16.5198, 360.2888, 351.5983]], device='cuda:0')
tensor([[ 91.1201, 59.8146, 233.0968, 342.2685],
[245.7369, 56.6024, 305.2173, 349.3939],
[241.1119, 32.6983, 362.4162, 346.0358]], device='cuda:0')
-Scores
tensor([0.9976, 0.9119, 0.1945], device='cuda:0')
tensor([0.9975, 0.9128, 0.1207], device='cuda:0')
-Labels
tensor([1, 1, 1], device='cuda:0')
tensor([1, 1, 1], device='cuda:0')
Sample 2:
-Boxes
tensor([[281.1774, 53.5141, 428.7436, 330.3915],
[139.6456, 23.7953, 264.7703, 330.2114]], device='cuda:0')
tensor([[281.7463, 53.2942, 429.3290, 327.9640],
[138.7147, 23.8612, 264.6823, 332.3202]], device='cuda:0')
-Scores
tensor([0.9969, 0.9947], device='cuda:0')
tensor([0.9968, 0.9945], device='cuda:0')
-Labels
tensor([1, 1], device='cuda:0')
tensor([1, 1], device='cuda:0')
Sample 3:
-Boxes
tensor([[175.3683, 34.3320, 289.3029, 306.8307],
[ 76.7871, 15.4444, 187.0855, 299.1662],
[ 0.0000, 45.9045, 51.3796, 222.0583],
[319.1224, 53.0593, 377.1693, 232.7251],
[260.2587, 55.8976, 309.0191, 229.4261],
[ 70.2029, 27.2173, 126.4584, 234.3767],
[ 38.0638, 55.5370, 65.4132, 164.1965],
[ 98.7189, 91.5356, 172.5915, 295.5404],
[ 70.1933, 56.1804, 103.6161, 218.4743]], device='cuda:0')
tensor([[175.1848, 36.0377, 288.8358, 305.3505],
[ 76.8171, 15.7485, 187.4645, 299.5779],
[ 0.0000, 45.9045, 51.3796, 222.0582],
[319.1060, 53.0140, 377.3391, 232.7926],
[260.2587, 55.8976, 309.0191, 229.4261],
[ 70.2030, 27.2173, 126.4584, 234.3767],
[ 38.0638, 55.5370, 65.4132, 164.1965],
[ 70.1933, 56.1804, 103.6161, 218.4743]], device='cuda:0')
-Scores
tensor([0.9968, 0.9959, 0.9942, 0.9937, 0.9271, 0.8133, 0.4273, 0.1163, 0.0884],
device='cuda:0')
tensor([0.9974, 0.9965, 0.9942, 0.9937, 0.9271, 0.8133, 0.4273, 0.0884],
device='cuda:0')
-Labels
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1], device='cuda:0')
tensor([1, 1, 1, 1, 1, 1, 1, 1], device='cuda:0')
Sample 4:
-Boxes
tensor([[318.0241, 60.4089, 450.3268, 348.4254],
[167.0622, 27.6761, 242.5035, 316.6244],
[221.8452, 26.9947, 310.0547, 291.2983],
[295.6860, 23.4690, 379.8831, 260.1526],
[140.3205, 44.4713, 223.6427, 281.9173],
[141.0462, 24.9851, 313.7406, 301.5022],
[252.8210, 28.4908, 358.8223, 261.0169]], device='cuda:0')
tensor([[317.8378, 63.2861, 450.5063, 350.6856],
[167.0629, 27.6768, 242.5045, 316.6241],
[221.8452, 26.9948, 310.0548, 291.2983],
[295.6860, 23.4690, 379.8831, 260.1525],
[142.1777, 24.9079, 313.1906, 302.9822],
[140.3205, 44.4713, 223.6428, 281.9174],
[252.8209, 28.4907, 358.8222, 261.0172]], device='cuda:0')
-Scores
tensor([0.9969, 0.9948, 0.9910, 0.9733, 0.1821, 0.1696, 0.0668],
device='cuda:0')
tensor([0.9968, 0.9948, 0.9910, 0.9733, 0.1832, 0.1821, 0.0668],
device='cuda:0')
-Labels
tensor([1, 1, 1, 1, 1, 1, 1], device='cuda:0')
tensor([1, 1, 1, 1, 1, 1, 1], device='cuda:0')
As far as I know, the prediction output for each sample for the batch size 1 and batch size 8 should be the same. However, there are differences in scores, bounding boxes, number of proposals…
Any help will be appreciated