I have been trying recently to fit a model for cat/dog recognition and noticed a strange behaviour. At the end of every training epoch I ran validation. When shuffle
flag in my validation DataLoader
was set to true, the loss on validation was close to the training loss. However, when I switched shuffle
to false, suddenly the validation loss became much worse (but still better than random guessing). During my investigation I encountered a simpler problem (at least in terms of minimal example), which I describe below.
I have trained a model and now I’m trying to use it for predicting. The code is below.
import numpy as np
import pandas as pd
from PIL import Image
from typing import Callable, List
from pathlib import Path
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
from torch.nn.functional import softmax
import torch
from imgrec.models import get_model, LoadParams
from imgrec.utils import ProblemType
class ImageBag(Dataset):
def __init__(self, paths: List[str], transform: Callable):
self.paths = paths
self.transform = transform
def __getitem__(self, i):
img = Image.open(str(self.paths[i])).convert('RGB')
return self.transform(img), self.paths[i]
def __len__(self):
return len(self.paths)
def load_model():
... # skipped for brevity
def main():
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
paths = [
"/mnt/ml-team/homes/grzegorz.los/cats_and_dogs/valid/cat/cat.10114.jpg",
"/mnt/ml-team/homes/grzegorz.los/cats_and_dogs/valid/cat/cat.5516.jpg"
]
img_bag = ImageBag(paths, transform=transform)
loader = DataLoader(img_bag,
batch_size=2,
shuffle=False,
num_workers=0)
model = load_model()
model = model.cuda()
for _ in range(5): # repeat prediction a few times
for batch in loader:
with torch.no_grad():
im_tensor, paths_tuple = batch
im_tensor = im_tensor.cuda()
logits_tensor = model(im_tensor)
probs = softmax(logits_tensor, dim=1)
for path, prob in zip(paths_tuple, probs):
print(Path(path).stem, prob.cpu().detach().numpy())
print('-'*50)
if __name__ == '__main__':
main()
What the code does is essentially:
- prepare a data loader of two images,
- load a model,
- use the model to predict labels of these two images a few times (in a loop).
And this is the output I received:
cat.10114 [0.96862036 0.03137956]
cat.5516 [0.7262352 0.27376482]
--------------------------------------------------
cat.10114 [0.9201531 0.0798469]
cat.5516 [0.8188935 0.18110651]
--------------------------------------------------
cat.10114 [0.97118205 0.02881794]
cat.5516 [0.92866737 0.07133257]
--------------------------------------------------
cat.10114 [0.949634 0.05036597]
cat.5516 [0.8162648 0.1837352]
--------------------------------------------------
cat.10114 [0.95325416 0.04674589]
cat.5516 [0.8434967 0.15650335]
--------------------------------------------------
Probability of showing a cat varies from 0.92 to 0.97 on one image, and from 0.73 to 0.93. I realize that any computation on GPU involves a random factor, but difference of ~0.2 must be a bigger issue.
I will be grateful for every advise.
torch==0.4.0
torchvision==0.2.1