Model inference memory crush

When I’m trying to evaluate class prediction on test dataset my laptop start freezing, I suppose somewhere in my code there is a memory leak.

Code below

test_tr = transforms.Compose([
        transforms.Resize((IMG_SHAPE, IMG_SHAPE)),

class data_test(
    def __init__(self, path):
        self.path = path
        self.names = list(os.listdir(path))

    def __len__(self):
        return len(self.names)

    def __getitem__(self, ind):
        img_name = self.names[ind]
        x = imread(os.path.join(self.path, str(img_name)))
        if len(x.shape) == 2 or x.shape[-1] == 1:
            x = gray2rgb(x)
        x = test_tr(x)
        return x, img_name

def infer(model_path, test_img_dir):
    dataset = data_test(test_img_dir)
    loader = DataLoader(dataset, 16, num_workers=1)
    #model = EfficientNet.from_pretrained('efficientnet-b1', num_classes=50)
    model = torch.load(model_path, map_location=torch.device('cpu'))
    res = {}
    for x, y in loader:
        out = model(x)
        for i, bird in enumerate(out):
            res[y[i]] = torch.argmax(bird).item()
    return res

Load model is from EfficientNet library, so I don’t missed up with model class)
When I’m run infer it can compute for several batches (1-3 iteration in loader) and after that laptop is freezing. Where is the weak place in code?

Have you tried different num_workers for the DataLoader?
num_workers=0 has been the most stable one from personal experience.

Otherwise I don’t see anything wrong with your code. Maybe the model is large and simply performs very slowly on CPU.

1 Like

Wow, really problem was with num_workers=1. Can you explain why, please?

In my understanding, num_workers=1 uses multithreading while num_workers=0 does not.
Using multi-threading means that you need to run the majority of your training loop inside a if __name__ == '__main__': just due to how Python handles multi-threading.

The multi-threading in DataLoader seems to be a bit sensitive as well. Using a 16 thread system, I am only able to run the dataloader on 6-8 threads. Using more threads have previously crashed training randomly.