Rookie ask: how to speed up the loading speed in pytorch

dmarnerides · July 8, 2017, 2:24pm

This is exactly what DataLoader does if you set num_threads > 1 (the name DataLoader is unfortunate in my opinion, since it is really an iterator).

What you will need to do in your case to use DataLoader, is to implement your own dataset and pass it to the DataLoader when you create it. Your dataset will be a class that implements getitem(self, index) (and len(self) ). getitem will load and return a datapoint from your database (or from wherever else you choose to load your data from). You could even have it read from a text file, but you might run into problems with that if using multiple threads.

See the an example dataset here which loads images from directories:

github.com

pytorch/vision/blob/master/torchvision/datasets/folder.py

import torch.utils.data as data

from PIL import Image
import os
import os.path

IMG_EXTENSIONS = ['.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm']


def is_image_file(filename):
    """Checks if a file is an image.

    Args:
        filename (string): path to a file

    Returns:
        bool: True if the filename ends with a known image extension
    """
    filename_lower = filename.lower()
    return any(filename_lower.endswith(ext) for ext in IMG_EXTENSIONS)

This file has been truncated. show original