I have a finetuned model and want to apply it to unlabeled images. The images are located in one directory with several subfolders. Each suchfolder can contain several subfolders as well. So I have to get them recursivly.
I have around 23.000 images to make a binary classification on. I thought it would be more efficient to load the data with a dataloader into my network rather than loading each image after another.
If the image gets classified as True (1) I want to copy the original image into another folder.
To get the data I am using os.walk().
rootpath = <mypath>
paths = []
for subdir, dirs, files in os.walk(rootpath):
for file in files:
#print os.path.join(subdir, file)
filepath = subdir + os.sep + file
paths.append(filepath)
How can I bring this into a pytorch Dataset where I can feed it into the DataLoader?
Here in the init method you can create a structure which stores paths to all images, any structure that can be indexed.
eg: list of image paths [’/img/1.png’ , ‘/img/2.png’ …]
Then in the getitem method you can load the corresponding image to a index x
(Add parameters to the init function if you want to)
from torch.utils.data import Dataset
class DataClass(Dataset):
def __init__(self):
self.list_of_paths = ... # Here create a list of all image paths / paths..
def __len__(self):
return len(self.list_of_paths)
def __getitem__(self, x):
image_path = self.list_of_paths[x] # Gives the path to an image
image = load_image ... # Here load your image using your path
return image