How to get the file name in dataloader

cat-loves-donuts · May 27, 2020, 2:13am

Well, I create d a test data set which contains 13 different objects. After the training I want to use those 13 objects to test my model. I printed confusion matrix for each test data, so I need to get the name of each test data. However I used shuffle in dataloader, which called data_loader_test, when I read test data set. I used data_loader_test.dataset.training_files inside epoch loop to get the file name in each epoch. But it looks like the file name is the same order which all those test data sorted in the file. How can I get the file name order after shuffle?

ptrblck · May 27, 2020, 4:22am

You could create a custom Dataset and return the image name with the data and target samples.
This would make sure that, even after shuffling, you would still get the corresponding file names for the current data and target batch.

This tutorial gives you a good overview on writing a custom Dataset.

cat-loves-donuts · May 27, 2020, 6:55am

Oh, thank you so much. I will try this one.

zhangyu-python · April 15, 2021, 11:21am

hellow, did you succeed? I also meet the problem now.

cat-loves-donuts · April 19, 2021, 9:29am

Yeah, it works, when I pre-process my .npz data, I save those data into a tuple and add the file name into it too. So when I pass them into the data loader, the file name is also in it.

zhangyu-python · April 19, 2021, 9:55am

Thank you so much for your reply，and it works in my dataset ,too

aemonge · January 20, 2023, 4:47pm

For anyone who’s interested in the code directly, it would be in the CustomDataset

    return image, [class_tensor, img_name]

And loading it as follows:

for (images, labels) in iter(loaders)
    labels[0] # is the class_tensor
    labels[1] # is the img_name