Concatenated dataloader have different image and label order

chanakin · February 15, 2022, 3:44pm

So the story is that I am doing a building segmentation for various sites. This means that I need to randomly split the train val test to equal proportion (e.g. to ratio of 60%, 20%, 10% respectively) for different location, and subsequently concatenating them into a big dataset.

I have manage to first split the images and then match the label to them. However, I’ve found out that once I concatenated the datasets together, although both resulting pytorch datasets of image and label contain both the correct images and labels, they are not in the same order. I am struggling to find a way to match and sort the order within the pytorch dataset object or pytorch dataloader object. Have anyone had similar issues before:

See attached output of print(Train.png_dir, Train.lbl_dir) in the same Train <pytorch.dataset> object:
The folder structure indicates the location:
KBY = Kalobeyei
DZK = Dzaleka
DZKN = Dzaleka North

ptrblck · February 16, 2022, 5:29am

I assume this means the labels do not correspond to the right sample anymore?
If so, how did you concatenate the data and made sure the correspondence is not broken?
Assuming you are using a custom Dataset, I guess you’ve created the data/label arrays in the __init__?

chanakin · February 16, 2022, 10:08am

Hi ptrblck,

Yes this is correct, the concatenation was done using, would it make sense to add a sorting element embedded within the init:

chanakin · February 17, 2022, 3:24pm

I solved this by a rediculously simple solution, is to copy the current png_dir list and then re.sub the required IMG to LBL.

Solved, sorry for the stupid question