Extract raw data or dataset from shuffled dataloader

Hi,

Do you know how to extract the raw data or Dataset (in the original order when the Dataloader was created) from a shuffled Dataloader?

Best regards

Did you mean, Extracting inputs and targets from DataLoader?

Yes, that’s what I mean … I want to extract the raw features/target in the same order of the original data, from which the dataset was constructed … and from which the shuffled dataloader was loaded

I’m looking for something like

dl = DataLoader(....shuffled=True)
dl.data or dl.dataset # ordered and original data, not shuffled

Best regards

apologies but i dont know to revert the shuffling process of DataLoader,

I think to extract raw files we can get using like this,


for input, target in dl:
     ...

But it that, the raw files/data would be shuffled (unordered) …

Thanks anyway :slight_smile: @mathematics

1 Like

While building DataLoader , if shuffle argument is not passed,
I think It will be ordered like raw files because shuffle=False is default arg in Dataloader https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader , unless you specified to be True which will shuffle raw dataset.

Yes, I know that. That’s why I posted:

I’m looking for something like

dl = DataLoader(....shuffled=True)
dl.data or dl.dataset # or something like this, to get ordered and original data, not shuffled

Your proposed approach using dl.dataset should work:

class MyDataset(Dataset):
    def __init__(self):
        self.data = torch.arange(10)
        
    def __getitem__(self, index):
        x = self.data[index]
        return x
    
    def __len__(self):
        return len(self.data)

dataset = MyDataset()
loader = DataLoader(dataset, shuffle=True, batch_size=2)

for data in loader:
    print(data)

for data in loader.dataset:
    print(data)

If you want to just get the raw data from the dataset you don’t need to use DataLoader. You can just index the dataset. For example if your dataset is called DS and getitem() returns an image and target you could use a for loop to iterate over the dataset in the order that the raw data is in.

for i in range(DS.__len__()):
    img, target = DS[i]