New added data cause FileNotFound on Dataloader

XezXey · December 1, 2023, 8:34pm

I have a list of my data path but not all of them are in the directory. And i will copy my data from other place once my dataloader going to process this file (e.g. 1 batch before this). But this turn to be the OSError: file not found. Do you know how can i fix this or doing this way in another way around. The reason that i want to do this way is my very large dataset and i want to just partially copy them then remove after the training on that data is done.
I also add some code snippets of my toy example that show the problem.


import glob, os
import numpy as np
from torch.utils.data import DataLoader, TensorDataset, Dataset, RandomSampler, Sampler

class CustomDataset(Dataset):
    def __init__(self, data):
        self.data = data
    def __len__(self):
        return len(self.data)
    def __getitem__(self, index):
        return np.loadtxt(self.data[index])
    
if __name__ == '__main__':
    if os.path.exists('./3.txt'):
        os.system('rm 3.txt')
    file = glob.glob('./*.txt') + ['./3.txt']
    print(file)
    dataset = CustomDataset(file)
    dataloader = DataLoader(dataset=dataset, batch_size=1, shuffle=False, num_workers=1, drop_last=True, pin_memory=True, persistent_workers=True)

    for i, x in enumerate(dataloader):
        print("before : ", i, x)
        if i == 1:
            with open('./3.txt', 'w') as f:
                f.write('3')
        print("after : ", i, x)
        input(dataset.data)```