Cannot Load Data Lazily Using Dataloader

Hello,

I have been trying to load data one line at a time from two excel files into a dataloader since my files are larger than the RAM I have available. One file contains the data and the other one contains the labels and both are excel files.

I tried following examples such as the ones below, but to no avail. Can anyone please point out what I am doing wrong? Both files are in the “current working data” root folder.

Examples:

pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.room.html

https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

I get the following error:

File “C:\Users\johnt\anaconda3\lib\site-packages\torch\utils\data\dataloader.py”, line 1146, in _try_get_data
raise RuntimeError(f’DataLoader worker (pid(s) {pids_str}) exited unexpectedly’) from e

RuntimeError: DataLoader worker (pid(s) 113228, 108680, 109552, 119108, 117720, 116756, 114436, 68888) exited unexpectedly

My code is as follows:

class MyDataset(torch.utils.data.Dataset):

def __init__(self):
    self.data_files = os.listdir('current working data')
 

def __getindex__(self, idx):
   
    return torch.load(self.data_files[idx])

def __len__(self):
    return len(self.data_files)

training_data = MyDataset()
train_loader = DataLoader(training_data, batch_size = batch_size, shuffle = False, num_workers=8)

Any help is greatly appreciated.

Thank you very much.

Could you use num_workers=0 and rerun your code to see if a better error message might be raised?

Hi Patrick. Thank you for your feedback. When I set the workers to zero and rerun the code, I get the following error:

NotImplementedError: Subclasses of Dataset should implement getitem.

Ah great! Replace the __getindex__ function name with __getitem__.

Hi Patrick,

Thank you. I replaced “getindex” with “getitem” and now it says the following:

FileNotFoundError: [Errno 2] No such file or directory: ‘train_data.xlsx’

The issue appears to stem from the line in “getitem” that tries to load the data. The odd thing is that this file does indeed exist and is titled as shown above. Not quite sure what to do from here. Thank you again for all your help! I appreciate it very much!

The full traceback is this:

Traceback (most recent call last):

File “C:\Users\johnt\Documents\SDSU\Thesis\Room Acoustics\PyRoom Acoustics\CNN and Data Generator\CNN Model\CNN_Model_3_NGrid_2592_Lazy_data_loader.py”, line 190, in
MUSIC_data, labels = next(dataiter)

File “C:\Users\johnt\anaconda3\lib\site-packages\torch\utils\data\dataloader.py”, line 631, in next
data = self._next_data()

File “C:\Users\johnt\anaconda3\lib\site-packages\torch\utils\data\dataloader.py”, line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration

File “C:\Users\johnt\anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py”, line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]

File “C:\Users\johnt\anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py”, line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]

File “C:\Users\johnt\Documents\SDSU\Thesis\Room Acoustics\PyRoom Acoustics\CNN and Data Generator\CNN Model\CNN_Model_3_NGrid_2592_Lazy_data_loader.py”, line 171, in getitem
return torch.load(self.data_files[idx]) # orignal line

File “C:\Users\johnt\anaconda3\lib\site-packages\torch\serialization.py”, line 998, in load
with _open_file_like(f, ‘rb’) as opened_file:

File “C:\Users\johnt\anaconda3\lib\site-packages\torch\serialization.py”, line 445, in _open_file_like
return _open_file(name_or_buffer, mode)

File “C:\Users\johnt\anaconda3\lib\site-packages\torch\serialization.py”, line 426, in init
super().init(open(name, mode))

FileNotFoundError: [Errno 2] No such file or directory: ‘train_data.xlsx’

Depending on the passed path to os.listdir you might need to recreate the full path again while trying to loading the data.
E.g. if you’ve used self.data_files = os.listdir("/home/myusername/data"), you would need to load the data later using the same path: torch.load("/home/myusername/data/" + self.data_files[idx]).

With that being said, it seems you are trying to load a Microsoft Excel spreadsheet, which will fail as PyTorch won’t be able to open it.