I’m working on a project that deals with a huge amount of data. I created a custom Dataset for the dataloader but while there’s a huge GPU memory consumption, GPU util is at 0%
this is the Custom Dataset class:
class CustomDataset(Dataset): def __init__(self, filenames, batch_size, transform=None): self.filenames= filenames self.batch_size = batch_size self.transform = transform def __len__(self): return int(np.ceil(len(self.filenames) / float(self.batch_size))) # Number of chunks. def __getitem__(self, idx): #idx means index of the chunk. # In this method, we do all the preprocessing. # First read data from files in a chunk. Preprocess it. Extract labels. Then return data and labels. batch_x = self.filenames[idx * self.batch_size:(idx + 1) * self.batch_size] # This extracts one batch of file names from the list `filenames`. data =  labels =  for exp in batch_x: temp_data =  for file in exp: temp = np.loadtxt(open(file), delimiter=",") # Change this line to read any other type of file temp = scaler.fit_transform(temp) temp_data.append(temp) # Convert column data to matrix like data with one channel pattern = exp[16:20] # Pattern extracted from file_name for column in label_file: if re.match(str(pattern), str(column)): # Pattern is matched against different label_classes labels.append(np.asarray(label_file[column])) data.append(temp_data) data = np.asarray(data) # Because of Pytorch's channel first convention if self.transform: data = self.transform(data) data = data.reshape(data.shape, data.shape * data.shape * data.shape) labels = np.asarray(labels) # The following condition is actually needed in Pytorch. Otherwise, for our particular example, the iterator will be an infinite loop. # Readers can verify this by removing this condition. if idx == self.__len__(): raise IndexError return data, labels
I decided to batch the data already here but even if I do it in Dataloader it does not change a thing. the filenames is a list of 50k lists each containing 48 csv paths needed for each label that are then flattened.
calling Dataloader with num_workers (os.cpu_count()) and pin_memory=True doesnt change a thing.
when using dummy variables instead the GPU works and when I try to run that function outside to save an .npy files the kernel dies so it’s definitely a bottleneck from the data.
when I try to time the data assignment:
for i, data in enumerate(train_dataloader, 0): start_time = time.time() # Get and prepare inputs inputs, targets = data inputs, targets = inputs.to(dev), targets.to(dev) inputs, targets = inputs.float(), targets.float() print(time.time()-start_time)
I get :
anyone has an idea of how I can deal with this?
Thanks a lot