GPU Memory Consumption but GPU util at 0%, How to deal with data Bottlenecks?

Hey everyone,

I’m working on a project that deals with a huge amount of data. I created a custom Dataset for the dataloader but while there’s a huge GPU memory consumption, GPU util is at 0%

this is the Custom Dataset class:

class CustomDataset(Dataset):
    def __init__(self, filenames, batch_size, transform=None):
        self.filenames= filenames
        self.batch_size = batch_size
        self.transform = transform
    def __len__(self):
        return int(np.ceil(len(self.filenames) / float(self.batch_size)))   # Number of chunks.
    def __getitem__(self, idx): #idx means index of the chunk.
    # In this method, we do all the preprocessing.
    # First read data from files in a chunk. Preprocess it. Extract labels. Then return data and labels.
        batch_x = self.filenames[idx * self.batch_size:(idx + 1) * self.batch_size]   # This extracts one batch of file names from the list `filenames`.
        data = []
        labels = []
        for exp in batch_x:
            temp_data = [] 
            for file in exp: 
                temp =  np.loadtxt(open(file), delimiter=",") # Change this line to read any other type of file
                temp = scaler.fit_transform(temp)
                temp_data.append(temp) # Convert column data to matrix like data with one channel
            pattern = exp[0][16:20]      # Pattern extracted from file_name
            for column in label_file:
                if re.match(str(pattern), str(column)): # Pattern is matched against different label_classes

        data = np.asarray(data) # Because of Pytorch's channel first convention
        if self.transform:
            data = self.transform(data)
        data = data.reshape(data.shape[0], data.shape[1] * data.shape[2] * data.shape[3])
        labels = np.asarray(labels)

        # The following condition is actually needed in Pytorch. Otherwise, for our particular example, the iterator will be an infinite loop.
        # Readers can verify this by removing this condition.
        if idx == self.__len__():  
            raise IndexError

        return data, labels

I decided to batch the data already here but even if I do it in Dataloader it does not change a thing. the filenames is a list of 50k lists each containing 48 csv paths needed for each label that are then flattened.
calling Dataloader with num_workers (os.cpu_count()) and pin_memory=True doesnt change a thing.

when using dummy variables instead the GPU works and when I try to run that function outside to save an .npy files the kernel dies so it’s definitely a bottleneck from the data.

when I try to time the data assignment:

    for i, data in enumerate(train_dataloader, 0):
        start_time = time.time()
        # Get and prepare inputs
        inputs, targets = data
        inputs, targets =,
        inputs, targets = inputs.float(), targets.float()

I get :

anyone has an idea of how I can deal with this?

Thanks a lot :slight_smile: