Dataset load is too slow

pkr7098 · February 16, 2023, 2:54pm

I’m using my custom numpy dataset (.npy).
Here is my brief Dataset class

class MyDataset(Dataset):
   def __init__(self, ~~~):
      super(MyDataset, self).__init__()
      self.PATH = PATH
      self.x = np.load(PATH['x'])
      self.y = np.load(PATH['y'])
      self.x = [FloatTensor(tensor) for tensor in self.x]
      self.y = [FloatTensor(tensor) for tensor in self.y]
      self.len = len(self.x)

   def __getitem__(self, index):
      x = self.x[index]
      y = self.y[index]
      return x, y
   def __len__(self):
      return self.len

When I training my model, the interval between epochs is too slow. It takes 10 seconds (training epoch itself takes 6 seconds)

batch_size=64
batch shape: (64, 3, 2048)
num_workers=6
pin_memory=True

How to resolve it?

tataganesh · February 16, 2023, 5:14pm

In your case, loading the dataset involves loading two numpy arrays and storing them in self.x and self.y respectively. Once the dataset is loaded, retrieving the elements is a fast operation. Hence, the reason for slow training could be something else in your training process (e.g. model forward pass) . Check each step in your training loop, or post the training loop and the model here so that we can help you further.

pkr7098 · February 17, 2023, 3:52am

Thank you for reply.
Actually, I’m using pytorch-lightning framework so it maybe not good question for here but the code is similiar to pytorch so the pytorch user can give me some advice.

Here is my training code (validation and test code is same):


    def on_train_epoch_start(self):
        self.train_epoch_loss = 0
        self.train_epoch_loss_1 = 0
        self.train_epoch_loss_2 = 0
        self.train_step_count = 0
        
    def training_step(self, batch, batch_idx):        
        data1, data2, trg = batch
        data_gen1, data_gen2= self.model(trg)

        loss_gen1= self.loss(data_gen1, sp)
        loss_gen2 = self.loss(data_gen2, src)
        loss = loss_gen1 + loss_gen2
        
        self.train_step_count+=trg.shape[0]
        return {
            'loss_gen1': loss_gen1,
            'loss_gen2 ': loss_gen2,
            'loss': loss
        }
    
    def training_step_end(self, loss):
        self.train_epoch_loss += loss['loss'].item()
        self.train_epoch_loss_1 += loss['loss_gen1'].item()
        self.train_epoch_loss_2  += loss['loss_gen2 '].item()
        return super().training_step_end(loss)
    
    def training_epoch_end(self, outputs):
        self.log_dict({
            'train_epoch_loss ': self.train_epoch_loss / self.train_step_count,
            'train_epoch_loss_1 ': self.train_epoch_loss_1 / self.train_step_count,
            'train_epoch_loss_2': self.train_epoch_loss_2/ self.train_step_count
        }, sync_dist=True)

batch_size: 64
log_every_n_steps: 10
logger: wandb