An alternative of using a DataLoader in a for loop?

Hello everyone,

I’m facing a speed up problem. In fact, I’m using DataLoader as a batch generator to train my network. Here is the concerned piece of code:

train_loader = data.DataLoader(np.concatenate((X,Y), axis=1), batch_size=16, …)
for epoch in range(n_epochs):
for _, da in enumerate(train_loader, 0):
inputs = torch.tensor(da[:,:-2].numpy())
targets = da[:,-2:]
optimizer.zero_grad()

optimizer.step()
I find the execution of this code relatively slow. Do you have any suggestion to improve it? Thank you in advance

Try this-

train_loader = data.DataLoader(data.TensorDataset(X, Y), batch_size=16, …)
for epoch in range(n_epochs):
    for batch in train_loader:
        inputs, targets = batch

Thank you braindotai for your suggestion, but I have a question: Is not the double loop slowing down the execution?

Not actually, this is just a way we train our deep learning models, where epochs represent how many times we are looking to our dataset. If you want to finish your training earlier then indeed you can set epochs to like 10, 5… etc. But again if you do so then your model might not be able to predict correctly with it’s full potential.
And even if training is slow, then it actually has nothing to do with double for loop. In that case your model would be big, or you are dealing massive datasets.
In case you want faster training you can always use gpu if available, by sending your model to vram using model = model.cuda() and doing same for inputs and targets as

inputs = batch[0].cuda()
targets = batch[1].cuda()

Feel free to ask for any further clarifications.

Thank you again, but this is what exactly I have done. Actually I’m comparing the following two pieces of code:

for epoch in range(n_epochs):
        for _, da in enumerate(trainloader, 0):
            inputs  = torch.tensor(da[:,:-2].numpy())
            targets = da[:,-2:]
            optimizer.zero_grad()
            inputs  = torch.tensor(np.array([[pp for pp in p] for p in inputs]).astype(np.float32))
            outputs = model(inputs.to(device))
            loss = model.loss_f(outputs, targets.to(device)) 
            
            loss.backward()
            optimizer.step()

And:

for epoch in range(n_epochs):
      model.train()
      optimizer.zero_grad()
      outputs    = model(X.to(device))
      loss       = model.loss_f(outputs, Y.to(device))
      
      loss.backward()
      optimizer.step()

where:

device = 'cuda' if torch.cuda.is_available() else 'cpu'

The second one seems to be faster. However, the second seems to be less efficient than the first one in some cases. I need to improve the first one.

Okay, now I understand your doubt.
See in first case you are predicting, calculating loss, calculating gradients, and applying gradient update for each batch, so total number of these operations happening for each epoch is equal to “number_of_batches”, which can be calculated as (len(X)//batch_size) + 1.

Whilst in second case you are applying these operations (predicting, calculating loss, calculating gradients, and applying gradient update) exactly 1 number of time for each epoch.

This is the first reason why second one is faster.

Another reason which is more related, is parallel computation. Computations occurred in parallel can be very fast than sequential, especially in case of GPU’s (that’s the reason why we use GPU’s in the first place).

And in the first case you are doing training by splitting the dataset in to batches. So training in this case would be sequential, means slower.

Whilst in the second case you are doing training by utilizing the whole X, Y at a time. So training in this case would be parallel, means faster.

These are the reasons by second case is faster.

But as you said, in second case loss wouldn’t converge as easily when compared to first case. That’s why we always split dataset into batches and then train the model. Otherwise there’s no point of better performance if loss is not converging :slight_smile:

So don’t worry about performance regression you get by splitting the dataset, 'cause you’d always get better loss convergence.

Instead you apply the second case when validating.
See, in training we were splitting the dataset for better optimization. But while validating there’s actually no point of splitting the validation data. The only case you’d have to do so is when your validation data is way big, and you couldn’t store it in you memory. In that case only you split your validation data and use dataloader in order to validate your model.

@braindotai understood, I really appreciate your help
Thank you !!