Dataload vs loop

import numpy as np
from torch.utils.data import Dataset, DataLoader
from time import time
import torch

dataload2 = np.load('all_np_stock_2y30close.npz')
all_state = dataload2['arr_0']
all_state_torch = torch.from_numpy(all_state)
#print(len(all_state))

class NetDataset(Dataset):
    # Initialize your data, download, etc.
    def __init__(self, state): 
        self.state = state
    def __getitem__(self, index):
        return self.state[index]
    def __len__(self):
        return len(self.state)

batch = 1080

dataset = NetDataset(all_state)
train_loader = DataLoader(dataset=dataset, batch_size=batch, shuffle=False)

TIME_ALL_START = time()
for st in range(0, len(all_state), batch):
#for state in train_loader:
    state = all_state_torch[st:st+batch]
print("ALL_START - ", time() - TIME_ALL_START)

For a long time I could not understand why my program is slow. I used Dataloader. then I decided to test the operation on a simple loop. I took a step to get the batch. My program began to work 10 times faster. Who can explain to me why the dataloader is so slow? I did not upload data to my network, I just checked how Dataloader works on an empty loop.

@slavavs I am having similar issue. Just using for loop and it seems that it is taking 4x more time to train comparing to an equivalent tensorflow model. I am struggling to find an issue. So, it would be great to see a solution if you find any.

Yes, I found a reason. Before submitting data to a dataset, convert it to a tensor.

but still the speed is not as good as

import numpy as np
from torch.utils.data import Dataset, DataLoader
from time import time

x = np.random.randn(1080, 30)

class NetDataset(Dataset):
    # Initialize your data, download, etc.
    def __init__(self, state): 
        self.state = state
    def __getitem__(self, index):
        return self.state[index]
    def __len__(self):
        return len(self.state)

batch = 180

dataset = NetDataset(x)
train_loader = DataLoader(dataset=dataset, batch_size=batch, shuffle=False)

TIME_START_1 = time()
for i in range(100):
    for i_main, state in enumerate(train_loader):
        pass
print("START_1 - ", time() - TIME_START_1)


TIME_START_2 = time()
for i in range(100):
    for st in range(0, len(x), batch):
        state = x[st:st+batch]
print("START_2 - ", time() - TIME_START_2)

START_1 - 0.47510671615600586
START_2 - 0.0002009868621826172

How much faster is the loop than Dataloader…
Although maybe I’m doing something wrong…

@slavavs Dataloader is useful when we can’t load our complete data in memory. Some overhead is expected using dataloader as it has to iterate and collate the data. However I am quite surprised by the difference in timings.

When you are calling state = all_state_torch[st:st+batch] it is simple operation of creating view over all_state_torch tensor. Which is super fast in compare to DataLoader because it creates completely new tensor as concatenation of multiple self.state[index] pieces.