Dataloader seems to apply a transformation to the data

wba.rich · January 16, 2020, 9:27am

I am experiencing the dataloader performing different to what is expected. I have built a simple linear regression with one input variable and one output variable.

Here is my linear regression architecture:

import torch
from torch.autograd import Variable

class linearRegression(torch.nn.Module):

    def __init__(self, inputSize, outputSize):
        super(linearRegression, self).__init__()
        self.linear = torch.nn.Linear(inputSize, outputSize)

    def forward(self, x):
        out = self.linear(x)
        return out

Then using this I have two examples where I train the model. In the first example I do not use the dataloader, in the second example I do use the data loader.

Example 1:

inputDim = 1        
outputDim = 1       
learningRate = 0.01 
epochs = 100

model = linearRegression(inputDim, outputDim)


criterion = torch.nn.MSELoss() 
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate)

for epoch in range(epochs):

    inputs = Variable(torch.from_numpy(train[:,:-1].reshape(train.shape[0], 1))).float()
    labels = Variable(torch.from_numpy(train[:,1].reshape(train.shape[0], 1))).float()

    optimizer.zero_grad()
    outputs = model(inputs.float())
    loss = criterion(outputs, labels.float())
    loss.backward()
    optimizer.step()

    print('epoch {}, loss {}'.format(epoch, loss.item()))

After running this a few times you can quickly see that the loss goes down to about 0.2. However if we use the dataloader (example 2), you will see that the loss never goes below 0.7 (with the same hyperparameters of course).

Example 2:

inputDim = 1        
outputDim = 1       
learningRate = 0.01 
epochs = 100

model = linearRegression(inputDim, outputDim)


criterion = torch.nn.MSELoss() 
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate)
dataloader = DataLoader(train, batch_size = 700, shuffle = True)
for epoch in range(epochs):

    for data in dataloader:
                
        inputs = data[:,:-1]
        labels = data[:,-1]

        optimizer.zero_grad()
        outputs = model(inputs.float())
        loss = criterion(outputs, labels.float())
        loss.backward()
        optimizer.step()
    
        print('epoch {}, loss {}'.format(epoch, loss.item()))

Does the dataloader transform the data somehow before passing each batch to the model?

Why are the losses so different?

Tahir · January 16, 2020, 9:31am

apart from anything else, you are using shuffle in the dataloader parameters which will randomize the inputs sequence. So if you test it for a little training time, you will likely get different results from first example.

wba.rich · January 16, 2020, 10:22am

Thank you. But since my batch size is set to the whole training data set, it shouldn’t matter if the records are shuffled. Surely if the data is shuffled, each row will only be used once in each epoch. So shuffling would change how the network weights are updated each time the algorithm is run, but its not like we are excluding any rows or using some rows more than others. Do you agree?

Also turning shuffle to false did not change the loss. The loss still never goes below 0.7. when using dataloader. Also increasing to 1000 epochs did not decrease the loss.

Tahir · January 16, 2020, 10:39am

yes, it would train on all rows.

to debug you can try running the model on small piece of data, e.g. 5 rows and see how both example work on it. i.e. calculate the weights and update.