I am experiencing the dataloader performing different to what is expected. I have built a simple linear regression with one input variable and one output variable.
Here is my linear regression architecture:
import torch
from torch.autograd import Variable
class linearRegression(torch.nn.Module):
def __init__(self, inputSize, outputSize):
super(linearRegression, self).__init__()
self.linear = torch.nn.Linear(inputSize, outputSize)
def forward(self, x):
out = self.linear(x)
return out
Then using this I have two examples where I train the model. In the first example I do not use the dataloader, in the second example I do use the data loader.
Example 1:
inputDim = 1
outputDim = 1
learningRate = 0.01
epochs = 100
model = linearRegression(inputDim, outputDim)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate)
for epoch in range(epochs):
inputs = Variable(torch.from_numpy(train[:,:-1].reshape(train.shape[0], 1))).float()
labels = Variable(torch.from_numpy(train[:,1].reshape(train.shape[0], 1))).float()
optimizer.zero_grad()
outputs = model(inputs.float())
loss = criterion(outputs, labels.float())
loss.backward()
optimizer.step()
print('epoch {}, loss {}'.format(epoch, loss.item()))
After running this a few times you can quickly see that the loss goes down to about 0.2. However if we use the dataloader (example 2), you will see that the loss never goes below 0.7 (with the same hyperparameters of course).
Example 2:
inputDim = 1
outputDim = 1
learningRate = 0.01
epochs = 100
model = linearRegression(inputDim, outputDim)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate)
dataloader = DataLoader(train, batch_size = 700, shuffle = True)
for epoch in range(epochs):
for data in dataloader:
inputs = data[:,:-1]
labels = data[:,-1]
optimizer.zero_grad()
outputs = model(inputs.float())
loss = criterion(outputs, labels.float())
loss.backward()
optimizer.step()
print('epoch {}, loss {}'.format(epoch, loss.item()))
Does the dataloader transform the data somehow before passing each batch to the model?
Why are the losses so different?