Forward function and data loader

DianaE · October 9, 2018, 2:43pm

I am new to neural network and for sure PyTorch, I am working on simple feed forward NN to predict groundwater level form precipitation and temperature daily data.
I’m facing some problems and seeking help:
First problem: data loader
So I should be using data loader so to feed the input data in batches (for example I have 300 temperature values and I want a batch size of 4) my understanding, the dataloader will take the first four data, feed it forward and then move to the next four data, my question is, is there a way for the dataloader to take the first four and then move 1 reading ahead (to use three of the temperature data used in the previous batch, ex. First batch = temperatures 1, 2, 3 and 4, second batch= temperatures= 2, 3, 4 and 5, and so on till it get to the last reading)

Second problem: data loader and output(target) data

Will there be a need to use dataloader for the output(target) data if there will be no batch size as I want it to take one reading only, just one output
If I have many output nodes, with different batch size to the input data, should I construct a separate dataloader for it or is it possible to combine the input and output within the same dataloader function.

Third problem: forward function
In defining forward function within the class (nn.module), and struggling with the input data; if I am using data loader and batches, should I use dataloader as input, if so how. or should I use the entire data frame.
This is my code, and Im asking about xin(xinput),
def forward(self, xin):
xinhi = self.fc1(xin)
xhi = self.Sigmoid(xinhi)
xhiout = self.fc2(xhi)
xout = self.Sigmoid(hiout)
return xout

kaixin · October 9, 2018, 3:11pm

I am not sure your problem settings. In general, batch is used for both input and target data.
If you want to use 1 day’s data to predict another day’s data, then batch_size can be 4 (or any number that fits your memory). In the situation, the number of target data samples will be 4 in a batch too.
If you want to use several days’ data (say 4 days) to predict another day’s data, then 4 is not your batch size. The input 4 days’ data constitutes one training sample.

Best,

DianaE · October 9, 2018, 3:34pm

Thank you, but I got confused a bit.
I have 300 days temperature measurements and the corresponding 300 groundwater level measurements. I will be using the previous 3 days temperature measurements+today’s to predict (say) today’s water level. so my understanding is that I will have one input node that has 4 temperature measurements and one output node that has one measurement. so i understand that the batch size will be four so it takes every consecutive 4 value and move ahead but i want it to take one output (the water level) and not four

kaixin · October 10, 2018, 3:32am

The training samples in a batch serve the same purpose. In your settings, day 1’s temperature will influence the model in a way that is different from day 2.
In your settings, a training sample should be:

input: [day1's temp, day2's temp, day3's temp, day 4's temp]
target: [day4's water level]

then a batch is (say batch_size is 4):

input: [[day1's temp, day2's temp, day3's temp, day 4's temp],
        [day2's temp, day3's temp, day4's temp, day 5's temp],
        [day3's temp, day4's temp, day5's temp, day 6's temp],
        [day4's temp, day5's temp, day6's temp, day 7's temp]]
target: [[day4's water level],
         [day5's water level],
         [day6's water level],
         [day7's water level]]

You could custom a Dataset to slide the window in your data, a small and incomplete example is

from torch.utils.data import Dataset
class WeatherData(Dataset):
    def __getitem__(self, idx):
        return {'inputs': [self.temp[i] for i in range(idx + 4)],
                'target': self.water[idx + 3]}

DianaE · October 10, 2018, 9:42am

Thank you so much, but could I please ask you to have a lot at my code, for a reason its not working:

book= pd.ExcelFile(“path.xlsx”)
sheet=book.parse(“sheetname”)
Input=sheet[‘R’]
Target=sheet[‘MWL’]

batch_size = 365
learning_rate=0.01
epochs=5
input_size= 1
hidden_size=10
output_size=1

class LoadData(Dataset):
def init(self, input, target):
self.input=Input
self.target=Target

def __getitem__(self, idx):
    return {'inputs': [self.input[i] for i in range(idx +4)], 'target': self.target[idx +3]}
    
def __len__ (self):
    return 365

train_dataset = LoadData(Input, Target)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=False)

kaixin · October 10, 2018, 11:02am

I assume the variable Input and Target are sequences of same length (365, right?), then 365 is not your batch_size (btw, 365 is not a typical number for batch_size).

batch_size = 8
learning_rate=0.01
epochs=5
days_elapsed = 4

class LoadData(Dataset):
    def __init__(self, inputs, targets, days_elapsed):
        assert len(inputs) == len(targets)
        self.inputs = inpus
        self.targets = targets
        self.days_elapsed = days_elapsed

    def __len__(self):
        return len(self.targets) - self.days_elapsed + 1

    def __getitem__(self, idx):
        return {'inputs': [self.inputs[i] for i in range(idx + self.days_elapsed)],
                'target': self.targets[idx + self.days_elapsed - 1]}

train_dataset = LoadData(Input, Target)
# usually set shuffle=True
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

This is a minimal example. In practice, you might want to feed NN an array(or Tensor) rather than a list.

DianaE · October 10, 2018, 11:41am

I have 368 temperature measurements with the corresponding water-level, I am taking a block of four consecutive (the sliding window) readings, which would leave me with 362 = 365 - 3 training samples = batch size, right. just like what you have mentioned bfore

input: [[day1’s temp, day2’s temp, day3’s temp, day 4’s temp],
[day2’s temp, day3’s temp, day4’s temp, day 5’s temp],
[day3’s temp, day4’s temp, day5’s temp, day 6’s temp],
[day4’s temp, day5’s temp, day6’s temp, day 7’s temp]]
.
.
.
[day362’s temp, day363’s temp, day364’s temp, day 365’s temp]]

target: [[day4’s water level],
[day5’s water level],
[day6’s water level],
[day7’s water level]]
.
.
.
[day365’s water level]]

and ture, the data is list but i convert it like this and here is my code (but the problem that I get now is “RuntimeError: size mismatch, m1: [1 x 365], m2: [1 x 10]”)

book= pd.ExcelFile(“path.xlsx”)
sheet=book.parse(“sheetname”)
print(sheet)

Input=sheet[‘R’]
Target=sheet[‘MWL’]
Input = torch.tensor(Input)
Target = torch.tensor(Target)

batch_size = 362
days_elapsed = 4
learning_rate=0.01
epochs=5
input_size= 1
hidden_size=10
output_size=1

class LoadData(Dataset):
def init(self, inputs, targets, days_elapsed):
assert len(inputs) == len(targets)
self.inputs = inputs
self.targets = targets
self.days_elapsed = days_elapsed

def __len__(self):
    return len(self.targets) - self.days_elapsed + 1

def __getitem__(self, idx):
    return {'inputs': [self.inputs[i] for i in range(idx + self.days_elapsed)],
            'target': self.targets[idx + self.days_elapsed - 1]}

train_dataset = LoadData (Input, Target, days_elapsed)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=true)
print(train_loader)
len(train_loader)

class Net(nn.Module):
def init(self, input_size, hidden_size, output_size):

    super(Net, self).__init__()
    self.fc1 = nn.Linear(input_size, hidden_size)
    self.fc2 = nn.Linear(hidden_size, output_size)
    self.Sigmoid = nn.Sigmoid()


def forward(self, x):
    x = self.fc1(x)
    x = self.Sigmoid(x)
    x = self.fc2(x)
    x = self.Sigmoid(x)
    return x

FFNN = Net(input_size, hidden_size, output_size)
print(FFNN)

criterion = nn.MSELoss()
optimizer = torch.optim.SGD(FFNN.parameters(), lr=learning_rate)

Training the FNN Model

for epoch in range (epochs):
for batch_idx, (inputtensor, targettensor) in enumerate(train_loader):
inputtensor = Variable(Input)
targettensor = Variable(Target)
optimizer.zero_grad()
FFNN_output= FFNN(inputtensor)

    loss = criterion(FFNN_output, targettensor)
    loss.backward()
    optimizer.step()
    
    #print out some results every time a certain number of iterations is reached
    if (i+1) % 100 == 0
        print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
             %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))

kaixin · October 10, 2018, 12:17pm

In your case, the input shape should be 365 x 4, and the input_size should be 4 because every time you feed the NN with 4 datapoints and expect 1 output.

DianaE · November 8, 2018, 9:26am

Thank you.
if possible could you clarify this issue please:

if I want the network to update the weights based on the entire year data (365 days data) like a regression problem, should not I feed the whole training set at once like this:
Input nodes = 4 (feeding the network 4 days measurements)
output nodes = 1 (expecting 1 output)
batch size = 365-3 = 362 (I have 362 training example to cover the whole year)
input shape = 365 x 4 (as you mentioned)
output shape = 365 x 1

thank you in advance