I have a multidimensional dataset, (1827, 5).
From that dataset I exctract the variable I want to predict and im left with my X and y variables as such:
X has size (1827, 4) and y has size (1827, 4).
Then i further split them into train and test datasets giving 20% to the test dataset. now the shapes I have are these:
> print(X_train.shape) > print(y_train.shape) > print(X_test.shape) > print(y_test.shape) torch.Size([1461, 4]) torch.Size() torch.Size([366, 4]) torch.Size()
The batch size I have to use (I cannot change that) is 50.
My question is, when reshaping a tensor so as to fit my model, shouldnt I take the original dimensions into account ? By dividing the train dataset into batches of 50, I will surely put data that belongs to the same observation into different batches, simply because 50 is not divisible by 4 exactly.
My network architecture is this:
class NeuralNetwork(nn.Module): def __init__(self): super(NeuralNetwork, self).__init__() self.hidden1 = torch.nn.Linear(50, 25) # hidden layer self.hidden2 = torch.nn.Linear(25, 25) # hidden layer self.out = torch.nn.Linear(25, 1) # output layer def forward(self, x): z = F.relu(self.hidden1(x)) # activation function for first hidden layer z = F.relu(self.hidden2(z)) # activation function for second hidden layer z = self.out(z) # linear output return z
Im the first layer takes 50 as input because the batch size is 50, the output layer outputs 1 because im doing regression. The in-between I chose 25 because I read that a good number is the median between the input and output.
This problem came to be because I tried training the model with the data as is and I got this error:
mat1 and mat2 shapes cannot be multiplied (50x4 and 50x25)
I know that by changing the layers to output 4 it will be solved but Im wondering if its better to just reshape the data to (50, 25) instead.