I have a multidimensional dataset, (1827, 5).
From that dataset I exctract the variable I want to predict and im left with my X and y variables as such:
X has size (1827, 4) and y has size (1827, 4).
Then i further split them into train and test datasets giving 20% to the test dataset. now the shapes I have are these:
> print(X_train.shape)
> print(y_train.shape)
> print(X_test.shape)
> print(y_test.shape)
torch.Size([1461, 4])
torch.Size([1461])
torch.Size([366, 4])
torch.Size([366])
The batch size I have to use (I cannot change that) is 50.
My question is, when reshaping a tensor so as to fit my model, shouldnt I take the original dimensions into account ? By dividing the train dataset into batches of 50, I will surely put data that belongs to the same observation into different batches, simply because 50 is not divisible by 4 exactly.
My network architecture is this:
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.hidden1 = torch.nn.Linear(50, 25) # hidden layer
self.hidden2 = torch.nn.Linear(25, 25) # hidden layer
self.out = torch.nn.Linear(25, 1) # output layer
def forward(self, x):
z = F.relu(self.hidden1(x)) # activation function for first hidden layer
z = F.relu(self.hidden2(z)) # activation function for second hidden layer
z = self.out(z) # linear output
return z
Im the first layer takes 50 as input because the batch size is 50, the output layer outputs 1 because im doing regression. The in-between I chose 25 because I read that a good number is the median between the input and output.
This problem came to be because I tried training the model with the data as is and I got this error:
mat1 and mat2 shapes cannot be multiplied (50x4 and 50x25)
I know that by changing the layers to output 4 it will be solved but Im wondering if its better to just reshape the data to (50, 25) instead.