Dimension of the input of a Neural Network built with sequential

miraboreasu · May 15, 2022, 4:08pm

Hello,

I have built a NN like this

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc = nn.Sequential(
            nn.Linear(1, 10),
            nn.ReLU(),
            nn.Linear(10, 1),
            nn.ReLU(),
        )

    def forward(self, x):
        out = self.fc(x)
        return out

to call the forward, I do

x=torch.tensor(np.array([1, 2, 3]), dtype=torch.float)
net=Net()

since the neuron dimension in the module is 1x10 (please correct me if I am wrong), I use .view to reshape my data, now I have seen two examples,

one is

outputs=net(x.view(-1, 1))

another is

outputs=net(x.view(-1, 1, 1))

They seem to work fine

But I am confused, why they work both, can anyone help me?

Matias_Vasquez · May 15, 2022, 5:39pm

When using nn.Linear, the last dimension of your input has to match the first dimension of your linear layer.

This means that it does not matter how many dimensions there are before, and how big/small they are, the last one has to be equal to in_features from your nn.Linear.

In your example in_features=1. This means that any tensor with 1 in the last dimension will work.

# Example
in_features = 1
out_features = 10

linear = torch.nn.Linear(in_features, out_features)

inp = torch.rand(1, 2, 3, 4, in_features)
output = linear(inp)

print(output.shape)
# Output
# torch.Size([1, 2, 3, 4, 10])

You can play with this. You can change how many dimensions inp has. You can also change how big in_features is. The important thing is that it is the same value in the LAST dim for inp and FIRST for nn.Linear.

Hope this helps

(If you want to know why, it is because it is performing a matrix multiplication, therefore the inner dims must match. There is more information on the docs)

miraboreasu · May 16, 2022, 6:28pm

Thank you, I found that influences the training results. If I use outputs=net(x.view(-1, 1)), the result is
cf246ac25c73677ff454f3d1bf77685
If I use outputs=net(x.view(-1, 1, 1 )), the result is
(20220516132715 hosted at ImgBB — ImgBB)

Sorry I can only upload one picture, so I put it online

Do you know why?

Matias_Vasquez · May 17, 2022, 9:56am

We can take a look at a little example

a = torch.rand(3, 2)

If we print a we get something like this

print(a)
# tensor([[0.9699, 0.6693],
#         [0.1688, 0.0404],
#         [0.7409, 0.6294]])

If we print it like this, it will just flatten the tensor into a one dimensional vector.

print(a.view(-1))
# tensor([0.9699, 0.6693, 0.1688, 0.0404, 0.7409, 0.6294])

If we now use the first of your options, we see that we gained a dimension. Now, each value is inside its own vector.

print(a.view(-1, 1))
# tensor([[0.9699],
#         [0.6693],
#         [0.1688],
#         [0.0404],
#         [0.7409],
#         [0.6294]])

If we now do your second option we gain another dimension. The values are further isolated.

print(a.view(-1, 1, 1))
#tensor([[[0.9699]],
#
#        [[0.6693]],
#
#        [[0.1688]],
#
#        [[0.0404]],
#
#        [[0.7409]],
#
#        [[0.6294]]])

However, the values are always the same.

You are then feeding these values to a nn.Linear layer that might look like this.

But as we mentioned, the input values remain the same. We only added more “isolation” to the values, but they are both of the size 1. So, the same inputs are going to the same network.

So, my guess for this particular case is that this should not play that much of a role. It would be more significant if you played with the input size of your linear layer (which should match the last dimension of your input).

In order to be sure that one view performs better than the other you could take two different approaches.

Deterministic approach: You can make sure that all of the random variables have the same seed when trying both approaches to see if there is really a difference in training.
Here you can read more about it. For this simple approach I think it would be enough to set torch.manual_seed(0) before creating your net. This way the weights of your linear layer are the same for both cases.
Many repetitions: You could also just try a bunch of times with both views and see if you get significantly better results with one than with the other.

If the results do show that one view is better than the other, then I do not know what might be happening.

Please let me know what you find out.