What should the shape of weights be here?

I have a fully connected neural network with 78 input features and two hidden layers with 50 nodes each. Output with 7 nodes.

 self.hidden_layer_1 = torch.nn.Linear(78,50)
 self.hidden_layer_2 = torch.nn.Linear(50,50)
 self.output = torch.nn.Linear(50, 7)
def forward(self, input):
        weights = np.load('./weights.npy')
        weight = torch.from_numpy(weights)
        self.hidden_layer_1.weight = torch.nn.Parameter(weight)
        H1 = self.hidden_layer_1(input)
        H1 = self.ReLU(H1)
        H2 = self.hidden_layer_2(H1)
        H2 = self.ReLU(H2)
        final_inputs = self.output(H2)
        # not applying activation on final_inputs because CrossEntropyLoss does that
        return final_inputs

I’m initializing weights for the first layer and that’s giving me an error. The shape of the weights is (50,7).
Error - return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x78 and 7x50)

Batch size is 1024, so that’s where the 1024 value comes from.

I am a bit confused what to do. What should the shape of my weights be so that I don’t get this error?
I get these weights from running an unsupervised layer which is Restricted Boltzmann Machine.

Hi jpj!

This is fine. At this point self.hidden_layer_1.weight is a tensor with
shape [50, 78]. (The order of the dimensions is correct because of
how pytorch multiplies batch tensors with weight tensors.)

This is your problem, You are replacing hidden_layer_1.weight, whose
shape is consistent with the number of features in your input batch, with
a different tensor with a shape, [50, 7], that is not consistent with your
input batch.

Yes. Presumably the file weights.npy contains a (numpy) tensor of
shape [50, 7]. The question is: Is this what you want? The numpy
tensor would be appropriate if your input had 7 features (rather than 78).

If your input has 78 features, hidden_layer_1.weight should have
shape [50, 78] – just as it does before you replace the weight tensor.

Why the mismatch in the number of features? Is your Restricted
Boltzmann Machine processing data with 78 features or 7?


K. Frank

1 Like