MLP with one hidden layer, dimensions modules?

Bart_Louwers · June 24, 2019, 3:38pm

I’m toying around with PyTorch and MNIST, trying to get a hang of the API. I want to create an MLP with one hidden layer. What should the dimensions of the modules be?

The input is a 784x1 vector, so I’d say two modules, hidden layer 781x100 (100 hidden nodes), output layer 100x10 (for classification). However, that gives “size mismatch, m1: [784 x 1], m2: [784 x 100] at /build/python-pytorch/src/”.

My code is below. I’m using the C++ API but this question should be answerable with those only familiar with the Python API as well. Thanks!

void TorchMLP::init(std::vector<size_t> nodes_per_layer) {
    assert(nodes_per_layer.size() == 3);
    h = register_module("h",torch::nn::Linear(nodes_per_layer[0], nodes_per_layer[1]));
    out = register_module("out",torch::nn::Linear(nodes_per_layer[1],nodes_per_layer[2]));
    optimizer = std::make_shared<torch::optim::SGD>(
        parameters(), torch::optim::SGDOptions(d_learning_rate));
}

torch::Tensor TorchMLP::test(torch::Tensor x) {
    x = torch::relu(h->forward(x));
    x = torch::sigmoid(out->forward(x));
    return x;
}

void TorchMLP::train(std::pair<torch::Tensor, torch::Tensor> example) {
    optimizer->zero_grad();

    auto [input, expected_output] = example;
    auto output = test(input);
    auto loss = torch::mse_loss(output, expected_output);
    loss.backward();
    optimizer->step();
}

KFrank · June 24, 2019, 5:08pm

Hi Bart!

Bart_Louwers:

I’m toying around with PyTorch and MNIST, trying to get a hang of the API. I want to create an MLP with one hidden layer. What should the dimensions of the modules be?

The input is a 784x1 vector, so I’d say two modules, hidden layer 781x100 (100 hidden nodes), output layer 100x10 (for classification). However, that gives “size mismatch, m1: [784 x 1], m2: [784 x 100] at /build/python-pytorch/src/”.

My code is below. I’m using the C++ API but this question should be answerable with those only familiar with the Python API as well. Thanks!
void TorchMLP::init(std::vector<size_t> nodes_per_layer) {
    assert(nodes_per_layer.size() == 3);
    h = register_module("h",torch::nn::Linear(nodes_per_layer[0], nodes_per_layer[1]));
    out = register_module("out",torch::nn::Linear(nodes_per_layer[1],nodes_per_layer[2]));
    optimizer = std::make_shared<torch::optim::SGD>(
        parameters(), torch::optim::SGDOptions(d_learning_rate));
}
...

With the pytorch api (i.e., not c++), I think the following does what
you want:

>>> import torch
>>> torch.nn.Sequential (
...     torch.nn.Linear (784, 100),
...     torch.nn.ReLU(),
...     torch.nn.Linear (100, 10)
... )
Sequential(
  (0): Linear(in_features=784, out_features=100, bias=True)
  (1): ReLU()
  (2): Linear(in_features=100, out_features=10, bias=True)
)

(Just to be clear, the input has 784 features, the hidden layer
has 100 “neurons”, the output has 10 values – the logits for
the 10 classes, and the batch size is deduced from the shape
of the input tensor.)

I’m not familiar with the torch c++ api, so I can’t say where your
problem might lie, but, on the surface, it looks like you are doing
something sensible, and I don’t have any guess about your
particular “size mismatch, m1: [784 x 1], m2: [784 x 100]" error.

(It would help if you would tell us – or print out – the values in
your nodes_per_layer vector.)

Good luck.

K. Frank

Bart_Louwers · June 24, 2019, 5:27pm

Hey @KFrank My problem was that my ‘batch’ wasn’t really a batch (it was a single training example), as I’m doing online learning.

The following seemed to have fixed that, (although I’m not exactly clear on the usage of view and unqueeze yet):

torch::Tensor TorchMLP::test(torch::Tensor x) {
    x = x.view(784);
    x = torch::relu(h->forward(x));
    x = torch::sigmoid(out->forward(x));
    return x.unsqueeze(-1);
}

But I got that very unhelpful error, so I’ll file a bug report for it. Thanks for your help.