I’m toying around with PyTorch and MNIST, trying to get a hang of the API. I want to create an MLP with one hidden layer. What should the dimensions of the modules be?
The input is a 784x1 vector, so I’d say two modules, hidden layer 781x100 (100 hidden nodes), output layer 100x10 (for classification). However, that gives “size mismatch, m1: [784 x 1], m2: [784 x 100] at /build/python-pytorch/src/”.
My code is below. I’m using the C++ API but this question should be answerable with those only familiar with the Python API as well. Thanks!
void TorchMLP::init(std::vector<size_t> nodes_per_layer) {
assert(nodes_per_layer.size() == 3);
h = register_module("h",torch::nn::Linear(nodes_per_layer[0], nodes_per_layer[1]));
out = register_module("out",torch::nn::Linear(nodes_per_layer[1],nodes_per_layer[2]));
optimizer = std::make_shared<torch::optim::SGD>(
parameters(), torch::optim::SGDOptions(d_learning_rate));
}
torch::Tensor TorchMLP::test(torch::Tensor x) {
x = torch::relu(h->forward(x));
x = torch::sigmoid(out->forward(x));
return x;
}
void TorchMLP::train(std::pair<torch::Tensor, torch::Tensor> example) {
optimizer->zero_grad();
auto [input, expected_output] = example;
auto output = test(input);
auto loss = torch::mse_loss(output, expected_output);
loss.backward();
optimizer->step();
}
(Just to be clear, the input has 784 features, the hidden layer
has 100 “neurons”, the output has 10 values – the logits for
the 10 classes, and the batch size is deduced from the shape
of the input tensor.)
I’m not familiar with the torch c++ api, so I can’t say where your
problem might lie, but, on the surface, it looks like you are doing
something sensible, and I don’t have any guess about your
particular “size mismatch, m1: [784 x 1], m2: [784 x 100]" error.
(It would help if you would tell us – or print out – the values in
your nodes_per_layer vector.)