In this link you can find an equivalent implementation of XOR for different dynamic NN frameworks (dynet, chainer, pytorch).
Although the implementation is parametrised the same for all frameworks, I’m finding dynet to converge to a solution in less iterations. I’m hoping someone can point out why.
Since the model is very small, I’d walk through it step by step with the same initial weight tensor and same data; each layer and the optimization step should produce exactly the same output between the three frameworks (but it looks like it won’t).
Ah, I was hoping not to have to do that
I’m guessing there is no regularisation/normalisation applied in pyTorch by default.
no, no regularization by default, you can configure it in the optimizer options.
Found that the hidden layer had tanh activation for dynet. Sorry for the mistake, now all frameworks learn at the same rate.