In this link you can find an equivalent implementation of XOR for different dynamic NN frameworks (dynet, chainer, pytorch).
Although the implementation is parametrised the same for all frameworks, I’m finding dynet to converge to a solution in less iterations. I’m hoping someone can point out why.
Since the model is very small, I’d walk through it step by step with the same initial weight tensor and same data; each layer and the optimization step should produce exactly the same output between the three frameworks (but it looks like it won’t).