I created one post for a similar question here,
see if it helps, regarding math.sqrt(self.ninp)
in the paper https://arxiv.org/pdf/1706.03762.pdf, they use a scaling factor of math.sqrt(self.ninp)
, maybe cancelling this scaling factor (or not considering to divide by math.sqrt(self.ninp)
), gives a better accuracy in the tutorial.