Is the gradient output by .reinforce() normalized by batch size?
No. It is not (cf. What is action.reinforce(r) doing actually?)