Firstly I am not pretty sure bout your conclusion that “Pytorch architecture doesn’t use the whole train/test idea mentioned by Kingma” because pytorch version runs for 10 epochs and 128 batch size each and it use the complete dataset for 10 epochs.
Secondly the reparam trick should be use only during the training not during the test. As you can think our entire goal of AE or VAE is to learn the latent space which is a correct representation of our data distribution (clustering or keeping close the same classes together ) and once you finish the training you can sample anything from the latent space and you will get your good samples