Vq-diffusion for face generation

Hello everyone,

this is the first time i using this forum, so sorry if i make some mistake.

I’m working on VQ-Diffusion i will appreciate some suggestion. The task is: use diffusion model to generate new faces. The input images have size (3x64x64), so i want to encode them to work on the latent space. My questions are: which are the best size and the number of feature maps i should give in input at the Unet used for the backward pass in the diffusion process? Is it useful to work on the latent space?

Thanks in advance.