Training Data for Denoising Diffusion Probabilistic Model

Dear Community,

I am currently training a denoising diffusion probabilistic model (DDPM) but the results I obtain differ from the training data. Each training image is divided into four quadrants, where each quadrant depicts an independent image (each image has a white frame). In other words I arranged four images in a 2x2 grid.

The neural network has 250 million trainable parameters and I have 40 000 images for training (I can obtain more if I need to, but this is a time consuming task).

My questions:

  • How many images do I need to train the DDPM with 250 million trainable parameters? (I somewhere read of a 10:1 or at least 1:1 ratio of training data to trainable parameters.)
  • How can I imagen the influence of training data to accuracy of the model? Does each training data point increase the accuracy of the model in the same way (let’s say 1 data set increases accuracy by 0.1%(just to name some random number)) or is the relationship between training data and accuracy non linear (e.g. an S-curve if I plot the accuracy over the training data points)?

Typical initial training of DDPMs take A LOT of compute for good results and tons of data, like LAION-5B dataset(5.85 billion images).

You might be better off starting with a pre-trained open source model downloaded fromHuggingface and then just train LoRA weights over top of it, for your use case.

Just by way of example, Stable Diffusion 2.0 took 79,000 A100 hours to train. That’s an 860 million parameter U-Net and 123 million param encoder.

But training LoRA weights overtop something is very quick and can be done on a 24GB GPU in less than a day.