Diffusions for pix2pix translation

I am wondering if diffusion models are suitable for pix2pix or img2img transformation without text prompts.

What I am looking for is transforming a color image with 3 channels to a UAV infrared image look alike image, with 1 channel, taken by a Infrared sensor. CycleGAN does the exact job that I need, but diffusion models seem to produce much better images. So I am planning on training a diffusion model, CycleDiffusioin might be a great point to start I guess, but all the diffusion models seem to have something to do with texts.