1x1 convolution in PyTorch

Suppose you have a CNN to deal with images, and your input images can be 1 (grey), 3(color) or 4 channels (color + transparency).

Would a 1x1 convolution and (maybe with 3 filters) in a first layer, be able to handle those cases, and make the output channels after the convolution always the same (3 in this case), so that we do not have to deal with images transformations elsewhere?

Could you explain the idea a bit more? Why should the conv layer create the same outputs and why would you want that?

Yes, I was reading this paper https://arxiv.org/pdf/2105.05787.pdf, and I thought that the convolutions of 1x1 could help to deal with images with different number of channels, but i must be wrong, because my model includes that the depth will always be the number of channels of the image.