Cross Entropy Loss vs. Mutual Information and Generalized Mutual Information

I am working on a project with binary inputs and outputs and want to apply a loss function. in similar works cross entropy and mutual information and generalized mutual information are considered as cost function. (MI and GMI are not loss functions and I think some changes are applied before use). I want to know the mathematical difference between these three and how to use them as cost function in pytorch.