Best loss function for blockmodeling task

Hi,
I’m looking for the best loss function to use for blockmodeling, where I need the trained output to be compared to an ideal blockmodel.

As a toy example, if my output is a 4 x 4 matrix (say, of decimals), e.g.

0.6868 0.4088 0.3600 0.5119
0.8085 0.8537 0.7839 0.8735
0.8971 0.9433 0.5701 0.6651
0.1267 0.7847 0.7456 0.5535
[torch.FloatTensor of size 4x4]

and I need to compare this to the following blockmodel:

1 1 0 0
1 1 0 0
0 0 1 1
0 0 1 1
[torch.FloatTensor of size 4x4]

What loss function would I use?

If there is no suitable loss function in PyTorch at present, what’s the name of a suitable loss function from other papers/sources/packages that I could explore and try to code into my own custom loss function in PyTorch?

Thank you for help.

I am not familiar with this problem, but why would it be bad to just use l1 or l2 loss?

I don’t know for sure that there is anything wrong with using L1 or L2 loss, but my feeling is that these functions will not capture the essential quality of a blockmodel where one is looking for, say, ‘complete’ blocks (i.e. all 1’s) or ‘null’ blocks (all 0’s).

I will do some more experimenting and research.

Thank you.

If you have an intuition about which results are better than which, you could try to come up with a bunch of examples and order them by goodness and then try to design a loss function that matches these intuitions. However, my experience is that this kind of methodology doesn’t always yield good results. For instance, in autoregressive models of audio, the best loss seems to be KL-divergence even though it doesn’t take into account how far off one is. I have experimented with losses that take into account how far off one is, such as Wasserstein and Cramer distance, but found them to yield worse results than simple KL. So I guess the best advice is just to experiment, and that seems to be what you already are planning to do