How does distributions.entropy() work?


ret_attn_softmax is a [256, 2] vector representing 256 Bernoullii distributions.

If I stick it in the above code, it outputs something, but it doesn’t look correct at all for the entropy. Shouldn’t the entropy have a dimensional input so the code knows which rows correspond to distributions?