Hello! For simplicity, let’s focus on the “1- without weight” version. Even though the target is sparse, you shouldn’t be seeing predictions all go to zero. Can you share an executable snippet of code that has this issue, just so we can double-check that you’re not doing anything unusual in the model? For example, a common mistake is to pass the output through your own Sigmoid layer, which you don’t need to do since your loss function does it for you already.