A problem with a multilabel classification and custom function

I’m looking for some small ideas/advice about how I could proceed with the problem I have.

I’m having difficulties creating a custom loss function for the problem I’m trying to resolve. My inputs are images of the kind (A.).
They seem to be a total mess but it’s possible to get information from them, certain classes, where outputs look as follows (B.), so ones in the indices of the certain class.

In the prove of concept I was using BinaryCrossentropy loss function. Its results were fine, (C.)
(left: the results of the BC, right: expected output)

But using BC for the problem has a big issue: it shows the accuracy of, let’s say, 99% but actually it can be just 60%. To calculate correct accuracy I was using NumPy cosine similarity function.

The problem has now evolved and I BC is not enough anymore so I have to create a custom loss function. I tried using CosineSimilarity inside a custom loss function but it is not enough and, actually, it gives very bad results for some reason.

Could you suggest an approach that could help to achieve the outputs I need, that’s sharp, narrow, high spikes?

Could you explain this issue a bit more?
Do you think the accuracy calculation is wrong or could there be some bugs in the output computation?

Explanation: The loss of a model goes to very small numbers and the accuracy to the very high ones very fast, within a few epochs. But these numbers are illusory; the results, in reality, are far from perfect.

Because of that, I added another level of calculating accuracy. Instead of relying on the model one and I calculate it with NumPy cosine similarity function. In reality, the accuracy after a few epochs is just 30-60% using the above function, not 95%>. In the previous version of the problem I, fortunately, was able to achieve 98% accuracy using cosine similarity after ~25 epochs.


Additional comment on the matter: I’m nooby in terms of ML. The newest problem I am trying to solve is far more complex than the previous one. It pushes me to create a custom loss function (LF). I think it should behave similarly to the binary cross-entropy LF as it so far did but, at the same time, it should somewhat be similar to cosine similarity to further punish the model for worse accuracy and to eliminate the necessity of calculating NumPy cosine similarity. Unfortunately, calculating them both together in one LF (with sums, multiplications, substractions) doesn’t provide good results, as I am not able to achieve even ~75% of the model’s accuracy.

I am not sure what direction I should go anymore. I think I am lacking some basic knowledge of how loss functions operate, how to point a model to the direction I want it to go rather than it, the problem, being a computational one.
I’ll be grateful for any advice.