Negative training loss

promach (buttercutter) January 17, 2024, 1:23am 1

I tried to implement Mamba: Linear-Time Sequence Modeling with Selective State Spaces, but I got some negative training loss , any ideas how to get around it ?