Hello everyone,
I’m currently working on a deep neural network that tries to locate musical onsets in audio samples. It returns a 1D tensor containing the probability of an onset at each timestamp, from 0 to 1. It’s working pretty well, but sometimes it returns a double onset where there should only be one.
(The double peak on the left should only be a single peak)
While increased training remedies the problem somewhat, it seems to be a natural consequence of the model I’m using. To fix this, I believe that I need to apply a hamming/hanning window across the outputs to smooth out these double peaks, like what was done in this paper: Dance Dance Convolution (arxiv.org).
How would I do this? I have read the PyTorch docs on the hamming window function (torch.hamming_window — PyTorch 1.13 documentation), but for some reason the explanation there isn’t very enlightening for me. Any help is much appreciated.
Thank you for your time,
-BanBot2