I’m trying to implement a music transcription system. Each of the labels has a shape of [88 x num_frames]. Each element represents the activation of the corresponding piano key on the corresponding frame. So the problem can be considered a binary classification problem. For making a batch each label gets pads with zeros so all of them have some number of frames.
Now my question is how can I ignore these padded values in the loss function? Also as the data is heavily unbalanced how can I use class weights for computing the loss?
I prefer to use binary cross entropy as the loss function.
So, using this, you could weight the loss contribution of each frame
separately, and, in particular, give the padding frames a weight of zero.
You can also use the
weight argument to reweight your unbalanced
data, at the granularity that is the most logical for your use case.
I imagine that you wouldn’t want to reweight individual notes (but you
could). Maybe it would make sense to reweight the combination of
notes that appear together in a given frame. Anyway, for each batch,
you would go through the batch labels, sample by sample, and create
the per-sample, per-frame, per-note weights, where, again, I imagine
that the per-note weights are all equal within a given frame.
(Also, you will presumably prefer to use the “logits” version,