Sorry for the noob question.
I am doing a multi-label classification for audio data. My input data is features from audio frames. The label classes are 34 phonemes, and each class has many labels (descriptors of that particular phoneme/class). The total number of descriptors are 10. The classes and labels look like this:
voiced b, d, g, j, l , v, z, Z, unvoiced f, k, p, s, t labial p, b, m, f, v fricative f, s, S, x, (...10)
My understanding is that this is a multi-label classification and I need to map each class/phoneme to its lables/descriptors., so that I have a matrix of (classes, labels) where each class is a row that contain 1 where label is present and 0 where label is not present.
Internet posts suggest this could be done with
but I would like to know if there is a more proper pytorch way to create the mapping matrix, and if it’s even the best way to go for this kind of task.
Also, what would the input shape look like.