Hi Everybody,
I’m training a net to detect what pitch-classes are playing in music. I’ve trained a net on the large Maestro dataset which has classical music along with their scores. Taking a look at the predictions the net made, I’m pretty disappointed! Instead of showing around 1 for pitch classes that are playing, and 0 for pitch classes that are not, it’s showing about 0.76 for every pitch class. Kind of like the net is saying “maybe?” to every classification. Below is a sample of the prediction, which should be detecting the presence of 3 pitch classes (and the absence of the other 9.) Could this be a symptom of the net getting stuck on a local minimum? How can I improve things?
The optimizer is SGD, the loss function is BCEWithLogitsLoss, and the model’s code is below, under the output of predictions.
[-0.7595, -0.7598, -0.7598, -0.7599, -0.7597, -0.7598, -0.7602, -0.7596,
-0.7596, -0.7594, -0.7601, -0.7602],
[-0.7596, -0.7599, -0.7599, -0.7600, -0.7598, -0.7599, -0.7603, -0.7597,
-0.7598, -0.7596, -0.7602, -0.7603],
[-0.7594, -0.7596, -0.7596, -0.7599, -0.7595, -0.7598, -0.7601, -0.7594,
-0.7595, -0.7593, -0.7600, -0.7601],
[-0.7596, -0.7598, -0.7598, -0.7600, -0.7597, -0.7599, -0.7603, -0.7596,
-0.7597, -0.7595, -0.7602, -0.7603],
[-0.7597, -0.7600, -0.7599, -0.7600, -0.7599, -0.7599, -0.7603, -0.7598,
-0.7598, -0.7596, -0.7602, -0.7603],
class AudioNet1(nn.Module):
def __init__(self, input_size=256,
h1_nodes=256,
h2_nodes=128,
h3_nodes=64,
output_size=12,
device='cpu'):
super(AudioNet1, self).__init__()
self.inputLayer = nn.Linear(input_size, h1_nodes).to(device)
self.bn1 = nn.BatchNorm1d(h1_nodes).to(device)
self.do1 = nn.Dropout(p=0.2).to(device)
self.hiddenOne = nn.Linear(h1_nodes, h2_nodes).to(device)
self.bn2 = nn.BatchNorm1d(h2_nodes).to(device)
self.do2 = nn.Dropout(p=0.2).to(device)
self.hiddenTwo = nn.Linear(h2_nodes, h3_nodes).to(device)
self.bn3 = nn.BatchNorm1d(h3_nodes).to(device)
self.do3 = nn.Dropout(p=0.2).to(device)
self.hiddenThree = nn.Linear(h3_nodes, output_size).to(device)
self.bn4 = nn.BatchNorm1d(output_size).to(device)
self.do4 = nn.Dropout(p=0.2).to(device)
self.lstm = nn.LSTM(input_size=output_size, hidden_size=12, num_layers=2)
def forward(self, x):
x = F.leaky_relu(self.inputLayer(x).float())
x = self.bn1(x)
x = self.do1(x)
x = F.leaky_relu(self.hiddenOne(x))
x = self.bn2(x)
x = self.do2(x)
x = F.leaky_relu(self.hiddenTwo(x))
x = self.bn3(x)
x = self.do3(x)
x = F.leaky_relu(self.hiddenThree(x))
x = self.bn4(x)
x = self.do4(x)
bs, n = x.shape
x, _ = self.lstm(x.view(1, bs, n))
x = torch.squeeze(x, 0)
return x