I am fine tuning a model for sound event detection, taken from https://github.com/qiuqiangkong/audioset_tagging_cnn, on the Urbansed dataset. In this task, the model should predict a [batch, n_classes, time_steps] matrix, with a value of one indicating the presence of an event at a certain time step.
However, my network does not seem to train. Specifically, after about the first 10 epochs, my loss stops decreasing. If i check the predictions of the model, the output is composed entirely of 0.5s.
I have tried:
- Changing the amount of l2 regularization, even turning it off completely
- Changing the learning rate
- Doing a mock training with only 2 samples to see if the network could learn the simple problem. The result was the same matrix of 0.5.
- Different optimizers (Adam and SGD so far)
- BCEWithLogitsLoss with reduction = mean and sum (using this loss as in theory multiple classes can be active at a time)
My loss and optimizer:
criterion = nn.BCEWithLogitsLoss(reduction='sum') optimizer = optim.SGD(model.parameters(), lr=0.001)
My training loop:
for i, data in enumerate(dataloader_train): inputs, labels = data inputs = inputs.type(torch.FloatTensor) optimizer.zero_grad() outputs = model(inputs).cpu() loss = 0 loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item()
I can’t figure out what’s wrong. Any thoughts?