RNN on a Spectrogram - Loss Isn't Going Down

I’m having issues training an RNN on some spectograms on 1 second audio clips. The spectrograms are of shape 271 * 781 * 4, with the 781 being the width (So the time stamp). I turned this to 271 * 780 * 4 so I can divide each data into groups of 4 to feed into the RNN. Currently, I’m having issues getting loss to go down. I’ve debugged an issue where my data wasn’t being broken into groups of 4 properly (it wasn’t being divided by time step), I’ve fixed that so I’m certain that is no longer the issue.

I’m thinking that there are probably issues with my RNN architecture code and my forward, but I"m not too sure how to figure it out. I’ve googled around and haven’t gotten too far (my RNN knowledge in the practical sense isn’t great, but I understand the idea of feeding the hidden output from the previous layer into the next layer in conjunction with the next ‘section’ of the data).

Here is my Model Architecture

I made this account so I can only upload 1 image, will try to upload the other code in the comments


Training Code

Hi @jhan6501, I’m not sure how your data looks like. Maybe you can try to add a sigmoid layer after fc. This is a common setup in binary classification task.

nn.CrossEntropyLoss expects raw logits, so the sigmoid usage of an output of [batch_size, 8] would be wrong.

@jhan6501 I would generally recommend to try to overfit a small dataset first (e.g. just 10 samples) and make sure your model is able to do so. Once this works, you could try to scale up the use case again.

Also, remove the Variable wrappers, since they are deprecated since PyTorch 0.4.

PS: you can post code snippets by wrapping them into three backticks ``` :wink:

1 Like