I am new to pytorch, still trying to learn, I have created a many to one simple LSTM network , I am passing as an input a sequence of audio frames, and wanting to use last output state as the class probabilities vector ( basically binary audio classification )
I have extracted features from 2 wav files manually and stored in vectors, but I am having problem optimizing and backpropagating, I am always getting the same loss , I know number of examples for training are just 2, but shouldn’t loss be decreasing ?
Here is the relevant portion of the code
mfcc1 = audio_to_mfcc('/home/saurabh/Documents/audio_classification/data/lizzie.wav')
#print(mfcc1.shape)
mfcc2 = audio_to_mfcc('/home/saurabh/Documents/audio_classification/data/boy.wav')
#print(mfcc2.shape)
temp = mfcc1[ : , np.newaxis , :]
temp2= mfcc2[ : , np.newaxis , :]
#print(temp2.shape)
input_var = Variable(torch.Tensor(temp))
input2_var = Variable(torch.Tensor(temp2))
for epoch in range(num_epochs):
outputs = rnn(input_var)
outputs2= rnn(input2_var)
#print(outputs[999])
final_output = outputs[999]
final_output2= outputs2[998]
final_output_numpy=final_output.data.numpy()[np.newaxis,:]
final_output = torch.from_numpy(final_output_numpy)
final_output_numpy2=final_output2.data.numpy()[np.newaxis,:]
final_output2 = torch.from_numpy(final_output_numpy2)
#print(final_output_numpy.shape)
#print(final_output.size())
label = Variable(torch.LongTensor([0]))
label2 = Variable(torch.LongTensor([1]))
#print (label.size())
loss = criterion(Variable(final_output, requires_grad=True), label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(loss)
loss2=criterion(Variable(final_output2, requires_grad=True), label2)
optimizer.zero_grad()
loss2.backward()
optimizer.step()
print(loss2)
and here is the output
torch.Size([1000, 1, 60])
torch.Size([999, 1, 60])
Variable containing:
0.6557
[torch.FloatTensor of size 1]
Variable containing:
0.7708
[torch.FloatTensor of size 1]
torch.Size([1000, 1, 60])
torch.Size([999, 1, 60])
Variable containing:
0.6557
[torch.FloatTensor of size 1]
Variable containing:
0.7708
[torch.FloatTensor of size 1]