RuntimeError: value cannot be converted to type float without overflow: (-2.42042e-22,3.95285e-06)

amittiwari · October 29, 2019, 9:13am

I am training speech to text model on OpenNMT-py.
I used MFCC algo at preprocess level. But unable to start training.
python3 train.py -model_type audio -enc_rnn_size 1024 -dec_rnn_size 1024 -audio_enc_pooling 1,1,1,2,2,2 -dropout 0.1 -enc_layers 6 -dec_layers 4 -rnn_type LSTM -data data/speech/demofiles-vctk-mfcc/demo -save_model models/exp-vctk-mfcc/demo-model-vctk-mfcc -global_attention mlp -batch_size 6 -optim sgd -max_grad_norm 100 -decay_method noam -train_steps 10000 -encoder_type brnn -decoder_type rnn -bridge -window_size 0.025 -image_channel_size 1
After running this at time of data loading,
[2019-10-29 14:42:49,730 INFO] * tgt vocab size = 9386
[2019-10-29 14:42:49,732 INFO] Building model…
/home/amit/.local/lib/python3.5/site-packages/torch/nn/modules/rnn.py:51: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1
“num_layers={}”.format(dropout, num_layers))
[2019-10-29 14:42:50,667 INFO] NMTModel(
(encoder): AudioEncoder(
(dropout): Dropout(p=0.1, inplace=False)
(W): Linear(in_features=1024, out_features=1024, bias=False)
(batchnorm_0): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_0): LSTM(26, 512, dropout=0.1, bidirectional=True)
(pool_0): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
(rnn_1): LSTM(1024, 512, dropout=0.1, bidirectional=True)
(pool_1): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
(batchnorm_1): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_2): LSTM(1024, 512, dropout=0.1, bidirectional=True)
(pool_2): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
(batchnorm_2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_3): LSTM(1024, 512, dropout=0.1, bidirectional=True)
(pool_3): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(batchnorm_3): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_4): LSTM(1024, 512, dropout=0.1, bidirectional=True)
(pool_4): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(batchnorm_4): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_5): LSTM(1024, 512, dropout=0.1, bidirectional=True)
(pool_5): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(batchnorm_5): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(decoder): InputFeedRNNDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(9386, 500, padding_idx=1)
)
)
)
(dropout): Dropout(p=0.1, inplace=False)
(rnn): StackedLSTM(
(dropout): Dropout(p=0.1, inplace=False)
(layers): ModuleList(
(0): LSTMCell(1524, 1024)
(1): LSTMCell(1024, 1024)
(2): LSTMCell(1024, 1024)
(3): LSTMCell(1024, 1024)
)
)
(attn): GlobalAttention(
(linear_context): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=True)
(v): Linear(in_features=1024, out_features=1, bias=False)
(linear_out): Linear(in_features=2048, out_features=1024, bias=True)
)
)
(generator): Sequential(
(0): Linear(in_features=1024, out_features=9386, bias=True)
(1): Cast()
(2): LogSoftmax()
)
)
[2019-10-29 14:42:50,668 INFO] encoder: 34770944
[2019-10-29 14:42:50,668 INFO] decoder: 54146226
[2019-10-29 14:42:50,668 INFO] * number of parameters: 88917170
[2019-10-29 14:42:50,669 INFO] Starting training on CPU, could be very slow
[2019-10-29 14:42:50,669 INFO] Start training loop and validate every 10000 steps…
[2019-10-29 14:42:50,669 INFO] Loading dataset from data/speech/demofiles-vctk-mfcc/demo.train.0.pt
[2019-10-29 14:42:50,923 INFO] number of examples: 5000
Traceback (most recent call last):
File “train.py”, line 200, in
main(opt)
File “train.py”, line 88, in main
single_main(opt, -1)
File “/home/amit/Desktop/amit/OpenNMT-py/onmt/train_single.py”, line 143, in main
valid_steps=opt.valid_steps)
File “/home/amit/Desktop/amit/OpenNMT-py/onmt/trainer.py”, line 243, in train
report_stats)
File “/home/amit/Desktop/amit/OpenNMT-py/onmt/trainer.py”, line 392, in _gradient_accumulation
self.optim.step()
File “/home/amit/Desktop/amit/OpenNMT-py/onmt/utils/optimizers.py”, line 360, in step
self.optimizer.step()
File “/home/amit/.local/lib/python3.5/site-packages/torch/optim/sgd.py”, line 106, in step
p.data.add(-group[‘lr’], d_p)
RuntimeError: value cannot be converted to type float without overflow: (-2.42042e-22,3.95285e-06)

albanD · October 29, 2019, 2:16pm

Hi,

Have you seen this warning in your code

/home/amit/.local/lib/python3.5/site-packages/torch/nn/modules/rnn.py:51: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1

?

Also what is the value of your learning rate?

amittiwari · October 31, 2019, 5:38am

Hi,
it was default as set in OpenNMT-py.
lr=1

amittiwari · October 31, 2019, 6:25am

I have increased the learning rate to 10.
Still not working.

albanD · October 31, 2019, 2:42pm

This warning does not talk about the learning rate. It is a problem with the use of dropout for an rnn with a single layer.

amittiwari · November 21, 2019, 5:51am

I have tried it. But same result.
When I am using decay method “rsqrt” it works fine.
but “noam” as decay method it is not working.

albanD · November 21, 2019, 2:48pm

I’m not sure to understand what these arguments mean. IT depends on your code.

Yo can try to flush denormals to avoid such large small numbers. It might help if you
re doing compute on the CPU.

amittiwari · November 22, 2019, 5:59am

This problem occurred on GPU .

albanD · November 22, 2019, 2:47pm

@ngimel do you know where this error comes from?

amittiwari · November 27, 2019, 9:17am

error comes while calculating loss and step size of gradient.