Hello everyone,
I tried to reproduce the results in streaming ASR (Must-C), which shows the WER should be:
WER | |
---|---|
dev | 0.190 |
tst-COMMON | 0.213 |
tst-HE | 0.186 |
However, following the training and eval scripts listed in the link above (except the slurm), I got the loss becomes inf
from the second epoch:
Epoch 2, global step 10770: ‘Losses/val_loss’ reached inf (best 72.69429), saving model to ‘pytorch/audio/examples/asr/emformer_rnnt/experiments_gradClip5.0/checkpoints/epoch=2-step=10770.ckpt’ as top 5
Epoch 2, global step 10770: ‘Losses/train_loss’ reached inf (best 169.92964), saving model to ‘pytorch/audio/examples/asr/emformer_rnnt/experiments_gradClip5.0/checkpoints/epoch=2-step=10770-v1.ckpt’ as top 5
Later, I removed --gradient-clip-val 5.0
from the training script, and get a very large WER:
WER | |
---|---|
dev | 0.385 |
tst-COMMON | 0.389 |
tst-HE | 0.352 |
The training log this time shows the model loss becomes nan
from the 8-th epoch, but the streaming ASR (Must-C) link above shows the model can be converged in the 55-th epoch (as it uses epoch=55-step=106679.ckpt
for eval).
Quite confused about the nan
and inf
of loss. Can someone help to show me the correct script or environment? Thank you very much.
================================================
My env:
GPU V100, cuda 10.2, 8 gpus and 1 node in training
python 3.7.6
torch ‘1.11.0+cu102’
torchaudio ‘0.11.0+cu102’
my training script:
$PYTHON global_stats.py --model-type mustc --dataset-path $MUSTC_DATA
CUDA_VISIBLE_DEVICES=${CUDA_CARD} \
$PYTHON train.py \
--model-type mustc \
--exp-dir ./experiments \
--dataset-path $MUSTC_DATA \
--num-nodes 1 \
--gpus 8 \
--global-stats-path ./global_stats.json \
--sp-model-path ./spm_bpe_500.model \
--debug
the MUST-C dataset I used:
$ wc -l *_asr.tsv
1419 dev_asr.tsv
225278 train_asr.tsv
2588 tst-COMMON_asr.tsv
601 tst-HE_asr.tsv
229886 total
Others:
To avoid GPU OOM, I changed the max_token_limit
from 100 to 50 in lightning.py:
dataset = CustomDataset(MUSTC(self.mustc_path, subset="train"), 50, 20)
Waiting for your kind help, thank you.