I have installed the ZipVoice toolkit using the following set of commands:-
1.conda create -n zip python=3.13
2.git clone https://github.com/k2-fsa/ZipVoice.git
3.pip install -r requirements_m.txt
requirements_m.txt contains the following:-
#Use the PyTorch CUDA wheel index (CUDA 12.8)
–extra-index-url https://download.pytorch.org/whl/cu128
#k2 and phonemizer wheels
–find-links https://k2-fsa.github.io/icefall/piper_phonemize.html
-f https://k2-fsa.github.io/k2/cuda.html
#Core PyTorch stack (torch 2.7.1 + CUDA 12.8)
torch==2.7.1+cu128
torchaudio==2.7.1+cu128
torchvision==0.22.1+cu128
#k2 build for torch 2.7.1 + CUDA 12.8 + Python 3.13
k2==1.24.4.dev20250715+cuda12.8.torch2.7.1
#Other dependencies
numpy
lhotse
huggingface_hub
safetensors
tensorboard
vocos
pydub
#Normalization
cn2an
inflect
#Tokenization
jieba
piper_phonemize
pypinyin
#Compatibility fixer
setuptools<81
#pip install audioop-lts
I am trying to train ZipVoice model using run_custom.sh script.
It is showing the following error:-
Stage 1: Prepare manifests for custom dataset from tsv files
2025-12-09 12:16:57,382 INFO [prepare_dataset.py:185] Preparing custom dataset train subset.
2025-12-09 12:16:57,382 INFO [prepare_dataset.py:190] custom_cuts_train.jsonl.gz exists, skipping.
2025-12-09 12:16:59,029 INFO [prepare_dataset.py:185] Preparing custom dataset dev subset.
2025-12-09 12:16:59,030 INFO [prepare_dataset.py:190] custom_cuts_dev.jsonl.gz exists, skipping.
Stage 2: Compute Fbank for custom dataset
/home/ubuntu/anaconda3/envs/zip/lib/python3.13/site-packages/lhotse/audio/utils.py:101: UserWarning: The audio duration mismatch tolerance has been set to a value lower than default (0.5s). We don’t recommend this as it might break some data augmentation transforms.
warnings.warn(
2025-12-09 12:17:00,828 INFO [compute_fbank.py:267] {‘sampling_rate’: 24000, ‘type’: ‘vocos’, ‘dataset’: ‘custom’, ‘subset’: ‘train’, ‘source_dir’: ‘data/manifests’, ‘dest_dir’: ‘data/fbank’, ‘split_cuts’: False, ‘split_begin’: None, ‘split_end’: None, ‘batch_duration’: 1000, ‘num_jobs’: 20}
2025-12-09 12:17:00,828 INFO [compute_fbank.py:210] Computing features for custom dataset train subset
2025-12-09 12:17:00,829 INFO [compute_fbank.py:226] Loading manifests data/manifests/custom_cuts_train.jsonl.gz
2025-12-09 12:17:00,834 INFO [compute_fbank.py:248] custom train already exists - skipping.
2025-12-09 12:17:00,834 INFO [compute_fbank.py:272] Done!
/home/ubuntu/anaconda3/envs/zip/lib/python3.13/site-packages/lhotse/audio/utils.py:101: UserWarning: The audio duration mismatch tolerance has been set to a value lower than default (0.5s). We don’t recommend this as it might break some data augmentation transforms.
warnings.warn(
2025-12-09 12:17:02,638 INFO [compute_fbank.py:267] {‘sampling_rate’: 24000, ‘type’: ‘vocos’, ‘dataset’: ‘custom’, ‘subset’: ‘dev’, ‘source_dir’: ‘data/manifests’, ‘dest_dir’: ‘data/fbank’, ‘split_cuts’: False, ‘split_begin’: None, ‘split_end’: None, ‘batch_duration’: 1000, ‘num_jobs’: 20}
2025-12-09 12:17:02,638 INFO [compute_fbank.py:210] Computing features for custom dataset dev subset
2025-12-09 12:17:02,638 INFO [compute_fbank.py:226] Loading manifests data/manifests/custom_cuts_dev.jsonl.gz
2025-12-09 12:17:02,644 INFO [compute_fbank.py:248] custom dev already exists - skipping.
2025-12-09 12:17:02,644 INFO [compute_fbank.py:272] Done!
Stage 3: Prepare tokens file for custom dataset
Stage 4: Train the ZipVoice model
[I1209 12:18:13.077337241 socket.cpp:946] [c10d] The client socket has connected to [localhost]:12356 on SocketImpl(fd=56, addr=[localhost]:42764, remote=[localhost]:12356).
[I1209 12:18:13.084397209 socket.cpp:946] [c10d] The client socket has connected to [localhost]:12356 on SocketImpl(fd=48, addr=[localhost]:42770, remote=[localhost]:12356).
[I1209 12:18:13.085364205 ProcessGroupNCCL.cpp:978] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: -2, PG Name: 0
[I1209 12:18:13.085393416 ProcessGroupNCCL.cpp:987] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.26.2, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 1, TORCH_NCCL_PROPAGATE_ERROR: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 2000, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 1, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
[I1209 12:18:13.482135435 socket.cpp:946] [c10d] The client socket has connected to [localhost]:12356 on SocketImpl(fd=49, addr=[localhost]:42786, remote=[localhost]:12356).
[I1209 12:18:13.483516486 ProcessGroupNCCL.cpp:978] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: -2, PG Name: 0
[I1209 12:18:13.483548755 ProcessGroupNCCL.cpp:987] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.26.2, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 1, TORCH_NCCL_PROPAGATE_ERROR: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 2000, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 1, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
[I1209 12:18:13.663145693 socket.cpp:946] [c10d] The client socket has connected to [localhost]:12356 on SocketImpl(fd=48, addr=[localhost]:42788, remote=[localhost]:12356).
[I1209 12:18:13.664427385 ProcessGroupNCCL.cpp:978] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: -2, PG Name: 0
[I1209 12:18:13.664456931 ProcessGroupNCCL.cpp:987] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.26.2, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 1, TORCH_NCCL_PROPAGATE_ERROR: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 2000, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 1, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
[I1209 12:18:13.669220685 ProcessGroupNCCL.cpp:978] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: -2, PG Name: 0
[I1209 12:18:13.669260360 ProcessGroupNCCL.cpp:987] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.26.2, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 1, TORCH_NCCL_PROPAGATE_ERROR: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 2000, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 1, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
2025-12-09 12:18:14,442 INFO [train_zipvoice.py:901] (0/4) Device: cuda:0
2025-12-09 12:18:14,444 INFO [train_zipvoice.py:916] (0/4) {
“average_period”: 200,
“base_lr”: 0.02,
“batch_idx_train”: 0,
“best_train_epoch”: -1,
“best_train_loss”: Infinity,
“best_valid_epoch”: -1,
“best_valid_loss”: Infinity,
“bucketing_sampler”: true,
“checkpoint”: null,
“condition_drop_ratio”: 0.2,
“dataset”: “custom”,
“dev_manifest”: “data/fbank/custom_cuts_dev.jsonl.gz”,
“device”: “cuda:0”,
“drop_last”: true,
“env_info”: {
“IP address”: “10.144.144.61”,
“hostname”: “ioworker-h200x8-48-1”,
“python-version”: “3.13”,
“torch-cuda-available”: true,
“torch-cuda-version”: “12.8”,
“torch-version”: “2.7.1+cu128”,
“zipvoice-git-branch”: “master”,
“zipvoice-git-date”: “Tue Dec 2 08:58:26 2025”,
“zipvoice-git-sha1”: “2f7326f-dirty”,
“zipvoice-path”: “/data0/Sougata/TTS/zipvoice_trial/ZipVoice/zipvoice”
},
“exp_dir”: “exp/zipvoice_custom”,
“feat_dim”: 100,
“feat_scale”: 0.1,
“finetune”: false,
“fm_decoder_cnn_module_kernel”: [
31,
15,
7,
15,
31
],
“fm_decoder_dim”: 512,
“fm_decoder_downsampling_factor”: [
1,
2,
4,
2,
1
],
“fm_decoder_feedforward_dim”: 1536,
“fm_decoder_num_heads”: 4,
“fm_decoder_num_layers”: [
2,
2,
4,
4,
4
],
“inf_check”: false,
“input_strategy”: “PrecomputedFeatures”,
“keep_last_k”: 30,
“lang”: “en-us”,
“log_interval”: 50,
“lr_batches”: 7500,
“lr_epochs”: 10,
“lr_hours”: 6452.0,
“manifest_dir”: “data/fbank”,
“master_port”: 12356,
“max_duration”: 500,
“max_len”: 20.0,
“min_len”: 1.0,
“model_config”: “conf/zipvoice_base.json”,
“num_buckets”: 30,
“num_epochs”: 1000000,
“num_iters”: 60000,
“num_workers”: 8,
“on_the_fly_feats”: false,
“pad_id”: 0,
“pos_dim”: 48,
“pos_head_dim”: 4,
“print_diagnostics”: false,
“query_head_dim”: 32,
“ref_duration”: 50,
“reset_interval”: 200,
“return_cuts”: false,
“sampling_rate”: 24000,
“save_every_n”: 5000,
“scan_oom”: false,
“seed”: 42,
“shuffle”: true,
“start_epoch”: 1,
“tensorboard”: true,
“text_embed_dim”: 192,
“text_encoder_cnn_module_kernel”: 9,
“text_encoder_dim”: 192,
“text_encoder_feedforward_dim”: 512,
“text_encoder_num_heads”: 4,
“text_encoder_num_layers”: 4,
“time_embed_dim”: 192,
“token_file”: “data/tokens_custom.txt”,
“tokenizer”: “simple”,
“train_manifest”: “data/fbank/custom_cuts_train.jsonl.gz”,
“type”: “vocos”,
“use_fp16”: true,
“valid_by_epoch”: false,
“valid_interval”: 5000,
“value_head_dim”: 12,
“vocab_size”: 1036,
“world_size”: 4
}
2025-12-09 12:18:14,445 INFO [train_zipvoice.py:918] (0/4) About to create model
2025-12-09 12:18:14,916 INFO [train_zipvoice.py:929] (0/4) Number of parameters : 122794596
2025-12-09 12:18:16,469 INFO [train_zipvoice.py:901] (2/4) Device: cuda:2
2025-12-09 12:18:16,470 INFO [train_zipvoice.py:916] (2/4) {
“average_period”: 200,
“base_lr”: 0.02,
“batch_idx_train”: 0,
“best_train_epoch”: -1,
“best_train_loss”: Infinity,
“best_valid_epoch”: -1,
“best_valid_loss”: Infinity,
“bucketing_sampler”: true,
“checkpoint”: null,
“condition_drop_ratio”: 0.2,
“dataset”: “custom”,
“dev_manifest”: “data/fbank/custom_cuts_dev.jsonl.gz”,
“device”: “cuda:2”,
“drop_last”: true,
“env_info”: {
“IP address”: “10.144.144.61”,
“hostname”: “ioworker-h200x8-48-1”,
“python-version”: “3.13”,
“torch-cuda-available”: true,
“torch-cuda-version”: “12.8”,
“torch-version”: “2.7.1+cu128”,
“zipvoice-git-branch”: “master”,
“zipvoice-git-date”: “Tue Dec 2 08:58:26 2025”,
“zipvoice-git-sha1”: “2f7326f-dirty”,
“zipvoice-path”: “/data0/Sougata/TTS/zipvoice_trial/ZipVoice/zipvoice”
},
“exp_dir”: “exp/zipvoice_custom”,
“feat_dim”: 100,
“feat_scale”: 0.1,
“finetune”: false,
“fm_decoder_cnn_module_kernel”: [
31,
15,
7,
15,
31
],
“fm_decoder_dim”: 512,
“fm_decoder_downsampling_factor”: [
1,
2,
4,
2,
1
],
“fm_decoder_feedforward_dim”: 1536,
“fm_decoder_num_heads”: 4,
“fm_decoder_num_layers”: [
2,
2,
4,
4,
4
],
“inf_check”: false,
“input_strategy”: “PrecomputedFeatures”,
“keep_last_k”: 30,
“lang”: “en-us”,
“log_interval”: 50,
“lr_batches”: 7500,
“lr_epochs”: 10,
“lr_hours”: 6452.0,
“manifest_dir”: “data/fbank”,
“master_port”: 12356,
“max_duration”: 500,
“max_len”: 20.0,
“min_len”: 1.0,
“model_config”: “conf/zipvoice_base.json”,
“num_buckets”: 30,
“num_epochs”: 1000000,
“num_iters”: 60000,
“num_workers”: 8,
“on_the_fly_feats”: false,
“pad_id”: 0,
“pos_dim”: 48,
“pos_head_dim”: 4,
“print_diagnostics”: false,
“query_head_dim”: 32,
“ref_duration”: 50,
“reset_interval”: 200,
“return_cuts”: false,
“sampling_rate”: 24000,
“save_every_n”: 5000,
“scan_oom”: false,
“seed”: 42,
“shuffle”: true,
“start_epoch”: 1,
“tensorboard”: true,
“text_embed_dim”: 192,
“text_encoder_cnn_module_kernel”: 9,
“text_encoder_dim”: 192,
“text_encoder_feedforward_dim”: 512,
“text_encoder_num_heads”: 4,
“text_encoder_num_layers”: 4,
“time_embed_dim”: 192,
“token_file”: “data/tokens_custom.txt”,
“tokenizer”: “simple”,
“train_manifest”: “data/fbank/custom_cuts_train.jsonl.gz”,
“type”: “vocos”,
“use_fp16”: true,
“valid_by_epoch”: false,
“valid_interval”: 5000,
“value_head_dim”: 12,
“vocab_size”: 1036,
“world_size”: 4
}
2025-12-09 12:18:16,470 INFO [train_zipvoice.py:918] (2/4) About to create model
2025-12-09 12:18:16,471 INFO [train_zipvoice.py:901] (3/4) Device: cuda:3
2025-12-09 12:18:16,472 INFO [train_zipvoice.py:916] (3/4) {
“average_period”: 200,
“base_lr”: 0.02,
“batch_idx_train”: 0,
“best_train_epoch”: -1,
“best_train_loss”: Infinity,
“best_valid_epoch”: -1,
“best_valid_loss”: Infinity,
“bucketing_sampler”: true,
“checkpoint”: null,
“condition_drop_ratio”: 0.2,
“dataset”: “custom”,
“dev_manifest”: “data/fbank/custom_cuts_dev.jsonl.gz”,
“device”: “cuda:3”,
“drop_last”: true,
“env_info”: {
“IP address”: “10.144.144.61”,
“hostname”: “ioworker-h200x8-48-1”,
“python-version”: “3.13”,
“torch-cuda-available”: true,
“torch-cuda-version”: “12.8”,
“torch-version”: “2.7.1+cu128”,
“zipvoice-git-branch”: “master”,
“zipvoice-git-date”: “Tue Dec 2 08:58:26 2025”,
“zipvoice-git-sha1”: “2f7326f-dirty”,
“zipvoice-path”: “/data0/Sougata/TTS/zipvoice_trial/ZipVoice/zipvoice”
},
“exp_dir”: “exp/zipvoice_custom”,
“feat_dim”: 100,
“feat_scale”: 0.1,
“finetune”: false,
“fm_decoder_cnn_module_kernel”: [
31,
15,
7,
15,
31
],
“fm_decoder_dim”: 512,
“fm_decoder_downsampling_factor”: [
1,
2,
4,
2,
1
],
“fm_decoder_feedforward_dim”: 1536,
“fm_decoder_num_heads”: 4,
“fm_decoder_num_layers”: [
2,
2,
4,
4,
4
],
“inf_check”: false,
“input_strategy”: “PrecomputedFeatures”,
“keep_last_k”: 30,
“lang”: “en-us”,
“log_interval”: 50,
“lr_batches”: 7500,
“lr_epochs”: 10,
“lr_hours”: 6452.0,
“manifest_dir”: “data/fbank”,
“master_port”: 12356,
“max_duration”: 500,
“max_len”: 20.0,
“min_len”: 1.0,
“model_config”: “conf/zipvoice_base.json”,
“num_buckets”: 30,
“num_epochs”: 1000000,
“num_iters”: 60000,
“num_workers”: 8,
“on_the_fly_feats”: false,
“pad_id”: 0,
“pos_dim”: 48,
“pos_head_dim”: 4,
“print_diagnostics”: false,
“query_head_dim”: 32,
“ref_duration”: 50,
“reset_interval”: 200,
“return_cuts”: false,
“sampling_rate”: 24000,
“save_every_n”: 5000,
“scan_oom”: false,
“seed”: 42,
“shuffle”: true,
“start_epoch”: 1,
“tensorboard”: true,
“text_embed_dim”: 192,
“text_encoder_cnn_module_kernel”: 9,
“text_encoder_dim”: 192,
“text_encoder_feedforward_dim”: 512,
“text_encoder_num_heads”: 4,
“text_encoder_num_layers”: 4,
“time_embed_dim”: 192,
“token_file”: “data/tokens_custom.txt”,
“tokenizer”: “simple”,
“train_manifest”: “data/fbank/custom_cuts_train.jsonl.gz”,
“type”: “vocos”,
“use_fp16”: true,
“valid_by_epoch”: false,
“valid_interval”: 5000,
“value_head_dim”: 12,
“vocab_size”: 1036,
“world_size”: 4
}
2025-12-09 12:18:16,472 INFO [train_zipvoice.py:918] (3/4) About to create model
2025-12-09 12:18:16,512 INFO [train_zipvoice.py:901] (1/4) Device: cuda:1
2025-12-09 12:18:16,514 INFO [train_zipvoice.py:916] (1/4) {
“average_period”: 200,
“base_lr”: 0.02,
“batch_idx_train”: 0,
“best_train_epoch”: -1,
“best_train_loss”: Infinity,
“best_valid_epoch”: -1,
“best_valid_loss”: Infinity,
“bucketing_sampler”: true,
“checkpoint”: null,
“condition_drop_ratio”: 0.2,
“dataset”: “custom”,
“dev_manifest”: “data/fbank/custom_cuts_dev.jsonl.gz”,
“device”: “cuda:1”,
“drop_last”: true,
“env_info”: {
“IP address”: “10.144.144.61”,
“hostname”: “ioworker-h200x8-48-1”,
“python-version”: “3.13”,
“torch-cuda-available”: true,
“torch-cuda-version”: “12.8”,
“torch-version”: “2.7.1+cu128”,
“zipvoice-git-branch”: “master”,
“zipvoice-git-date”: “Tue Dec 2 08:58:26 2025”,
“zipvoice-git-sha1”: “2f7326f-dirty”,
“zipvoice-path”: “/data0/Sougata/TTS/zipvoice_trial/ZipVoice/zipvoice”
},
“exp_dir”: “exp/zipvoice_custom”,
“feat_dim”: 100,
“feat_scale”: 0.1,
“finetune”: false,
“fm_decoder_cnn_module_kernel”: [
31,
15,
7,
15,
31
],
“fm_decoder_dim”: 512,
“fm_decoder_downsampling_factor”: [
1,
2,
4,
2,
1
],
“fm_decoder_feedforward_dim”: 1536,
“fm_decoder_num_heads”: 4,
“fm_decoder_num_layers”: [
2,
2,
4,
4,
4
],
“inf_check”: false,
“input_strategy”: “PrecomputedFeatures”,
“keep_last_k”: 30,
“lang”: “en-us”,
“log_interval”: 50,
“lr_batches”: 7500,
“lr_epochs”: 10,
“lr_hours”: 6452.0,
“manifest_dir”: “data/fbank”,
“master_port”: 12356,
“max_duration”: 500,
“max_len”: 20.0,
“min_len”: 1.0,
“model_config”: “conf/zipvoice_base.json”,
“num_buckets”: 30,
“num_epochs”: 1000000,
“num_iters”: 60000,
“num_workers”: 8,
“on_the_fly_feats”: false,
“pad_id”: 0,
“pos_dim”: 48,
“pos_head_dim”: 4,
“print_diagnostics”: false,
“query_head_dim”: 32,
“ref_duration”: 50,
“reset_interval”: 200,
“return_cuts”: false,
“sampling_rate”: 24000,
“save_every_n”: 5000,
“scan_oom”: false,
“seed”: 42,
“shuffle”: true,
“start_epoch”: 1,
“tensorboard”: true,
“text_embed_dim”: 192,
“text_encoder_cnn_module_kernel”: 9,
“text_encoder_dim”: 192,
“text_encoder_feedforward_dim”: 512,
“text_encoder_num_heads”: 4,
“text_encoder_num_layers”: 4,
“time_embed_dim”: 192,
“token_file”: “data/tokens_custom.txt”,
“tokenizer”: “simple”,
“train_manifest”: “data/fbank/custom_cuts_train.jsonl.gz”,
“type”: “vocos”,
“use_fp16”: true,
“valid_by_epoch”: false,
“valid_interval”: 5000,
“value_head_dim”: 12,
“vocab_size”: 1036,
“world_size”: 4
}
2025-12-09 12:18:16,515 INFO [train_zipvoice.py:918] (1/4) About to create model
2025-12-09 12:18:18,072 INFO [train_zipvoice.py:942] (0/4) Using DDP
[rank0]:[I1209 12:18:18.096753539 ProcessGroupNCCL.cpp:1078] [PG ID 0 PG GUID 0(default_pg) Rank 0] Using non-blocking mode: 0
[rank0]:[I1209 12:18:18.170647894 ProcessGroupNCCL.cpp:2828] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.067965 ms
[rank0]:[I1209 12:18:18.170730405 NCCLUtils.cpp:75] Rank 0: creating NCCL communicator with mode: blocking
2025-12-09 12:18:19,820 INFO [train_zipvoice.py:929] (2/4) Number of parameters : 122794596
2025-12-09 12:18:19,943 INFO [train_zipvoice.py:929] (1/4) Number of parameters : 122794596
2025-12-09 12:18:19,983 INFO [train_zipvoice.py:929] (3/4) Number of parameters : 122794596
2025-12-09 12:18:20,382 INFO [train_zipvoice.py:942] (2/4) Using DDP
[rank2]:[I1209 12:18:20.397350102 ProcessGroupNCCL.cpp:1078] [PG ID 0 PG GUID 0(default_pg) Rank 2] Using non-blocking mode: 0
[rank2]:[I1209 12:18:20.397572570 ProcessGroupNCCL.cpp:2828] [PG ID 0 PG GUID 0(default_pg) Rank 2] ProcessGroupNCCL broadcast unique ID through store took 0.192201 ms
[rank2]:[I1209 12:18:20.397602151 NCCLUtils.cpp:75] Rank 2: creating NCCL communicator with mode: blocking
2025-12-09 12:18:20,449 INFO [train_zipvoice.py:942] (1/4) Using DDP
[rank1]:[I1209 12:18:20.461722248 ProcessGroupNCCL.cpp:1078] [PG ID 0 PG GUID 0(default_pg) Rank 1] Using non-blocking mode: 0
[rank1]:[I1209 12:18:20.461877640 ProcessGroupNCCL.cpp:2828] [PG ID 0 PG GUID 0(default_pg) Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.128095 ms
[rank1]:[I1209 12:18:20.461898202 NCCLUtils.cpp:75] Rank 1: creating NCCL communicator with mode: blocking
2025-12-09 12:18:20,945 INFO [train_zipvoice.py:942] (3/4) Using DDP
[rank3]:[I1209 12:18:20.978068859 ProcessGroupNCCL.cpp:1078] [PG ID 0 PG GUID 0(default_pg) Rank 3] Using non-blocking mode: 0
[rank3]:[I1209 12:18:20.978377453 ProcessGroupNCCL.cpp:2828] [PG ID 0 PG GUID 0(default_pg) Rank 3] ProcessGroupNCCL broadcast unique ID through store took 0.25176 ms
[rank3]:[I1209 12:18:20.978421953 NCCLUtils.cpp:75] Rank 3: creating NCCL communicator with mode: blocking
W1209 12:18:30.647000 2426393 site-packages/torch/multiprocessing/spawn.py:169] Terminating process 2426592 via signal SIGTERM
W1209 12:18:30.648000 2426393 site-packages/torch/multiprocessing/spawn.py:169] Terminating process 2426593 via signal SIGTERM
W1209 12:18:30.650000 2426393 site-packages/torch/multiprocessing/spawn.py:169] Terminating process 2426595 via signal SIGTERM
Traceback (most recent call last):
File “”, line 198, in _run_module_as_main
File “”, line 88, in _run_code
File “/data0/Sougata/TTS/zipvoice_trial/ZipVoice/zipvoice/bin/train_zipvoice.py”, line 1130, in
main()
~~~~^^
File “/data0/Sougata/TTS/zipvoice_trial/ZipVoice/zipvoice/bin/train_zipvoice.py”, line 1122, in main
mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/ubuntu/anaconda3/envs/zip/lib/python3.13/site-packages/torch/multiprocessing/spawn.py”, line 340, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method=“spawn”)
File “/home/ubuntu/anaconda3/envs/zip/lib/python3.13/site-packages/torch/multiprocessing/spawn.py”, line 296, in start_processes
while not context.join():
~~~~~~~~~~~~^^
File “/home/ubuntu/anaconda3/envs/zip/lib/python3.13/site-packages/torch/multiprocessing/spawn.py”, line 215, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
– Process 2 terminated with the following error:
Traceback (most recent call last):
File “/home/ubuntu/anaconda3/envs/zip/lib/python3.13/site-packages/torch/multiprocessing/spawn.py”, line 90, in _wrap
fn(i, *args)
~~^^^^^^^^^^
File “/data0/Sougata/TTS/zipvoice_trial/ZipVoice/zipvoice/bin/train_zipvoice.py”, line 943, in run
model = DDP(model, device_ids=[rank], find_unused_parameters=True)
File “/home/ubuntu/anaconda3/envs/zip/lib/python3.13/site-packages/torch/nn/parallel/distributed.py”, line 835, in init
_verify_param_shape_across_processes(self.process_group, parameters)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/ubuntu/anaconda3/envs/zip/lib/python3.13/site-packages/torch/distributed/utils.py”, line 282, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.distributed.DistBackendError: NCCL error in: /pytorch/torch/csrc/distributed/c10d/NCCLUtils.cpp:77, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.26.2
ncclUnhandledCudaError: Call to CUDA function failed.
Last error:
Cuda failure 999 ‘unknown error’