I am trying to run piper forked from github/rhasspy/piper in google compute engine vm with L4 GPU. I have cuda toolkit 12.6
This is my error log:
python3 -m piper_train --dataset-dir /home/stevenvana/piper/out-train/ --accelerator ‘gpu’ --devices 1 --batch-size 32 --validation-split 0.0 --num-test-examples 0 --max_epochs 10000 --resume_from_checkpoint /home/stevenvana/piper/out-train/epoch=2218-step=838782.ckpt?download=true --checkpoint-epochs 1 --precision 32 --quality high
DEBUG:piper_train:Namespace(dataset_dir=‘/home/stevenvana/piper/out-train/’, checkpoint_epochs=1, quality=‘high’, resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices=‘1’, gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=10000, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=50, accelerator=‘gpu’, strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint=‘/home/stevenvana/piper/out-train/epoch=2218-step=838782.ckpt?download=true’, profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend=‘native’, amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode=‘max_size_cycle’, batch_size=32, validation_split=0.0, num_test_examples=0, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234)
/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting Trainer(resume_from_checkpoint=)
is deprecated in v1.5 and will be removed in v1.7. Please pass Trainer.fit(ckpt_path=)
directly instead.
rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 1 epoch(s)
DEBUG:vits.dataset:Loading dataset: /home/stevenvana/piper/out-train/dataset.jsonl
/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:731: LightningDeprecationWarning: trainer.resume_from_checkpoint
is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path with trainer.fit(ckpt_path=)
instead.
ckpt_path = ckpt_path or self.resume_from_checkpoint
Restoring states from the checkpoint path at /home/stevenvana/piper/out-train/epoch=2218-step=838782.ckpt?download=true
DEBUG:fsspec.local:open file: /home/stevenvana/piper/out-train/epoch=2218-step=838782.ckpt?download=true
/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1659: UserWarning: Be aware that when using ckpt_path
, callbacks used to create the checkpoint need to be provided during Trainer
instantiation. Please add the following callbacks: [“ModelCheckpoint{‘monitor’: None, ‘mode’: ‘min’, ‘every_n_train_steps’: 0, ‘every_n_epochs’: 1, ‘train_time_interval’: None}”].
rank_zero_warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local:open file: /home/stevenvana/piper/out-train/lightning_logs/version_13/hparams.yaml
Restored all states from the checkpoint file at /home/stevenvana/piper/out-train/epoch=2218-step=838782.ckpt?download=true
/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:153: UserWarning: Total length of DataLoader
across ranks is zero. Please make sure this was your intention.
rank_zero_warn(
/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:236: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers
argument(try 8 which is the number of cpus on this machine) in the
DataLoader` init to improve performance.
rank_zero_warn(
/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (2) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/home/stevenvana/piper/src/python/piper_train/main.py”, line 147, in
main()
File “/home/stevenvana/piper/src/python/piper_train/main.py”, line 124, in main
trainer.fit(model)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 696, in fit
self._call_and_handle_interrupt(
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 650, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 735, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1166, in _run
results = self._run_stage()
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1252, in _run_stage
return self._run_train()
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1283, in _run_train
self.fit_loop.run()
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py”, line 200, in run
self.advance(*args, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py”, line 271, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py”, line 200, in run
self.advance(*args, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py”, line 203, in advance
batch_output = self.batch_loop.run(kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py”, line 200, in run
self.advance(*args, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py”, line 87, in advance
outputs = self.optimizer_loop.run(optimizers, kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py”, line 200, in run
self.advance(*args, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py”, line 201, in advance
result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py”, line 248, in _run_optimization
self._optimizer_step(optimizer, opt_idx, kwargs.get(“batch_idx”, 0), closure)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py”, line 358, in _optimizer_step
self.trainer._call_lightning_module_hook(
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1550, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/module.py”, line 1705, in optimizer_step
optimizer.step(closure=optimizer_closure)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py”, line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py”, line 216, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py”, line 153, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py”, line 68, in wrapper
return wrapped(*args, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/optimizer.py”, line 140, in wrapper
out = func(*args, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py”, line 27, in decorate_context
return func(*args, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/adamw.py”, line 120, in step
loss = closure()
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py”, line 138, in _wrap_closure
closure_result = closure()
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py”, line 146, in call
self._result = self.closure(*args, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py”, line 132, in closure
step_output = self._step_fn()
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py”, line 407, in _training_step
training_step_output = self.trainer._call_strategy_hook(“training_step”, *kwargs.values())
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1704, in _call_strategy_hook
output = fn(*args, **kwargs)
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py”, line 358, in training_step
return self.model.training_step(*args, **kwargs)
File “/home/stevenvana/piper/src/python/piper_train/vits/lightning.py”, line 191, in training_step
return self.training_step_g(batch)
File “/home/stevenvana/piper/src/python/piper_train/vits/lightning.py”, line 230, in training_step_g
y_hat_mel = mel_spectrogram_torch(
File “/home/stevenvana/piper/src/python/piper_train/vits/mel_processing.py”, line 120, in mel_spectrogram_torch
torch.stft(
File “/home/stevenvana/piper/src/python/.venv/lib/python3.10/site-packages/torch/functional.py”, line 632, in stft
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR
These are my installed dependencies:
Package Version Editable project location
absl-py 2.1.0
aiohappyeyeballs 2.4.0
aiohttp 3.10.5
aiosignal 1.3.1
async-timeout 4.0.3
attrs 24.2.0
audioread 3.0.1
build 1.2.1
certifi 2024.7.4
cffi 1.17.0
charset-normalizer 3.3.2
coloredlogs 15.0.1
Cython 0.29.37
decorator 5.1.1
flatbuffers 24.3.25
frozenlist 1.4.1
fsspec 2024.6.1
grpcio 1.66.0
humanfriendly 10.0
idna 3.8
joblib 1.4.2
lazy_loader 0.4
librosa 0.10.2.post1
lightning-utilities 0.11.6
llvmlite 0.43.0
Markdown 3.7
MarkupSafe 2.1.5
mpmath 1.3.0
msgpack 1.0.8
multidict 6.0.5
numba 0.60.0
numpy 1.26.4
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
onnxruntime 1.19.0
packaging 24.1
pip 24.0
piper-phonemize 1.1.0
piper_train 1.0.0 /home/stevenvana/piper/src/python
piper-tts 1.2.0
platformdirs 4.2.2
pooch 1.8.2
protobuf 5.27.3
pycparser 2.22
pyDeprecate 0.3.2
pyproject_hooks 1.1.0
pytorch-lightning 1.7.7
PyYAML 6.0.2
requests 2.32.3
scikit-learn 1.5.1
scipy 1.14.1
setuptools 73.0.1
six 1.16.0
soundfile 0.12.1
soxr 0.4.0
sympy 1.13.2
tensorboard 2.17.1
tensorboard-data-server 0.7.2
threadpoolctl 3.5.0
tomli 2.0.1
torch 1.13.1
torchmetrics 0.11.4
tqdm 4.66.5
typing_extensions 4.12.2
urllib3 2.2.2
Werkzeug 3.0.4
wheel 0.44.0
Versions
ubuntu 22.04
python 3.10 venv