I’ve been running a workflow that is both Whisper and Pyannote.audio diarization. I’ve been running into an issue where CPU usage keeps spiking, particularly with the diarization model. I have tried to combat this issue with torch.set_num_threads(1) and other settings like OMP_NUM_THREADS, but that has either been ineffective, or spiked memory.
Here is an end to end repro of the workflow I’ve been running:
Minimum example:
import whisper
import os
from stable_whisper import modify_model
from pyannote.audio import Pipeline
import time
import torch
torch.set_num_threads(1)
model = whisper.load_model("base")
modify_model(model)
diarizationModel = Pipeline.from_pretrained("pyannote/speaker-diarization",
use_auth_token="<AUTH TOKEN>")
diarizationModel=diarizationModel.to(0)
def transcription(filePath):
modelRes=model.transcribe(filePath).to_dict()
res = {"text": modelRes["text"], "language": modelRes["language"]}
toAdd=[]
diarizationResult=diarizationModel(filePath)
for turn, _, speaker in diarizationResult.itertracks(yield_label=True):
toAdd.append({"startTime": turn.start, "stopTime": turn.end, "speaker": speaker})
res["diarization"]=toAdd
return res
print(transcription(""))
Requirements.txt
asteroid-filterbanks >=0.4
einops >=0.6.0
huggingface_hub >= 0.13.0
lightning >= 2.0.1
omegaconf >=2.1,<3.0
pyannote.core >= 5.0.0
pyannote.database >= 5.0.1
pyannote.metrics >= 3.2
pyannote.pipeline >= 2.3 # 2.4
pytorch_metric_learning >= 2.1.0
rich >= 12.0.0
semver >= 3.0.0
soundfile >= 0.12.1
speechbrain >= 0.5.14
tensorboardX >= 2.6
torch >= 2.0.0
torch_audiomentations >= 0.11.0
torchaudio >= 2.0.0
torchmetrics >= 0.11.0
#replaced torch with one above
numba
numpy
tqdm
more-itertools
tiktoken==0.3.3
fastapi
uvicorn
librosa
python-multipart
audioread
gunicorn
Docker file for the env
FROM nvidia/cuda:12.0.0-runtime-ubuntu22.04
EXPOSE 8000
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.9 \
python3-pip \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
WORKDIR /
COPY . .
RUN apt-get update
RUN apt-get install --assume-yes git
RUN apt update && apt install -y ffmpeg
RUN pip install --upgrade pip
RUN pip install git+https://github.com/openai/whisper.git
RUN pip install git+https://github.com/jianfch/stable-ts.git
RUN pip install -qq https://github.com/pyannote/pyannote-audio/archive/refs/heads/develop.zip
RUN pip install -r requirements.txt