Constant/Variable Q Transform

lewiswolf · July 9, 2021, 11:30am

Hi,

I’ve been looking into using a Constant Q Transform in my pipeline, which I’m currently doing with librosa. I would like to rewrite this function, so that I only need to use pytorch/torchaudio for my application, and also so that it can be written in c++ like torch.stft. I am however unsure on how to get started. Where is the c++ part of torch.stft defined, so that I can get a sense of how to proceed with writing a VQT function. (VQT and CQT are essentially the same).

Thanks!

nateanl · August 31, 2021, 12:32pm

Hi @lewiswolf! Thanks for the proposal. It’ll be great to add the VQT and CQT functions. Here is the reference for torch.stft.
I also recommend adding the functions to torchaudio repo, it already supports several audio processing functions like MFCC, PitchShift, and so on. Feel free to create an issue for feature proposal and add your PR there

yoyololicon · August 31, 2021, 1:28pm

@lewiswolf
You can try nnAudio, which is built upon pytorch and would probably fit your needs.

nateanl · September 1, 2021, 10:14am

@yoyololicon I checked the implementation in nnAudio. I think it can be refactored by using the native torch complex dtype. Do you think it’s worthy implementing it in C++ like torch.stft, or it’s good enough to implement in Python, like LFCC/MFCC?

lewiswolf · September 1, 2021, 10:40am

I’d argue that a C++ implementation would be better, just based on the overall performance of these algorithms. Thanks for sharing though everyone. I’ll take a crack at it shortly!

nateanl · September 1, 2021, 11:19am

Sounds good! Looking forward to your implementation🙂 Also I realized you can add your C++ implementation to torchaudio, as we have several C++ functions already such as audio/lfilter.cpp at main · pytorch/audio · GitHub. It will attract more researchers specially in audio and speech field.

yoyololicon · September 1, 2021, 12:49pm

Seems like there’s already a CQT proposal at torchaudio.
@nateanl Yeah I think implementing CQT in Python is efficient enough cuz most of the operation can be done as matrix operation.