Normalization/energy conservation on STFT output

JXuan · February 26, 2022, 7:41pm

Hi,

Torchaudio offers the normalization on STFT output, which I think is to restore the energy lost caused by windowing (Parseval’s theorem).

def spectrogram():
.....
 if normalized:
        spec_f /= window.pow(2.).sum().sqrt()
(https://github.com/pytorch/audio/blob/0076ab073d1ee6160efbc239e075196b35ed850b/torchaudio/functional.py#L95)

I could not find more detailed explanation about the formula online. Could anyone briefly explain why do we normalize it this way? Is it generic to all window functions or only to the window functions available on torch API?

Besides, I saw people afterwards performing

S = stft_ouput.pow(2).sum(-1) 
return S

on the real tensor of STFT output. The last dim of the real tensor containing the real part and the imaginary part. Do you know what is the operation (pow(2).sum(-1)) for? And do we or do we not need sqrt() after pow(2).sum(-1) and before return S?

Many thanks in advance!