Torchaudio.load normalization question

Laurence_J · February 28, 2020, 6:11pm

Hi, I’m new to audio signal processing and to pytorch and I’m having some trouble understanding this part of the docs of the torchaudio load function:

normalization (bool, number, or callable, optional) – If boolean True, then output is divided by 1 << 31 (assumes signed 32-bit audio), and normalizes to [-1, 1]. If number, then output is divided by that number If callable, then the output is passed as a parameter to the given function, then the output is divided by the result. (Default: True)

From what I understand, the function assumes the file to have a bit depth of 32 bit, however that bit depth is rather rare. Does 32-bit audio mean indeed bit depth or something else?

Also I don’t understand what is the meaning of output is divided by 1 << 31. What is meant by output and what is meant by 1 << 31?

Thanks for your help

ptrblck · February 29, 2020, 6:44am

I also assume 32-bit audio corresponds to the bits per sample.

1 << 31 is a left shift by 31 positions, so it translates to 1 << 31 == 2**31 == 2147483648, which would be the max value of each sample.
If I’m not mistaken, 32bit audio would have the range [−2,147,483,648, 2,147,483,647], so you would get a minimal error for the max positive value.

Laurence_J · February 29, 2020, 10:30am

thank you very much for the reply,

it must be true indeed that

I also assume 32-bit audio corresponds to the bits per sample.

as if i normalize with True I get a tensor with max and min values [-1,1]:
max: 0.0881
min: -0.1289

while if I use normalization=16:
max: 11821056
min: -17301504

and for normalization=False:
max: 1.8914e+08
min: -2.7682e+08

indeed the std and avg of the data loaded using normalization True are between 0 and 1 so it seems like the correct normalization for the data I’m working with.

Thank you very much again