Mechanism of represneting PCM16 by float32

Felix_astw · January 10, 2022, 8:35am

Hello everyone, I am the novice to audio coding. I simply use “torchaudio” to load WAV files whose format is PCM16 with sample rate=16kHz and bit depth=16bits. In addition, datatype of loaded tensor is torch.float32.

My questions are:
(1) What is the mechanism of representing PCM16bits by float32?
(2) Since original data has been quantized in terms of 16bits already, finer representation(>16bits) cannot specify more details yet consume more hardware memories. Thus, what are necessity and benefit of using float32 to represent PCM16 by “torchaudio”?

Sincerely thanks for your reply!

JamesWright · January 13, 2022, 9:48am

Hey @Felix_astw , welcome to the forum!

In audio file containing 16 bit PCM samples, the values are restricted to the range [-32768, 32767]. The conversion from from 16bit to to float is done by dividing each sample value by 32768. This results in signal represented in floating point values in the range [-1.0, 1.0].

Using float is the canonical way of representing audio samples. There are many advantages, for example: getting full 32-bit precision, normalisation - regardless of whether data in the file is 8, 16, 24 or 32 bit wide, simple representation of clipping and easy application of operations (e.g. mixing signals is as easy as adding sample values together). At the end, it is also fast to scale/convert to the target bit depth - it’s just matter of multiplying by target range.