I am getting confused when I use torchaudio.transforms.DownmixMono.
first, I load my data with sound = torchaudio.load(). This is correct that sound is two channel data with torch.Size is ([2, 132300]) and sound = 22050, which is the sample rate.Then I use soundData = torchaudio.transforms.DownmixMono(sound) to downsample. But the result looks weird with torch.Size([2, 1]). If I understand it correctly, I can get soundData, which has only one channel? What’s wrong with that?
I check the document as well, the input format should be: tensor (Tensor): Tensor of audio of size (c x n) or (n x c), what does (c x n) mean?