How does torchaudio.transforms.DownmixMono work?

wzehui · March 28, 2019, 4:09pm

Hello,

I am getting confused when I use torchaudio.transforms.DownmixMono.
first, I load my data with sound = torchaudio.load(). This is correct that sound[0] is two channel data with torch.Size is ([2, 132300]) and sound[1] = 22050, which is the sample rate.Then I use soundData = torchaudio.transforms.DownmixMono(sound[0]) to downsample. But the result looks weird with torch.Size([2, 1]). If I understand it correctly, I can get soundData, which has only one channel? What’s wrong with that?

I check the document as well, the input format should be: tensor (Tensor): Tensor of audio of size (c x n) or (n x c), what does (c x n) mean?

ptrblck · March 28, 2019, 8:52pm

It looks like you are passing your data as [channels, length], so you should pass channels_first=True. While the docs says:

channels_first (bool): Downmix across channels dimension. Default: True

The default seems to be in fact None, which results in dim1 as the default channel dimension:

channels_first = None
ch_dim = int(not channels_first)
print(ch_dim)
> 1

c x n should correspond to [channels, length].

Thanks for reporting this problem! I’ve created an issue here.

wzehui · March 29, 2019, 10:01am

Thanks for your reply! This is exactly where the problem is!

100deep1001 · July 19, 2019, 9:23am

Hi @ptrblck,
How should I pass the channels_first keyword to the DownmixMono function?

I get the following error.
torchaudio.transforms.DownmixMono()(sound[0],channels_first = True)
__init__() got an unexpected keyword argument 'channels_first'

However this works but I get wrong dimensions as mentioned by OP:
torchaudio.transforms.DownmixMono()(sound[0])

ptrblck · July 19, 2019, 10:31am

You should pass it while initializing the transformation:

transform = torchaudio.transforms.DownmixMono(channels_first=True)

100deep1001 · July 19, 2019, 10:49am

@ptrblck Thanks for the reply;
I tried that too… but can’t get it work

transform = torchaudio.transforms.DownmixMono(channels_first=True) __init__() got an unexpected keyword argument 'channels_first'

Just to add up; I downloaded torchaudio as follows:

!git clone https://github.com/pytorch/audio.git
os.chdir("audio")
!git checkout 301e2e9
!python setup.py install

ptrblck · July 19, 2019, 10:58am

This argument was introduced after your specified commit hash.
If you look at the file, you’ll see that the __init__ method just contains a pass statement.

100deep1001 · July 19, 2019, 11:14am

Sure… got it! Thanks for the help, I will look for more recent hashes.

100deep1001 · July 19, 2019, 12:19pm

@ptrblck Tried all possible hashes but didn’t succeed.

Could you guide to a stable version/hash? I am on Google Colab.

ptrblck · July 19, 2019, 12:46pm

Isn’t the current master working (5c9d33d)?

100deep1001 · July 19, 2019, 2:01pm

No,

Here’s what I tried:

!apt-get install sox libsox-dev libsox-fmt-all
!git clone https://github.com/pytorch/audio.git
import os
os.chdir("audio")
!git checkout 5c9d33d  #301e2e9 d92de5b
!python setup.py install
import torchaudio

Gives me following error:
RuntimeError: Failed to parse the argument list of a type annotation: name 'Optional' is not defined

ptrblck · July 19, 2019, 2:07pm

I just installed the current master without a problem and it seems your error might be related to this issue.