Torch.cuda.make_graphed_callables + torchaudio.functional.lfilter returns zeros

I’m trying to wrap a call to torchaudio.functional.lfilter into a CUDA graph to make it run faster. I’ve followed the approach described this topic at first, but the example code from there doesn’t handle backward pass.

Then, after reading the CUDA graphs section in pytorch docs I’ve switched to torch.cuda.make_graphed_callables. But for me on V100 it always returns zeros no matter what are the parameters. Here’s the code:

import scipy
import torch, torchaudio

device = 'cuda'
b, a = scipy.signal.butter(N=1, Wn=0.5, fs=20, btype='low', analog=False)
a = torch.tensor(a, dtype=torch.float32, device=device, requires_grad=False)
b = torch.tensor(b, dtype=torch.float32, device=device, requires_grad=False)
sample_input = torch.rand(128, device=device, requires_grad=True)

f = torchaudio.functional.lfilter
f_graphed = torch.cuda.make_graphed_callables(f, (sample_input, a, b))

print('f', f(sample_input, a, b))
print('f_graphed', f_graphed(sample_input, a, b))

print(torch.version.cuda)
print(torch.__version__)
print(torchaudio.__version__)

And this is the output I get:

f tensor([0.0170, 0.0899, 0.1505, 0.1978, 0.2478, 0.2665, 0.3215, 0.4104, 0.4537,
        0.4543, 0.4242, 0.4193, 0.4601, 0.4790, 0.4756, 0.4928, 0.4975, 0.4711,
        0.4534, 0.4222, 0.4201, 0.4693, 0.5068, 0.5351, 0.5278, 0.5039, 0.4987,
        0.4662, 0.4691, 0.4948, 0.5330, 0.5339, 0.4730, 0.4141, 0.3555, 0.3600,
        0.3904, 0.3786, 0.4036, 0.4441, 0.4631, 0.5086, 0.5174, 0.5027, 0.5026,
        0.4702, 0.4331, 0.4619, 0.5341, 0.5852, 0.6287, 0.6269, 0.6300, 0.6489,
        0.5930, 0.5078, 0.4394, 0.4356, 0.4833, 0.5073, 0.4978, 0.5048, 0.5581,
        0.5763, 0.5643, 0.5433, 0.5000, 0.4538, 0.4070, 0.3933, 0.4035, 0.4480,
        0.4974, 0.5308, 0.5798, 0.6177, 0.6273, 0.5867, 0.5171, 0.4893, 0.5026,
        0.5185, 0.4867, 0.4580, 0.4593, 0.4841, 0.4938, 0.4379, 0.4310, 0.4347,
        0.4144, 0.4306, 0.4153, 0.3726, 0.3896, 0.4424, 0.4551, 0.4470, 0.4457,
        0.4123, 0.4017, 0.4501, 0.4729, 0.4474, 0.4026, 0.4149, 0.4834, 0.5366,
        0.5309, 0.5146, 0.5135, 0.4979, 0.5010, 0.5205, 0.5114, 0.4936, 0.5022,
        0.5001, 0.4994, 0.5138, 0.5533, 0.5559, 0.5070, 0.5088, 0.5375, 0.5386,
        0.5319, 0.5268], device='cuda:0', grad_fn=<ReshapeAliasBackward0>)
f_graphed tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0',
       grad_fn=<GraphedBackward>)
11.7
2.0.1+cu117
2.0.2+cu117

What am I doing wrong here?
Thanks

1 Like

I don’t know if torchaudio is compatible with CUDA Graphs or which ops might be. @hwangjeff do you know what is supported and if it was tested before?