I’m trying to wrap a call to torchaudio.functional.lfilter into a CUDA graph to make it run faster. I’ve followed the approach described this topic at first, but the example code from there doesn’t handle backward pass.
Then, after reading the CUDA graphs section in pytorch docs I’ve switched to torch.cuda.make_graphed_callables. But for me on V100 it always returns zeros no matter what are the parameters. Here’s the code:
import scipy
import torch, torchaudio
device = 'cuda'
b, a = scipy.signal.butter(N=1, Wn=0.5, fs=20, btype='low', analog=False)
a = torch.tensor(a, dtype=torch.float32, device=device, requires_grad=False)
b = torch.tensor(b, dtype=torch.float32, device=device, requires_grad=False)
sample_input = torch.rand(128, device=device, requires_grad=True)
f = torchaudio.functional.lfilter
f_graphed = torch.cuda.make_graphed_callables(f, (sample_input, a, b))
print('f', f(sample_input, a, b))
print('f_graphed', f_graphed(sample_input, a, b))
print(torch.version.cuda)
print(torch.__version__)
print(torchaudio.__version__)
And this is the output I get:
f tensor([0.0170, 0.0899, 0.1505, 0.1978, 0.2478, 0.2665, 0.3215, 0.4104, 0.4537,
0.4543, 0.4242, 0.4193, 0.4601, 0.4790, 0.4756, 0.4928, 0.4975, 0.4711,
0.4534, 0.4222, 0.4201, 0.4693, 0.5068, 0.5351, 0.5278, 0.5039, 0.4987,
0.4662, 0.4691, 0.4948, 0.5330, 0.5339, 0.4730, 0.4141, 0.3555, 0.3600,
0.3904, 0.3786, 0.4036, 0.4441, 0.4631, 0.5086, 0.5174, 0.5027, 0.5026,
0.4702, 0.4331, 0.4619, 0.5341, 0.5852, 0.6287, 0.6269, 0.6300, 0.6489,
0.5930, 0.5078, 0.4394, 0.4356, 0.4833, 0.5073, 0.4978, 0.5048, 0.5581,
0.5763, 0.5643, 0.5433, 0.5000, 0.4538, 0.4070, 0.3933, 0.4035, 0.4480,
0.4974, 0.5308, 0.5798, 0.6177, 0.6273, 0.5867, 0.5171, 0.4893, 0.5026,
0.5185, 0.4867, 0.4580, 0.4593, 0.4841, 0.4938, 0.4379, 0.4310, 0.4347,
0.4144, 0.4306, 0.4153, 0.3726, 0.3896, 0.4424, 0.4551, 0.4470, 0.4457,
0.4123, 0.4017, 0.4501, 0.4729, 0.4474, 0.4026, 0.4149, 0.4834, 0.5366,
0.5309, 0.5146, 0.5135, 0.4979, 0.5010, 0.5205, 0.5114, 0.4936, 0.5022,
0.5001, 0.4994, 0.5138, 0.5533, 0.5559, 0.5070, 0.5088, 0.5375, 0.5386,
0.5319, 0.5268], device='cuda:0', grad_fn=<ReshapeAliasBackward0>)
f_graphed tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0',
grad_fn=<GraphedBackward>)
11.7
2.0.1+cu117
2.0.2+cu117
What am I doing wrong here?
Thanks