How to use use StreamReader in Linux

I’m trying to follow the PyTorch tutorials that explain how to work with audio files and devices. In the StreamReader Advanced Usages, the examples provided are for Mac which I don’t have one. I’m using Linux and I’m having a hard time following the examples.

For one thing, the ffmpeg version that works with torchaudio is earlier than 4.4. And when I install torchvision (using conda), the ffmpeg version 4.3 is installed. So far, no complaints. But the problem is that for some reason, the installed ffmpeg 4.3 does not recognize any of my devices:

ffmpeg -devices
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
  configuration: --prefix=/opt/conda/conda-bld/ffmpeg_1597178665428/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
Devices:
 D. = Demuxing supported
 .E = Muxing supported
 --
 DE fbdev           Linux framebuffer
 D  lavfi           Libavfilter virtual input device
 DE oss             OSS (Open Sound System) playback
 DE video4linux2,v4l2 Video4Linux2 output device

This is while I have ffmpeg n5.2 installed on my machine which works with all the devices on my machine perfectly fine:

ffmpeg -devices
ffmpeg version n5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 12.2.0 (GCC)
  configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libmfx --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librav1e --enable-librsvg --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-nvdec --enable-nvenc --enable-opencl --enable-opengl --enable-shared --enable-version3 --enable-vulkan
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
Devices:
 D. = Demuxing supported
 .E = Muxing supported
 --
 DE alsa            ALSA audio output
 DE fbdev           Linux framebuffer
 D  iec61883        libiec61883 (new DV1394) A/V input device
 D  jack            JACK Audio Connection Kit
 D  kmsgrab         KMS screen capture
 D  lavfi           Libavfilter virtual input device
  E opengl          OpenGL output
 DE oss             OSS (Open Sound System) playback
 DE pulse           Pulse audio output
  E sdl,sdl2        SDL2 output device
 DE video4linux2,v4l2 Video4Linux2 output device
 D  x11grab         X11 screen capture, using XCB
  E xv              XV (XVideo) output device

More precisely, I was hoping to see pulse in the devices of ffmpeg 4.3 so I can use my microphone for reading live stream of audio. But right now, there’s no way for me to do anything. I even tested with my own ffmpeg n5.2 (I didn’t install torchvision) but then StreamReader does not recognize ffmpeg at all:

StreamReader(
    src="1",
    format="pulse",
)

RuntimeError: StreamReader requires FFmpeg extension which is not available. Please refer to the stacktrace above for how to resolve this.

I appreciate it if someone could point me to some examples on how to use StreamReader on Linux.

Thanks.

After some struggle, I managed to compile ffmpeg 4.4.3 on my system with pulse and jack listed as devices it recognizes.

I can even read from my webcam:

StreamReader(
    src="/dev/video0",
    format="v4l2",
)

But still, when I try to read audio stream:

StreamReader(
    src="3",
    format="pulse",
)

It errors out:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[5], line 1
----> 1 StreamReader(
      2     src="3",
      3     format="pulse",
      4 )

File ~/.conda/envs/pytorch/lib/python3.10/site-packages/torchaudio/io/_stream_reader.py:467, in StreamReader.__init__(self, src, format, option, buffer_size)
    465 torch._C._log_api_usage_once("torchaudio.io.StreamReader")
    466 if isinstance(src, str):
--> 467     self._be = torch.classes.torchaudio.ffmpeg_StreamReader(src, format, option)
    468 elif isinstance(src, torch.Tensor):
    469     self._be = torch.classes.torchaudio.ffmpeg_StreamReaderTensor(src, format, option, buffer_size)

RuntimeError: Unsupported device/format: "pulse"

This is while ffmpeg lists pulse as one its devices:

$ ffmpeg -devices
ffmpeg version 4.4.3 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 12.2.1 (GCC) 20230201
  configuration: --enable-libjack --enable-libpulse --enable-opengl --prefix=/home/mehran/.conda/pkgs/ffmpeg-4.4.3
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
Devices:
 D. = Demuxing supported
 .E = Muxing supported
 --
 DE alsa            ALSA audio output
 DE fbdev           Linux framebuffer
 D  jack            JACK Audio Connection Kit
 D  lavfi           Libavfilter virtual input device
  E opengl          OpenGL output
 DE oss             OSS (Open Sound System) playback
 DE pulse           Pulse audio output
  E sdl,sdl2        SDL2 output device
 DE video4linux2,v4l2 Video4Linux2 output device
 D  x11grab         X11 screen capture, using XCB
  E xv              XV (XVideo) output device

Has anyone ever managed to work with StreamReader on Linux?

I just learned about get_input_devices():

from torchaudio.utils.ffmpeg_utils import get_input_devices

for k, v in get_input_devices().items():
    print(f"{k}: {v}")

And it is returning:

fbdev: Linux framebuffer
lavfi: Libavfilter virtual input device
oss: OSS (Open Sound System) capture
video4linux2,v4l2: Video4Linux2 device grab

Which means that my effort to install ffmpeg 4.4.3 with pulse support was futile. Back to square one. Does anyone know how to install ffmpeg for torch audio in Linux with pulse support?

Hi Mehran, I am running into the same issue. Have you found a solution?

Hi all - had this issue as well. First, I did a pacmd list sources to find my device card. I have a 6-channel array on card 4 that samples at 16000 Hz.

To get the StreamReader started, I did the following:
streamer = torchaudio.io.StreamReader(src="hw:4, format="alsa", option={"sample_rate":"16000","channels":"6"})

The actual option names come directly from the FFMPEG alsa options here: FFmpeg Devices Documentation