A common error occurs in multiple torchaudio.datasets.○○

I get an error when trying torchaudio’s dataset
I have tried two
torchaudio.datasets.LIBRISPEECH
torchaudio.datasets.TEDLIUM

code
Switched to LIBRISPEECH with “”"

    train_dataset = torchaudio.datasets.TEDLIUM(root='data',
                                              release = 'release1',
                                              subset = 'train',
                                              download = True,
                                              audio_ext = '.sph')
    """
    train_dataset = torchaudio.datasets.LIBRISPEECH(root='data',
                                          url= 'train-clean-100',
                                          folder_in_archive = 'LibriSpeech',
                                          download = True)
    print("test")"""
    test_dataset = torchaudio.datasets.TEDLIUM(root='data',
                                              release = 'release1',
                                              subset = "test",
                                              download = True,
                                              audio_ext = '.sph')
    """
    test_dataset = torchaudio.datasets.LIBRISPEECH(root='data',
                                          url= 'test-clean',
                                          folder_in_archive = 'LibriSpeech',
                                          download = True)"""



    train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

    test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                           batch_size=batch_size,
                                           shuffle=False)

    print("start")
    for i, (inputs) in enumerate(train_loader):
        print(type(inputs))

LIBRISPEECH

  File "test3.py", line 194, in <module>
    main()
  File "test3.py", line 188, in main
    for i, (inputs) in enumerate(train_loader):
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\dataloader.py", line 681, in __next__
    data = self._next_data()
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\_utils\fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\_utils\collate.py", line 175, in default_collate
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\_utils\collate.py", line 175, in <listcomp>
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\_utils\collate.py", line 141, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [1, 27520] at entry 0 and [1, 34240] at entry 1

TEDLIUM

Traceback (most recent call last):
  File "voice_train2.py", line 199, in <module>
    main()
  File "voice_train2.py", line 181, in main
    _train_loss = train_fn(model, train_loader, criterion, optimizer, device=device,batch_size=batch_size)
  File "voice_train2.py", line 34, in train_fn
    for i, (inputs, labels) in enumerate(train_loader):
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\dataloader.py", line 681, in __next__
    data = self._next_data()
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\_utils\fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\_utils\collate.py", line 175, in default_collate
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\_utils\collate.py", line 175, in <listcomp>
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data\_utils\collate.py", line 141, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [1, 244080] at entry 0 and [1, 251520] at entry 1

I don’t know if it’s related, but I get a warning like this

C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension:
  warn(f"Failed to load image Python extension: {e}")

Based on the error message the DataLoader isn’t able to collate the samples into one batch as their size in dim1 differ.
I don’t know what this dimension refers to, but if it’s e.g. the sequence length you might want to cut all signals to the same duration or resample them etc.

Thank you for your answer, but

C:\Users\PC_User\Anaconda3\envs\newflan22a08m14d\lib\site-packages\torch\utils\data_utils\collate.p This is not a function I wrote, but a function installed in pytorch Since the download was also done with download = True, does that mean there’s a problem with torchaudio?
Suspicious is “Failed to load image Python extension: {e}” or something during the conda install stage

about what dim is at issue
The most striking thing is that the sample length of the audio data is different.
Therefore, it fails to make it an array of (bacth, sep)

There’s a problem in torch.utils.data.DataLoader, and it’s quite annoying not to use torch.utils.data.DataLoader.Is there a good way to deal with this?

If you explicitly want to use samples with a different length, you could search this forum for solutions using a custom collate_fn and make sure your model is able to process them somehow. E.g. you could return a list instead of a stacked tensor.

On the other hand, you can make sure that each sample has the same shape such that the default collate_fn can create a single batch tensor which you can pass to the model directly.

This behavior is expected and not an error in the DataLoader as it’s the users responsibility to either provide samples in the same shape or to provide a custom collate_fn otherwise.

Thank you. Thanks to you, I may be able to move it somehow
However
TEDLIUM and LIBRISPEECH are installed in torchaudio by default, but do you mean that you have to create your own collate_fn function to use them?