I’m trying to prepare some audio data for a Dataloader but I’m struggling with a few points. At the moment my data is organised into two lists inputs
and target
, which are both length 32
, but have different dimensions for their elements; inputs[0].shape = (8, 3690288)
(8 mono audio tracks) and target[0].shape = (2, 3690288)
(a single stereo mix).
I’ve converted each array to a tensor by:
tensor_inputs = torch.Tensor(inputs)
tensor_target = torch.Tensor(target)
which seems to work: tensor_inputs.shape = torch.Size([32, 8, 3690288])
. I’ve then tried to convert each of these to a melspectrogram:
melspectrogram = torchaudio.transforms.melspectrogram(
sr=44100,
n_fft=1024,
hop_length=512,
n_mels=64)
tensor_input_specs = []
for i in range(len(tensor_inputs)):
spec = mel_spectrogram(tensor_inputs[i])
tensor_input_specs.append(spec)
tensor_target_specs = []
for i in range(len(tensor_target)):
spec = mel_spectrogram(tensor_target[i])
tensor_target_specs.append(spec)
and then move these into a Dataloader by doing:
dataset = TensorDataset(tensor_input_specs,tensor_target_specs)
iter = DataLoader(dataset)
However I get the following error: AttributeError: 'list' object has no attribute 'size'
, which I imagine is due to the fact that I’m appending the spectrograms to a list, but I’m not sure how else to achieve this.
Any help is appreciated.