If any of the tasks are independent of the audio file, you can perform them in the __init__ method. Else, instead of doing everything in the __getitem__ method, you can write custom transforms and use them in the order you want. I am not really sure if this is more efficient than doing everything in the __getitem__ method but you can try and check if it is.
__getitem__ is the process which is spawn with multiprocess.
The workload should be there (mainly the spectrograms)
As @hash-ir mentions, any other task can be performed in the __init__ function (or even outside the dataset class)
It’s all about how many RAM do you have.
Do you need to read audio in __getitem__?
Well, if your dataset fits in your RAM you can preload it in init.
You can use any python function inside __getitem__, therefore it can still be readable.