getitem in custom dataset is called twice

antran96 · June 23, 2020, 2:19am

I am building a custom Dataset where I read the audio from a path (given in a pandas dataframe). When I tried to print the path in __getitem__, it prints the same path twice, which means it processes the same thing twice. Not sure if everything works properly.

Here is my custom Dataset:

class BirdDatasetTrain:
    def __init__(self, folds,freq_mask=False, crop = 512):
        df = pd.read_csv("./input/train_folds.csv")

        df = df[["filename","ebird_code", "ebird_lbl", "duration", "kfold"]]
        df = df[df.kfold.isin(folds)].reset_index(drop=True)
        print(df.head())
        self.filenames = df.filename.values
        self.ebird_lbls = df.ebird_lbl.values
        self.ebird_codes = df.ebird_code.values
        self.freq_mask = freq_mask
        self.crop = crop 
    
    def __len__(self):
        return (len(self.filenames))

    
    def __getitem__ (self,item):
        fp = BASE_DIR + self.ebird_codes[item] + "/" +self.filenames[item]
        print(fp)  # This is printed twice with same value
        mel_spec = build_spectogram(fp)
        print(mel_spec) # This is printed twice with same value
        mel_spec = do_random_crop(mel_spec, self.crop)

        mel_spec = (mel_spec - mel_spec.mean()) / (mel_spec.std()+1e-7)
        
        if self.freq_mask:
            mel_spec = freq_mask(mel_spec)
        
        mel_spec = mel_spec.reshape([1,mel_spec.shape[0],mel_spec.shape[1]])
        return {
            "audio": torch.tensor(mel_spec,dtype=torch.float),
            "ebird_lbl":torch.tensor(self.ebird_lbls[item], dtype=torch.long)
        }

BTNug · June 23, 2020, 2:43am

Hi @antran96, could you double check:

BirdDatasetTrain.len() output is > 1?
Please make sure your filenames, ebird_lbls, and ebird_codes outputs*
-. Do you use batch size 2?
In the image generation, you could also double check whether your I/O is correct by printing inside your code. Because you doubt that you are using the same dataset, you could try to print “item” value in your getitem function.

*you could try to instantiate this class, then check all of those parameters first

__getitem__ in custom dataset is called twice

getitem in custom dataset is called twice