In sound event detection, how to augment synthetic labels?

Hi, I’m trying to participate in DCASE 2020 Task4, a sound event detection.
And while modifying the baseline code, I could find augmentation in training process.

As you know, synthetic data have event label and onset, offset time.
So, when I augment data with time transform, I should adjust onset, offset labels to fit this.

I’m stuck in here.
So my question is:

  1. When data have been augmented and have two data in tuple, how to modify labels?

  2. When tuple have multi-data features, how pytorch link them with labels?
    (ex. Data in Tuple : 4, label : 2)

Thank you for your help in advance.