Creating a random chunk of image sequences from a video

Some of us might have worked on KITTI vision benchmarks.
Dataset Description:
22 Sequences, Of which 11 are provided with ground truth labeling

number of frames in 11 sequences = [4541, 1101, 4661, 801, 271, 2761, 1101, 1101, 4071, 1591, 1201]

Let my dataloader be:

class KITTIData(Dataset):
    def __init__(self, basedir, sequences):
        self.basedir    = basedir
        self.sequences  = sequences
        self.seq_lens   = []
        self.cumulative = []
      
        for i in self.sequences:
            curdirlen = os.listdir(os.path.join(self.basedir, "sequences", i, "image_2"))
            self.seq_lens.append(len(curdirlen)-1)
        
        temp = 0
        for i in self.seq_lens:
            temp = temp + i
            self.cumulative.append(temp)
        
    def __len__(self):
        return np.sum(self.seq_lens)

    def __getitem__(self, index):
        seq_key   = liefunctions.BinarySearch(self.cumulative, index)
        seq_index = self.sequences[seq_key]

        if seq_key == 0:
            offset = index
        else:
            offset = index - self.cumulative[seq_key-1]
        
        t1   = offset
        t2   = offset + 1
        img1 = Image.open(os.path.join(self.basedir, "sequences", seq_index, "image_2", "%06d" % t1 + ".png")).convert('RGB')
        #img1 = ImageOps.grayscale(img1)
        img2 = Image.open(os.path.join(self.basedir, "sequences", seq_index, "image_2", "%06d" % t2 + ".png")).convert('RGB')
        #img2 = ImageOps.grayscale(img2)

        img1 = self.transform(img1)
        img2 = self.transform(img2)

        img = torch.cat((img1, img2), 0)  

        return img
  1. How can i modify this to generate randomly cropped sequences for example, (10 to 20), (45 to 60) …(1053 to 1080) frames from each sequence ?

Any similar example for this where small chunks of randomly cropped sequences are added in the dataloader would help me a lot.

Thank you for your effort in advance!!

@ptrblck @albanD @tom can i get some suggestions on this

If you have the image sequence as array-like shape vid = [w, h, c, n_frames] you can use np.split and specify the indices to extract random cropped sequences.
random_sequence = np.split(vid, indices_or_sections=[10,20,45,60,...], axis=3)
For this, you have to load the whole sequence into memory, which might be a bit memory-heavy.

Otherwise, you can just define t1 and t2 in __getitem__ as:

t1 = random.sample(range(0, len(sequence-max_n_frames_wanted), 1)
t2 = random.sample(range(0, len(max_n_frames_wanted), 1)

and then load the frames sequentially in a for loop from t1 to t2 and use torch.cat in the end as you do.

Hope this helps.

I did not get it.

The variables i am using t1, t2 are frames at t, t+1 timestamps.
From the above code:

t1 = random.sample(range(0, len(sequence-max_n_frames_wanted), 1)
t2 = random.sample(range(0, len(max_n_frames_wanted), 1)

i assume number of samples to be drawn for instance be k = 10 (argument is missing in random.sample()) or if at all we consider k = max_n_frames_wanted, sequence = 400.

t1 = [0, 1, 2, 3, 4,…389]
t2 = [0, 1, 2, 3, 4,…9]

I hope i am getting it correctly.

But, my main concern is when we using random sampling the length in def __len__(self) varies and the index should point out to the sequence to retrieve image from disk.

First, you can set the __len__(self) to be the number of sequences you have, so in this case, 11. So instead of

you can specify it as len(self.seq_lens). Sure, an epoch will be smaller, but you can train for more epochs.

Then, I assumed that you wanted to have a variable number of frames (but in serialized order) on each __getitem__(self, idx) call, so that’s what I specified as t1 and t2. At each call, you would get a sequence index and then randomly pick t1 and t2 in that sequence. Then take all the frames between t1 and t1+t2.

If you want to fix the number of frames then you can simply take frames between t1 and t1+k. Then, t1 can be specified as:
t1 = random.sample(range(0, self.seq_lens[idx]-k), 1)

If you don’t want to take sequential (t1, t1+1,... t1+k) frames just replace 1 to k in t1 and remove t2.

Hope this clears it up.