WeightedRandomSampler Image Sequences list index out of range

I have a sequence of images, so with batch size 1, my getitem method output would be a sequence of images and a sequence of labels of size [9] with sequence length 9. I am trying to use the WeightedRandomSampler. However, I am running into an error in the getitem method, which I didn’t when not using the WeightedRandomSampler.

I first used this approach:

class_sample_count = [6935, 805] 
weights = 1 / torch.Tensor(class_sample_count)

# load labels once to later get weights
targets = []
for _, labels in train_set:
    targets.append(np.array([weights[t] for t in labels]))
    
# transform to numpy array
targets = np.asarray(targets)

samples_weight = torch.from_numpy(targets)
samples_weight = samples_weight.double()
sampler = WeightedRandomSampler(samples_weight, len(samples_weight ))

samples_weigth had shape torch.Size([860, 9]) and I got the following error:

list indices must be integers or slices, not list

I then did the same but first flattened the 2D list to a 1D list before transforming it into a numpary array:

class_sample_count = [6935, 805] 
weights = 1 / torch.Tensor(class_sample_count)

# load labels once to later get weights
targets = []
for _, labels in train_set:
    targets.append(np.array([weights[t] for t in labels]))
    
# flatten list (2D to 1D)
targets = list(chain.from_iterable(targets))

# transform to numpy array
targets = np.asarray(targets)

samples_weight = torch.from_numpy(targets)
samples_weight = samples_weight.double()
sampler = WeightedRandomSampler(samples_weight, len(samples_weight))

but then got the following error:

list index out of range

In my getitem method, I am getting a sequence of images with seq = self.data[idx].
How can I handle sequences with the WeightedRandomSampler?

If I understand your use case correctly, each index in __getitem__ would load 9 samples and the corresponding 9 targets (9 would be the sequence length).
If that’s the case, you would need to create a weight value for each sample, not each value in the sequence.
The first approach seems to create sample_weight in the shape [nb_samples, seq_len], while the latter would flatten it.
Note that the sampler yields the index values, which are passed to the __getitem__, so it would have nb_samples weight values.

Thanks for your reply! Yes, exactly that’s how my getitem method works.
That means the sampler should have 860 (number of sequences) weight values instead of 860*9 which would be the case right now when I would use the flattening approach?
I am not quite sure how to realize that since this code snip

targets.append(np.array([weights[t] for t in labels]))

gets a weight for each sample in the sequence and a sequence can have targets from both classes (0 or 1). So how can I get a weight for the whole sequence?

It’s not straightforward to create weights for samples containing multiple targets and you could take a look at this post, which explains some potentially useful methods.

Thanks, I’ll take a look!