Hi ptrblck,
Thank you for your in-depth responses on the forums! I’m currently working with sequential image classifier using a CNN-Bi-LSTM architecture. Fixed-length sequences (length ~5) of images are fed into my network, which outputs a single label for classification.
My current approach to splitting the data has been creating a Dataset
that reads in the image paths and class IDs, and processes the index from __getitem__()
to generate sequences in a sliding window fashion. This works perfectly fine, but is it more standard to have this sequence partitioning logic implemented in a Sampler
(similar to your reply here)?
I’m currently using random_split
for my train, validation, and test splits, and have since realized there is data leakage due to the sliding window approach. How do you recommend fixing this while keeping the model robust (without breaking the sequence partitioning logic)? If I don’t allocate my testing data in a large chunk (i.e. random sampling), I’ll lose a good amount of potential sequences.
Here is my Dataset
class below:
class MyDataset(Dataset):
def __init__(self, img_dir, annotations_file, seq_length, transform=None, target_transform=None):
self.img_dir = img_dir
self.img_labels = pd.read_csv(annotations_file, header=None, names=['image', 'class'])
self.seq_length = seq_length
self.transform = transform
self.target_transform = target_transform
self.class_groups = self.img_labels.groupby('class')
def __len__(self):
return sum(len(group) - self.seq_length + 1 for _, group in self.class_groups)
def __getitem__(self, idx):
# Find class index and index within the class
for class_label, group in self.class_groups:
group_size = len(group) - self.seq_length + 1
if idx < group_size:
group_idx = idx
break
else:
idx -= group_size
# Read sequence images
img_paths = group.iloc[group_idx : group_idx + self.seq_length, 0].tolist()
images = []
for img_path in img_paths:
image = read_image(os.path.join(self.img_dir, img_path)).float()
if self.transform:
image = self.transform(image)
images.append(image)
images = torch.stack(images)
label = torch.tensor(class_label)
if self.target_transform:
label = self.target_transform(label)
return images, label
Also on another note, will the LSTM learn effectively if I am training on sequences from different classes in my batches?
Really appreciate your help!