Where to provide composed tranforms if images are not loaded using ImageFolder?

I have composed some transforms for my training images in the following way:

data_transform = transforms.Compose([
    transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.2),
    transforms.FiveCrop((224, 224)),
    transforms.RandomAffine(degrees=15, translate=(0.05, 0.05), scale=(0.9, 1.1),
    transforms.RandomPerspective(distortion_scale=0.5, p=0.5, interpolation=2,

Now it needs to be provided to torchvision.datasets.ImageFolder or some other ways exist which I am not aware of yet.

My dataset is originally in pandas dataframe, I split it for 5-fold cross validation and normalize the data in two different ways, because the last column of the dataframe is actually those image arrays and the rest is numerical data.

I am using train = torch.utils.data.TensorDataset(x_train_fold, x_train_NUM_fold, y_train_fold) where x_train_fold is a tensor of training set.

How do I insert the transforms so that the model receives them while training?

You could write a custom Dataset as explained here and apply the transformation in the __getitem__ method.
Since you’ve already created the TensorDataset, you could also pass it to your custom Dataset, add transforms.ToPILImage() as the first transformation, and apply it in the __getitem__ method.

Thanks for your reply. However I am not able to do what you said in terms of writing a custom Dataset, because to me, the link you shared and my case seems different. Besides, I am not good at organizing a custom class to suit my case.

Let me put in words (and codes), I have a sequence like this:
five_splits = list(StratifiedKFold(n_splits=5, shuffle=True).split(X_numeric_img, y_numeric_img))
where X_numeric_img, y_numeric_img are (1792, 10) and (1792, 1) shape dataframes.

Then I loop through it in the following way:

for i, (train_idx, valid_idx) in enumerate(five_splits):
    X_train_img = np.array([x for x in X_numeric_img[train_idx, -1].flatten()], dtype=float)
    X_test_img = np.array([x for x in X_numeric_img[valid_idx, -1].flatten()], dtype=float)
    X_train_img /= 255
    X_test_img /= 255

    X_train_img_T = np.transpose(X_train_img, (0, 3, 1, 2))  # (1433, 3, 224, 224)
    X_test_img_T = np.transpose(X_test_img, (0, 3, 1, 2))  # (359, 3, 224, 224)

    x_train_fold = torch.tensor(X_train_img_T).to(device, dtype=torch.float)   
    y_train_fold = torch.tensor(y_numeric_img[train_idx]).to(device, dtype=torch.float)

    x_val_fold = torch.tensor(X_test_img_T).to(device, dtype=torch.float)
    y_val_fold = torch.tensor(y_numeric_img[valid_idx]).to(device, dtype=torch.float) 

    std_scaler_train = StandardScaler().fit(X_numeric_img[train_idx, :-1])
    std_scaler_test = StandardScaler().fit(X_numeric_img[valid_idx, :-1])
    X_num_train_normm = std_scaler_train.transform(X_numeric_img[train_idx, :-1]) 
    X_num_test_normm = std_scaler_test.transform(X_numeric_img[valid_idx, :-1])

    x_train_NUM_fold = torch.tensor(X_num_train_normm).to(device, dtype=torch.float)
    x_val_NUM_fold = torch.tensor(X_num_test_normm).to(device, dtype=torch.float)

    train = torch.utils.data.TensorDataset(x_train_fold, x_train_NUM_fold, y_train_fold)
    valid = torch.utils.data.TensorDataset(x_val_fold, x_val_NUM_fold, y_val_fold)

    train_loader = torch.utils.data.DataLoader(train, batch_size=bs, shuffle=True)
    valid_loader = torch.utils.data.DataLoader(valid, batch_size=bs, shuffle=False)

Although a bit lengthy I hope you can see the way I approached it.
After augmenting images (e.g. 5 times), I need to assign numerical data of the original image to those augmented ones too, bacause the netwrok receives two inputs. That part also is making things difficult.

Could you explain it in more detail so that I can use data_transform above?

Thank you once again

Based on your code snippet I would suggest to apply the transformation in the image arrays before normalizing them with the StandardScaler.
I.e. you could permute X_train_img before dividing the data by 255. and pass each image, which should be transformed to your transformation. Since PIL.Images are expected you would need to create them via PIL.Image.fromarray.

I don’t understand the assignment of numerical data to the augmented images and am not sure if this is shown in the code snippet.

StandardScaler is not applied to arrays (images), I only extract them and divide by 255 and that’s it. As for the numerical part, you can see that StandardScaler is applied to X_num_train_normm and X_num_test_normm. The former two are training and test sets of the whole dataset for numerical part.

To understand what I want to do, please refer to this link.

Have I explained it properly? Drop any questions if something is not clear.