Running multiple instances of training slows down dataloader

I have written a custom dataloader to fetch the data and labels. I am training my deep learning model with different hyperparameter values. But one thing I notice is that if I run multiple programs, each with different hyperparameter values, on the same GPU (with same GPU usage, batch size etc…), both the program slows down by atleast 2X. Is there any particular reason for this? I am thinking it might be because of multiple loader instances? Here is a very general template of how my loader looks like.

class CustomLoader:
    def __init__(self) -> None:
        self.all_files = []
        self.all_labels = []

    def get_all_files(self):
        //returns all image file paths and its corresponding labels

    def __len__(self):
        return len(self.all_files)

    def __getitem__(self, idx):
        image =[idx])
        label = self.all_labels[idx]
        return image, label

Are you only comparing dataloading speed or end-to-end benchmarking (e.g., including training/inference) on the same GPU? It would be expected that increased contention from running two programs that are calling kernels on the same GPU would cause a slowdown. If you are only benchmarking dataloading, I would check if the underlying storage is the bottleneck (are you saturating the IOPs or bandwidth) or if CPU is the bottleneck (e.g., that image decoding/preprocessing/transforms are too slow).

Thanks for the reply.
I am comparing end-to-end i.e. the time taken to run through one epoch during training. My GPU capacity is 24GB and each program occupies ~3GB of memory.