Self supervised learning, double transformations seeding

I want to implement simclr/byol/moco/barlow style self supervised learning.

For this I need to apply two independent transformations to each individual image occurrence.

I am unclear about seeding of the pytorch transformations.

I need independence between workers, batches and epochs.

In a naive approach I would do

    class DoubleTransform:
        def __init__(self, transform):
            self.transform = transform

        def __call__(self, x):
            x1 = self.transform(x)
            x2 = self.transform(x)
            return x1, x2
    transform = T.Compose([
        T.RandomResizedCrop(size=32, scale=(0.2, 1.0)),
        T.RandomHorizontalFlip(),
        T.RandomApply(nn.ModuleList([T.ColorJitter(0.4, 0.4, 0.4, 0.1)]), p=0.8),
        T.RandomGrayscale(p=0.2),
        T.ToTensor(),
        T.Normalize(
            (0.4914, 0.4822, 0.4465),
            (0.247, 0.243, 0.261))
    ])
    double_loader = torch.utils.data.DataLoader(
        CIFAR10(root='/tmp/cifar/', train=True, download=True, transform=DoubleTransform(transform)),
        batch_size=args.batch_size, shuffle=True, num_workers=4, drop_last=True, pin_memory=True
    )
    for epoch in range(start_epoch, args.epochs):
        model.train()
        for (x1, x2) in double_loader:
            x1, x2 = x1.to(args.device), x2.to(args.device)
            loss = model(x1, x2)

From torchvision.transforms — Torchvision master documentation I can tell that:
Deterministic or random transformations applied on the batch of Tensor Images identically transform all the images of the batch.
From this I expect that if the transform randomly chose to grayscale, ALL images in the batch will be grayscale. Am I reading this correct?

In contrast to the naive approach, recently the Barlow repository went online. They simply put the transformation twice.

class Transform:
    def __init__(self):
        self.transform = transforms.Compose([
            transforms.RandomResizedCrop(224, interpolation=Image.BICUBIC),
            transforms.RandomHorizontalFlip(p=0.5),
            transforms.RandomApply(
                [transforms.ColorJitter(brightness=0.4, contrast=0.4,
                                        saturation=0.2, hue=0.1)],
                p=0.8
            ),
            transforms.RandomGrayscale(p=0.2),
            GaussianBlur(p=1.0),
            Solarization(p=0.0),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])
        self.transform_prime = transforms.Compose([
            transforms.RandomResizedCrop(224, interpolation=Image.BICUBIC),
            transforms.RandomHorizontalFlip(p=0.5),
            transforms.RandomApply(
                [transforms.ColorJitter(brightness=0.4, contrast=0.4,
                                        saturation=0.2, hue=0.1)],
                p=0.8
            ),
            transforms.RandomGrayscale(p=0.2),
            GaussianBlur(p=0.1),
            Solarization(p=0.2),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])

    def __call__(self, x):
        y1 = self.transform(x)
        y2 = self.transform_prime(x)
        return y1, y2

I take this as a reference, but why is it the way it is?

It does not seem to address the homogenous batch transformations mentioned earlier, which makes me think that maybe my interpretation of the documentation is wrong and this is not actually happening. Are they indeed independent?

Also, my hunch is that, in the naive approach the views in x1 and x2 are results of consecutive transformation instantiations of the same seed. As we start with the same seed in each epoch, even though the x would change by shuffling in the data loader, the consecutive transformation draws from the seed will remain paired. This would greatly limit the diversity of image pairs x1, x2.
Is this correct?

Looking at the official moco repository, they seem to implement the naive approach

class TwoCropsTransform:
    """Take two random crops of one image as the query and key."""

    def __init__(self, base_transform):
        self.base_transform = base_transform

    def __call__(self, x):
        q = self.base_transform(x)
        k = self.base_transform(x)
        return [q, k]