Memory Issues with RandomSampler

I am working with long documents and BERT, so I used the sliding window approach and fit each entire document into a minibatch. Since the documents are different lengths, the minibatches are different lengths. I had to make my own Dataset class and also rewrite the forward method in the DataParallel to scatter the pre-made mini batches across multiple GPUs, yada yada.

Anyway, I am still using the Dataloader, and when I set sampler=None (bad, I know), the model trains fully with no issue. When I use RandomSampler, I run into CUDA out of memory issues after maybe 25 steps or so.

I decided to just turn the train_dataloader into a list and call random.shuffle(train_dataloader) after every epoch. The model trains fine, and I specifically do not what the items inside the minibatches to randomly shuffle across minibatches.

My question then is, are using RandomSampler and my list/shuffle approach the same? In either case, why is using RandomSampler running into memory issues?