Hi I am new to this and for most application I have been using the dataloader in utils.data to load in batches of images. However I am now trying to load images in different batch size. For example my first iteration loads in batch of 10, second loads in batch of 20.
Is there a way to do this easily? Thank you.
Same problem here,
Did you succeed to do it ?
You could implement a custom
collate_fn for your DataLoader and use it to load your batches.
I think the easiest way to achieve this is to change the
batch_size parameter of the Dataloader.
Thank you very much for your answers!!
I actually found what I wanted with the sampler in this discussion: 405015099 and changing the batch size with a batch_size for each source (here my data_source is the concatenation of datasets with specific batch_size for each).
Not very clean but seems to work.
r"""Takes a dataset with cluster_indices property, cuts it into batch-sized chunks
Drops the extra items, not fitting into exact batches
data_source (Dataset): a Dataset to sample from. Should have a cluster_indices property
batch_size (int): a batch size that you would like to use later with Dataloader class
shuffle (bool): whether to shuffle the data or not
def __init__(self, data_source, batch_size=None, shuffle=True):
self.data_source = data_source
if batch_size is not None:
assert self.data_source.batch_sizes is None, "do not declare batch size in sampler " \
"if data source already got one"
self.batch_sizes = [batch_size for _ in self.data_source.cluster_indices]
self.batch_sizes = self.data_source.batch_sizes
self.shuffle = shuffle
def flatten_list(self, lst):
return [item for sublist in lst for item in sublist]
batch_lists = 
for j, cluster_indices in enumerate(self.data_source.cluster_indices):
batches = [
cluster_indices[i:i + self.batch_sizes[j]] for i in range(0, len(cluster_indices), self.batch_sizes[j])
# filter our the shorter batches
batches = [_ for _ in batches if len(_) == self.batch_sizes[j]]
# flatten lists and shuffle the batches if necessary
# this works on batch level
lst = self.flatten_list(batch_lists)
I have been trying to use
collate_fn for this purpose but haven’t figured out how, yet. Can you give any pointers? My problem right now is the sampler gives
collate_fn 16 samples at a time, but I want the batch size to be 128. Is this possible with this approach?
collate_fn is used to process the batch of samples in a custom way. It doesn’t specify the batch size, which is set in the
Could you explain your issue a bit more, i.e. are you setting a batch size of 128 in the
DataLoader and each batch contains just 16 samples?
I’m trying to replicate the original StyleGAN’s batch size schedule: 128, 128, 128, 64, 32, 16 as the progressive growing is applied. I know I can recreate the DataLoader when I want to switch, but I’m working inside an extant framework that makes that a clunky change to make.
I never did figure out how to use
collate_fn here so instead, I’m initializing my DataLoader with a
batch_size of 16, and in my training loop I collect and concatenate these batches until I reach the actual batch size I want at any given time. This only works because all the batch sizes are divisible by 16. I tried to do this in
collate_fn at first, I thought maybe it received a generator and I could return a different generator, but that wasn’t the case.
I’m still interested to know how
collate_fn can be used to yield variable batch sizes, maybe it would be cleaner than my solution.
You could use this code snippet to see an example.
Note that the “variable size” is usually the temporal dimension or the spatial dimensions (e.g. images with a different resolution) not the batch size.
That snippet again does not modify the batch size, which is the subject of this thread.
I’ve come across the same issue while trying to implement this functionality of StyleGAN using PyTorch Lightning, which I believe is like your use case. Any luck on your end in resolving this issue?