Is it possible to change the batch_size of a dataloader after it was created?

Is this possible?
I’m asking this becasue for example today I had couple of models and for each of them I’d like to use a different batch_size, I initially created a dataloader with lets say batch_size of 32, and now I want to increase its size to lets say 128. but I dont want to create a new dataloader.
Is this possible?
I would appreciate any kind of help in this regard.

This shouldn’t be allowed in the current version anymore and you’ll get a ValueError:

ValueError: batch_size attribute should not be set after DataLoader is initialized

Creating a new DataLoader should be cheap, so I would recommend to initialize a new DataLoader.

2 Likes

Thanks, but why not?

Why isn’t this possible? How do I know what kind of DataLoader was passed?

Say I have the following:

def return_dl(
    dataloader: DataLoader,
    batch_size: int = None,
) -> DataLoader:

    if batch_size is not None:
        dataloader.batch_size = batch_size

    return dataloader

Here, dataloader could be any DataLoader subclass, e.g. 'NeighborLoader` from PyTorch Geometric. Why should this remain not allowed?

1 Like

The ValueError was added here when persistent_workers were introduced since changing internal attributes won’t have an effect anymore. I’m sure the code owners would like to hear your concerns and PRs should be welcome in case you want to allow changing these attributes for valid use cases.

1 Like

Thanks for the reply. But then how would one handle my use case? You receive an already instantiated DataLoader, and one has to change its batch size while keeping all the rest. What would be the best way to do so? How can one initialize a new DataLoader that is exactly like the old but with a different batch size?

Exactly as you’ve described: you could re-initialize a new DataLoader. Usually the creation of the DataLoader is cheap compared to the actual data loading and processing (assuming you are using lazy data loading).
Another complication (besides persistent workers) is that each worker pre-fetches batches in advance. If your code now manipulates internal states (such as the batch size) how and when should the change be visible?
Maybe there could be a clear API design, so your ideas are welcome.

Thanks for being very active here!

Yep, I see. But re-initializing a new DataLoader is not easy as I do not know how this DataLoader was initialized! In my design, the user passes a dataloader and I have to change its batch size. Why? Well, because that dataloader (more the data behind it) is used to train two different models. The second model benefits from huge batch sizes. To solve the problem, I see no other solutions but to force the user to create two dataloaders, unless there is an easy way to create a new DataLoader from an existing one.

You should be able to still read the attributes of the passed DataLoader and could create the new one based on these attributes. Note that you might also need to check the internal sampler and its attributes in order to restore it. In case the user creates a custom sampler and you wouldn’t have the source code of it (I don’t know how it would work but it seems you don’t even know how the Dataset is defined), you might not be able to restore it.