How to collect all data from dataloader

Currently, by setting batch_size=dataset_size, I could get all data together from dataset.
However, are there other ways to do this? The previous way is limited by the size of memory.

def get_all_data(dataset, num_workers=30, shuffle=False):
    dataset_size = len(dataset)
    data_loader = DataLoader(dataset, batch_size=dataset_size,
                             num_workers=num_workers, shuffle=shuffle)
    all_data = {}
    for i_batch, sample_batched in tqdm(enumerate(data_loader)):
        all_data = sample_batched
    return all_data

What do you mean by “get all data” if you are constrained by memory? The purpose of the dataloader is to supply mini-batches of data so that you don’t have to load the entire dataset into memory (which many times is infeasible if you are dealing with large image datasets, for example). If you are dealing with a small-enough dataset, then you can do what you did above, but otherwise you’ll have to settle for mini-batches. Unless you are talking about some other representation of the data- for example if your loading images via a image folder dataset, then your dataset will be composed of image paths, which can be accessed by dataset.imgs and that IS kept entirely in memory. But it’s the dataloader that actually brings specific items from that image-path list into memory.

“Get all data” is merging all mini-batches into a super-large-batch one.
Similar to the following one.

all_data = np.empty((0,2))
all_data = np.append(all_data, np.array([[1, 2], [3,4]]), axis=0)

What you’re doing is concatenating arrays in memory. If you want to do this with your entire dataset, you’re going to need to have enough memory to hold that amount of data, but since you tried this already and it failed, I’m afraid you’re not going to be able to load everything into memory unless you get a bigger computer. Why do you want to load everything into memory?