I take a dataset and split it into 3 and then configure a dataloader to access each one, as follows;
full_data_args={‘data_dir’:‘penguin_data/data’, ‘data_file’:‘penguin_csv.csv’,‘stage’:‘full’}
data_batch = dataset.PenguinData(**full_data_args)
train_data_params = {‘batch_size’:512, ‘shuffle’:True, ‘num_workers’:0, ‘pin_memory’: True}
train_dataset = data.DataLoader(data_batch.train_dataset, **train_data_params)
valid_data_params = {‘batch_size’:16, ‘shuffle’:True, ‘num_workers’:0, ‘pin_memory’: True}
test_dataset = data.DataLoader(data_batch.train_dataset, **valid_data_params)
valid_dataset = data.DataLoader(data_batch.val_dataset, **valid_data_params)
However I understand that a better approach is to attach a dataloader to the whole dataset and use that to access the data for training, testing and validation.
I can’t find an example of how to do this. Can you show me how this can be done? It may be that the batch size is the same.