Create and destroy dataloader during training

Hello, I have similar question about dataloader to this question. But in a different manner

I’m currently writing a training script of a model consisted of 3 submodels, each trained individually. Roughly, the training iteration will be like this.

for epoch in range(n_epochs):
	# train model A
	model_a_best = model_a_step()

	# train model B
	model_b_best = model_b_step()

	# train model C
	model_c_best = model_c_step()

	# validate big model (3 submodels bundled up becoming one big model)

Unfortunately, each model has their own dataset and dataloader configurations. So I need to instantiate them both before run the training for each model.

My question is, if I were to instantiate both dataset and dataloader inside each model’s _step function, how do I destroy the dataset and dataloader once I finished training the model and goes to the next one?

My initial plan was to instantiate all the datasets and dataloaders right before this training iteration, but I’m afraid that will consume a lot of resources during the iteration

Hi,

Datasets and Dataloaders are generally instantiated outside the training/testing loops. Are you sure you instantiate those within the training loop? To delete: how about del dataset/dataloader?

Cheers!

The initial plan is yes, to instantiate (and destroy) each dataloaders of related model whenever I’m running a training iteration on said model, to minimize my resource usage during training.

As each model as train and validation dataset, there will be 6 dataloaders in total (2 for each model) during training, and that doesn’t include dataloader for the big model.

My rough implementation will be somewhat like this (with example of model_a)

def model_a_step():
    # instantiate dataloader and dataset
    a_dataset = Dataset(model_a_data_set)
    dataloader = DataLoader(a_dataset)

    for x in dataloader:
        # run inference and backprop
    
    # destroy dataloader and dataset when finished
    del dataloader, a_dataset
    return trained_model

Let us know how it goes. BTW, there will be only 2 dataloaders active, not 6, because you will be deleting the used ones.

def model_a_step():
    # instantiate dataloader and dataset
    a_dataset = Dataset(model_a_data_set)
    dataloader = DataLoader(a_dataset)

    for x in dataloader:
        # run inference and backprop
    
    # no need as they will be destroyed when function ends.
    # destroy dataloader and dataset when finished
    # del dataloader, a_dataset
    return trained_model

If you load it in a function call there will be no need to explicitly delete as they will be destroyed when the function returns as per the language rules.