I have 5 identical MLP models that I want to train in parallel on a single GPU and they are relatively small. Each has its own dataset so there’s no overlap in data or model parameters.
I load both the model and dataset in a for loop, assign them to an object and I append the object to an object list. During optimization, I iterate through these objects and I run a forward/backward pass+step through the model.
In nvidia-smi, I see only the memory footprint of a single model. Is this a bug where I’m training only one model or a Pytorch optimization of some sort?
I stepped through the optimization loop with pdb and all the model object hashes seem different, the datasets are different. What else can I check to confirm I’m optimizing through the right model/dataset?
# initialize data
dataset_array = []
for i in datasets:
ds_obj = {}
ds = Dataset(**kwargs)
ds_obj['dataset'] = ds
dataset_array.append(ds_obj)
# initialize models
model_array = []
for i in models:
model_obj = {}
model = MyModel(**kwargs)
optimizer = Adam(**kwargs)
model_obj['model'] = model
model_obj['optimizer'] = optimizer
model_array.append(model_obj)
# optimize
for i in range(max_iterations):
for idx, model_obj in enumerate(model_array):
dataset = dataset_array[idx]['dataset']
model = model_array[idx]['model']
optim = model_array[idx]['optimizer']
model_input, GT = dataset[i]
model_output = model(model_input)
loss = loss_fn(model_output, GT)
optim.zero_grad()
loss.backward()
optim.step()