Parallel SGD Steps

I need to perform SGD steps on two independent models at the same time, without aggregation or any other interaction with each other. The models have the same architecture but are to be trained on different datasets and stored separately. I also have 4 GPU’s available. I tried to use python’s multiprocessing package, but could not work it with cuda. Does pytorch provide a straightforward solution for this?

2 Likes

@hydroxyl No, Pytorch does not provide any out of the box functionality.
Both the models are separate so you need separate threads to execute the SGD for each model object.
BTW just curious to know the reason behind the “request to perform SGD at the same time”

You can just call the 2 backwards in the same epoch loop though

import os
import numpy as np

import torch
import torchvision
from torch import nn
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torchvision import transforms

class linearRegressionOLS(nn.Module):
    def __init__(self):
        super(linearRegressionOLS,self).__init__()
        self.linearModel=nn.Linear(10,1)
        
    def forward(self,x):
        x = self.linearModel(x)
        return x


model1=linearRegressionOLS()
criterion1=nn.MSELoss()
optimizer1=torch.optim.SGD(model.parameters(),lr=learning_rate)

model2=linearRegressionOLS()
criterion2=nn.MSELoss()
optimizer2=torch.optim.SGD(model.parameters(),lr=learning_rate)

X=np.random.rand(100,10).astype(np.float32)
Y=np.random.randint(2,size=(100)).reshape(100,1).astype(np.float32)

inputVal=Variable(torch.from_numpy(X))
outputVal=Variable(torch.from_numpy(Y))

def backward(optimizer, inputVal, criterion, manual_seed):
    # In a gradient descent step, the following will now be performing the gradient descent now
    torch.manual_seed(manual_seed)
    optimizer.zero_grad()
    dataOutput = model(inputVal)
    loss = criterion(dataOutput, outputVal)
    loss.backward()
    optimizer.step()
    return loss

for epoch in range(num_epochs):
    # In a gradient descent step, the following will now be performing the gradient descent now
    loss1 = backward(optimizer1, inputVal, criterion1, 42)
    loss2 = backward(optimizer2, inputVal, criterion2, 99)
    if epoch % 10 == 0:
        print('epoch [{}/{}], loss1:{:.4f} loss2:{:.4f}'.format(epoch + 1, num_epochs, loss1, loss2))        

Thank you very much for your response. However, this was not what I was aiming for, I will try to explain more clearly.
In my case, I need to perform a number of SGD steps on two models separately. As of now, what I do is similar to your answer where, say 10 steps are performed on one model and after they are finished we switch to the second model/node and perform the desired 10 steps on it (using it’s local data). What I whish to achieve is to make this process parallel, i.e. performing the mentioned10 steps on the two models in the same time (possibly on different GPU’s), so that the execution time is halved. I have tried pytorch’s DDP example which spawns multiple learning processes, but I still haven’t found a way to access the model weights after the steps are taken.

Hello,
PyTorch offers a straightforward solution for training two independent models simultaneously on multiple GPUs using its torch DataParallel module. Your model architecture in PyTorch, Set up separate data loaders for each dataset you want to train the models on. Next, wrap each model instance with torch.nn.DataParallel. PyTorch manages CUDA operations and GPU synchronization internally, optimizing performance across the GPUs you have.

1 Like