Multiprocessing - torch.multiprocessing.spawn

TommyLeeJones · January 20, 2020, 8:08pm

Hi,
Can somebody answer pls the following questions

can I create in a model and custom data iterator inside the main_method
will there be 4 data sets loaded into the RAM / CPU memory?
will each “for batch_data in…” iterate independently
will the model be updated e.g. every independed batch operation. Obviously I don’t want to have four independed models. What’s the process flow in this case… when gradients are updated etc?

I have seen this solution but it uses DataLoader (not a custom iterator) and the model is instantiated before the train method is called. - https://github.com/pytorch/examples/tree/master/mnist_hogwild

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.l1 = nn.Linear(100, 50)
        self.l2 = nn.Linear(50, 2)

    def forward(self, x):
        return self.l2(self.l1(x))

class CustomDataClassIterator():
    def __init__(self):
        self.data = None
        self.batch_size = 10

    def __iter__(self):
        while True:
            yield


def main_method(i, args):
    print(i, datetime.datetime.now())
    model = Net()
    data = CustomDataClassIterator()
    for epoch in args.epoch_n:
        for batch_data in data:
            pass # some stuff

if __name__ == '__main__':
    args = {'test': 10}
    torch.multiprocessing.spawn(fn=main_method, args=(args), nprocs=4)

mrshenli · January 27, 2020, 6:44pm

can I create in a model and custom data iterator inside the main_method?

Given the above example, you created a generator to produce input data? If that is the case, yes, sure you can do that.

will there be 4 data sets loaded into the RAM / CPU memory?

I am assumingg pass # some stuff statement will be replaced by actual forward-backward-step functions? If that is the case, then the 4 date sets won’t be loaded into memory at the same time. Instead, each data set will no longer be needed and can be gc-ed at the end of every iteration.

will each “for batch_data in…” iterate independently

Yes. It will have it’s own forward pass (building autograd graph), backward pass (generating grads and sync them if necessary), and step function (updating params)

will the model be updated e.g. every independed batch operation. Obviously I don’t want to have four independed models. What’s the process flow in this case… when gradients are updated etc?

When you call backward, the gradient will be accumulated into Tensor.grad, and it is up to you regarding when to call Optimizer.step() to apply those grads into the parameter.

mrshenli · January 27, 2020, 6:48pm

I saw you had a pointer to the hogwild training example. Could you please elaborate your use case? Are you looking for distributed data parallel training (like nn.parallel.DistributedDataParallel) or specifically asking for hogwild?