Multi GPU with Custom Backward and Attributes

maralm · February 19, 2020, 12:34am

Hi,

I wrote a custom backward function for my model. I want to use the DataParallel package. However I have a problem. If I use

model = torch.nn.DataParallel(model, device_ids=[0,1])

I get the following error:

“‘DataParallel’ object has no attribute ‘backward’”

I know this can be solved by using model.module.backward, but then it will only use one gpu. Is there a way to use the torch.nn.DataParallel with custom backward and attributes?

ptrblck · February 19, 2020, 7:27am

Would it be possible to return the outputs in your forward method and calculate the loss on the default device?
This would be the vanilla use case, while it seems you’ve implemented backward as a class function?

maralm · February 19, 2020, 5:19pm

Thanks for the reply. No the backward is not a separate class. It is a function inside the model class. Here is how I define it:

Class myModel():
    def __init__(self, config):
         ....
    def forward(...):
        ....
    def backward(...):
        ....

And I call it this way:

outputs = model(....)
loss = outputs[0]  
if args.n_gpu > 1:
    loss = loss.mean()
model = model.backward(...)

but nn.DataParallel is not recognizing the backward and some other attributes without using module.

ptrblck · February 20, 2020, 12:43am

Thanks for the information.

What’s the design decision to put the backward call inside your model?
Are you using some internal parameters?
If so, how are these parameters updated/used inside the model?

maralm · February 20, 2020, 9:22pm

I needed to access the activations and activation gradients in backward. I collect activations in forward pass and access to them in backward. I use autograd backward function to calculate each layer’s backward and make the changes that I want in the process.

maralm · February 21, 2020, 10:02pm

I tried the distributed data parallel instead of data parallel and it is working.