Multi GPU with Custom Backward and Attributes


I wrote a custom backward function for my model. I want to use the DataParallel package. However I have a problem. If I use

model = torch.nn.DataParallel(model, device_ids=[0,1])

I get the following error:

“‘DataParallel’ object has no attribute ‘backward’”

I know this can be solved by using model.module.backward, but then it will only use one gpu. Is there a way to use the torch.nn.DataParallel with custom backward and attributes?

Would it be possible to return the outputs in your forward method and calculate the loss on the default device?
This would be the vanilla use case, while it seems you’ve implemented backward as a class function?

Thanks for the reply. No the backward is not a separate class. It is a function inside the model class. Here is how I define it:

Class myModel():
    def __init__(self, config):
    def forward(...):
    def backward(...):

And I call it this way:

outputs = model(....)
loss = outputs[0]  
if args.n_gpu > 1:
    loss = loss.mean()
model = model.backward(...)

but nn.DataParallel is not recognizing the backward and some other attributes without using module.

Thanks for the information.

What’s the design decision to put the backward call inside your model?
Are you using some internal parameters?
If so, how are these parameters updated/used inside the model?

I needed to access the activations and activation gradients in backward. I collect activations in forward pass and access to them in backward. I use autograd backward function to calculate each layer’s backward and make the changes that I want in the process.

I tried the distributed data parallel instead of data parallel and it is working.