I wrote a custom backward function for my model. I want to use the DataParallel package. However I have a problem. If I use
model = torch.nn.DataParallel(model, device_ids=[0,1])
I get the following error:
“‘DataParallel’ object has no attribute ‘backward’”
I know this can be solved by using model.module.backward, but then it will only use one gpu. Is there a way to use the torch.nn.DataParallel with custom backward and attributes?
Would it be possible to return the outputs in your
forward method and calculate the loss on the default device?
This would be the vanilla use case, while it seems you’ve implemented
backward as a class function?
Thanks for the reply. No the backward is not a separate class. It is a function inside the model class. Here is how I define it:
def __init__(self, config):
And I call it this way:
outputs = model(....)
loss = outputs
if args.n_gpu > 1:
loss = loss.mean()
model = model.backward(...)
but nn.DataParallel is not recognizing the backward and some other attributes without using module.
Thanks for the information.
What’s the design decision to put the
backward call inside your model?
Are you using some internal parameters?
If so, how are these parameters updated/used inside the model?
I needed to access the activations and activation gradients in backward. I collect activations in forward pass and access to them in backward. I use autograd backward function to calculate each layer’s backward and make the changes that I want in the process.
I tried the distributed data parallel instead of data parallel and it is working.