Hey, I have a question RE the backwards function.

Lets say I have two computation graphs which are unlinked and on seperate hosts, g1 and g2. The gradients of g2 are computed using a loss function. However the output of model is the input of model. I have the gradient of A after running L.backward() . I want to feed this loss backward to g1 by giving DofA to g1 and running backward() from that gradient. Is this possible?

In the example below the computation graph is linked because it is done on the same host. However, lets imagine the models are on seprate hosts and loss needs to be communicated backward to model from model. Is there a way to compute gradients on model given the gradient of its output?

``````from torch import optim

# A Toy Dataset
x = torch.tensor([[0,0,0,0],[1,0,0,0],[0,1,0,0],[0,0,1,0],[1,1,0,0],[1,0,1,0],[0,1,1,0],[1,1,1,0],[0,0,0,1],[1,0,0,1],[0,1,0,1],[0,0,1,1],[1,1,0,1],[1,0,1,1],[0,1,1,1],[1,1,1,1.]])
target = torch.tensor([,,,,,,,,,,,,,,,[1.]])

#   Variables for performance metrics
epochs = 20
lr = 0.2
counter = 0

# Define 2 chained models
models = [
nn.Sequential(
nn.Linear(4, 3),
nn.Tanh()
),
nn.Sequential(
nn.Linear(3, 1),
nn.Sigmoid()
)
]

# Create optimisers for each segment and link to their segment
optimizers = [
optim.SGD(params=model.parameters(),lr=lr)
for model in models
]

def train():
# Training Logic
for iter in range(epochs):

# 1) erase previous gradients (if they exist)
for opt in optimizers:

# 2) make a prediction
a  = models(x)

# Janky Pseudocode
a.send(models.location)
# End Janky Pseudocode

pred =  models(a)

# 3) calculate how much we missed
loss = ((pred - target)**2).sum()

# 4) figure out which weights caused us to miss
loss.backward()

#Pseudocode for functionality I want
DofA.send(model.location)
a = model(x)
a.backward()
#Pseudocode over

# 5) change the weights
for opt in optimizers:
opt.step()

# 6) print our progress
print(loss.data)

train()
``````

Hi,

Yes sure, you just need to manually do that one step of backprop:

``````        for opt in optimizers:

# 2) make a prediction
a  = models(x)

# Janky Pseudocode
# You stop the autograd here
a_to_send = a.detach()
remote_a = a_to_send.send(models.location)
# End Janky Pseudo code

pred =  models(remote_a)

# 3) calculate how much we missed
loss = ((pred - target)**2).sum()

# 4) figure out which weights caused us to miss
loss.backward()

#Pseudocode for functionality I want
# Note that the a here is the output of models
#Pseudocode over

# 5) change the weights
for opt in optimizers:
opt.step()

# 6) print our progress
# Do not use .data
print(loss.detach())
``````
1 Like

Thanks for the response! If I understand correctly…

When computing backward from a given variable, a;

``````a.backward()
``````

I can artificially supply a gradient to allocate to a and work back from using;

``````a.backward(grad_a)
``````

Is this correct?

Actually you’re supposed to always give it. It’s just that when the tensor contains a single element, it is natural to give a gradient of 1 for it as the backward will then give you gradients.
If your tensor contains more than one element, then it will fail asking you for the gradient (see the doc).

1 Like