# Custom Function - Open questions

So I have a few questions for implementing a custom functions. We need to do something different than gradient descend thus we try to to implement something within the forward/backward mode of pytorch, but calculating more than gradients. However, currently on a very simple example, pytorch just crashes with:

``````Traceback (most recent call last):
File "/home/alex/work/python/nn-second-order/bin/test.py", line 94, in <module>
loss.backward()
File "/usr/local/lib/python3.4/dist-packages/torch/autograd/variable.py", line 145, in backward
RuntimeError: could not compute gradients for some functions
``````

The code we are using looks like this:

``````import torch

torch.nn.Linear

def forward(self, x, y):
self.save_for_backward(x, y)
diff = (x - y)
return diff.pow(2)

# N x D
x, y = self.saved_tensors
diff = (x - y) * 2
# N x D
# N x D
d = g[0].size()[1]
# D x D
m = [torch.eye(d, d) * 2, torch.eye(d, d) * 3]
# (N+D) x D
s1 = torch.cat((m[0], g[0]), 0)
# (N+D) x D
s2 = torch.cat((m[0], g[0]), 0)
return s1, s2

def forward(self, x, w):
self.save_for_backward(x, w)

# N x H, H x D
x, w = self.saved_tensors
# (N + D) x D
# D x D
# N x D
# N x H
dx = torch.mm(g, torch.transpose(w, 0, 1))
# H x D
dw = torch.mm(torch.transpose(x, 0, 1), g)
# D x H
m = torch.mm(m, torch.transpose(w, 0, 1))
# (D + N) x H
dx = torch.cat((m, dx), 0)
return dx, dw

def forward(self, x):
self.save_for_backward(x)

x, = self.saved_tensors
dx = torch.cat((m, g * (1 - torch.tanh(x).pow(2))), dimension=0)
return dx

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs, and wrap them in Variables.

# Create random Tensors for weights, and wrap them in Variables.

affine = Gn_Dot()
tanh = Gn_Tanh()
square_loss = Gn_SquareLoss()

learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y using operations on Variables; we compute
# ReLU using our custom autograd operation.
y1 = affine(x, w1)
h1 = tanh(y1)
y2 = affine(h1, w2)
loss1 = (y - y2).pow(2).sum()
loss = square_loss(y2, y).sum()
print(t, loss.data[0], loss1.data[0])

# Manually zero the gradients before running the backward pass

# Use autograd to compute the backward pass.
loss.backward()

# Update weights using gradient descent
``````

Note that in order to satisfy pytorchâ€™s requirement to have the number of gradients outputed same as inputs, we concatenate the extra thing we calculate (for testing purposes this is identity matrix) and then in each layer we peel them out.

Now the main issue is that this should work, however the error pytorch is giving us in a mystery why? My guess is something is happening in the C API, but I do not know what.

There are plenty of small bugs in your code. If you replace the functions by

``````affine = torch.mm
tanh = torch.tanh
squere_loss = torch.nn.MSELoss()
``````

and one at a time change to your function, you will have better error messages.

A few points by the way in `Gn_Tanh`:

• in your case, most of the time you want to do `def backward(self, grad_output)` instead of `def backward(self, *grad_output)`
• you should unpack the saved tensors (it is a tuple, not a tensor), so `x, = self.saved_tensors`
• you are getting the size of `grad_input.size()[1]`, but then using it to index the 0th dimension. Is that right?

Itâ€™s interesting though that the error messages get lost when everything is put together,

1 Like

Thanks for the fast reply, and Iâ€™m always happy to learn!

1. I can not exchange 1 by 1 the functions, as what is returned as a â€śgradientâ€ť is the concatenation of the gradient + identity. Thus if I plug in only one of my Functions, the graph will break since the shapes will not match.

2. Thanks, Iâ€™ve now changed everything to `backward(self, grad_output)` and unpack them from tuples (Updated the code in the post as well).

3. I should have mentioned that the error occurs immediately after the return statement of `Gn_SquareLoss.backward` so non of the other functionâ€™s backward methods get called.

4. To your question - if the grad is standard gradient is `NxD` of the loss, I append an identity on the 0th axis, thus now return `(N+D)xD`. Then in the backward of each other functions I unpack that two a `DxD` and `NxD` tensors via indexing as you said.

Hope this make it clear.