I have a bit of an odd question that requires some background information first.

EDIT: Google Colab

I’m currently working on research which uses backprop through discrete optimization algorithms like A* for shortest path, and have code that works perfectly for that. I’ll give short code examples, but can see about getting a complete example uploaded later if required.

Essentially, a network takes in a picture and outputs a NxN grid of values, which signifies the cost of traversing into that tile. These tiles, with a given start/goal location index are then used in a shortest path algorithm. To track gradients through the shortest path algorithm, the solver is ran twice with some noise. The output of the shortest path solver is binary mask of the path to travel.

```
def a_star(w, start, goal):
...
ShortestPath(torch.autograd.Function):
...
def forward(ctx, w, start, goal):
path_mask = a_star(w, start, goal)
# save ctx.values ...
return path_mask
def backward(ctx, grad_output):
# get values from ctx ...
w_prime = w + grad_output * eps # wiggle w with specific noise
path_mask_prime = a_star(w_prime, start, goal)
gradient = - (path_mak - path_mask_prime)
return gradient, None, None
# running example
predicted_w = network(img) # NxN tile costs, used as weights in A*
predicted_path = ShortestPath.apply(predicted_w, start, goal)
...
loss = HammingLoss(predicted_path, actual_path_mask)
...
```

This works fine, and is a method which follows a previous paper. I have results which follow theirs, so I know this works.

Now, suppose that we are working with a large pathfinding problem, i.e., computing `predicted_w`

might be wasteful, or maybe we want to use a more complicated network. Since the gradient of the `ShortestPath`

is a difference of binary masks, we really only need to compute the tile costs which are relevant to our search, and not for the entire space (gradient is 0 for tiles which are not part of the solution path either of the 2 runs of the algorithm).

What I would like to do is initialize `predicted_w`

as a zero tensor, use the network to output single cost values on the fly inside my `a_star`

, and set the correct index of `predicted_w`

using that output tensor, so that the gradients from the network are tracked (each index of `predicted_w`

would have come from a separate output from the network).

```
def a_star(w, start, goal, network, noise):
...
tile_cost = network(img, ...)
w[x, y] = tile_cost
...
ShortestPath(torch.autograd.Function):
...
def forward(ctx, w, start, goal, network):
path_mask = a_star(w, start, goal, network, None)
# save ctx.values ...
return path_mask
def backward(ctx, grad_output):
# get values from ctx ...
noise = grad_output * eps # wiggle w with specific noise
path_mask_prime = a_star(w, start, goal, network, noise)
gradient = - (path_mak - path_mask_prime)
return gradient, None, None, None
# running example
predicted_w = torch.zeros()
predicted_path = ShortestPath.apply(predicted_w, start, goal, network)
...
```

This sets the correct values for `predicted_w`

, but when I run `loss.backward()`

, I get gradient values of `None`

for the layers of by network. Is what I am trying to do possible? Again, I can try to get a running gist if that would help.