I’ll try my best to simplify what I am trying to accomplish. In trying to create a decision tree through pyTorch my gradient is lost. I’d like to know if there is a way I can still calculate my loss and manually apply a gradient to earlier layers.

I first pass the features of my data set through a Linear Layer. I have a Variable (requires_grad=True) which is used to determine a “split_value”. Using this value I split the incoming tensor using .ge() and .lt(). This is where I loose my gradient. After reassembling the output tensor I can calculate loss but I’m not sure how to reach back into the chain to apply it to the “split_value” or the LinearLayer.

simple example code (This is using random values for the example so I realize this example would never converge):

```
class_values = [ 0.0, 0.5, 1.0]
feature_data = torch.rand( (20,10))
target_data = torch.tensor( np.random.choice( class_values, (20)))
split_value = Variable( torch.rand(1), requires_grad=True)
mix_feature_layer = torch.nn.Linear( 10, 5)
#
# Here I am splitting based on the 0th index feature output from the Linear layer
# This is where I loose access to the gradient
true_labels = target_data[ mix_feature_layer( feature_data)[ :, 0].ge( split_value)]
false_labels = target_data[ mix_feature_layer( feature_data)[ :, 0].lt( split_value)]
#
# Let the loss be the combined amount of variance from both the true and false branch of the split.
# Ideally the split value should be trained to group the labels so that each branch has little to no variance.
loss = 0
loss += torch.var( true_labels) if true_labels.size(0)>1 else 0
loss += torch.var( false_labels) if false_labels.size(0)>1 else 0
```

The question becomes, how to apply this loss back to affect the split_value and weights of the Linear Layer?