I’ll try my best to simplify what I am trying to accomplish. In trying to create a decision tree through pyTorch my gradient is lost. I’d like to know if there is a way I can still calculate my loss and manually apply a gradient to earlier layers.
I first pass the features of my data set through a Linear Layer. I have a Variable (requires_grad=True) which is used to determine a “split_value”. Using this value I split the incoming tensor using .ge() and .lt(). This is where I loose my gradient. After reassembling the output tensor I can calculate loss but I’m not sure how to reach back into the chain to apply it to the “split_value” or the LinearLayer.
simple example code (This is using random values for the example so I realize this example would never converge):
class_values = [ 0.0, 0.5, 1.0] feature_data = torch.rand( (20,10)) target_data = torch.tensor( np.random.choice( class_values, (20))) split_value = Variable( torch.rand(1), requires_grad=True) mix_feature_layer = torch.nn.Linear( 10, 5) # # Here I am splitting based on the 0th index feature output from the Linear layer # This is where I loose access to the gradient true_labels = target_data[ mix_feature_layer( feature_data)[ :, 0].ge( split_value)] false_labels = target_data[ mix_feature_layer( feature_data)[ :, 0].lt( split_value)] # # Let the loss be the combined amount of variance from both the true and false branch of the split. # Ideally the split value should be trained to group the labels so that each branch has little to no variance. loss = 0 loss += torch.var( true_labels) if true_labels.size(0)>1 else 0 loss += torch.var( false_labels) if false_labels.size(0)>1 else 0
The question becomes, how to apply this loss back to affect the split_value and weights of the Linear Layer?