Pytorch Geometric's softmax function seems to break the GAT layer

I have copied the GAT layer to a separate file. I want to first make it work, then experiment with it a bit. I am, however, having a problem with making it run. The problem seems to come from this line:

    alpha = softmax(alpha, index, ptr, size_i)

in the message function. It seems to be required for the Softmax function as:

ptr (LongTensor, optional): If given, computes the softmax based on
sorted inputs in CSR representation. (default: :obj:None)

However, if I include a copy of the GAT layer (local copy on my drive) into a model and try to run it for node classification, I get:

~/my_test/custom_layers/ in forward(self, x, edge_index, size, return_attention_weights)
    136         # propagate_type: (x: OptPairTensor, alpha: OptPairTensor)
--> 137         out = self.propagate(edge_index, x=(x_l, x_r),
    138                              alpha=(alpha_l, alpha_r), size=size)

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch_geometric/nn/conv/ in propagate(self, edge_index, size, **kwargs)
    255         # Otherwise, run both functions in separation.
    256         if mp_type == 'edge_index' or self.__fuse__ is False:
--> 257             msg_kwargs = self.__distribute__(self.__msg_params__, kwargs)
    258             out = self.message(**msg_kwargs)

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch_geometric/nn/conv/ in __distribute__(self, params, kwargs)
    178             if data is inspect.Parameter.empty:
    179                 if param.default is inspect.Parameter.empty:
--> 180                     raise TypeError(f'Required parameter {key} is empty.')
    181                 data = param.default
    182             out[key] = data

TypeError: Required parameter ptr_i is empty.

Don’t understand exactly what’s going on.

EDIT: It works if I delete the parameter from the message function, and use just:

    def message(self, x_j: Tensor, alpha_j: Tensor, alpha_i: OptTensor,
                index: Tensor, size_i: Optional[int]) -> Tensor:
        alpha = alpha_j if alpha_i is None else alpha_j + alpha_i
        alpha = F.leaky_relu(alpha, self.negative_slope)
        alpha = softmax(alpha, index, size_i)
        self._alpha = alpha
        alpha = F.dropout(alpha, p=self.dropout,
        return x_j * alpha.unsqueeze(-1)

However I am not sure if this is what I should do, since I am deleting something I don’t understand.