I am trying to understand the code of the Graph Attention Network implementation, but I am stuck at the following chunk of code:
if isinstance(in_channels, int):
self.lin_l = Linear(in_channels, heads * out_channels, bias=False)
self.lin_r = self.lin_l
else:
self.lin_l = Linear(in_channels[0], heads * out_channels, False)
self.lin_r = Linear(in_channels[1], heads * out_channels, False)
and from the documentation we understand that:
in_channels (int or tuple): Size of each input sample. A tuple
corresponds to the sizes of source and target dimensionalities.
But the attention coefficient is between two graph nodes that have equal feature dimensionality, so what exactly is the _l and _r needed for? Why can’t you compute just one (as in the first part of the if ) ?