How can I perform spatio-temporal link prediction on an unweighted graph with fe…atures on the edges?
My graph is a `DynamicGraphTemporalSignal`, but I do not want to regress on some signal, rather perform link prediction as also outlined in https://pytorch-geometric-temporal.readthedocs.io/en/latest/_modules/torch_geometric_temporal/nn/recurrent/gc_lstm.html?highlight=link
But from looking at the example code of: https://github.com/benedekrozemberczki/pytorch_geometric_temporal/blob/4fb610326811906eb4f95152a80f5df3f62fb459/test/recurrent_test.py#L305 it is still unclear to me how to create a suitable data loader.
So far I have looked into all the available ones from https://github.com/benedekrozemberczki/pytorch_geometric_temporal/tree/master/torch_geometric_temporal/dataset and think that the Twitter tennis data loader https://github.com/benedekrozemberczki/pytorch_geometric_temporal/blob/master/torch_geometric_temporal/dataset/twitter_tennis.py is the most suitable one.
However, even this one seems to be stuck in formulating a regression problem.
> NOTICE: even though the edge has features such as location, time, and hobby when making a prediction I am happy when only predicting a new suitable link from a topological perspective even if the features such as time, location, or hobby would not be required to match the ground truth data when evaluating a match on the test dataset.
Please find the code to produce the dummy data below.
It also contains the transformations I have come up with so far from looking at existing data loaders.
But for me, it remains unclear how to structure the data loader for a link prediction problem and not for a regression.
<img width="618" alt="JupyterLab" src="https://user-images.githubusercontent.com/1694964/118352217-7c30e880-b560-11eb-8a91-e8d1879b7a23.png">
```
import pandas as pd
import numpy as np
n_people = 20
links = 80
seed = 47
time_steps = 10
hobby_categories = 3
np.random.seed(47)
df_dummy = pd.DataFrame({'person_1':np.random.randint(0, n_people-1, links),
'person_2':np.random.randint(0, n_people-1, links),
'time':np.random.randint(0, time_steps-1, links),
'lat': np.random.uniform(10,19, links),
'lng':np.random.uniform(45,49, links),
'hobby':np.random.randint(0, hobby_categories-1, links)})
display(df_dummy.head())
df_dummy['edge_index'] = df_dummy[['person_1', 'person_2']].values.tolist()
df_dummy['edge_features'] = df_dummy[['lat', 'lng', 'hobby']].values.tolist()
df_dummy = df_dummy[['time', 'edge_index', 'edge_features']]
display(df_dummy.head())
df_dummy = df_dummy.groupby(['time']).agg(lambda x: list(x))
display(df_dummy.head())
```
## edit
I guess somehow the target must be the lagged edge-index of a future time index? Perhaps even of any future time index? depending on the forecast horizon.
Perhaps something along these lines would work? Here in this example a prediction horizon with an unlimited boundary is assumend.
```
current_timestep = 3
# get the n+1
future_edges_target = pd.DataFrame(df_dummy.loc[current_timestep +1:].edge_index).explode('edge_index').edge_index#.unique()
# as the graph is undirected, sorting the edges is fine.
future_edges_target = future_edges_target.apply(sorted).transform(tuple).unique()
future_edges_target
array([(8, 8), (1, 7), (7, 11), (2, 16), (5, 18), (11, 18), (6, 11),
(2, 5), (14, 16), (6, 6), (16, 18), (9, 17), (1, 2), (0, 6),
(0, 5), (0, 15), (9, 18), (9, 12), (1, 17), (2, 14), (8, 13),
(1, 18), (1, 8), (14, 17), (5, 6), (3, 6), (11, 14), (10, 17),
(4, 14), (7, 12), (0, 18), (13, 15), (9, 15)], dtype=object)
```