Hello everybody!

I’m a medicinal chemistry undergraduate student who is preparing his dissertation. My idea would be to create a classifier that can distinguish anticancer drugs as active or inactive and distinguish those active in three classes, describing the molecules as a graph. My supervisor suggested me to use the random forest classifier, and to do this I need to convert my graph into a vector trying to keep as many characteristics as possible.

I start from a molecular graph dataset like this:

```
Data(x=[9, 9], edge_index=[2, 18], edge_attr=[18, 2], y=[0], smiles='COC(=O)C=CN1CC1')
```

where `x`

, `edge_index`

and `edge_attr`

are a torch tensor, and `y`

is the label (`0`

is inactive, `1`

is activity of class one, `2`

is activity of class two, …). To run a random forest classifier I think I must convert them into a vector like a `np.array`

, to do that I think I must to compute a kernel for my graphs, but I have no idea how to do it . Has anyone had experience on this task?