Hello everybody!
I’m a medicinal chemistry undergraduate student who is preparing his dissertation. My idea would be to create a classifier that can distinguish anticancer drugs as active or inactive and distinguish those active in three classes, describing the molecules as a graph. My supervisor suggested me to use the random forest classifier, and to do this I need to convert my graph into a vector trying to keep as many characteristics as possible.
I start from a molecular graph dataset like this:
Data(x=[9, 9], edge_index=[2, 18], edge_attr=[18, 2], y=[0], smiles='COC(=O)C=CN1CC1')
where x
, edge_index
and edge_attr
are a torch tensor, and y
is the label (0
is inactive, 1
is activity of class one, 2
is activity of class two, …). To run a random forest classifier I think I must convert them into a vector like a np.array
, to do that I think I must to compute a kernel for my graphs, but I have no idea how to do it . Has anyone had experience on this task?