I’m trying to predict if a mutation in a protein makes the protein go bad. My Input is ["ABCD, 0], so the mutation containing of the mutations “ABCD” make the protein go bad (0). Similarily [“ABCX”, 1] would keep it functioning.
There are 21 possible amino acids, so my input is an array of length 21 with 4 ones and 17 zeros.
I make a dataset out of this data where I standardize the input using sklearns tools.
Now I try to get the probability of it being 0 or 1 i.e. good or bad using sigmoid but the probabilities “converge” towards zero. They are around [0.0, 0.2] instead of [0.0, 1.0].
So I use a threshold of t=0.1. So if a value is bigger than 0.1 I call it “good” and else bad. This gives me a f1 score of about ~0.30.
Now I won’t ever get above 30, no matter what I do. I tried for hours over hours but something is fundamentally wrong with my approach/code.
My code: View paste JWPQ