Dear community,
I’m trying to build a GPT-2 transformer from scratch (without any pre-train model) with DNA sequences in order to generate DNA sequences on top of smaller ones. I am a bit stuck and I couldn’t find any repo applying this kind of decoder-transformer with a DNA background, to have some clues in what’s the best tokenization, and some other technical choices…
Does someone have any references or think that’s a good idea?
Thank you in advance!