I’m trying to implement a decoder only model, but the Pytorch implementation of TRANSFORMERDECODERLAYER requires input from an encoder to do cross attention. Is there an implementaiton with only self-attention?
I’m trying to implement a decoder only model, but the Pytorch implementation of TRANSFORMERDECODERLAYER requires input from an encoder to do cross attention. Is there an implementaiton with only self-attention?