I’m looking at the nanoGPT code from https://github.com/karpathy/nanoGPT/blob/master/model.py#L99 which has a single Causal Self Attention Block.
class Block(nn.Module):
def __init__(self, config):
super().__init__()
self.ln_1 = LayerNorm(config.n_embd, bias=config.bias)
self.attn = CausalSelfAttention(config)
self.ln_2 = LayerNorm(config.n_embd, bias=config.bias)
self.mlp = MLP(config)
def forward(self, x):
x = x + self.attn(self.ln_1(x))
x = x + self.mlp(self.ln_2(x))
return x
I’m trying to represent the Transformer model in nanoGPT (decoder only) using one of PyTorch’s built-in layers. I thought of using TransformerDecoder — PyTorch 2.1 documentation at first, but looking at the code suggests that it’s built for use as a decoder that accepts an encoder’s output. i.e. the memory
argument is required.
- memory (Tensor) – the sequence from the last layer of the encoder (required).
I’m hoping that there’s some simple 1-2 line solution for this w/o me having to re-implement large parts of the code.
My initial attempt is to use TransformerEncoder
by passing in is_causal=True
. However, when I pass in is_causal=False
, I get the same result. I am a bit confused.
torch.manual_seed(23)
x = torch.randn(10, 30, 16)
ys = {}
for is_causal in (False, True):
torch.manual_seed(21)
enc_layer = nn.TransformerEncoderLayer(d_model=16, nhead=4, batch_first=True,)
enc = nn.TransformerEncoder(enc_layer, num_layers=2,)
ys[is_causal] = enc(x, is_causal=is_causal)
#
print(torch.allclose(ys[False], ys[True])) # prints 'True'.
I also found a tutorial on next word prediction using TransformerEncoder
at Language Modeling with nn.Transformer and torchtext — PyTorch Tutorials 2.1.1+cu121 documentation - however, it seems like the attention mask argument is not passed in to the model’s forward()
method.