I see. So I’ve definitely been doing the wrong thing then. I padded the sequence then passed the entire sequence into the linear layer i.e.

```
def __init__(self, padded_sequence_length, ...):
self.fc = nn.Linear(padded_sequence_length, 512)
...
def forward(self, batch):
# x is a batch of sequences e.g. x = [[32, 6, 88, 542, 56, 34, 511, 0, 0, 0], ...]
x, y = batch
input_embedding = self.fc(x)
pos_encoder_output = self.pos_encoder(input_embedding)
transformer_output = self.transformer_encoder(pos_encoder_output)
...
```

I got this from here. However, I’m realising that I should probably research more into the nn.Embedding layer.

Should I rather iterate over the sequence feeding each timestep into a linear layer and then append to a tensor i.e.:

```
def __init__(self, ...):
self.fc = nn.Linear(1, 512)
...
def forward(self, batch):
# x is a batch of sequences e.g. x = [[32, 6, 88, 542, 56, 34, 511, 0, 0, 0], ...]
x, y = batch
all_input_embeddings = []
for example in x:
single_example_input_embeddings = [self.fc(timestep) for timestep in x]
all_input_embeddings.append(single_example_input_embeddings)
all_input_embeddings = torch.cat(all_input_embeddings)
...
```

Something like that?

Looking over the notebook more carefully now.