Predicting pregnancy codes with transformer

Im trying to predict pregnancy codes with a basic transformer model architecture. These pregnancy codes are like following prg001, prg002 to prg030. Prg001 would be antenatal screening and prg030 would be maternal outcome of delivery.

The source is codes in one year (2022 for example) and the target is the codes in the following year (2023 for example).

My dataset has about 42k rows. About half (22k rows) have empty target (no code because the pregnancy ended in the previous year).

The model is basically predicting zero code no matter what sequence im testing. For example, its not predicting that if a person has prg008 in the prior year, they would have prg030 (the code that represents delivery) in the target because they would give birth in the following year.

Hyperparameters

max_seq_length = 20
max_gen_length = 20
learning_rate = 0.001
weight_decay = 0.01
dropout = 0.1
d_model = 128
nhead = 4
num_layers = 3
dim_feedforward = 512
batch_size = 256
train_num_samples = 800000
valid_num_samples = 200000
num_epochs = 10

Im using crossentropyloss. Any suggestion on how i can improve this?

Dear random42,
If I were you I would test the following.

  • check if my implementation is correct and not any bugs in the code. Right from Dataloader and until the loss.backward(), check if the backpropagation is happening.

  • If everything is correct then i will use weighted Cross entropy or Focal loss such that minority classes have high weights while loss calculation.

Hi!
I used the same architecture to test a simple text reversal task and it worked correctly so i think architecture is implemented correctly.
Does pytorch have FocalLoss? Or do you have an example of how to use weighted cross entropy?