How to design a decoder for time series regression in Transformer?

I am using Transformer for time series regression (not forecasting). My input data has the structure [batch, seq_size, embedding_dim], and my output structure is [batch, seq_size, 1]. But the result of the model is always overfitting and is worse than the LSTM. I don’t want to include the target information in the decoder. Can anyone tell me how to design the decoder?

import torch.nn as nn

class Transformer(nn.Module):

    def __init__(self, d_input, n_head, d_model, n_layer):
        super(Transformer, self).__init__()

        self.enc_position_embedding = self.data_position_embedding(c_in=d_input, d_model=d_model)
        self.encoder_layer = nn.TransformerEncoderLayer(d_model=d_model,  nhead=n_head)
        self.encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=n_layer)

        self.project = nn.Linear(d_model, 1)

    def forward(self, x):

        x_emb = self.enc_embedding(x)
        enc = self.encoder(x_emb)

        dec = enc

        output = self.project(dec)
        return output


loss curve:


I have a similar scenario. Did you manage to solve the issue?

I have the exact same problem. Did you find out how this issue should be addressed? Thank you.

