Using Transformer Module for time series?

Hi I’m using the PyTorch transformer module for time series forecasting and I have a couple questions related to the tgt sequence as well as few more general questions. (i.e the module from from torch.nn.modules import Transformer). For the transformer I’m aware that we generally feed in the actual target sequence (as opposed to generating the target sequence step by step like other encoder-decoder methods). So my first question is prior to the transformer I have a standard linear layer to transform my time series sequence along with positional encodings:

class TransformerTimeSeries(torch.nn.Module)
  def __init__(self, n_time_series, d_model=128):
        self.dense_shape = torch.nn.Linear(n_time_series, d_model) = SimplePositionalEncoding(d_model)
        self.transformer = Transformer(d_model, nhead=8)

As per the transformer module code the src and trg sequence need to be the same dimension. So I was wondering can I simply do something like this or will this somehow leak information about the target?

def forward(self, x, t):
        x = self.dense_shape(x)
        x =
        t = self.dense_shape(t)
        t =
        x = self.transformer(x, t)

Second question: Does the target sequence need an offset? So for instance if I have the time series [0,1,2,3,4,5,6,7] and I want to feed in [0,1,2,3] to predict [4,5,6,7] (tgt)? Would I simply feed it in like that or is it more complicated? Final question is will I need a mask for the encoder as well? My inclination is yes as unlike with sentence I would want the current timestep to be formed solely by the previous ones.

Thanks for the help.

Anyone have links to an example of one that uses the full decoder? I’m currently having a problem of getting a lot of NaN values in the output using this approach.

1 Like

I’m wondering if you’ve had any luck with this yet?

Yeah I’ve it up and running for a while. You can see the implementation at GitHub - AIStream-Peelout/flow-forecast: Deep learning PyTorch library for time series forecasting, classification, and anomaly detection (originally for flood forecasting).