Hi I’m using the PyTorch transformer module for time series forecasting and I have a couple questions related to the
tgt sequence as well as few more general questions. (i.e the module from
from torch.nn.modules import Transformer). For the transformer I’m aware that we generally feed in the actual target sequence (as opposed to generating the target sequence step by step like other encoder-decoder methods). So my first question is prior to the transformer I have a standard linear layer to transform my time series sequence along with positional encodings:
class TransformerTimeSeries(torch.nn.Module) def __init__(self, n_time_series, d_model=128): super().__init__() self.dense_shape = torch.nn.Linear(n_time_series, d_model) self.pe = SimplePositionalEncoding(d_model) self.transformer = Transformer(d_model, nhead=8)
As per the transformer module code the src and trg sequence need to be the same dimension. So I was wondering can I simply do something like this or will this somehow leak information about the target?
def forward(self, x, t): x = self.dense_shape(x) x = self.pe(x) t = self.dense_shape(t) t = self.pe(t) x = self.transformer(x, t)
Second question: Does the target sequence need an offset? So for instance if I have the time series [0,1,2,3,4,5,6,7] and I want to feed in [0,1,2,3] to predict [4,5,6,7] (tgt)? Would I simply feed it in like that or is it more complicated? Final question is will I need a mask for the encoder as well? My inclination is yes as unlike with sentence I would want the current timestep to be formed solely by the previous ones.
Thanks for the help.