Padding in Causal Convs only for seq2seq prediction?


I am trying to predict the value x[t+1] from a time series x[t-L:t] with a stack of causal convolutions, where L is the input sequence length.

My input dataset X (capital X) has two dimensions, i.e. shape CxT:

  • C is the number of features / channels
  • T is the time dimension.

Then, an input from time t would be x_t = X[:,t-L:t], where L is the input sequence length.
We stack B such inputs x = {x_t | t drawn B times randomly from range[L,T]} into a batch of shape B x C x L, where B is the batch size.

A stack of Causal Convs (with kernel_size=2) will look like this:

class CausalConvStack(nn.Module): 

    def __init__(self,C,L): 
        stack = []
        for t in range(L-1):
        self.stack = nn.ModuleList(stack)
        self.final_conv = nn.Conv1d(C,1,kernel_size=1)

If no left-padding is applied, the input x’s sequence length will be shortened by kernel_size - 1 = 1 by each causal layer, resulting in a final output sequence length 1, according to the stack defined above:

    def forward(self,x): 
        # x: B C L
        for layer in self.stack: 
            x = layer(x)
       x = self.final_conv(x) 
       # x: B 1 1 
       return x

If regressing the output of forward (which takes input x[t-L:t] and returns an output with shape B 1 1) on x[t+1], this should be a valid causal approach to predict the next value of a time series, right?

I often read that people apply padding, in order to keep the input sequence length constant. Thus, the output sequence length will be the same as the input sequence length. My question: padding is only required if one wants to predict an entire sequence of the same length from the input, i.e. seq2seq, right? E.g. we want to either predict the values y[t-L:t] of another time series from the input time series x[t-L:t], or we maybe even want to predict T future values of x, i.e. x[t+1:t+L+1] from x[t-L:t].


Best, JZ