nn.ModuleList vs nn.Parameter[] on Global-Local Model


I’m trying to create an architecture to model multiple time series at the same time. I want to generate a simple Global-Local Model.

Let’s assume for simplicity that I have two time series TS_1 and TS_2 and we assume that both can be modeled as:

y = trend + seasonality + noise

I want to use the same parameters to model seasonality for both TS_1 and TS_2, but I want the parameters defining TS_1 and TS_2 to be different, so:

y_i = trend_global + seasonality_i + noise

Again for simplification, imaging the trend is made of 3 parameters and the seasonality from 6 parameters, which I then combine with really easy Math Operations to create the trend and seasonality components.

I define the trend parameters as nn.Parameter[dim=[3]] (and, as I mentioned, then I do some math ops to create the trend component)

For the seasonality parameters I also need to define the parameters, and I tried three different approaches

  • Method 1: nn.ParameterDict(TS_1: nn.Parameter[6],TS_2: nn.Parameter[6])
  • Method 2: ModuleList( nn.Parameter[6], nn.Parameter[6])
  • Method 3: nn.Parameter[2, 6]

For a particular batch of data I need to identify each sample’s TS. To do that, in the DataLoader we will have the tuple (data_inputs, metadata). So, in the forward method, we need to identify the local seasonality parameters, to manipulate the input data consistently. For each method we get the seasonality parameters of the batch in the following way:

  • Method 1: I apply a vectorised function, using the metadata as key over the nn.ParameterDictionary across all samples of the batch. Then I torch.stack the output.
  • Method 2: I apply a vectorised function, using the metadata as index over the nn.ModuleList across all samples of the batch. Then I torch.stack the output.
  • Method 3: Using hot_encoding I can use torch.multiply and by doing array multiplication I get the parameters of each batch.

Method 1 and 2 have a computation disadvantage. The computation time is slower because the Autograd Graph is as wide as the batch size? (see picture attached(*)). I guess this is because how torch.stack behaves…?

Method 3 has a design disadvantage. If instead of nn.Parameter[6] (+ simple Math Ops manipulations) , I want to model seasonality with a more complex class like nn.Linear… I have to do complex manipulations to behave as nn.Linear…. And becomes unfeasible at some point.

My question is if you can think of any solution which brings the best out of each method?
I guess the bottleneck is having to use a vectorised function on the forward method… I guess what I was looking for is a MathOperation from pytorch which allows doing something like:

“tensor = torch.indexation(Modulelist, tensor_indexes)”

Thanks a lot for your time, and please let me know if you have any questions :slight_smile:

(*) in the file attached we are actually modeling trend and seasonality globally and AutoRegressive component locally, for 8 Time Series.