I have a regression problem where the input data distribution can shift across days. I’m trying to design a network that has a day-specific linear input layer to account for this shift, where each day learns a different set of weights but feeds into the same core regression network:

Day 1 data → [Linear Layer Day 1] → [Core Network] → output
Day 2 data → [Linear Layer Day 2] → [Core Network] → output where each linear layer has input 1x100 and output 1x100

I’d like to train the Core Network using data from all days simultaneously, so each minibatch will have some samples from each day. Also, ideally it would be easy to vary the number of days used during training.

What’s the best way to implement this?

One solution could be to make the input layer size [NumDays x InputFeatures], input all zeros except for the day of interest, then sum along dimension-0 before feeding into the core network.

What about letting the Dataset (__len__(), __getitem__()) take care of loading data & let the DataLoader's sampler worry about selecting data instances for a specific batch? What do you think about this?

Modifying the Dataset as mentioned should work well. However, I’m not sure the best way to setup the linear layers within the network model. I could explicitly create different layers for each day then select which layer to use on the forward pass, but is there a cleaner way to do this?

Assuming the linear layer is cheap compared with the core network, you could do something like this.

import torch as th
class DailyLinearModule(th.nn.Module):
def __init__(self, in_features, out_features, num_days=7):
super().__init__()
self.layers = th.nn.ModuleList([th.nn.Linear(in_features, out_features)
for _ in range(num_days)])
def forward(self, features, day_index):
# Create a tensor with shape (num_days, batch_size, out_features)
features = th.vstack([layer(features)[None, ...] for layer in self.layers])
# Select the features; there may be a better way to index.
return features[day_index, th.arange(day_index.numel())]
X = th.randn(100, 3)
day_index = th.randint(0, 7, [100])
model = DailyLinearModule(3, 4)
y = model(X, day_index)
y.shape # 100, 4

This increases the computation time by a factor of the number of days for the linear layer. However, if your batch is large-ish, it’s probably still faster to apply each linear layer to all elements of the batch and then select rather than iterating over all elements in the batch.

A compromise might be to group examples with the same day into distinct tensors, one for each day. Then apply the linear layers to the grouped tensor and concatenate the results (while making sure to restore the ordering to match the labels you’re trying to predict).