Train the same neural network with multiple datasets

Username2 · November 17, 2020, 1:29pm

Hey,

I built a CNN-LSTM model to forecast the monthly demand of some product item in the future (1, 3 and 6 month in the future) based on the sales and order history + some chosen indicators.
(It is kind of a time series except for some dates I have multiple entries…)

The data consists of the sales history for many, many items from different product groups (for the last 7 years). It is a mix of categorical and continuous data - I included embedding layers in the preparation of the data to take care of the categorical features.

So far I trained and tested the NN with subsets of the main dataset, only containing information for some selected item and some shortened time frame (3-4 years).
This works quite well for the moment (just a little bit overfitting that needs to be taken care of).

Is there a possibility to train the same neural net on multiple different data sets, e.g. different items or different time frames and combine this “knowledge” to one model.

I can’t just feed the whole dataset to the model and adjust the output, because the data will be too big.

To make the input the same length, I had to pad my sequences, which makes the input data even bigger. And if the data is too big my computation crashes when I want to create the dataset that I need for my dataloaders. (all available ram is used…)

ptrblck · November 21, 2020, 1:19am

Would it be possible to lazily load the data, i.e. each call into Dataset.__getitem__ would load a single sample and the DataLoader would create the final batch?