Training continuously on real-time data

akshayb7 · December 12, 2019, 10:10am

Hi everyone.
I have to create a model which is to be trained online (after a batch of data is collected) and get the loss error for that batch of training.

I was saving the model after each batch of training and outputting the loss error for that batch, before reloading the model again for next batch of training. This seems very inefficient to me and I think there should be a better method to do so.

Anyone has any pointers for me?

Thanks in advance!

akshayb7 · December 13, 2019, 5:11am

I think I should clarify a little more. What I’m after is an autoencoder for anomaly detection. Where the recreation error (loss function error, mse, mae, etc) between the input and output (which are same) is an indicator of presence of anomaly. Now this system is to be deployed online in such a way that it is to be processed after every batch, but this requires constant training for the algorithm in deployment itself. It’s here where I’m not sure how to do this. I’ve been able to do this for a csv based file but the online scenario is where I can’t figure out the process.

My theory was to save the model back to a file after each training batch, and then reload the model before the next training batch session and then train it on the new batch (repeat the process). To get the error I either get the loss from the model itself (pytorch) or use a custom history callback (in keras). But this process of training over batches and reloading model each time seems redundant and inefficient.

Would be very thankful if you have any ideas on how to tackle such a problem. I prefer Pytorch because this model is so much faster in Pytorch than in Keras.

akshayb7 · December 13, 2019, 5:13am

I’m aware of the IterableDataset but am unsure how to use this in this scenario.

Please help.