I have two Bert models. I detached the classifier layer and kept the pre_classifier and drop out layers. Then I created an ensamble that has the classifier layer.
So the output of the two BERT models will be concatenated on the ensamble, then will be input to the classifier layer.
The problem is that I’m getting errors about running out of memory. So how can I save the output of each model to the csv file, then concatenate the two files and run the ensamble model for classification?
If you store the outputs, load them afterwards and train the classifier, this might not be an ensemble model training anymore, since you would only train the classifier (but it also depends on your definition of ensemble).
If that’s your use case, then you could either
detach() the outputs from both BERT models or use
torch.save to store the activations (with their target).
On the other hand, if you want to train the model end-to-end, you would need to keep both BERT models, their computation graph, as well as the classifier in memory.