Hello,
what is the efficient way of doing soft ensembling (during the inference) in pytorch? Assume that I have models loaded on CPU in list named models
and I want to speed it up with GPU.
This is my naive implementation (I guess it is nor very efficient to load model on/off memory in every epoch) and it is not working since even when I execute lines model = model.cpu()
and torch.cuda.empty_cache()
, model still takes GPU and after few iterations code runs out of memory.
for i, batch in enumerate(dev_iter):
pred_logits = None
for model in models:
model = model.cuda()
pred_logits_per_model = model(batch)
if pred_logits is None:
pred_logits = F.softmax(pred_logits_per_model,-1)
else:
pred_logits+=F.softmax(pred_logits_per_model,-1)
model = model.cpu()
torch.cuda.empty_cache()
loss = lossfunction(pred_logits, batch.stance_label)