Appending to Dataset for Active Learning

I am trying to build an Active learning system, and have my full labelled Dataset now split up into two Datasets. The first I call the labelled Dataset, which the model can train on, the second I call the unlabelled Dataset, which actually contains labels, but the model has no access to them, and can only query samples from that set to find out what the label is, so that it can learn from it in the next AL iteration. Could somebody please explain me what an efficient way would be to ‘append’ a sample to the labelled Dataset, and remove it from the unlabelled Dataset? Is using the Dataset class a good option for Active Learning anyway? It seems that you will have to construct the Dataset every AL iteration fully over, instead of just being able to append to it. I would really like to be able to have the ease of Dataloaders for training, so that would point at using Dataset, but I don’t want huge inefficiency, because you cannot append easily to Dataset.

I would really value a small code snippet showing how you can efficiently remove a subset from the unlabelled Dataset, and move it to the labelled Dataset. Or of course something equivalent and more efficient, if it is the case that it cannot be done without fully reconstructing the Dataset again every time you move a subset from the one Dataset to the other

1 Like

I would like to know the best way to do it as well !

1 Like