How to implement pseudo-labeling?

Hi all,

I recently stumbled upon a post mentioning pseudo-labeling and how it helped increase the accuracy of the model. Can anyone point me towards a pytorch implementation of it?

Also, Im still not 100% sure about what PL is (mostly due to the lack of code to look at).

I found one article here,

I think what they are doing is associate pseudo label for unlabeled dataset,
which is based on neural network trained on labeled dataset.
one example would be,

import torch, torch.nn as nn
dataset = torch.randn(10, 10)
labeled_dataset = dataset[0:5]
unlabeled_dataset = dataset[5:10]
labels = torch.tensor([0., 1., 3., 2., 4.])
# next five indices are unlabeled
model = nn.Linear(10, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
optimizer.zero_grad()
model(labeled_dataset) # make predictions on labeled dataset
model(unlabeled_dataset) # make predictions on unlabeled_dataset
# first time these predictions would become our pseudo labels
# for unlabeled_dataset
loss_fn = nn.MSELoss()
# first time we only consider labeled_dataset during computing loss
# for every next epoch, we consider unlabeled_dataset along with their pseudo labels

so, first time our loss would be,

loss = loss_fn(model(labeled_dataset), labels).sum()

then, we do,

pseudo_labels = model(unlabeled_dataset) 
# predictions on unlabeled_dataset act as pseudo labels for unlabeled dataset
loss.backward()
optimizer.step()

then, carry out training again and
next time, our loss would be,

loss = loss_fn(model(labeled_dataset), labels).sum() + loss_fn(model(unlabeled_dataset), pseudo_labels).sum() 
# (loss for labeled dataset, considering actual labels) + (loss for unlabeled dataset, considering pseudo_labels)

and then, same,

pseudo_labels = model(unlabeled_dataset) 
# predictions on unlabeled_dataset act as pseudo labels for unlabeled dataset
loss.backward()
optimizer.step()

in the article, they have also mentioned alpha, for how much do we want to prioritize labeled_dataset vs unlabeled_dataset for loss.

and this alpha is based on time, so as time increases we want to prioritize unlabeled dataset more.

also, we could train for some epochs (like 500) on our labeled_dataset, and then begin giving unlabeled_dataset pseudo labels, and then train on unlabeled_dataset only, or switch between training on unlabeled, labeled dataset.

also, we could change when and how often we give pseudo label to our unlabeled dataset, like train for 500 epoch on labeled, then give pseudo label (to unlabeled dataset), then do not change this pseudo label, and train for some epochs, on labeled+unlabeled, then repeat.

2 Likes

Thanks for the detailed answer!

I’ve seen this method used on a dataset where all the labels were known, and they simply used PS to increase the accuracy of their model. Maybe the deleted some labels on purpose? would that make sense?

Is PS only efficient when we’re missing labels in train/test/val?

If I understand this correctly, one should have very good accuracy to use this technique because the opposite would make the weights that make a wrong prediction stronger.

I do not know why would they do this when all labels are known, it could lead to wrong predictions being considered as pseudo labels, which would lead to further wrong predictions.

Maybe they set low priority in the beginning to pseudo labels, when pseudo labels might be incorrect, and with time, as accuracy increases on labeled dataset, give high priority to pseudo labels. But this sounds a bit confusing to me. There might be some other reason.

1 Like

I think for self-supervised pseudo labels work good!