I found one article here,
I think what they are doing is associate pseudo label for unlabeled dataset,
which is based on neural network trained on labeled dataset.
one example would be,
import torch, torch.nn as nn
dataset = torch.randn(10, 10)
labeled_dataset = dataset[0:5]
unlabeled_dataset = dataset[5:10]
labels = torch.tensor([0., 1., 3., 2., 4.])
# next five indices are unlabeled
model = nn.Linear(10, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
optimizer.zero_grad()
model(labeled_dataset) # make predictions on labeled dataset
model(unlabeled_dataset) # make predictions on unlabeled_dataset
# first time these predictions would become our pseudo labels
# for unlabeled_dataset
loss_fn = nn.MSELoss()
# first time we only consider labeled_dataset during computing loss
# for every next epoch, we consider unlabeled_dataset along with their pseudo labels
so, first time our loss would be,
loss = loss_fn(model(labeled_dataset), labels).sum()
then, we do,
pseudo_labels = model(unlabeled_dataset)
# predictions on unlabeled_dataset act as pseudo labels for unlabeled dataset
loss.backward()
optimizer.step()
then, carry out training again and
next time, our loss would be,
loss = loss_fn(model(labeled_dataset), labels).sum() + loss_fn(model(unlabeled_dataset), pseudo_labels).sum()
# (loss for labeled dataset, considering actual labels) + (loss for unlabeled dataset, considering pseudo_labels)
and then, same,
pseudo_labels = model(unlabeled_dataset)
# predictions on unlabeled_dataset act as pseudo labels for unlabeled dataset
loss.backward()
optimizer.step()
in the article, they have also mentioned alpha, for how much do we want to prioritize labeled_dataset vs unlabeled_dataset for loss.
and this alpha is based on time, so as time increases we want to prioritize unlabeled dataset more.
also, we could train for some epochs (like 500) on our labeled_dataset, and then begin giving unlabeled_dataset pseudo labels, and then train on unlabeled_dataset only, or switch between training on unlabeled, labeled dataset.
also, we could change when and how often we give pseudo label to our unlabeled dataset, like train for 500 epoch on labeled, then give pseudo label (to unlabeled dataset), then do not change this pseudo label, and train for some epochs, on labeled+unlabeled, then repeat.