Running multiple batches trough model before backpropagation

I get a somewhat weird behavior. If I train the code without the unlabeled data

out_la = model(input_la)
loss_la = F.cross_entropy(out_la, label_la)

model.zero_grad()
loss_la.backward()
optimizer.step()

it converges pretty fast. If I use my original code instead and set the threshold to 1, it obviousely uses no unlabeled data since the mask is False everywhere. What surprises me is that even in this case it does not really converge and acts totally differently than the purely supervised code.

In my other question I tried finding out if its a masking issue, but I’m not sure.