Per-sample weighting leads to no training progress (even when all weights are one)

I am working on a model which classifies images between two classes, 0 and 1, whereas there are example of varying “strength” of 1. Some of them are so weak that it would be almost okay for my model to classify them as 0.

Now I have tried the following:
I am using a CrossEntropyLoss with reduction = 'none'.
I split up my classes into 5 classes, whereas 1 to 4 are all mapped to class “1” with a weight corresponding to their evidence strength.
Then I am weighting the examples by multiplying with a weight-vector of length BatchSize.

I implement this as follows:

for inputs,classes in dataloaders[phase]:

inputs = inputs.to(device).float()
prefactor_dict = {0:1, 1:1, 2:0.1, 3:0.05, 4:0.01}
prefactors = np.array([prefactor_dict[cls] for cls in classes.numpy()])
prefactors = torch.from_numpy(prefactors)
prefactors = prefactors.to(device)
labels = torch.from_numpy(np.array([0 if cls == 0 else 1 for cls in classes]))
labels = labels.to(device)
optimizer.zero_grad()
with torch.set_grad_enabled(phase == ‘train’):

outputs = model(inputs[:,None,…])
_,preds = torch.max(outputs,1)
loss = criterion(outputs, labels)*prefactors
loss = loss.mean()
if phase == ‘train’:

loss.backward()
optimizer.step()

If I don’t use the prefactors and don’t set reduction to ‘none’, my model manages to overfit the data in a few epochs. However even if I set the prefactors all to 1, my model doesn’t learn anything. It simply gets stuck at exactly 50% accuracy.