Network not learning when setting all CrossEntropyLoss weights to 1

Unfortunately I am not allowed to post any details of my CNN (company rules) here. It is a very basic CNN with nothing special in it. I use Adam as optimizer and CrossEntropyLoss as loss function.

Since I have some unbalanced datasets, I use the weight parameter of CrossEntropyLoss to account for that:

# ok: 11200, nok: 2800
weights = torch.Tensor([0.25, 1.0]).to(device)
criterion = nn.CrossEntropyLoss(weight=weights)

and the training is successfull. To use the same code for other, balanced datasets I just set the weights to [1.0, 1.0]. In this case, the network is not being trained at all. When I don’t set the weight parameter at all, it trains fine again. From the definition in the docs, I don’t see what could be wrong, since it would just add a constant factor of 1 to the term:
https://pytorch.org/docs/stable/nn.html#crossentropyloss

It is no big problem, because I don’t have to set the weight parameter at all, but still I am asking myself if I maybe understand the weight argument wrong or if my implementation is wrong.

If you pass weight as ones, you should get the same result and I couldn’t reproduce this issue:

torch.manual_seed(2809)

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 6 ,3, 1, 1)
        self.pool1 = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(6, 12, 3, 1, 1)
        self.pool2 = nn.MaxPool2d(2)
        self.fc1 = nn.Linear(12*6*6, 64)
        self.fc_out = nn.Linear(64, 10)
        
    def forward(self, x):
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.pool2(F.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = self.fc_out(x)
        return x


model = MyModel()
data = torch.randn(100, 3, 24, 24)
target = torch.empty(100, dtype=torch.long).random_(10)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

for epoch in range(10):
    optimizer.zero_grad()
    output = model(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    print('Epoch {}, loss {}'.format(epoch, loss.item()))
    

# Use weights
torch.manual_seed(2809)
model_weighted = MyModel()

criterion_weighted = nn.CrossEntropyLoss(weight=torch.ones(10))
optimizer_weighted = optim.Adam(model_weighted.parameters(), lr=1e-3)

for epoch in range(10):
    optimizer_weighted.zero_grad()
    output = model_weighted(data)
    loss = criterion_weighted(output, target)
    loss.backward()
    optimizer_weighted.step()
    print('Weighted: Epoch {}, loss {}'.format(epoch, loss.item()))

Both training loops yield the same results.

I get the same results with your sample codes. I will investigate my code further and come back for any findings. Still not sure what the reason could possibly be, since the weight argument is the only thing I change between my test loops.