Is the printed loss supposed to include the weight decay term?

I built a deep neural net, let’s call it net() which I train like so:

    net = net()
    loss = nn.BCEWithLogitsLoss()
    net.cuda()
    optimizer = optim.Adamax(net.parameters(),lr=0.001,weight_decay=1.0)
    for epoch in range(epochs):
        i = 0
        running_cost = 0
        for minibatch in minibatches:
            X,Y = minibatch
            out = net(X)
            cost = loss(out,Y)
            cost.backward()
            optimizer.step()

            running_cost += cost.data[0]
            if(i%1000==0):
                print(running_cost/1000)
                running_cost = 0

I find that no matter what I set the weight_decay parameter to, the running_cost will not change. Is that expected behavior? I would have thought that the cost should be calculating the total cost including the weight_decay portion so that if weight_decay is large, the cost should be correspondingly large as well.

1 Like

The running_cost does not include the weight_decay.