I built a deep neural net, let’s call it
net() which I train like so:
net = net() loss = nn.BCEWithLogitsLoss() net.cuda() optimizer = optim.Adamax(net.parameters(),lr=0.001,weight_decay=1.0) for epoch in range(epochs): i = 0 running_cost = 0 for minibatch in minibatches: X,Y = minibatch out = net(X) cost = loss(out,Y) cost.backward() optimizer.step() running_cost += cost.data if(i%1000==0): print(running_cost/1000) running_cost = 0
I find that no matter what I set the
weight_decay parameter to, the
running_cost will not change. Is that expected behavior? I would have thought that the cost should be calculating the total cost including the weight_decay portion so that if
weight_decay is large, the cost should be correspondingly large as well.