my model is like below:
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(4909, 1500)
self.relu1 = nn.ReLU()
self.dout = nn.Dropout(0.2)
self.fc2 = nn.Linear(1500, 300)
self.prelu = nn.PReLU(1)
self.out = nn.Linear(300, 1)
self.out_act = nn.Sigmoid()
def forward(self, input_):
a1 = self.fc1(input_)
h1 = self.relu1(a1)
dout = self.dout(h1)
a2 = self.fc2(dout)
h2 = self.prelu(a2)
a3 = self.out(h2)
y = self.out_act(a3)
return y
and… define my model and loss function & optimizer function
model = Net()
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01, betas=(0.9, 0.999))
now start to train:
def train(epoch):
model.train()
for i in range(len(x_train_tensor)):
optimizer.zero_grad()
output = model(x_train_tensor[i])
print(output.data[0], y_train_tensor[i])
loss = criterion(output.data[0], y_train_tensor[i])
loss.backward()
optimizer.step()
losses.append(loss.data[0])
print("loss: {}".format(loss.data[0]))
but… output is like below
tensor(0.4984) tensor(1., grad_fn=)
loss: 0.47467532753944397
tensor(0.5021) tensor(1., grad_fn=)
loss: 0.4732956886291504
tensor(0.5000) tensor(0., grad_fn=)
loss: 0.9740557670593262
tensor(0.4942) tensor(1., grad_fn=)
when the label is 1 loss value is almost 0.4, but when a label is 0 loss function up to 0.9. these values repeat and don’t go to low value.
How can I resolve it? thank you!
nn.Sigmoid
and nn.BCEWithLogitsLoss
don’t fit together.
Either remove the nn.Simgoid
or use nn.BCELoss
.
3 Likes
if i change the nn.BCEWithLogitsLoss to nn.BCELoss function error was:
RuntimeError: the derivative for ‘target’ is not implemented
but remove nn.Sigmoid and leave BCEWithLogitsLoss there is no error occured
Oh wait, there seems to be another issue.
Could you try to pass the ouput
directly to your criterion instead of output.data[0]
?
I’m not sure why you get this error.
Your code runs fine for this dummy input and target:
def train(epoch):
model.train()
for i in range(len(x)):
optimizer.zero_grad()
output = model(x[i])
print(output, target[i])
loss = criterion(output, target[i])
loss.backward()
optimizer.step()
losses.append(loss.item())
print("loss: {}".format(loss.item()))
x = torch.randn(1, 4909)
target = torch.tensor([[1.]])
losses = []
train(0)
1 Like
In theory the error during training is called bias
.
Ways to reduce bias
Increase the number of hidden layers
change the NN architecture
Train NN for longer
MAYBE! Threshold you are using for dropout might be too high, which are causing more units/neurons to turn off as you train longer.
2 Likes
I extracted my data from csv file, and define like below:
x_train_tensor = Variable(torch.FloatTensor(x_train.values))
x_test_tensor = Variable(torch.FloatTensor(x_test.values))
y_train_tensor = Variable(torch.Tensor(y_train.values), requires_grad = True)
y_test_tensor = Variable(torch.Tensor(y_test.values), requires_grad = True)
is there any problem ?
Could you try to use torch.from_numpy()
to get the tensors instead of wrapping the numpy arrays directly? It’s the recommended way, but I’m not sure if it’s related to this issue.
My experience with this was different, when I was training a test model to test this exact thing I noticed that when I applied nn.Sigmoid
on model output and used BCELoss()
I got very bad results, my loss actually went to NaN
after some iterations.
Similarly when I did not use nn.Sigmoid
at model output and used BCEWithLogitsLoss()
I agaain got bad results, no more NaN’s but error was not dropping from 0.999
Then I used nn.Sigmoid
and BCEWithLogitsLoss()
and got expected results, loss was dropping and so was error.
So can you please mention why these two dont go well together whereas my tests showed them working together.
Thanks
ptrblck
September 4, 2019, 2:58pm
11
Since sigmoid
will be applied twice in this (wrong) approach, you might have scaled down the gradients, thus stabilized the training, e.g. if your learning rate was too high.
Here is a small example showing this effect:
model = nn.Sequential(
nn.Linear(10, 10),
nn.ReLU(),
nn.Linear(10, 1)
)
data = torch.randn(1, 10)
target = torch.randint(0, 2, (1, 1)).float()
# 1) nn.BCEWithLogitsLoss
output = model(data)
loss = F.binary_cross_entropy_with_logits(output, target)
loss.backward()
print(model[0].weight.grad.norm())
> tensor(0.1741)
print(model[2].weight.grad.norm())
> tensor(0.2671)
# 2) nn.BCELoss
model.zero_grad()
output = model(data)
loss = F.binary_cross_entropy(torch.sigmoid(output), target)
loss.backward()
print(model[0].weight.grad.norm())
> tensor(0.1741)
print(model[2].weight.grad.norm())
> tensor(0.2671)
# 3) wrong
model.zero_grad()
output = model(data)
loss = F.binary_cross_entropy_with_logits(torch.sigmoid(output), target)
loss.backward()
print(model[0].weight.grad.norm())
> tensor(0.0595)
print(model[2].weight.grad.norm())
> tensor(0.0914)
Your loss might blow up and get eventually a NaN
value, e.g. if the learning rate is set too high, which would also fit my assumption.
While applying sigmoid
twice might have helped in your use case, I would recommend to try to debug the exploding loss (or NaN
values).
2 Likes
Many thanks to giving such a helpful reply to such an old topic
This really helped me a lot
2 Likes