How can my net produce negative outputs when I use ReLU?


can I asked you another question ? I found a topic you created:


I am currently working on a segmentation problem, where my target is a segmentation mask with only 2 different classes (0 for background, 1 for object).

Until now I was using the NLLLoss2d, which works just fine, but I would like to add an additional pixelwise weighting to the object’s borders. I thought about creating a weight mask for each individual target, which will be calculated on the fly.

My next task is pretty similar to your segmentation task here. I got 2 classes (0 for background, 1 for object) same as you had there. I would like to ask you if you could give me hints about good weighted loss functions for such a tasks. And what experience you made. I need a weighted loss because my object class is underrepresented in nearly every image. On average the object of interest covers 15-20% of the image. With a normal loss function the net only learns to make an all black output. Thats what I have already learned.

Do you maybe have some net structure advices too? At the moment I work with this approach .


I used a stronger weighting for border pixels, because I noticed that most of the errors resulted from noisy borders.
You could look at your predictions and based on the errors, focus on some parts of the image, e.g. also borders.

How was the performance using the weighted loss?

You could use a UNet, if your current model doesn’t learn correctly.
I’ve created a small gist here.


To choose the weights feels like lottery. I trained it now with [0.1, 1] as weight tensor for nn.CrossEntropyLoss(). When I check the predictions now after 10 epochs of training it looks like the net got no clue what I want him to do.

Example input:

Target of the example image:

The output of the net:

I want the net to segment the coils from pcbs. Should I maybe just train for more epochs or can you see other problems here ?

I will try your U-Net tomorrow.


I got your U-Net implementation running now and I have got a question. Why is it that much more computationally expensive to run a U-Net than for example a VGG11.

My setup is not the best for machine learning but for the classification task where I tried different structures like AlexNet, VGG11, ResNet and LeNet5 it was good enough. I mean it took some time to get the results but I could do something by the way. And now im running the U-Net and it feels like my iMac is a bit overwhelmed with that task. The iMac got 3,2 GHz i5 with 16GB ram and im running on the cpu of course because no normal iMac got a gforce graphic card.

I mean if I saw it right your implementation of the U-Net is even a bit smaller than the actually one.


My sample code doesn’t follow the original UNet implementation.
It’s just a small demo showing, how to implement the general model structure.

I’m not sure why the performance is worse. If your are using CPU only, you could easily time the code and have a look which operations take most of the time.

If you don’t have a possibility to use a GPU, you might want to try Google Colab.


I made the model a bit smaller (Less feature maps) that it runs better now. But I still got the problem that I already had with the VGG + FCN model. The model produce not even close the results I want to have.

I would like to reproduce my proceed exhibit here and maybe you can find a fundamental mistake of mine.

First my images are 3x256x256 and I got a total of 157 images without any augmentation. My targets are 3x256x256 images as well. The objects are white and the rest of the image is black. As preprocessing of the training images I do normalize them with the mean of the whole data set.

I convert the targets to labels with one hot encoding. The labels are in 256x256 in the end. They value one represents the object and the value zero everything else.

Then do i put the normalized training images batch wise in your U-Net model(I ran it with 32 out channels so far). As loss function I use nn.CrossEntropyLoss() with a weight tensor. I hold the first component of the weight tensor fix 0.1 and the second component I use as variable so far. I tried something from 0.8 - 2 now with nearly the same results. As optimizer I useoptim.SGD(). Learnrate is set to 0,0001 and I train for 20 epochs, batch size is 4.

Can you find something that is totally wrong? I mean the task is not that hard and the results are not even close to what I want. There must be mistake somewhere.


You could additionally divide by the std of the training set.

So it’s not actually one-hot encoding, but indices?

Do you have two output units in your last layer?
If you use nn.CrossEntropyLoss, you should remove the F.sigmoid at the last line from my example.

Are the images you posted from the training or test set?
How is the training error behaving? Is your net overfitting at some point?


I will try that.

I thought that this is one hot encoding but mainly I did that transformation because the nn.CrossEntropyLoss() function needs as input a NxWxH tensor with the values from 0 to C-1 for the targets.

I not not 100% sure now but the model must output 2 units for every pixel otherwise it would not work with nn.CrossEntropyLoss() which needs a tensor in size NxCxWxH. But I will check it once again when the training finished.

Oh yes good point.

The 3 images 2 post above are from the test images and from the VGG11+FCN structure. And the U-Net learned so far to make black images. No matter what weight tensor I choose. But for the next run im going to remove the F.sigmoid() and I will raise the second component of the weight tensor to 5.

The loss is nearly constant at something around 0.69… for all batches.


If you checked some of these points and the model still doesn’t learn anything, you could try to scale down your problem to a single image and try to overfit your model on it.
If your model can’t learn this single image perfectly, something else might be wrong.


I stopped the run now and started a new one without the F.sigmoid() and with normalization with std too. And it seams thats the loss decreases but very slow. Lets see.

My task in general is to find out if there is that one object on the pcb or not and to locate it. Therefor I tough segmentation with some image processing in the end is the easiest way to do that if you only got one object class your are looking for. But gradually I think of another approach. Do you think yolo can solve this problem here? I mean I have to make new targets again what will be a hell of work but when I can get better results with that approach and I could save me the image processing afterwords.

My method would be now to finish the trainings process first when the results are still not nearly what I want to have then I try to overfit the model. And if that not works out I would search for another approach for example Yolo.

What do you think about that?


Yolo might work, but I would be still wondering, why your approach fails. I think we are really close to a working solution.
On the other hand, Yolo is known for its speed while having a slightly worse accuracy.
So if you care about a high throughput, Yolo should be fine.


I don’t care about a high throughput in first place. It would be more import in second place that I can find and locate more than one object. And therefor something like yolo is better suitable I think specially when I think of the fine structure on pcbs where segmentation models struggle with.

And when I see the loss of the run I started after the last post I mean its decreasing somehow but it feels a bit random to me. Currently it finished epoch 8.

epoch0, iter0, loss: 0.6926751732826233
epoch0, iter10, loss: 0.6991615295410156
epoch0, iter20, loss: 0.6919772028923035
epoch0, iter30, loss: 0.6815929412841797

epoch1, iter0, loss: 0.673651933670044
epoch1, iter10, loss: 0.6767787933349609
epoch1, iter20, loss: 0.6822627782821655
epoch1, iter30, loss: 0.6802076101303101

epoch2, iter0, loss: 0.687945544719696
epoch2, iter10, loss: 0.680255115032196
epoch2, iter20, loss: 0.6747005581855774
epoch2, iter30, loss: 0.689960241317749

epoch3, iter0, loss: 0.6714928150177002
epoch3, iter10, loss: 0.6791138648986816
epoch3, iter20, loss: 0.6687366366386414
epoch3, iter30, loss: 0.6746395230293274

epoch4, iter0, loss: 0.6884714365005493
epoch4, iter10, loss: 0.6811267137527466
epoch4, iter20, loss: 0.6705578565597534
epoch4, iter30, loss: 0.6744130849838257

epoch5, iter0, loss: 0.6666390895843506
epoch5, iter10, loss: 0.6794276237487793
epoch5, iter20, loss: 0.6937333941459656
epoch5, iter30, loss: 0.6647709608078003

epoch6, iter0, loss: 0.7014554142951965
epoch6, iter10, loss: 0.6830400824546814
epoch6, iter20, loss: 0.6678677201271057
epoch6, iter30, loss: 0.6608891487121582

epoch7, iter0, loss: 0.6893932223320007
epoch7, iter10, loss: 0.65982985496521
epoch7, iter20, loss: 0.6993864178657532
epoch7, iter30, loss: 0.6719664931297302


I try to overfit the model now with only one image.

epoch0, loss: 0.7016939520835876
epoch1, loss: 0.7016094326972961
epoch2, loss: 0.7015239596366882
epoch3, loss: 0.7014367580413818
epoch4, loss: 0.701361894607544
epoch5, loss: 0.701288104057312
epoch6, loss: 0.7012142539024353
epoch7, loss: 0.7011398673057556
epoch8, loss: 0.7010654807090759
epoch9, loss: 0.700985312461853
epoch10, loss: 0.70090252161026
epoch11, loss: 0.700810968875885
epoch12, loss: 0.7007091641426086
epoch13, loss: 0.7006009221076965
epoch14, loss: 0.7004837989807129
epoch15, loss: 0.7003656029701233
epoch16, loss: 0.7002443075180054
epoch17, loss: 0.7001180648803711
epoch18, loss: 0.7000008225440979
epoch19, loss: 0.699885368347168
epoch20, loss: 0.699772834777832
epoch21, loss: 0.6996691823005676
epoch22, loss: 0.6995682120323181
epoch23, loss: 0.6994773745536804
epoch24, loss: 0.6993924379348755
epoch25, loss: 0.6993141174316406
epoch26, loss: 0.6992430686950684
epoch27, loss: 0.6991660594940186
epoch28, loss: 0.6990967392921448
epoch29, loss: 0.6990288496017456
epoch30, loss: 0.6989548206329346
epoch31, loss: 0.6988816857337952
epoch32, loss: 0.6988019347190857
epoch33, loss: 0.6987239122390747
epoch34, loss: 0.6986420750617981
epoch35, loss: 0.6985604763031006
epoch36, loss: 0.6984803676605225
epoch37, loss: 0.6983962059020996
epoch38, loss: 0.6983159184455872
epoch39, loss: 0.6982365846633911
epoch40, loss: 0.698161780834198
epoch41, loss: 0.6980820298194885
epoch42, loss: 0.6980027556419373
epoch43, loss: 0.6979289650917053
epoch44, loss: 0.6978538632392883
epoch45, loss: 0.6977810263633728
epoch46, loss: 0.6977129578590393
epoch47, loss: 0.6976410746574402
epoch48, loss: 0.6975752711296082
epoch49, loss: 0.697505533695221
epoch50, loss: 0.6974354386329651
epoch51, loss: 0.6973608136177063
epoch52, loss: 0.6972773671150208
epoch53, loss: 0.6971907615661621
epoch54, loss: 0.6970956921577454
epoch55, loss: 0.6969901323318481
epoch56, loss: 0.6968814134597778
epoch57, loss: 0.6967665553092957
epoch58, loss: 0.6966500878334045
epoch59, loss: 0.6965349912643433
epoch60, loss: 0.6964192986488342
epoch61, loss: 0.6963112950325012
epoch62, loss: 0.6962102055549622
epoch63, loss: 0.6961128115653992
epoch64, loss: 0.6960229873657227
epoch65, loss: 0.6959385871887207
epoch66, loss: 0.6958605647087097
epoch67, loss: 0.695787250995636
epoch68, loss: 0.6957195401191711
epoch69, loss: 0.6956507563591003
epoch70, loss: 0.6955835819244385
epoch71, loss: 0.6955192685127258
epoch72, loss: 0.6954488754272461
epoch73, loss: 0.695377767086029
epoch74, loss: 0.6953052878379822
epoch75, loss: 0.6952307224273682
epoch76, loss: 0.6951506733894348
epoch77, loss: 0.6950728893280029
epoch78, loss: 0.6949944496154785
epoch79, loss: 0.6949175000190735
epoch80, loss: 0.694835901260376
epoch81, loss: 0.6947645545005798
epoch82, loss: 0.6946881413459778
epoch83, loss: 0.6946176290512085
epoch84, loss: 0.6945491433143616
epoch85, loss: 0.694476842880249
epoch86, loss: 0.6944096088409424
epoch87, loss: 0.6943421959877014
epoch88, loss: 0.6942755579948425
epoch89, loss: 0.6942132711410522
epoch90, loss: 0.6941456198692322
epoch91, loss: 0.69408118724823
epoch92, loss: 0.6940111517906189
epoch93, loss: 0.6939395666122437
epoch94, loss: 0.6938643455505371
epoch95, loss: 0.6937788128852844
epoch96, loss: 0.6936940550804138
epoch97, loss: 0.6935980916023254
epoch98, loss: 0.6934947967529297
epoch99, loss: 0.6933897137641907
epoch100, loss: 0.6932768225669861

I think the model will be able to learn that one image but it will take a lot of time when I see how slow the loss is decreasing. I run that now for 1000 epochs for the first try.

okay 1000 is not enough:

epoch999, loss: 0.6457356214523315


Could you try to increase the learning rate to ~1e-2.
Also, try momentum=0.9 or the Adam optimizer.
Could you upload the training image and its target?




I run it now on lr 0.01 , momentum 0.9 , without a weight tensor and with a way smaller U-Net(only 4 out channels). The loss decreases now way faster. After 2000 epochs im now on around 0.1 loss.

After 5000 epochs im at 0.0500 and it decreases pretty slow.


That is what he learned. At least he learnt something for the first time.


Nice it’s working now.
Since the borders between the segmented parts are not looking that good, you could try to weight them a bit more. But at first, I would try to successfully train using your complete training dataset.


I have a run going now with 16 output channels, lr 0.01 , momentum 0.9 and without a weight tensor. I want to train for 1000 epochs at first and check if he learnt at least something or if he just learnt to produce all black outputs again.

Do you think I can train the model on the whole data set with just 4 output channels too ? That would give me a good boost in training time.

And do you maybe know a function for image segmentation that calculates the Iou? I mean I could check every pixel of the output and the target with 2 loops over every pixel but I think thats not rly effective and there must be something smarter. So far I just check if the image learnt to produce white pixel and compare the amount of white pixels with the amount of white pixels on the target image.


You could easily compute the dice loss and even use it for training your model:

def dice_loss(input, target):
    smooth = 1.

    iflat = input.view(-1)
    tflat = target.view(-1)
    intersection = (iflat * tflat).sum()
    return 1 - ((2. * intersection + smooth) /
              (iflat.sum() + tflat.sum() + smooth))

The 4 output channels would correspond to only 4 classes?
If you make sure to remove the other classes from the target, it should be no problem.
Are you observing the training and validation loss/accuracy?
It will give you hints about your model’s performance.
Personally, I like to visualize the loss, accuracy, and some segmentations using Visdom.


Okay for the next run I try the dice_loss. What size should the target and the input have? So far I had to put in nn.CrossEntropyLoss() target in shape of NxHxW and input in shape of NxCxHxW. For dice_loss they need to have both the same size. Should convert now the target to NxCxHxW or the input to NxHxW. I think it doesn’t matter which one I transform. What do you think?

For the classes there is the variable n_class and the variable output channels defines the number of feature maps layers. 4 out channels would mean thats there are 4 feature maps in the first layer. And in the next layer there would be 8 layers then and for the last you would have 16. Thats how I interpreted the U-Net and your implementation.

So far I observe the loss of every tenth batch while training. And after one epoch I check the white pixel for the test images. (But that method was only a temporary solution because in first place I only wanted to see if the models learns something else then producing black images). But I don’t save all of that.

Visdom looks pretty powerful. I will have to check how I can use this.


okay I tried to implement the dice_loss but your example can’t be compute backwards.

I tried to find another implementation of the dice_loss functions who got a backward pass but could not find something. They all got no backward pass.