[Beginner] What does it mean when loss stays high yet error/accuracy improves?

phdproblems · August 28, 2019, 1:09pm

Hi,

I have a simple network that I’m trying to learn to segment a 3d volume. And my network is returning some strange behaviour. I mean after some iterations the error drops to a very low value but the loss stays around maximum:

Iteration:  93  - idx:  92 of  297 - Training Loss:  0.9229209423065186 - Training Error:  0.96309
Iteration:  94  - idx:  93 of  297 - Training Loss:  1.0 - Training Error:  1.0
Iteration:  95  - idx:  94 of  297 - Training Loss:  1.0 - Training Error:  1.0
Iteration:  96  - idx:  95 of  297 - Training Loss:  0.9052178859710693 - Training Error:  0.95456
Iteration:  97  - idx:  96 of  297 - Training Loss:  0.9063018560409546 - Training Error:  0.95752
Iteration:  98  - idx:  97 of  297 - Training Loss:  0.9971625804901123 - Training Error:  0.9988
Iteration:  99  - idx:  98 of  297 - Training Loss:  0.9963698983192444 - Training Error:  0.99843
Iteration:  100  - idx:  99 of  297 - Training Loss:  1.0 - Training Error:  1.0
Iteration:  101  - idx:  100 of  297 - Training Loss:  0.9743770360946655 - Training Error:  0.98899
Iteration:  102  - idx:  101 of  297 - Training Loss:  0.9997477531433105 - Training Error:  0.99989
Iteration:  103  - idx:  102 of  297 - Training Loss:  0.8088855743408203 - Training Error:  0.90434
Iteration:  104  - idx:  103 of  297 - Training Loss:  0.8306054472923279 - Training Error:  0.91758
Iteration:  105  - idx:  104 of  297 - Training Loss:  0.8507647514343262 - Training Error:  0.92999
Iteration:  106  - idx:  105 of  297 - Training Loss:  0.8689086437225342 - Training Error:  0.93817
Iteration:  107  - idx:  106 of  297 - Training Loss:  0.8423064351081848 - Training Error:  0.92586
Iteration:  108  - idx:  107 of  297 - Training Loss:  0.9808754920959473 - Training Error:  0.99174
Iteration:  109  - idx:  108 of  297 - Training Loss:  0.9371287822723389 - Training Error:  0.97213
Iteration:  110  - idx:  109 of  297 - Training Loss:  0.9500914812088013 - Training Error:  0.9782
Iteration:  111  - idx:  110 of  297 - Training Loss:  0.866399884223938 - Training Error:  0.93801
Iteration:  112  - idx:  111 of  297 - Training Loss:  0.9967585802078247 - Training Error:  0.99864
Iteration:  113  - idx:  112 of  297 - Training Loss:  0.9402463436126709 - Training Error:  0.97344
Iteration:  114  - idx:  113 of  297 - Training Loss:  0.9903985261917114 - Training Error:  0.99581
Iteration:  115  - idx:  114 of  297 - Training Loss:  0.9773250222206116 - Training Error:  0.98987
Iteration:  116  - idx:  115 of  297 - Training Loss:  0.9529093503952026 - Training Error:  0.97944
Iteration:  117  - idx:  116 of  297 - Training Loss:  0.8779460787773132 - Training Error:  0.94363
Iteration:  118  - idx:  117 of  297 - Training Loss:  0.968449056148529 - Training Error:  0.98645
Iteration:  119  - idx:  118 of  297 - Training Loss:  1.0 - Training Error:  1.0
Iteration:  120  - idx:  119 of  297 - Training Loss:  0.8482145071029663 - Training Error:  0.92954
Iteration:  121  - idx:  120 of  297 - Training Loss:  0.9723891019821167 - Training Error:  0.24093
Iteration:  122  - idx:  121 of  297 - Training Loss:  1.0 - Training Error:  0.31912
Iteration:  123  - idx:  122 of  297 - Training Loss:  1.0 - Training Error:  0.29264
Iteration:  124  - idx:  123 of  297 - Training Loss:  0.8992700576782227 - Training Error:  0.20317
Iteration:  125  - idx:  124 of  297 - Training Loss:  0.8656968474388123 - Training Error:  0.19998
Iteration:  126  - idx:  125 of  297 - Training Loss:  1.0 - Training Error:  0.27035
Iteration:  127  - idx:  126 of  297 - Training Loss:  0.9026780128479004 - Training Error:  0.20375
Iteration:  128  - idx:  127 of  297 - Training Loss:  0.9268592596054077 - Training Error:  0.23639
Iteration:  129  - idx:  128 of  297 - Training Loss:  0.8402963876724243 - Training Error:  0.15921
Iteration:  130  - idx:  129 of  297 - Training Loss:  1.0 - Training Error:  0.23356
Iteration:  131  - idx:  130 of  297 - Training Loss:  1.0 - Training Error:  0.29319
Iteration:  132  - idx:  131 of  297 - Training Loss:  0.9110627770423889 - Training Error:  0.21088
Iteration:  133  - idx:  132 of  297 - Training Loss:  0.9204986095428467 - Training Error:  0.2349
Iteration:  134  - idx:  133 of  297 - Training Loss:  0.8369404077529907 - Training Error:  0.20368
Iteration:  135  - idx:  134 of  297 - Training Loss:  0.8765052556991577 - Training Error:  0.20096
Iteration:  136  - idx:  135 of  297 - Training Loss:  0.9297358989715576 - Training Error:  0.24956
Iteration:  137  - idx:  136 of  297 - Training Loss:  1.0 - Training Error:  0.25418
Iteration:  138  - idx:  137 of  297 - Training Loss:  0.8197125792503357 - Training Error:  0.14655
Iteration:  139  - idx:  138 of  297 - Training Loss:  0.8560500144958496 - Training Error:  0.1742
Iteration:  140  - idx:  139 of  297 - Training Loss:  0.911932110786438 - Training Error:  0.19162
Iteration:  141  - idx:  140 of  297 - Training Loss:  0.8006452918052673 - Training Error:  0.15542
Iteration:  142  - idx:  141 of  297 - Training Loss:  1.0 - Training Error:  0.31364
Iteration:  143  - idx:  142 of  297 - Training Loss:  1.0 - Training Error:  0.29259
Iteration:  144  - idx:  143 of  297 - Training Loss:  1.0 - Training Error:  0.29525
Iteration:  145  - idx:  144 of  297 - Training Loss:  0.8789593577384949 - Training Error:  0.22947
Iteration:  146  - idx:  145 of  297 - Training Loss:  0.8523666858673096 - Training Error:  0.22481
Iteration:  147  - idx:  146 of  297 - Training Loss:  0.8480030298233032 - Training Error:  0.19208
Iteration:  148  - idx:  147 of  297 - Training Loss:  0.966659426689148 - Training Error:  0.30955
Iteration:  149  - idx:  148 of  297 - Training Loss:  0.9490553736686707 - Training Error:  0.32177
Iteration:  150  - idx:  149 of  297 - Training Loss:  0.991357147693634 - Training Error:  0.25028
Iteration:  151  - idx:  150 of  297 - Training Loss:  0.9695178270339966 - Training Error:  0.32728
Iteration:  152  - idx:  151 of  297 - Training Loss:  0.991864800453186 - Training Error:  0.32936
Iteration:  153  - idx:  152 of  297 - Training Loss:  0.93214350938797 - Training Error:  0.26553
Iteration:  154  - idx:  153 of  297 - Training Loss:  1.0 - Training Error:  0.35379
Iteration:  155  - idx:  154 of  297 - Training Loss:  0.9247879981994629 - Training Error:  0.2841
Iteration:  156  - idx:  155 of  297 - Training Loss:  1.0 - Training Error:  0.26329
Iteration:  157  - idx:  156 of  297 - Training Loss:  0.9641293287277222 - Training Error:  0.3922
Iteration:  158  - idx:  157 of  297 - Training Loss:  0.8773941993713379 - Training Error:  0.29334
Iteration:  159  - idx:  158 of  297 - Training Loss:  0.8569602966308594 - Training Error:  0.23416
Iteration:  160  - idx:  159 of  297 - Training Loss:  0.8664135932922363 - Training Error:  0.2346
Iteration:  161  - idx:  160 of  297 - Training Loss:  0.9659935235977173 - Training Error:  0.30228
Iteration:  162  - idx:  161 of  297 - Training Loss:  0.9180066585540771 - Training Error:  0.22153
Iteration:  163  - idx:  162 of  297 - Training Loss:  1.0 - Training Error:  0.28928
Iteration:  164  - idx:  163 of  297 - Training Loss:  0.8440924882888794 - Training Error:  0.18165
Iteration:  165  - idx:  164 of  297 - Training Loss:  0.9442515969276428 - Training Error:  0.21878
Iteration:  166  - idx:  165 of  297 - Training Loss:  1.0 - Training Error:  0.24745
Iteration:  167  - idx:  166 of  297 - Training Loss:  0.9008297324180603 - Training Error:  0.22211
Iteration:  168  - idx:  167 of  297 - Training Loss:  1.0 - Training Error:  0.26047
Iteration:  169  - idx:  168 of  297 - Training Loss:  1.0 - Training Error:  0.24333

I have a binary segmentation case and am using DiceLoss().

I manually checked the result, and when the error was high the output was noting, a black image volume. but as the error decreased the network is trying to predict the shape, I mean I am getting some output (not correct but its localized around the actual ground truth label) but why is the loss so high…

Does this mean the network is not learning? If so, what can I do to fix it?

Thank you

phdproblems · August 28, 2019, 3:09pm

While searching for a solution I cam upon this thread: https://github.com/pytorch/pytorch/issues/3358, suggesting to add the below code under loss.backwards():

loss.backward()

for param in net.parameters():
    print(param.grad.data.sum())

unless I understood it wrong, this is to check if the computed gradients are returning any non-zero values. If they are non-zero then the learning rate is too small.

But the time when I get non-zero gradients is when the network is showing that its learning, but at zeros values I get a loss of 1.0.

tensor(-0.0001, device='cuda:0')
tensor(3.8132e-09, device='cuda:0')
tensor(1.5014e-06, device='cuda:0')
tensor(-0.0006, device='cuda:0')
tensor(1.4931e-09, device='cuda:0')
tensor(2.0285e-06, device='cuda:0')
tensor(-0.0003, device='cuda:0')
tensor(3.4502e-10, device='cuda:0')
tensor(1.4271e-06, device='cuda:0')
tensor(-0.0005, device='cuda:0')
tensor(-0.0000, device='cuda:0')
tensor(1.6241e-06, device='cuda:0')
tensor(-0.0003, device='cuda:0')
tensor(-0.0000, device='cuda:0')
tensor(1.2744e-06, device='cuda:0')
tensor(-0.0003, device='cuda:0')
tensor(-0.0000, device='cuda:0')
tensor(1.5523e-06, device='cuda:0')
tensor(-0.0006, device='cuda:0')
tensor(-0.0000, device='cuda:0')
tensor(2.0451e-06, device='cuda:0')
tensor(-0.0009, device='cuda:0')
tensor(-0.0000, device='cuda:0')
tensor(2.3836e-06, device='cuda:0')
tensor(-0.0011, device='cuda:0')
tensor(-0.0000, device='cuda:0')
tensor(2.6307e-06, device='cuda:0')
tensor(-0.0038, device='cuda:0')
tensor(6.4067e-10, device='cuda:0')
tensor(6.6819e-06, device='cuda:0')
tensor(-0.0074, device='cuda:0')
tensor(-0.0000, device='cuda:0')
tensor(2.7682e-06, device='cuda:0')
tensor(0.0008, device='cuda:0')
tensor(-0.0001, device='cuda:0')
tensor(-0.0000, device='cuda:0')
tensor(-0.0044, device='cuda:0')
tensor(-0.0001, device='cuda:0')
tensor(-0.0052, device='cuda:0')
tensor(-0.0007, device='cuda:0')
tensor(-0.0053, device='cuda:0')
tensor(-0.0004, device='cuda:0')
tensor(-0.0008, device='cuda:0')
tensor(-0.0021, device='cuda:0')
Iteration:  7  - idx:  6 of  297 - Training Loss:  0.83579021692276 - Training Error:  0.90537
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
Iteration:  8  - idx:  7 of  297 - Training Loss:  1.0 - Training Error:  1.0

Sorry for the very beginner question but the github thread suggests that zero gradient values are preferred, but the time I get high loss is when my gradients are zero.

I also tried lowering my learning rate from 1e-2 to 1e-3 but that did not improve or fix anything.

What am I doing wrong?

Thank you

phdproblems · September 10, 2019, 11:59am

If someone comes by this via Google search, I wanted to update why I was getting this loss behaviour. The reason was that I was applying nn.Sigmoid() at the model output and in the DiceLoss() as well. Removing the sigmoid operation from the model output fixed the weird loss behaviour.

ptrblk explained this more here: Why my loss function's value doesn't going down? which helped me fix this.