Hi,
I have a simple network that I’m trying to learn to segment a 3d volume. And my network is returning some strange behaviour. I mean after some iterations the error drops to a very low value but the loss stays around maximum:
Iteration: 93 - idx: 92 of 297 - Training Loss: 0.9229209423065186 - Training Error: 0.96309
Iteration: 94 - idx: 93 of 297 - Training Loss: 1.0 - Training Error: 1.0
Iteration: 95 - idx: 94 of 297 - Training Loss: 1.0 - Training Error: 1.0
Iteration: 96 - idx: 95 of 297 - Training Loss: 0.9052178859710693 - Training Error: 0.95456
Iteration: 97 - idx: 96 of 297 - Training Loss: 0.9063018560409546 - Training Error: 0.95752
Iteration: 98 - idx: 97 of 297 - Training Loss: 0.9971625804901123 - Training Error: 0.9988
Iteration: 99 - idx: 98 of 297 - Training Loss: 0.9963698983192444 - Training Error: 0.99843
Iteration: 100 - idx: 99 of 297 - Training Loss: 1.0 - Training Error: 1.0
Iteration: 101 - idx: 100 of 297 - Training Loss: 0.9743770360946655 - Training Error: 0.98899
Iteration: 102 - idx: 101 of 297 - Training Loss: 0.9997477531433105 - Training Error: 0.99989
Iteration: 103 - idx: 102 of 297 - Training Loss: 0.8088855743408203 - Training Error: 0.90434
Iteration: 104 - idx: 103 of 297 - Training Loss: 0.8306054472923279 - Training Error: 0.91758
Iteration: 105 - idx: 104 of 297 - Training Loss: 0.8507647514343262 - Training Error: 0.92999
Iteration: 106 - idx: 105 of 297 - Training Loss: 0.8689086437225342 - Training Error: 0.93817
Iteration: 107 - idx: 106 of 297 - Training Loss: 0.8423064351081848 - Training Error: 0.92586
Iteration: 108 - idx: 107 of 297 - Training Loss: 0.9808754920959473 - Training Error: 0.99174
Iteration: 109 - idx: 108 of 297 - Training Loss: 0.9371287822723389 - Training Error: 0.97213
Iteration: 110 - idx: 109 of 297 - Training Loss: 0.9500914812088013 - Training Error: 0.9782
Iteration: 111 - idx: 110 of 297 - Training Loss: 0.866399884223938 - Training Error: 0.93801
Iteration: 112 - idx: 111 of 297 - Training Loss: 0.9967585802078247 - Training Error: 0.99864
Iteration: 113 - idx: 112 of 297 - Training Loss: 0.9402463436126709 - Training Error: 0.97344
Iteration: 114 - idx: 113 of 297 - Training Loss: 0.9903985261917114 - Training Error: 0.99581
Iteration: 115 - idx: 114 of 297 - Training Loss: 0.9773250222206116 - Training Error: 0.98987
Iteration: 116 - idx: 115 of 297 - Training Loss: 0.9529093503952026 - Training Error: 0.97944
Iteration: 117 - idx: 116 of 297 - Training Loss: 0.8779460787773132 - Training Error: 0.94363
Iteration: 118 - idx: 117 of 297 - Training Loss: 0.968449056148529 - Training Error: 0.98645
Iteration: 119 - idx: 118 of 297 - Training Loss: 1.0 - Training Error: 1.0
Iteration: 120 - idx: 119 of 297 - Training Loss: 0.8482145071029663 - Training Error: 0.92954
Iteration: 121 - idx: 120 of 297 - Training Loss: 0.9723891019821167 - Training Error: 0.24093
Iteration: 122 - idx: 121 of 297 - Training Loss: 1.0 - Training Error: 0.31912
Iteration: 123 - idx: 122 of 297 - Training Loss: 1.0 - Training Error: 0.29264
Iteration: 124 - idx: 123 of 297 - Training Loss: 0.8992700576782227 - Training Error: 0.20317
Iteration: 125 - idx: 124 of 297 - Training Loss: 0.8656968474388123 - Training Error: 0.19998
Iteration: 126 - idx: 125 of 297 - Training Loss: 1.0 - Training Error: 0.27035
Iteration: 127 - idx: 126 of 297 - Training Loss: 0.9026780128479004 - Training Error: 0.20375
Iteration: 128 - idx: 127 of 297 - Training Loss: 0.9268592596054077 - Training Error: 0.23639
Iteration: 129 - idx: 128 of 297 - Training Loss: 0.8402963876724243 - Training Error: 0.15921
Iteration: 130 - idx: 129 of 297 - Training Loss: 1.0 - Training Error: 0.23356
Iteration: 131 - idx: 130 of 297 - Training Loss: 1.0 - Training Error: 0.29319
Iteration: 132 - idx: 131 of 297 - Training Loss: 0.9110627770423889 - Training Error: 0.21088
Iteration: 133 - idx: 132 of 297 - Training Loss: 0.9204986095428467 - Training Error: 0.2349
Iteration: 134 - idx: 133 of 297 - Training Loss: 0.8369404077529907 - Training Error: 0.20368
Iteration: 135 - idx: 134 of 297 - Training Loss: 0.8765052556991577 - Training Error: 0.20096
Iteration: 136 - idx: 135 of 297 - Training Loss: 0.9297358989715576 - Training Error: 0.24956
Iteration: 137 - idx: 136 of 297 - Training Loss: 1.0 - Training Error: 0.25418
Iteration: 138 - idx: 137 of 297 - Training Loss: 0.8197125792503357 - Training Error: 0.14655
Iteration: 139 - idx: 138 of 297 - Training Loss: 0.8560500144958496 - Training Error: 0.1742
Iteration: 140 - idx: 139 of 297 - Training Loss: 0.911932110786438 - Training Error: 0.19162
Iteration: 141 - idx: 140 of 297 - Training Loss: 0.8006452918052673 - Training Error: 0.15542
Iteration: 142 - idx: 141 of 297 - Training Loss: 1.0 - Training Error: 0.31364
Iteration: 143 - idx: 142 of 297 - Training Loss: 1.0 - Training Error: 0.29259
Iteration: 144 - idx: 143 of 297 - Training Loss: 1.0 - Training Error: 0.29525
Iteration: 145 - idx: 144 of 297 - Training Loss: 0.8789593577384949 - Training Error: 0.22947
Iteration: 146 - idx: 145 of 297 - Training Loss: 0.8523666858673096 - Training Error: 0.22481
Iteration: 147 - idx: 146 of 297 - Training Loss: 0.8480030298233032 - Training Error: 0.19208
Iteration: 148 - idx: 147 of 297 - Training Loss: 0.966659426689148 - Training Error: 0.30955
Iteration: 149 - idx: 148 of 297 - Training Loss: 0.9490553736686707 - Training Error: 0.32177
Iteration: 150 - idx: 149 of 297 - Training Loss: 0.991357147693634 - Training Error: 0.25028
Iteration: 151 - idx: 150 of 297 - Training Loss: 0.9695178270339966 - Training Error: 0.32728
Iteration: 152 - idx: 151 of 297 - Training Loss: 0.991864800453186 - Training Error: 0.32936
Iteration: 153 - idx: 152 of 297 - Training Loss: 0.93214350938797 - Training Error: 0.26553
Iteration: 154 - idx: 153 of 297 - Training Loss: 1.0 - Training Error: 0.35379
Iteration: 155 - idx: 154 of 297 - Training Loss: 0.9247879981994629 - Training Error: 0.2841
Iteration: 156 - idx: 155 of 297 - Training Loss: 1.0 - Training Error: 0.26329
Iteration: 157 - idx: 156 of 297 - Training Loss: 0.9641293287277222 - Training Error: 0.3922
Iteration: 158 - idx: 157 of 297 - Training Loss: 0.8773941993713379 - Training Error: 0.29334
Iteration: 159 - idx: 158 of 297 - Training Loss: 0.8569602966308594 - Training Error: 0.23416
Iteration: 160 - idx: 159 of 297 - Training Loss: 0.8664135932922363 - Training Error: 0.2346
Iteration: 161 - idx: 160 of 297 - Training Loss: 0.9659935235977173 - Training Error: 0.30228
Iteration: 162 - idx: 161 of 297 - Training Loss: 0.9180066585540771 - Training Error: 0.22153
Iteration: 163 - idx: 162 of 297 - Training Loss: 1.0 - Training Error: 0.28928
Iteration: 164 - idx: 163 of 297 - Training Loss: 0.8440924882888794 - Training Error: 0.18165
Iteration: 165 - idx: 164 of 297 - Training Loss: 0.9442515969276428 - Training Error: 0.21878
Iteration: 166 - idx: 165 of 297 - Training Loss: 1.0 - Training Error: 0.24745
Iteration: 167 - idx: 166 of 297 - Training Loss: 0.9008297324180603 - Training Error: 0.22211
Iteration: 168 - idx: 167 of 297 - Training Loss: 1.0 - Training Error: 0.26047
Iteration: 169 - idx: 168 of 297 - Training Loss: 1.0 - Training Error: 0.24333
I have a binary segmentation case and am using DiceLoss()
.
I manually checked the result, and when the error was high the output was noting, a black image volume. but as the error decreased the network is trying to predict the shape, I mean I am getting some output (not correct but its localized around the actual ground truth label) but why is the loss so high…
Does this mean the network is not learning? If so, what can I do to fix it?
Thank you