UNet implementation

It’s done. It seems I still have some problems though, and they might also be related to cuda.

I can’t reproduce the error with trivial tensors, but basically the log_softmax function produces nans and -inf.
I’m not even sure those are “reasonable” values to obtain for a last layer, but in any case the cuda function doesn’t seem happy with them.

last_layer

Variable containing:
(0 ,0 ,.,.) = 
    3.5879    3.6678    3.8380  ...     3.1548    3.0576    2.9584
    3.4753    3.7363    3.8944  ...     2.9736    3.0051    2.9889
    3.3298    3.3160    3.5382  ...     2.8276    2.8111    2.8584
              ...                ⋱                ...             
    3.2416    3.1960    3.2502  ...    90.0304   98.9006   98.5473
    3.2719    3.2843    3.2724  ...    39.1980   67.7482   73.4172
    3.2535    3.3880    3.3061  ...    13.5371   25.7164   37.9838

(0 ,1 ,.,.) = 
   -3.7768   -3.8683   -4.0629  ...    -3.2815   -3.1703   -3.0568
   -3.6481   -3.9466   -4.1274  ...    -3.0742   -3.1102   -3.0917
   -3.4816   -3.4659   -3.7200  ...    -2.9072   -2.8883   -2.9424
              ...                ⋱                ...             
   -3.3808   -3.3285   -3.3906  ...  -122.8841 -137.5700 -122.2672
   -3.4154   -3.4296   -3.4160  ...   -48.6680  -82.5093  -91.6747
   -3.3943   -3.5482   -3.4545  ...   -17.6337  -33.1317  -46.8320
[torch.cuda.FloatTensor of size 1x2x68x68 (GPU 0)]

F.log_softmax(last_layer)

Variable containing:
(0 ,0 ,.,.) = 
   -0.0006   -0.0005   -0.0004  ...    -0.0016   -0.0020   -0.0024
   -0.0008   -0.0005   -0.0003  ...    -0.0024   -0.0022   -0.0023
   -0.0011   -0.0011   -0.0007  ...    -0.0032   -0.0033   -0.0030
              ...                ⋱                ...             
   -0.0013   -0.0015   -0.0013  ...        nan       nan       nan
   -0.0012   -0.0012   -0.0012  ...     0.0000    0.0000    0.0000
   -0.0013   -0.0010   -0.0012  ...     0.0000    0.0000    0.0000

(0 ,1 ,.,.) = 
   -7.3653   -7.5366   -7.9013  ...    -6.4379   -6.2299   -6.0176
   -7.1242   -7.6833   -8.0221  ...    -6.0501   -6.1175   -6.0829
   -6.8125   -6.7830   -7.2590  ...    -5.7381   -5.7027   -5.8038
              ...                ⋱                ...             
   -6.6237   -6.5260   -6.6420  ...       -inf      -inf      -inf
   -6.6886   -6.7152   -6.6896  ...   -87.8659      -inf      -inf
   -6.6491   -6.9371   -6.7618  ...   -31.1708  -58.8481  -84.8159
[torch.cuda.FloatTensor of size 1x2x68x68 (GPU 0)]

F.log_softmax(last_layer.cpu())

Variable containing:
(0 ,0 ,.,.) = 
   -0.0006   -0.0005   -0.0004  ...    -0.0016   -0.0020   -0.0024
   -0.0008   -0.0005   -0.0003  ...    -0.0024   -0.0022   -0.0023
   -0.0011   -0.0011   -0.0007  ...    -0.0032   -0.0033   -0.0030
              ...                ⋱                ...             
   -0.0013   -0.0015   -0.0013  ...     0.0000    0.0000    0.0000
   -0.0012   -0.0012   -0.0012  ...     0.0000    0.0000    0.0000
   -0.0013   -0.0010   -0.0012  ...    -0.0000    0.0000    0.0000

(0 ,1 ,.,.) = 
   -7.3653   -7.5366   -7.9013  ...    -6.4379   -6.2299   -6.0176
   -7.1242   -7.6833   -8.0221  ...    -6.0501   -6.1175   -6.0829
   -6.8125   -6.7830   -7.2590  ...    -5.7381   -5.7027   -5.8038
              ...                ⋱                ...             
   -6.6237   -6.5260   -6.6420  ...  -212.9145 -236.4706 -220.8145
   -6.6886   -6.7152   -6.6896  ...   -87.8659 -150.2575 -165.0919
   -6.6491   -6.9371   -6.7618  ...   -31.1708  -58.8481  -84.8159
[torch.FloatTensor of size 1x2x68x68]

It seems that it becomes numerically unstable if when the difference gets too large. I’ve opened an issue.

2 Likes

I also implemented an UNet variant in pytorch recently, and managed to train it more or less successfully for a Kaggle competition, here it is: https://github.com/lopuhin/kaggle-dstl/blob/292840bf4faf49ecf7c74bed9b6d91982a139090/models.py#L211 - but in my case the classes were not mutually exclusive, so I used sigmoid activations.

2 Likes

@lopuhin I also tried to implement UNet unfortunately I don’t have any convergence (neither for image-to-image transforms nor for multi-class pixel-wise segmentation).

Do you have any idea what the problem could be? Do you think it maybe because of missing Batch Norm layers?

What is your experience with the different UNet models you provide? How do e.g. SmallNet, OldNet, … compare to each other?

@bodokaiser the lack of any convergence even on train might be some bug (maybe even not in the network but in how inputs/outputs are prepared) or bad learning rate, and it can very much depend on the dataset - sorry, don’t have any more insights about this.

One thing that is different in my implementation is that I use upsampling instead of transposed convolutions, it worked significantly better in my case. Batch normalization speeds up convergence but is by no means essential, it worked fine without it too. Simpler models also gave okayish results, but UNet was consistently better - in this task the metric was intersection over union, and simple models were giving results in the 0.2-0.3 range (average over 10 classes), while UNet gave 0.4+ without too much tuning.

1 Like

@lopuhin after about ~200 iterations (batch size 1) my output images only have one color (= are classified to have one one homogenous segmentation label) which does not change with different images. I tried cuda and cpu mode (same problem) I also get this when only using 1 u-net layer (but not with only one standard convolutional layer). So not sure if this can be a bug. I guess I need to try out your implementation to find bugs in my code. So big thanks for sharing!

1 Like

I also spend a whole week on your code @bodokaiser and as you you mentioned, the blank output problem is so weird! I changed and analyzed all kinds of examinations to find the source of bad behavior, but nothing!!! Everything works as it should!!
I don’t know why the developers pf PyTorch do not pay attention to this weird problem, which I believe is a clear bug in PyTorch!!

maybe I should mention @apaszke explicitly to grab his attention :slight_smile:

I don’t think it’s a bug in pytorch, as I’ve commented in the other issue.
Using batch norm with such small batch sizes is probably not a good idea. If you need those (because of using a pre-trained network), I’d freeze the mean/std and parameters of the batch norm

1 Like

The point is that the problem persists even after removing batchnorm. Additionally, I’m using batch size of size 16 in my recent efforts . The same model is working under theano/lasagna implementation!

Hey,

I already commented on this issue on GitHub but got some new ideas today which might be worth to check out:

  1. Check if the loss is correct (correct sign +1, -1)
  2. Use Deconvolution instead of MaxPool

For 2. you could take a look at https://github.com/meetshah1995/pytorch-semseg/blob/master/ptsemseg/models/utils.py#L94-L108

Good luck!

If you share the lasagne and the pytorch code, I can have a look

HI @bodokaiser

for the 1. I think it is not the problem, because I’m normalizing the groundtuth to the range of {0,1} and the prediction after sigmoid is also in the range [0,1]. The BCELoss also is supposed to work with this range.

For 2, yeah maybe there is a problem with ConvTranspose2d. I will replace it with Upsampling and update you.

Here is the code to the THEANO version which works like a charm!!

Here’s the PyTorch code not working!
https://github.com/saeedizadi/UNET_POLYP

I had a quick look at your code, and one thing that you should note is that you need the Sigmoid for the BCELoss, and it seems that you commented it out?
Also, it would be great if you could explain what kind of problems you are having: is the network not converging? are you getting worse results that your lasagne implementation? the network doesn’t learn at all?

No, I’ve commented in the unet.py file, but it exists in the main.py:

outputs = F.sigmoid(model(inputs))

The problem is that the network starts to converge and the loss goes from ~0.7 down to ~0.2 very naturally! So we have convergence! right? however, when I try to evaluate the learned model on even the training images, the output is not better than a blank image!
I was thinking that there might be some problem with loading the learned weight. So, I incorporated a evaluation phase in each batch update. It is evident that the prediction goes apparently goes to blank map in the early first epoch!.

I tested many thing! I also replaced my implemented arch with implementations of UNet by others, and the same problem! I changed the input range into [0,255] instead of [0,1]! the same problem! I changed the loss funtion from BCELoss, to MSELoss, and to CrossEntripyLoss2d; again the same problem!!! With and without batch normalization!! Different Gradient Descent algorithms!

That’s so weird! this is why I doubt it to be a bug in PyTorch!

Is there any updates on this?
@fmassa

I finally found the problem!!
For the last set of convolutions, that is 128-> 64 -> 64 -> 1, the activation function should not be used!
The activation function causes the values to vanish!

I just removed the nn.ReLU() modules on top of these convolution layers and now everything works like a charm!

Saeed

Dear all, I almost went through all your implementation about Unet. I can not find weight initialization function or syntax in your implement. I am wondering if I need to initialize weight by myself or there is any missing from the code.

These are only the model design codes. I do weigh initialization as:

            for m in self.modules():
                    if isinstance(m, nn.Conv2d):
                            n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                            m.weight.data.normal_(0, math.sqrt(2. / n))

Sorry that I am reviving this topic after one year :confused: I am also trying to implement UNet but I cannot understand the center_crop function that how does it work.
another question is what is imsize in UNet class? any replies would appreciated.

1 Like