Why doesn't UNet allow any backbone?

I can’t speak for the author of the linked question, but would guess

But, where U-net only copy the features and append them, FPN apply a 1x1 convolution layer before adding them. This allows the bottom-up pyramid called “backbone” to be pretty much whatever you want.

means that the FPN implementation could be more flexible as the skip connections are processes (and you might be thus able to change the number of channels, spatial size etc.) while the (original) UNet implementation might have just concatenated the skip activations and would thus be more shape-dependent.
In any case, that’s just my interpretation of this answer so you might want to follow up with the authod.

2 Likes