Upsampling EfficientNet (in Encoder-decoder ⌛ arch)

I’m trying to re-implement SANET, an encoder-decoder style network for arbitrary image stylization.

The problem is, these guys use 2 freakin’ VGG19s for inference. I’m trying to replace these with a much more efficient nets (pun intended) to run on mobile devices. I replaced the encoder VGG, no problems there, but what about the decoder?

Is there an equivalent, high-performance, upsampling CNN architecture that can perform just as well as inverted VGG19, but be a lot faster and smaller? (Sadly the U-nets are out question, for this particular project)

I just need some pointers, maybe related papers or something that shows me something equivalent to an inverted efficientNet architecture. :confused: Also the framework of the paper doesn’t matter.

Thank you so much. :heart: