Too much GPU memory usage for input/model size?

I’m using DistributedDataParallel and have 8 P100 GPUs with 16GB each, all located on a single node. I’m training DeepLabV3Plus semantic segmentation model with the effiicentnet-b0 encoder (trained model is 54MB), the input data samples are 1024x1024x3 (25MB uncompressed tiff files) output is the same size as input data.

The batchsize=4, num_workers=2, using the memory snapshot tool, it seems like I’m using 13.4GB on each GPU, why is the memory usage so high, is this expected? Can I approximate GPU memory needed given model/data/output sizes?