How is the gradient processed when using image Pyramid?

Hello, my question is, for CNN networks to accept multi-scale fusion technology, that is, the image pyramid, how is the gradient preserved?
Is it the gradient of the network that preserves one image for each scale?
Just like this, J=dt/dx1+dt/dx2+dt/dx3.
x1 x2 x3 represent different scales of images, t represent shared network