Confused about STN

In STN(Spatial Transformer Networks), we need 6 parameters to do affine transforms. Why the output of a transformation is the previous layer(l-1) of a network while the input is the latter layer(l)? I am so confused, thank you