Predict on central slice, with adjacent slices provided as additional channels in input tensor?

I am looking to implement a model using a ‘pseudo-3D approach’, similar to this:

Specifically, I am working with medical data and want to add adjacent imaging slices (above and below the center slice) as contextual information in order to improve 2D segmentations on the center slice, i.e. a pseudo-3D approach (as the paper calls it). The paper in question adds these neighboring slices as additional channels in the input tensor. My question is - how do I organize this data in the input tensor such that actual segmentations are only being performed on the center slice? This approach appears to be quite different than a normal multi-channel input image, which may just have different color channels.

My thought was that if my input is CxDxHxW, that the first channel would be my center slice (image of interest that I want a predicted segmentation for), and that any additional channels would be the adjacent slices to provide contextual data. Does this sound correct?

That sounds correct and a reordering of the channels should not make a difference, as the conv kernels will use all input channels anyway (they will not slide in this dimension).
You should just make sure to keep the same order of slices.

The target would of course only be the segmentation map of the “center slice”.

Let me know, if you have more questions.