Unble to understand some lines in dlwpt book

I have been following the “Deep Learning with Pytorch” book to learn pytorch, in Chapter 10 of part 2, there is dsets.py file and it contains the following code:

    def __init__(self, series_uid):
        mhd_path = glob.glob(

        ct_mhd = sitk.ReadImage(mhd_path)
        ct_a = np.array(sitk.GetArrayFromImage(ct_mhd), dtype=np.float32)

        ct_a.clip(-1000, 1000, ct_a)

        self.series_uid = series_uid
        self.hu_a = ct_a

        self.origin_xyz = XyzTuple(*ct_mhd.GetOrigin())
        self.vxSize_xyz = XyzTuple(*ct_mhd.GetSpacing())
        self.direction_a = np.array(ct_mhd.GetDirection()).reshape(3, 3)

I understood all but the last 3 lines where self.origin_xyz, self.vxSize_xyz and self.direction_a got assigned. XyzTuple is XyzTuple = collections.namedtuple('XyzTuple', ['x', 'y', 'z']) (in util.py)

I am having a difficulty to understand what those GetOrigin(), GetSpacing() and GetDirection() mean. Also what is the role of * in front of ct_mhd?

Hope one can explain or refer to some other resources.

Based on the code snippet it seems that CT images are loaded (hu_a should refer to Hounsfield units).
origin_xyz would refer to the origin of the Cartesian coordinate system, vxSize_xyz to the voxel size in each direction, and direction_a should be an axis vector.

The asterisk (*) unpacks the values as described e.g. here.

CC @tom, who is one of the authors of this book to correct me.

Yes, as @ptrblck said, these are the coordinate transformation, and the * is just Python to separate a tuple (returned by GetOrigin/GetSpacing) into separate arguments.

The easiest way to look at the transformation is irc2xyz (i.e. voxel indices to physical coordinates in the patient):
The key here is

coords_xyz = (direction_a @ (cri_a * vxSize_a)) + origin_a

And so starting out with the pixel indices cri_a:

  • The triple vx_Size_a is the size fo a single voxel, i.e. two voxels are exactly that many mm apart.
    So cri_a * vxSize_a has a coordinates of the right size.
  • The matrix direction_a is an affine transformation, typically a orthogonal one, it typically be the identity or a mirroring of the coordingates (-1 on the diagonal) or a rotation (e.g., but not necessarily by 90° to exchange some axes). After this, you have a coordinate system that the directions of figure 10.6.
  • The origin moves where 0 is. I’m never 100% sure where it should be, but the software of the ct scanner gives it to us and we need it to match against the physical coordinates in the pixels.
    Then you have patient coordinates. These are used e.g. for the annotations.

The other direction is the algebraic inverse of this.
This can be more than just a bit messy but so not all patients lie perfectly axis-parallel in the thing and so you have all sorts of weird effects. It is a wonder that we even get a common file format for all these machines across the world.
(And I’ve personally sat next to Luca when we converted some even more arcane format in some other project, so the LUNA dataset is actually quite moderate.)

Best regards


P.S.: As an anecdote: Actually you can get by with only two very simple matrices for the direction in the LUNA dataset and we had the transformations hardcoded at first, but then we thought it might be better to code this up in general to let you use it with other data if you want (also more maths less magic :wink: ).

1 Like

@tom Hey guys,

Thank you very much for this wonderful book!

Could you please explain these lines in util.py:

def irc2xyz(coord_irc, origin_xyz, vxSize_xyz, direction_a):
    cri_a = np.array(coord_irc)[::-1]

cri_a should have dimensions c and i swapped compared to coord_irc, right? But does [::-1] really swap dimensions? Or is it just reversing data along the first dimension?

Hi, thank you!

The thing is that this is for a single tuple of irc coordinates cast to a 3-element array and then flipped to cri.
In general, flipping would be the transformation between irc and cri, but if you had a batch, it would be more likely in the last dimension. (This is - in an abstract way - similar to BGR->RGB, but ignore this if it adds more confusion than it helps.)

If I had a tensor that was in IRC “data layout” and wanted to convert it to CRI data layout (similar to channel-first CHW vs. channel-last HWC layout for images), then I would need to use .permute(2, 1, 0), but this is not the case here because we are talking about indices into tensors.

Best regards


Hi Thomas,

Got it, this function is designed to work with a single point. So the first line just turns tuple of numbers (I,R,C) into 3-element array [C, R, I].

Thank you very much for quick response and detailed explanation!

1 Like