Dicom files in pytorch?

I ran my program using the Pytorch transfer learning example, with dicom files, but this error was thrown
“RuntimeError: Found 0 files in subfolders of: …/data2/train
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp”

Is there any way to continue using dicom files and not convert to png?

You can create a custom dataset class which opens your files with whichever library you need. Think that PIL is chosen by default but it’s not mandatory. The Drawback is you would lose transformations

How would I go about writing a custom dataset class… Any tutorials that you know of that explains how to?

You need to subclass a Datasetclass as follows:

from torch.utils import data
class custom_dataset(data.Dataset):
    def __init__(self,*args,**kwargs):
    """
    Custom arguments, up to you
    """
    self.data = list_of_paths
    def __getitem__(self, index):
        """
        Args:
            index (int): Index
        Returns:
            whatever you need
        """
        #  Here you have to code workload to open files or to do any kind of preprocessing.This function is submitted to multiprocessing.
       return load(self.data[index])
    def __len__(self):
        # Return amount of samples of your dataset.
        return len(self.data)

You need to define a class like the previous one, with init, getitem and len.
Len should return the amount of samples of your dataset, images in your case.
In init you define a list of paths, files, or whatever setup you need.

In getitem you have to load those files prviously enlisted and put all the workload there. This is, any preprocessing operation you have to run on the fly as this function is submitted to multiprocessing

1 Like

Thank you. Can I still use “torch.utils.data.DataLoader” for the multiprocessing?

Yes you can use that…

# Convert DICOM to JPG/PNG via openCV
def convert_images(filename, img_type='jpg'):
    """Reads a dcm file and saves the files as png/jpg
    
    Args:
        filename: path to the dcm file
        img_type: format of the processed file (jpg or png)
        
    """
    # extract the name of the file
    name = filename.parts[-1]
    
    # read the dcm file
    ds = pydicom.read_file(str(filename)) 
    img = ds.pixel_array
    
    # save the image as jpg/png
    if img_type=="jpg":
        cv2.imwrite(outdir + name.replace('.dcm','.jpg'), img)
    else:
        cv2.imwrite(outdir + name.replace('.dcm','.png'), img)

# Using dask 
all_images = [dd.delayed(convert_images)(all_files[x]) for x in range(len(all_files))]
dd.compute(all_images)