I ran my program using the Pytorch transfer learning example, with dicom files, but this error was thrown
“RuntimeError: Found 0 files in subfolders of: …/data2/train
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp”
Is there any way to continue using dicom files and not convert to png?
You can create a custom dataset class which opens your files with whichever library you need. Think that PIL is chosen by default but it’s not mandatory. The Drawback is you would lose transformations
How would I go about writing a custom dataset class… Any tutorials that you know of that explains how to?
You need to subclass a Datasetclass as follows:
from torch.utils import data
Custom arguments, up to you
self.data = list_of_paths
def __getitem__(self, index):
index (int): Index
whatever you need
# Here you have to code workload to open files or to do any kind of preprocessing.This function is submitted to multiprocessing.
# Return amount of samples of your dataset.
You need to define a class like the previous one, with init, getitem and len.
Len should return the amount of samples of your dataset, images in your case.
In init you define a list of paths, files, or whatever setup you need.
In getitem you have to load those files prviously enlisted and put all the workload there. This is, any preprocessing operation you have to run on the fly as this function is submitted to multiprocessing
Thank you. Can I still use “torch.utils.data.DataLoader” for the multiprocessing?
Yes you can use that…
# Convert DICOM to JPG/PNG via openCV
def convert_images(filename, img_type='jpg'):
"""Reads a dcm file and saves the files as png/jpg
filename: path to the dcm file
img_type: format of the processed file (jpg or png)
# extract the name of the file
name = filename.parts[-1]
# read the dcm file
ds = pydicom.read_file(str(filename))
img = ds.pixel_array
# save the image as jpg/png
cv2.imwrite(outdir + name.replace('.dcm','.jpg'), img)
cv2.imwrite(outdir + name.replace('.dcm','.png'), img)
# Using dask
all_images = [dd.delayed(convert_images)(all_files[x]) for x in range(len(all_files))]