About large datasize, 3D data and patches

Hello All,
I am working on 3D data of 114 images each of dimensions [180x256x256]. Since such a large image can not be fed directly to the network, I am using overlapping patches of size [64x64x64]. Now there are around 22,000 patches in total for 114 images. which can not be loaded into the Dataloader as cuda memory runs out.
Is there a way to iterate the loading of one image at a time and run patches from the image and go to the next image?

N.B. for each image there is a target 3D mask with 9 different labels.

Often you would not load the data directly to the GPU in the Dataset.__getitem__ or in any DataLoader method, but you would push it to the device in the training loop.
Assuming your host system has enough memory to load (a batch of) images, you could perform the patch creation in the Dataset.
Depending on your use case, you could e.g. specify the batch size to be N and slice the single image into M patches. The returned batch would then be N*M of the smaller windows.

1 Like

Hey @ptrblck,
I wrote some of the Dataset like below:

class SegSet(data.Dataset):
    def __init__(self, subdict, num_labels):  
        """
        :param subdict: a dictionary of subject and label paths
        :param num_labels: number of segmentation labels (9)
        """
        self.subdict = subdict
        self.img_subs = subdict['img_subs']
        self.img_files = subdict['img_files']
        if checkKey(subdict, 'seg_subs'): # This function checks if segmentation label available or not
            self.seg_subs = subdict['seg_subs']
            self.seg_files = subdict['seg_files']
        else:
            self.seg_subs = None
            self.seg_files = None
        self.num_labels = num_labels

    def __len__(self):
        return len(self.img_subs)

    def __getitem__(self, index):
        num_labels = self.num_labels
        sub_name = self.img_subs[index]
        img_file = self.img_files[index]
        img_3d = nib.load(img_file)
        img = img_3d.get_data()
        img = (img - img.min())/(img.max()-img.min())
        img = img*255.0
        seg_file = self.seg_files[index]
        seg_3d = nib.load(seg_file)
        seg = seg_3d.get_data()
        imgp, segp = generate_patch_32_3(img, seg)
        for i in range(1,num_labels):
            for j in range(len(imgp)):
                seg_one = segp == labels[i] #labels = labels number list e.g. [0, 1, 2 ,10, 56 ...]
                segp[j, i, :, :, :] = seg_one[0:segp.shape[0], 0:segp.shape[1], 0:segp.shape[2]]
                segp[j, 0, :, :, :] = segp[j, 0, :, :, :] - segp[j, i, :, :, :]
                # print("Here")
        imgp = imgp.astype('float32')
        segp = segp.astype('float32')
        return imgp, segp, sub_name

The generate_patch_32_3 function simply generates 3D patches from 180x256x256 img and segmentation img to 192x16x64x64 and 192x8x32x32 correspondingly.

The problem is after having paired source and target from SegSet I don’t understand how to load them to the 'Dataloaderand also keep loading the next image forSegSet`.

    train_set = SegSet(train_dict, num_labels=9)
    print(len(train_set))
    x, y, z = next(iter(train_set))
    print(x.shape, '\n', y.shape, '\n', z)

The output is:
2
(192, 1, 16, 64, 64)
(192, 9, 8, 32, 32)
001_MR2std

So the output of the Dataset class is multiple (192) source and targets. When I pass it to Dataloader :

train_loader = data.DataLoader(train_set, batch_size=16, shuffle=False, num_workers=1)
print(len(train_loader))
x, y, z = next(iter(train_loader))
print(x.shape, '\n', y.shape, '\n', z)

The output is:

1
torch.Size([2, 192, 1, 16, 64, 64])
torch.Size([2, 192, 9, 8, 32, 32])
('001_MR2std', '002_MR2std')

It doesn’t change even if I change the batch_size to 8 or 80.

So if I feed the x and y to the network, is it taking the whole 192 patches at once? what is the significance of batches here?

I’m not sure what exactly is causing the issue and cannot easily debug it as I don’t have the files.
However, here is a small example of what I had in mind:

class MyDataset(Dataset):
    def __init__(self):
        self.data = torch.randn(10, 3, 24, 24)
    
    def __getitem__(self, index):
        # get current sample
        x = self.data[index]
        # create patches
        x = x.unfold(1, 12, 12).unfold(2, 12, 12)
        #flatten patches
        x = x.contiguous().view(x.size(0), -1, x.size(3), x.size(4))
        x = x.permute(1, 0, 2, 3)
        return x
        
    def __len__(self):
        return len(self.data)

dataset = MyDataset()
loader = DataLoader(dataset, batch_size=2)
x = next(iter(loader))
print(x.shape)
> torch.Size([2, 4, 3, 12, 12])

Inside the Dataset.__getitem__ you are creating 4 patches of the shape 12x12 and returning these 4 patches in a shape of [patches=4, channels=3, height=12, width=12].
The DataLoader uses the batch_size to specify how often the __getitem__ is called and stacks the returned tensors in a newly created dim0.
For a batch size of 2, the returned batch will have thus the shape [batch_size=2, patches=4, 3, 12, 12] and you could flatten the patches into the batch dimension in order to feed it into the model.

1 Like

Hey @ptrblck,
I think I have understood the issue. Since I fed only patches from 2 images, thus the batch never changes beyond 2. Sorry for the confusion.

Now the questions remain,
I am giving Dataset one image and one segmentation path in index, which loads one image[170,256,256] and segmentation[same size] file from directory. As you suggested, I am generating patches inside __getitem__
after patch operation,
img: [192, 16, 64, 64]
seg: [192, 8, 32, 32]
finally I create segmentation channels for 9 labels
img: [192, 1 , 16, 64, 64]
seg: [192, 9, 8, 32, 32]

This way I can directly load data from the directories and need not save anything.

Is there a way to iterate 192 patches as
src: [batch_size, 1, 16, 64, 64]
trgt:[batch_size, 9, 8, 32, 32] and feed to Dataloader and also iterate 3D images in Dataset?

Now to do the same I had to save all the patches to an hdf5(around 8gb) file as and index one by one.

    g = h5py.File(hpath[0], 'r')
    print(g.keys())
    img = np.array(g['MR'])    # (114, 192, 16, 64, 64)
    seg = np.array(g['Mask'])   # (114, 192, 8, 32, 32)


    train_img = img.reshape(-1, 16, 64, 64)
    train_seg = seg.reshape(-1, 8, 32, 32)

    trainSet = SimpleDataset(train_img, train_seg)
    x, y = next(iter(trainSet))
    print(x.shape, '\n', y.shape)
    # (1, 1, 16, 64, 64)
    # (1, 9, 8, 32, 32)

Is there a more efficient way to do that?
I hope I make sense and convey it clearly :stuck_out_tongue:

I’m not sure I understand the issue correctly.
The Dataset approach would create the patches and the DataLoader would then return tensors with batch_size*num_patches in dim0, which you could feed directly into the model.
Why would you have to store the patches again?

so if I generate patches inside __getitem__ it gives [192, 1, 64, 64, 64] img from Dataset. If I pass it through Dataloader with batch_size 2, finally source is [2, 192, 1, 1, 64, 64, 64]
Even if I reshape it to [384, 1, 64, 64, 64]

How would I feed it to the model?

[N.B. input to the Dataset is actually location to each 3D image paths.]

You would feed it in the “flattened” shape to the model, i.e. [batch_size*num_patches=384, channels=1, ...].
This would increase the “new batch size” by a factor of num_patches, but would still work.
Let me know, if I’m still misunderstanding the issue or what kind of errors you are getting.

In that case, GPU runs out of memory:

>> RuntimeError: CUDA out of memory. Tried to allocate 3.00 GiB (GPU 0; 10.89 GiB total capacity; 4.70 GiB already allocated; 2.16 GiB free; 4.70 GiB reserved in total by PyTorch)

In that case you could lower the batch size to 1 and check, if it’s still running out of memory.
If that’s the case, keep the batch size at 1, split the batch in the DataLoader loop in dim0 and loop over smaller input tensors.
Alternatively, you could also try to create the patches from each image in several steps in the __getitem__, but I think this would be much more complicated, as you might need a custom sampler for it etc.

1 Like

Hey @ptrblck
I think the looping works:

    train_loader = DataLoader(train_set, batch_size=1, shuffle=True, num_workers=8)
    print(len(train_loader))
    # x, y, z = next(iter(train_loader))
    # print(x.shape, '\n', y.shape, '\n', z) #torch.Size([1, 192, 1, 16, 64, 64])
                                           #torch.Size([1, 192, 9, 8, 32, 32])
    model = UNet3D(in_channel=1, n_classes=9).to(device)
    bSize = 4
    # the loop below would be within epochs
    for idx, sample in enumerate(train_loader):
        print('idx: {}'.format(idx), 'sub: {}'.format(sample[2]))
        for i in range(0, sample[0].shape[1], bSize):
            img = sample[0][0, i:i+bSize, 0, :, :, :] # creating batches
            img = img.unsqueeze(1)
            # print(img.shape)
            seg = sample[1][0, i:i+bSize, 0:9, :, :, :]
            # print(seg.shape)
            img, seg = img.to(device), seg.to(device)
            out = model(img)
            # print('Out: {}'.format(out.shape))

The same code also iterates the main subject images, creates the patches from each inside Dataset, takes samples from Dataloader, and chunks from sample fed to GPU/model at a time.
Thanks.

Is there a better way to check that correct data are sent to the model, apart from the tensor shape?

Just out of curiosity, are there Pytorch functionality for that? Some of the posts I have seen has words like Torchnet, ConcatDataset, ChunkSize and things like that?

You could create pre-defined tensors as the input to the Dataset and verify the values in the training loop.
E.g. create 10 samples, where each image contains a specific scalar value (you could use torch.arange, unsqueeze the needed dimensions, and expand the tensor). Before passing the chunks to the model, print the data and make sure the samples look correct and are not e.g. interleaved.

Torchnet was a higher-level API for Torch7 (LuaTorch), if I’m not mistaken. ConcatDataset is used to concatenate different datasets to a single one (which wouldn’t help here), and I don’t know what ChunkSize refers to.
That being said, I don’t think this functionality is added to PyTorch core modules, but might be available in a high-level API, such as Ignite, Lightning, or Catalyst.