fMRI data loader

Rozgns · September 30, 2019, 3:01pm

Hi everyone,

I am trying to set up a network to train on fMRI task data (sequential volumes) along side with T1 (single volume). fMRI data has 1200 volumes for one single subject. What is a smart and efficient way to set up this data loader to avoid memory issues? Help me brain storming on this.

Thank you in advance,
Roza

JuanFMontesinos · September 30, 2019, 3:19pm

It depends on you resources. The fastest way is always using numpy arrays with memory map but ofc that’s raw data and you need hard disk to store it. If your data is compressed then you will have to pay the price of reading and decoding.

albanD · September 30, 2019, 3:27pm

If your data is too big to fit in memory. You can use a Dataset/Dataloader similar to what is used for Imagenet: Every time a sample is needed, it is read on the disk, preprocessed then returned. The dataloader num_workers argument allow you to have multiple processes doing this loading/preprocessing on the side to make sure it is not a bottleneck.

In particular, you can implement your own torch.utils.data.Dataset. Which requires to implement a __getitem__ to retrieve one sample from your dataset and __len__ that tells the length of your dataset.
You should do loading and any required preprocessing in the __getitem__ method and return one sample ready to be forwarded in your network.
Then you can use the basic dataloader with few workers to make that dataset loading parallel ! Note that you should experiment with the number of workers and find the best one by experiment.