I have 3d MRI images and I wanted to use classification for them. But they are not too much I do not get a good accuracy. I wanted to use pretrained models. Most of the pretrained models are 2d and I couldnt find any pretrained models for 3d inputs. What is your recommendation to me? Is there any way to change my 3d inputs to 2d inputs? I can iterate on the slice dimension but how can I make a dataloader? What are your ideas when I have 3d inputs but I want to use 2d models? Can you please explain in code?
Hmmm I don’t know if 2d pretrained models are gonna be useful for MRI images.
That being said, you have some 3d models pretrained in datasets such us kinetics.
That being said I can offer you some alternatives:
-
[1711.11248v3] A Closer Look at Spatiotemporal Convolutions for Action Recognition Hybrid networks. You can apply some 3d convolutions at the beggining and then 2d convolutions. This way you can reduce the amount of parameters yet exploiding 3d info. You can adapt the 3D implementation to be initialized with 2d weights and finetune from there.
-
3D ResNet (pretrained) GitHub - kenshohara/3D-ResNets-PyTorch: 3D ResNets for Action Recognition (CVPR 2018)
-
Using 2d models by extracting features from each slice or the mri. Just convert your batch which is
B, T, H, W into BxT,H,W. Then extract the features (let’s say C feats). And you would get BxTxC, convert back to B,T,C. Aggregate the T feats to have a general descriptor.
Thanks for your quick reply.
can you explain more about the third way? I couldnt understand it completely.
Pytorch’s layers require a dimension to be “the batch dim”.
This means a 2 convolutional layer expects a tensor of size B,C,H,W.
However you have an additional dimension (time), thus B,T,H,W. Ideally you would like to extract information for each t slice independently. What you can do is to fold the time and batch dimensions into a single one. BxT,H,W which would act as a fictional batch dim B’,H,W. Then you extract the features so you would obtain something like B’,C.
As you know B’=BxT you can unfold the features so you recover the temporal dimension.
Then you just need to add some layers that combine the temporal information to perform classification.