Best approach to classify a SEQUENCE of images: Linear layer or something else?

I am working with CT scans which have a sequence of 150 slices in each scan. I take one slice at a time and run it through a 2D DenseNet (trained from scratch on medical images), and extract last layer features.

So at the end of feature extraction phase I have (num_features X 150) array. What is the best way to put a binary classifier on this array? Is it just putting one / multiple layers of Linear classifiers, LSTM, or are there any interesting Arxiv papers that I should look at?

nn.Linear( num_feature x 150, 2 )

Thanks!

Just put a multi-layer Linear + ReLU neural network on top of these features.

You can read this paper by Karpathy:

http://www.cv-foundation.org/openaccess/content_cvpr_2014/html/Karpathy_Large-scale_Video_Classification_2014_CVPR_paper.html

The early-fusion / late-fusion etc. are relevant.

1 Like