Reading Long et al (2015, FCN), Huang et al (2018, DenseNet), Chen et al (2017, atrous convolution), Badrinarayanan et al (2017, SegNet), Chen et al (2017, DeepLab), Chen et al (2016, CRFs), I got confused by the use of term ‘dense’.
From what I understood, in Long et al, ‘dense’ features are the opposite of ‘sparse’ features, i.e. feature maps with few or no zeros instead of many 0s obtained by convolving with the sparse features, so that the features are ‘densified’ after convolution.
At the same time, in Chen et al (atrous convolution), it says ‘extract denser feature maps by removing the downsampling operations from the last few layers and upsampling the corresponding filter kernels, equivalent to inserting holes between filter weights’. I don’t quite understand this statement, as it seems to apply another meaning to the term ‘dense features’ than the one above. It seems to relate to the term ‘feature resolution’, meaning, roughly speaking, how much the features in the layer resemble the input image.
In Huang et al, Chen et al (2016) and Chen et al (DeepLab) it seems that ‘dense connections’ simply means ‘many connections’, e.g. thorough connections to the previous concatenated layers (DeepLab), or through fully connected layers (CRF, DeepLab).
Finally, ‘dense prediction/problem’ means ‘predict every pixel’, e.g. semantic segmentation, or simply ‘predict many things’, e.g. object detection.
Could someone clarify that?