Dealing with variable sized input signals for classification using FCN

BarkenBark · April 30, 2020, 7:54pm

Background:
I am working on classification of 1D signal data using Fully Convolutional Networks acting directly on the signal data.

My dataset consists of signals of various lengths. In order to deal with this during training, I choose an appropriate signal length, cropped all signals longer than that, and padded all signals shorter than that with zeros.

Problem:
After training models and inspecting the results, I’ve noticed that flat zero-valued regions in signals tend to bias the classification towards a few certain classes (The cause might be partly due to the length distribution of signals between classes). This is problematic when classifying shorter realizations of other classes which need to be padded, since even though the correct class activation is high for the original part of the signal, the flat region drastically decreases the activation for these classes.

In the end, since the FCN model can handle inputs of variable sizes, I think it would be ideal for me to train it on data without the padding causing the problems. But I don’t see how I could achieve that efficiently.

Question:
How can efficiently deal with this problem?

Ideas that I haven’t tried yet:

Pad signals by duplicating the original signal until target length is achieved rather than zero-pad
Zero-pad all signals excessively to hopefully make the resulting models agnostic to flat regions
Train the model using batch_size=1
Train the model on batches with variable sized inputs by accumulating gradients from each input separately

I understand that this is a huge open-ended question, so any ideas or shared experiences are very appreciated. Thank you.

googlebot · April 30, 2020, 11:01pm

Well, you can multiply intermediate layer outputs with binary mask, that would zero values computed from partially available inputs. You would start this mask shaped like input, with one channel, and do “min pooling” whenever data shape is changed by convolution or pooling operation. It is actually easier to do max pooling with inverse mask (1 = bad location), and invert it for multiplication.

correction: when doing convolution and pooling, different mask pooling modes should be used (i.e. data max-pooling output is good if ANY input is good, convolution - needs ALL inputs)