I am working with sequences that I dont have sufficient data of it. I aim to train a model to perform binary classification on 30s-long sequences. However, I have sufficient 10s-long sequences.
As, a result I have used scalogram to train a CNN, which performed quite well on the 10s data.
Then, I have divided the 30s dta to 3x10s data and extracted features using the trained CNN. Then by keeping the CNN parameters frozen I have trained the LSTM of the CNN-LSTM architecture.
Does it make sense to perform well?
My intuitition is that since I am extracting the features on the 30s image using the CNN trained on the 10s sequences I will be able to train the LSTM and then use the CNN-LSTM model to classify the 30s sequences.
I couldnt find any reference about this for 1d-sequences (this idea was inspired by the Human Action Recognition projects)
PS: when I trained once it performed poorly. Then I retrained it and though for training the loss and accuracy was approximately constant, in validation/testing accuracy (mean accuaracy went from 15% to 83%) was improving a lot and the loss was decreasing slightly (0.69 to 0.51). This is about binary classification
I am working with ECGs and unfortunately I have very few examples of well labelled 30s ECGs and plenty of 10s well labelled ECGs.
Does the self attention allows for variable size inputs?
You could pad the 10s clips with zeros on the front and back so that 2/3s is sent into the model as zeros. Then just randomly mask the front and back of the 30s clips so they also have 2/3s going in as zeros, but the input size is the same as the 10s clips. It would have the same effect as masking words for NLP models or blocking parts of an image for image classification models.
Self attention doesn’t change input/output size. It just helps the model to learn to selectively focus on important features and filter out the noise.
The size of the inputs should reflect what size they would normally be in real world use.
Thank you I will try it.
However I dont know how well it will perform because the disease I try to detect might be detectable for a really short period of time, thus masking too much of the 30s might not be ideal
Hello, I have been working on some other tests and now I will work on the self attention. Because I have never worked with attention before, should I use a CBAM just before the last layer that performs the binary classification since I am working with CNN.
I read that the multi head attention module in pytorch is for sequences (such as NLP) and I assume the features extracted cannot be treated as sequences.
I am working with resnet18, so from what you suggets I should use ConvBlock just after the AdaptiveAvgPool2d(output_size=(1, 1)) before the last fully connected layer, correct ?
Also, do I need only the AttentionBlock class or the other classes as well?
My apologies. That particular example is used on junctures of the skip connection and the main path. Here is another attention module that just takes one input, it’s called LinearAttention on line 211:
Not clear on your question. I just mean attention is best applied when there is a lot of data involved. It likely wouldn’t do anything to your final binary classification output of size 1. So earlier in the forward pass would be more appropriate.