CRNN for sequence classification

Hello Forum

Im currently building a CRNN (CNN followed by RNN) which needs to classify ship-types according to their movement/behavoir. Im using AIS data which is transformed into a [lat, lon, time] data sequence.

The idea is to use the CNN as feature extraction network and then use the RNN to classify from found features.

The network i have is unfortunately not working. I trained it on 1000 Cargo ship tracks, 1000 Passenger ship tracks and 1000 Fishing ship tracks. The result is an accurancy of 30% which is the same as the network essentially just guessing same class over and over.

My Net is the following: First i have 3 convolutional layers, then 4 recurrent layers.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        ### RNN ###
        self.rnn1 = nn.GRU(input_size=32, #input is the output from CNN
                            hidden_size=hidden_size,
                            num_layers=1)
        
        self.rnn2 = nn.GRU(input_size=hidden_size,
                            hidden_size=hidden_size,
                            num_layers=1)
        
        self.rnn3 = nn.GRU(input_size=hidden_size,
                            hidden_size=hidden_size,
                            num_layers=1)
        
        self.rnn4 = nn.GRU(input_size=hidden_size,
                            hidden_size=hidden_size,
                            num_layers=1)
        
        self.activation = nn.ReLU()
        
        ### END ###
        self.dense1 = nn.Linear(hidden_size, 3)
        
        
        ### CNN ###
        self.conv1 = nn.Sequential(         
            nn.Conv1d(
                in_channels=3,              
                out_channels=8,             
                kernel_size=5,              
                stride=1,                   
                padding=2,                  
            ),                              
            nn.ReLU(),                      
            #nn.MaxPool1d(kernel_size=2),    # reduce dimension of sequece by half
        )
        self.conv2 = nn.Sequential(         
            nn.Conv1d(
                in_channels=8,              
                out_channels=16,             
                kernel_size=5,              
                stride=1,                   
                padding=2,                  
            ),                              
            nn.ReLU(),                      
            #nn.MaxPool1d(kernel_size=2),    # reduce dimension of sequece by half
        ) 
        self.conv3 = nn.Sequential(         
            nn.Conv1d(
                in_channels=16,              
                out_channels=32,             
                kernel_size=5,              
                stride=1,                   
                padding=2,                  
            ),                              
            nn.ReLU(),                      
            #nn.MaxPool1d(kernel_size=2),    # reduce dimension of sequece by half
        )    
            
    def forward(self, x, hidden, batch_size):
        
        x = self.conv1(x.double()) #inputs (1,3,batch_size)
        x = self.conv2(x.double())
        x = self.conv3(x.double())
        
        #Reshape batch for RNN training:
        x = x.reshape(batch_size,1,32)
        
        
        x, hidden = self.rnn1(x, hidden) #inputs (seq_len,1,3)
        
        x = self.activation(x)
        x, hidden = self.rnn2(x, hidden)
        x = self.activation(x)
        x, hidden = self.rnn3(x, hidden)
        x = self.activation(x)
        x, hidden = self.rnn4(x, hidden)
        
        
        #x = x.select(0, maxlen-1).contiguous()
        
        x = x.view(-1, hidden_size)
        x = F.relu(self.dense1(x))
        
        
        return x, hidden #Returns prediction for all batch_size timestamps. i.e [batch_size, 3]

    def init_hidden(self):
        weight = next(self.parameters()).data
        return Variable(weight.new(1, 1, hidden_size).zero_())

My optimizer and criterion:

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001, weight_decay=0.5)

and my training phase:

def train():
    print("Training Initiated!")
    model.train()
    hidden = model.init_hidden() #Initiate hidden
    for step, data in enumerate(train_set_all):
        X = data[0] #Entire sequence
        y = data[1] #[1,0,0] or [0,1,0] or [0,0,1]
        y = y.long()
        #print(y.size())
        ### Split sequence into batches:
        batch_size = 50 # split sequence into mini-sequences of size 50
        max_batches = int(X.size(2)/batch_size)
        
        for nbatch in range(max_batches):
             
            model.zero_grad()
            output, hidden = model(X[:,:,nbatch*batch_size:batch_size+nbatch*batch_size], Variable(hidden.data), batch_size)
            
            
            
            loss = criterion(output, torch.max(y[:,nbatch*batch_size:batch_size+nbatch*batch_size,:].reshape(batch_size,3), 1)[1])
            loss.backward()
            optimizer.step()
        
        
        print(step)

my question is. Does it makes sense? I know this is quite a question. At the moment i use batches of 50 time samples. This is so that the convolutional part of the network have something to convolve around. Ideally i would just feed it a single timestamp at a time but the result was the same (30 % accurancy).

Am i missing something between the networks? I.e. between the CNN and the RNN. Right now i just reshape the data so it fits the RNN requirements. Do i need anything else?

I cant seem to find any good tutorials on CRNNs only a few examples of source codes. But i find those hard to rewrite into my example when i have no information other than the code.

My labels are simply [0, 1, 0] or [1, 0, 0] or [0,0,1]. Is this correct? Should i use [1,2,3] or something of the like? And does it make sense to use the criterion that i use which i have labels as that? Is it possible to get a probability out as output from the network? Such that class 1 might be 20, class to might be 30 and class 3 might be 50? With all summing to 100? I think that would be ideal. :slight_smile:

Any help on CRNNs are highly appreciated.

Try increasing cnn layers since feature extraction is essential.Try using dropout

Thanks for your suggestion. Where should i place my nn.dropout() layer? at the very end? after all RNN layers or between CNN/RNN?

after every cnn layer’s activation or batch norm place dropout of 0.2 and rnn there is an argument for dropout use that

Thanks ill try that :slight_smile: