Im currently building a CRNN (CNN followed by RNN) which needs to classify ship-types according to their movement/behavoir. Im using AIS data which is transformed into a [lat, lon, time] data sequence.
The idea is to use the CNN as feature extraction network and then use the RNN to classify from found features.
The network i have is unfortunately not working. I trained it on 1000 Cargo ship tracks, 1000 Passenger ship tracks and 1000 Fishing ship tracks. The result is an accurancy of 30% which is the same as the network essentially just guessing same class over and over.
My Net is the following: First i have 3 convolutional layers, then 4 recurrent layers.
class Net(nn.Module): def __init__(self): super(Net, self).__init__() ### RNN ### self.rnn1 = nn.GRU(input_size=32, #input is the output from CNN hidden_size=hidden_size, num_layers=1) self.rnn2 = nn.GRU(input_size=hidden_size, hidden_size=hidden_size, num_layers=1) self.rnn3 = nn.GRU(input_size=hidden_size, hidden_size=hidden_size, num_layers=1) self.rnn4 = nn.GRU(input_size=hidden_size, hidden_size=hidden_size, num_layers=1) self.activation = nn.ReLU() ### END ### self.dense1 = nn.Linear(hidden_size, 3) ### CNN ### self.conv1 = nn.Sequential( nn.Conv1d( in_channels=3, out_channels=8, kernel_size=5, stride=1, padding=2, ), nn.ReLU(), #nn.MaxPool1d(kernel_size=2), # reduce dimension of sequece by half ) self.conv2 = nn.Sequential( nn.Conv1d( in_channels=8, out_channels=16, kernel_size=5, stride=1, padding=2, ), nn.ReLU(), #nn.MaxPool1d(kernel_size=2), # reduce dimension of sequece by half ) self.conv3 = nn.Sequential( nn.Conv1d( in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2, ), nn.ReLU(), #nn.MaxPool1d(kernel_size=2), # reduce dimension of sequece by half ) def forward(self, x, hidden, batch_size): x = self.conv1(x.double()) #inputs (1,3,batch_size) x = self.conv2(x.double()) x = self.conv3(x.double()) #Reshape batch for RNN training: x = x.reshape(batch_size,1,32) x, hidden = self.rnn1(x, hidden) #inputs (seq_len,1,3) x = self.activation(x) x, hidden = self.rnn2(x, hidden) x = self.activation(x) x, hidden = self.rnn3(x, hidden) x = self.activation(x) x, hidden = self.rnn4(x, hidden) #x = x.select(0, maxlen-1).contiguous() x = x.view(-1, hidden_size) x = F.relu(self.dense1(x)) return x, hidden #Returns prediction for all batch_size timestamps. i.e [batch_size, 3] def init_hidden(self): weight = next(self.parameters()).data return Variable(weight.new(1, 1, hidden_size).zero_())
My optimizer and criterion:
criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.0001, weight_decay=0.5)
and my training phase:
def train(): print("Training Initiated!") model.train() hidden = model.init_hidden() #Initiate hidden for step, data in enumerate(train_set_all): X = data #Entire sequence y = data #[1,0,0] or [0,1,0] or [0,0,1] y = y.long() #print(y.size()) ### Split sequence into batches: batch_size = 50 # split sequence into mini-sequences of size 50 max_batches = int(X.size(2)/batch_size) for nbatch in range(max_batches): model.zero_grad() output, hidden = model(X[:,:,nbatch*batch_size:batch_size+nbatch*batch_size], Variable(hidden.data), batch_size) loss = criterion(output, torch.max(y[:,nbatch*batch_size:batch_size+nbatch*batch_size,:].reshape(batch_size,3), 1)) loss.backward() optimizer.step() print(step)
my question is. Does it makes sense? I know this is quite a question. At the moment i use batches of 50 time samples. This is so that the convolutional part of the network have something to convolve around. Ideally i would just feed it a single timestamp at a time but the result was the same (30 % accurancy).
Am i missing something between the networks? I.e. between the CNN and the RNN. Right now i just reshape the data so it fits the RNN requirements. Do i need anything else?
I cant seem to find any good tutorials on CRNNs only a few examples of source codes. But i find those hard to rewrite into my example when i have no information other than the code.
My labels are simply [0, 1, 0] or [1, 0, 0] or [0,0,1]. Is this correct? Should i use [1,2,3] or something of the like? And does it make sense to use the criterion that i use which i have labels as that? Is it possible to get a probability out as output from the network? Such that class 1 might be 20, class to might be 30 and class 3 might be 50? With all summing to 100? I think that would be ideal.
Any help on CRNNs are highly appreciated.