Hello Forum

Im currently building a CRNN (CNN followed by RNN) which needs to classify ship-types according to their movement/behavoir. Im using AIS data which is transformed into a [lat, lon, time] data sequence.

The idea is to use the CNN as feature extraction network and then use the RNN to classify from found features.

The network i have is unfortunately not working. I trained it on 1000 Cargo ship tracks, 1000 Passenger ship tracks and 1000 Fishing ship tracks. The result is an accurancy of 30% which is the same as the network essentially just guessing same class over and over.

My Net is the following: First i have 3 convolutional layers, then 4 recurrent layers.

```
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
### RNN ###
self.rnn1 = nn.GRU(input_size=32, #input is the output from CNN
hidden_size=hidden_size,
num_layers=1)
self.rnn2 = nn.GRU(input_size=hidden_size,
hidden_size=hidden_size,
num_layers=1)
self.rnn3 = nn.GRU(input_size=hidden_size,
hidden_size=hidden_size,
num_layers=1)
self.rnn4 = nn.GRU(input_size=hidden_size,
hidden_size=hidden_size,
num_layers=1)
self.activation = nn.ReLU()
### END ###
self.dense1 = nn.Linear(hidden_size, 3)
### CNN ###
self.conv1 = nn.Sequential(
nn.Conv1d(
in_channels=3,
out_channels=8,
kernel_size=5,
stride=1,
padding=2,
),
nn.ReLU(),
#nn.MaxPool1d(kernel_size=2), # reduce dimension of sequece by half
)
self.conv2 = nn.Sequential(
nn.Conv1d(
in_channels=8,
out_channels=16,
kernel_size=5,
stride=1,
padding=2,
),
nn.ReLU(),
#nn.MaxPool1d(kernel_size=2), # reduce dimension of sequece by half
)
self.conv3 = nn.Sequential(
nn.Conv1d(
in_channels=16,
out_channels=32,
kernel_size=5,
stride=1,
padding=2,
),
nn.ReLU(),
#nn.MaxPool1d(kernel_size=2), # reduce dimension of sequece by half
)
def forward(self, x, hidden, batch_size):
x = self.conv1(x.double()) #inputs (1,3,batch_size)
x = self.conv2(x.double())
x = self.conv3(x.double())
#Reshape batch for RNN training:
x = x.reshape(batch_size,1,32)
x, hidden = self.rnn1(x, hidden) #inputs (seq_len,1,3)
x = self.activation(x)
x, hidden = self.rnn2(x, hidden)
x = self.activation(x)
x, hidden = self.rnn3(x, hidden)
x = self.activation(x)
x, hidden = self.rnn4(x, hidden)
#x = x.select(0, maxlen-1).contiguous()
x = x.view(-1, hidden_size)
x = F.relu(self.dense1(x))
return x, hidden #Returns prediction for all batch_size timestamps. i.e [batch_size, 3]
def init_hidden(self):
weight = next(self.parameters()).data
return Variable(weight.new(1, 1, hidden_size).zero_())
```

My optimizer and criterion:

```
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001, weight_decay=0.5)
```

and my training phase:

```
def train():
print("Training Initiated!")
model.train()
hidden = model.init_hidden() #Initiate hidden
for step, data in enumerate(train_set_all):
X = data[0] #Entire sequence
y = data[1] #[1,0,0] or [0,1,0] or [0,0,1]
y = y.long()
#print(y.size())
### Split sequence into batches:
batch_size = 50 # split sequence into mini-sequences of size 50
max_batches = int(X.size(2)/batch_size)
for nbatch in range(max_batches):
model.zero_grad()
output, hidden = model(X[:,:,nbatch*batch_size:batch_size+nbatch*batch_size], Variable(hidden.data), batch_size)
loss = criterion(output, torch.max(y[:,nbatch*batch_size:batch_size+nbatch*batch_size,:].reshape(batch_size,3), 1)[1])
loss.backward()
optimizer.step()
print(step)
```

my question is. Does it makes sense? I know this is quite a question. At the moment i use batches of 50 time samples. This is so that the convolutional part of the network have something to convolve around. Ideally i would just feed it a single timestamp at a time but the result was the same (30 % accurancy).

Am i missing something between the networks? I.e. between the CNN and the RNN. Right now i just reshape the data so it fits the RNN requirements. Do i need anything else?

I cant seem to find any good tutorials on CRNNs only a few examples of source codes. But i find those hard to rewrite into my example when i have no information other than the code.

My labels are simply [0, 1, 0] or [1, 0, 0] or [0,0,1]. Is this correct? Should i use [1,2,3] or something of the like? And does it make sense to use the criterion that i use which i have labels as that? Is it possible to get a probability out as output from the network? Such that class 1 might be 20, class to might be 30 and class 3 might be 50? With all summing to 100? I think that would be ideal.

Any help on CRNNs are highly appreciated.