For my binary segmentation model, I have two entries images and sensors data (history of numeric data) for each image.
The numeric data are temporal series.
Assuming I have
70 entries for each day
sequence lenght = 20 (days)
image batch size = 10
images size = torch.Size([10, 3, 256, 256])
sensors_data size = torch.Size([10, 20, 70])
For semantic segmentation I am using Unet Encoder , decoder
for numeric data I am using RNN, the Idea is to take the output of the RNN as a weight that forces the segmentation.
So my forward function is as folows
def forward(self, images, x):
encoded = sef.encoder(images)
decoded = self.decoder(encoded)
Fuse = torch.clone(decoded) #([10, 2, 256, 256])
h0 = torch.zeros(self.layer_dim, 1, self.hidden_dim).requires_grad_(true).to(self.device)
hidden = torch.empty_like(torch.unsqueeze(h0,0))
out = torch.empty(1).requires_grad_().to(self.device)
#loop over image batchs
for i in range(x.size(1)):
#loop over days in each image batch
for j in range(x.size(0)):
out, h0 = self.rnn(x[j:j+1,i:i+1,:], h0.detach())
out = out[:, -1, :]
out = self.fc(out)
last = torch.clone(out[0])
myoutput = torch.cat((myoutput, last),0)
#fusion part
Weights = torch.clone(My_output)
Weights = torch.where(Weights > 35, 0.7, 0.3)
Weights = Weights.view(Weights.size(0), 1, 1, 1)
#print(Weights)
Fuse[:,1,None] = torch.mul(Fuse[:,1,None],Weights)
return Decoded, myoutput, Fuse
in the trainning loop :
logits , sensors , fusion = model(images, x)
optimizer1.zero_grad()
optimizer2.zero_grad()
optimizer3.zero_grad()
loss1 = F.cross_entropy(logits, labels)
loss0 = Funcloss0(sensors, labels_sensors)
loss2 = F.cross_entropy(fusion, labels_fusion)
loss0.backward(retain_graph=True)
loss1.backward(retain_graph=True)
loss2.backward(retain_graph = False)
PS : I kept the output without fusion to compare results
The problem is that the prediction of RNN are very bad and I don’t know why, can you spot the mistake. Does this seems logical?