For my binary segmentation model, I have two entries images and sensors data (history of numeric data) for each image.
The numeric data are temporal series.
Assuming I have
70 entries for each day
sequence lenght = 20 (days)
image batch size = 10
images size = torch.Size([10, 3, 256, 256])
sensors_data size = torch.Size([10, 20, 70])
For semantic segmentation I am using Unet Encoder , decoder
for numeric data I am using RNN, the Idea is to take the output of the RNN as a weight that forces the segmentation.
So my forward function is as folows
def forward(self, images, x): encoded = sef.encoder(images) decoded = self.decoder(encoded) Fuse = torch.clone(decoded) #([10, 2, 256, 256]) h0 = torch.zeros(self.layer_dim, 1, self.hidden_dim).requires_grad_(true).to(self.device) hidden = torch.empty_like(torch.unsqueeze(h0,0)) out = torch.empty(1).requires_grad_().to(self.device) #loop over image batchs for i in range(x.size(1)): #loop over days in each image batch for j in range(x.size(0)): out, h0 = self.rnn(x[j:j+1,i:i+1,:], h0.detach()) out = out[:, -1, :] out = self.fc(out) last = torch.clone(out) myoutput = torch.cat((myoutput, last),0) #fusion part Weights = torch.clone(My_output) Weights = torch.where(Weights > 35, 0.7, 0.3) Weights = Weights.view(Weights.size(0), 1, 1, 1) #print(Weights) Fuse[:,1,None] = torch.mul(Fuse[:,1,None],Weights) return Decoded, myoutput, Fuse
in the trainning loop :
logits , sensors , fusion = model(images, x) optimizer1.zero_grad() optimizer2.zero_grad() optimizer3.zero_grad() loss1 = F.cross_entropy(logits, labels) loss0 = Funcloss0(sensors, labels_sensors) loss2 = F.cross_entropy(fusion, labels_fusion) loss0.backward(retain_graph=True) loss1.backward(retain_graph=True) loss2.backward(retain_graph = False)
PS : I kept the output without fusion to compare results
The problem is that the prediction of RNN are very bad and I don’t know why, can you spot the mistake. Does this seems logical?