I am trying to design a halfway fusion technique, where I fuse the features from the two pictures (visual and thermal) to design a classifier. The technique requires that the feature extraction takes place within the network. Something like the image below.
I’m sorry if this is a bit confusing. I am new to Deep Learning and still learning to use PyTorch.
