So Im trying to implement the following framework:
Co-training on 2 different datasets. The two datasets will share a backbone net (say a VGG until the last pooling layer, we throw away fc layers). There will be two fc output layers, one for the predictions of Dataset 1 and the other for the predictions of Dataset 2. Each output layer / dataset will have its own loss function.
An approach that I tried is compromised of 3 basic steps: 1. give as input argument to model’s forward() the dataset that is forward propagated in the current batch, such that the model will compute and return ONLY the output related to dataset, so in each batch iteration the model will output the values of EITHER of the fc layers and NOT BOTH. 2. I combine the dataloaders of the datasets with itertools.zip_longest(), and 3. In each batch iteration for each dataset separately I do the following: a) compute the output and the loss, b) backpropagate ONLY for the relevant dataset/loss/fc layer. a) and b) are performed independently and sequentially for Dataset 1 and Dataset 2, hence in each iteration 2*batch_size updates are performed.
Currently Im getting very poor results with this approach, and Im not sure if it is because with my implementation something is done wrongly in Autograd, or because the two different datasets are not suitable to be combined together.
Do you see anything wrong in my implementation or do you have any better approach to do something like this? Thank you!