Split single model in multiple gpus, RuntimeError: tensors are on different GPUs

I am trying to use two inputs in my model and run the model two times on two different gpus along with paralleled. I read the existing posts and blogs in this regard but it sounds like that it is not easy at all in pytorch and it is painful :frowning:

I am out of hope but i just give some details about what i want to do, so maybe some one can help me…
the code that im using is pretty big so i cannot post it here.

I am using RFCN, from this repository link

inside the code we feed image data to base model to obtain base feature map (the base model is resnet). and do the suff on it.

now imagine i want to use another im_data, lets say im_data_2 and do the
base_feat = self.RCNN_base(im_data)

in another gpu, so i did

im_data_2 = im_data.detach().clone()
base_feat_2 = self.RCNN_base(im_data_2).cuda(3)

I monitor my gpus and i see that gpu 3 is doing something and the code run without any error on gpu 0 and 3. (for simplicity im just using batchsize 1 and not using dataparallel.

The error happens in the next part.

so now i want to use this base_feat_2 and give it to base_feat_2 = self.RCNN_conv_new(base_feat_2)
link,
so i did base_feat_2 = self.RCNN_conv_new(base_feat_2).cuda(3)

but it throws me the followqing error RuntimeError: tensors are on different GPUs

anyone can help me what im doing wrong, and is what im trying to do even possible?