Feeding dictionary of tensors to model on gpu

Hi all,

I am using a DataLoader to feed x and y to my model during training. However, the y object is a built-in python dictionary containing 2 types of labels y = {‘target1’ : Tensor1 , ‘target2’: Tensor2}.

I want to load y into the gpu. However, this is not possible directly on the dict. I know that I could extract target 1 and 2, load them separately into cuda and provide this data to the model as such:

target1 = y[‘target1’].to(device)
target2 = y[‘target2’].to(device)
model(x, target1, target2)

But, for many reasons, I have made the design decision to feed the dict to the model directly:


Is there an elegant solution for this ? I see 2 options and for both I have no clue if they are sound:

  1. untangle Y within the .forward() method of my model. Meaning that I would extract the targets from the Y dictionary inside the .forward method and send them to the gpu inside.

  2. Sending the values in my Y dictionary to the gpu separately but keep them in the dict structure.

y[‘target1’] = y[‘target1’].to(device)
y[‘target2’] = y[‘target2’].to(device)
model(x, y)

For both methods I am worried about unwanted inneficiency. Could anyone help me with understanding if those methods would make sense and if not, why not and what else can I try ?

Thanks !

1 Like

Both approaches should work and even nn.DataParallel should create chunks of your dict in case you want to unwrap it inside the forward method.
I don’t know of any advantages of one approach over the other, if you need to use dicts.