I have a list as input. Some of the elements in the list are tensors of (batch_size, *), while some of the are tuples of size (batch_size, ).
the whole list is like this
(tensor, tensor, tuple, tuple)
For example, batch_size is 48, the tuple is like this
t = (a, b, c, d, ... , zz)
len(t) = 48
this cannot be put into cuda since the element is string.
In practice, the tensor can be split to 8 devices with the new batch_size 6 (48/8=6), but the tuple is copied to every device. So is there a clean way to scatter the tuple to each device?