Hi,
Is there a way to not split batches evenly between gpus ?
Let’s give an example. Imagine we have a dataset where each sample has one document with one or more query vectors and the position of the related answer.
Example:
(Doc1, (q11, p11), (q12, p12)),
(Doc2, (q21, p21), (q22, p22), (q23, p23), (q24, p24))
(Doc3, (q31, p31))
(Doc4, (q41, p41))
Our batches could be:
batch = {
'doc': [doc1_tokens, doc2_tokens, doc3_tokens, doc4_tokens],
'query': [q11, q12, q21, q22, q23, q24, q31, q41],
'y_pos': [p11, p12, p21, p22, p23, p24, p31, p41],
}
repetition_vector = [2, 4, 1, 1]
For efficiency, a first module could compute the representation of the documents (RNN) and then a second one could repeat the embedding and use the query with.
doc_embds = model1(batch['docs'])
doc_embds = torch.repeat_interleave(doc_embds, repetition_vector, dim=1)
y_hat_pos = model2(doc_embds, batch['query'])
loss = criterion(y_hat_pos, batch['y_pos'])
PROBLEM
Let’s imagine we have 2 gpus. doc_embds is gonna be split evenly into [doc1, doc2] and [doc3, doc4]; it will generate final embeddings such as [doc1, doc1, doc2, doc2, doc2, doc2] and [doc3, doc4] while queries and y_pos gonna be [q11, q12, q21, q22] with [q23, q24, q31, q41] and [0, 6, 3, 7] and [4, 5, 5, 9].
In this case, tensors will not match on each GPUs are the query vectors have been split evenly (which shouldn’t be the case).