Hi,
I was doing through the examples of tensor parallel. After spending few hours, I still don’t get my head around.
Let’s assume that we have the following class and we have 2 GPUs.
How can we use ColwiseParallel tensor paralelisim to store the first half of the entity_embeddings and of the relation_embeddings in the first GPU and other halfs into second GPU.
Constantly, the following RuntimeError occurs
RuntimeError: Function EmbeddingBackward0 returned an invalid gradient at index 0 - got [135, 32] but expected shape compatible with [46, 32]
The goal is to apply columnwise TP so that
the first 16 columns of entity_embeddings and relation_embeddings are kept on the first GPU and the last 16 columns of entity_embeddings and relation_embeddings are on the second GPU.
In your code, I didn’t see TP being applied. You need to call parallelize_module explicitly with a plan (e.g. "tok_embeddings": RowwiseParallel()) to parallelize the module before usage.