Tensor parallel spawns additional processes on GPU0 and uses additional memory

This is I think the same question from here Question about tensor parallel (DTensor, parallelize_module)