Distributed training creates multiple processes in GPU0

Found the bug. So we need to be careful with setting the right GPU context while calling clear_cache() function, otherwise it allocates fixed memory on GPU0 for the other GPUs. Relevant issue here.

1 Like