Two questions,
- Did you divide the the epoch size on each process by
world_size
? - Will there be any contention on the data loader?
cc @osalpekar
Also cc @zhangguanheng66 for transformer questions
Two questions,
world_size
?cc @osalpekar
Also cc @zhangguanheng66 for transformer questions