Using torch.compile() with DDP

JamesDickens · January 10, 2023, 10:11pm

I’ve been trying to check out the new torch.compile() feature for a vision transformer and its working pretty well for one gpu, but when I use distributed data parallel, I get a pickling error as soon as I call mp.spawn with my training loop process.

I get the error:
AttributeError: Can’t pickle local object ‘convert_frame.._convert_frame’

Are there any resources/tutorials to use the new compile feature with DDP training?