Use non-forward function to get loss in ddp

shy · September 18, 2020, 3:30am

Hi, i wonder if i can use a non-forward function to get output and loss in the distributed data parallel. in this case does gradient sync still work?

ptrblck · September 19, 2020, 7:34am

I wouldn’t rely on it.
What’s your use case that you need to use another function name?