Use non-forward function to get loss in ddp

Hi, i wonder if i can use a non-forward function to get output and loss in the distributed data parallel. in this case does gradient sync still work?

I wouldn’t rely on it.
What’s your use case that you need to use another function name?