Pippy I can't see backward pass

nomaue · January 19, 2023, 8:41am

Hi!

I’m using pippy package and I experimented the following code.

https://github.com/pytorch/tau/blob/main/examples/resnet/pippy_resnet.py

I only changed dataset from cifar10 to imagenet, and world size from 5 to 4(annotate_split_points block too.) because I only have 4 gpus.
But, I can’t see backward pass.
this is the result of print(pipe)

def forward(self, input, target):
    input_1 = input
    submod_0 = self.submod_0(input_1);  input_1 = None
    submod_1 = self.submod_1(submod_0);  submod_0 = None
    submod_2 = self.submod_2(submod_1, target);  submod_1 = target = None
    getitem = submod_2[0]
    getitem_1 = submod_2[1];  submod_2 = None
    sync_barrier = pippy_backward_sync_barrier((getitem, getitem_1), [], None);  getitem = getitem_1 = None
    return sync_barrier

there’s no stage_backward.
and there’s only forward pass stage in tensorbard as you can see…

how can i get the stage backward?
Thank you.

fduwjj · January 19, 2023, 6:23pm

cc: @kwen2501 on PiPPy example question.

nomaue · January 20, 2023, 2:04am

edited// this reply was my mistake

nomaue · January 20, 2023, 6:39am

I changed LossWrapper block to


    class OutputLossWrapper(LossWrapper):
        def forward(self, x, target):
            return self.loss_fn(self.module(x), target) 

    wrapper = OutputLossWrapper(model, cross_entropy)

.
the original version from github example I mentioned is

    class OutputLossWrapper(LossWrapper):
        def __init__(self, module, loss_fn):
            super().__init__(module, loss_fn)

        def forward(self, input, target):
            output = self.module(input)
            return output, self.loss_fn(output, target)

    wrapper = OutputLossWrapper(model, cross_entropy)

.
.
.
and now I got stage_backward like this

def forward(self, x, target):
    submod_0 = self.submod_0(x)
    submod_1 = self.submod_1(submod_0)
    submod_2 = self.submod_2(submod_1, target)
    stage_backward = pippy_backward_stage_backward(stage_output = (submod_2,), output_grads = (None,), input_values = [submod_1, target], outputs_with_grads_idxs = [0], stage_info = 'stage_backward for stage %submod_2 : [#users=2] = call_module[target=submod_2](args = (%submod_1, %target), kwargs = {})');  target = None
    getitem = stage_backward[0]
    getitem_1 = stage_backward[1];  stage_backward = None
    getitem_2 = getitem[0]
    getitem_3 = getitem[1];  getitem = None
    stage_backward_1 = pippy_backward_stage_backward(stage_output = (submod_1,), output_grads = (getitem_2,), input_values = [submod_0], outputs_with_grads_idxs = [0], stage_info = 'stage_backward_1 for stage %submod_1 : [#users=3] = call_module[target=submod_1](args = (%submod_0,), kwargs = {})');  submod_1 = getitem_2 = None
    getitem_4 = stage_backward_1[0]
    getitem_5 = stage_backward_1[1];  stage_backward_1 = None
    getitem_6 = getitem_4[0];  getitem_4 = None
    stage_backward_2 = pippy_backward_stage_backward(stage_output = (submod_0,), output_grads = (getitem_6,), input_values = [x], outputs_with_grads_idxs = [0], stage_info = 'stage_backward_2 for stage %submod_0 : [#users=3] = call_module[target=submod_0](args = (%x,), kwargs = {})');  submod_0 = getitem_6 = x = None
    getitem_7 = stage_backward_2[0]
    getitem_8 = stage_backward_2[1];  stage_backward_2 = None
    getitem_9 = getitem_7[0]
    sync_barrier = pippy_backward_sync_barrier(submod_2, [getitem_1, getitem_5, getitem_8], getitem_7);  submod_2 = getitem_1 = getitem_5 = getitem_8 = getitem_7 = None
    return sync_barrier

but it doesn’t seem to be trained well.
and I printed loss value(=pipe_driver(x, target)) , It is greater than 1.

Why doesn’t the original OutputLossWrapper make backward stage?

kwen2501 · January 21, 2023, 8:56am

Hi @nomaue thanks for reporting this issue and glad you found a workaround.
We will investigate why returning two outputs stops PiPPy from generating the backward pass and report back here.

nomaue · January 25, 2023, 2:00am

Thank you @kwen2501!
I have one more question.
I also experimented the following code, but it seems not to work for pipelining.

github.com

pytorch/tau/blob/main/examples/ddp2pipe/ddp2pipe.py

import argparse
# import logging
import os
import socket

import torch
import torch.distributed
import torch.distributed.rpc as rpc
import torch.multiprocessing as mp
import torch.nn.functional as F
from torch import nn, optim
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.utils.data import DistributedSampler
from torchvision import datasets
from torchvision.transforms import transforms
from tqdm import tqdm

from pippy import Pipe, PipelineDriverFillDrain, annotate_split_points, PipeSplitWrapper
from pippy.microbatch import TensorChunkSpec, CustomReducer
from pippy.utils import tp_transports

This file has been truncated. show original

when I set

DIMS = [28 * 28, 300, 100, 10]
DP_LAYERS = 2
PP_LAYERS = 1
#nnode=1, nproc_per_node=2

It works well for Dataparallel,
but when I set like below,

DIMS = [28 * 28, 300, 100, 10]
DP_LAYERS = 1
PP_LAYERS = 2
#nnode=1, nproc_per_node=2

or

DIMS = [28 * 28, 500, 250, 100, 50, 25, 10]
DP_LAYERS = 2
PP_LAYERS = 4
#nnode=2, nproc_per_node=4

It doesn’t work at all. could this problem be related to the issue I suggested at first? or did I set somethin incorreclty? If I need to post a new topic for this, I’ll do that.
Thank you.

kwen2501 · February 7, 2023, 9:00pm

Hi @nomaue, I just came back from PTO.
Regarding the issue of PiPPy failing to generate the backward pass, I submitted a fix here:

github.com/pytorch/tau

Fix issue that PiPPy does not generate backward pass when there are multiple output values

pytorch:main ← pytorch:fix_find_loss_from_wrapper

opened 08:46PM - 07 Feb 23 UTC

kwen2501

+8 -3

Issue was reported by user when trying the ResNet example: [Pippy I can’t see b…ackward pass](https://discuss.pytorch.org/t/pippy-i-cant-see-backward-pass/170630) ### Explanation of the Issue: When the output is in tuple form and there other values than loss in the tuple, e.g. `return output, loss`, PiPPy could not figure out which field is the loss, and hence did not generate the backward pass. ### Fix to the ResNet example: The return needs to be in a form of dict, e.g. `return {"output": output, "loss": loss}`, so that PiPPy can automatically locate the loss field by looking at the keyword. ### Fix to PiPPy's IR: We cannot assume that there is only one output value (loss) when the model is of the general `LossWrapper` class. Such assumption is only valid for the `TrivialLossWrapper` class.

Thanks for reporting this issue!

kwen2501 · February 7, 2023, 9:35pm

The issue in the ddp2pipe example is a separate issue.
The model in the ddp2pipe example contains two parts: a tradition DDP part followed by a pipeline. And a special connection is built to connect those two parts.
It is not very stable for now as it only works for the parameters specified in the example. It may also work for CPU only due to a hang in GPU mode.

nomaue · February 8, 2023, 1:29am

Many thanks for you reply @kwen2501. I’ll try running the code.

nomaue · February 21, 2023, 12:56am

Hi @kwen2501 . so, Is ‘PP+DP’ unavailable on single or multi-node so for? for both any cnn model and transformer?

kwen2501 · February 24, 2023, 7:56pm

Hi,
PP + DP is available today through PiPPy’s init_data_parallel API.

Documentation:

Example:

github.com

pytorch/tau/blob/main/test/local_test_ddp.py

# Copyright (c) Meta Platforms, Inc. and affiliates
import argparse
import copy
import os
import unittest

import torch
import torch.distributed.rpc as rpc

import pippy.fx
from pippy import run_pippy
from pippy.IR import (
    MultiUseParameterConfig,
    Pipe,
    TrivialLossWrapper,
    pipe_split,
)
from pippy.PipelineDriver import (
    PipelineDriverFillDrain,
    PipelineDriver1F1B,

This file has been truncated. show original