Pippy I can't see backward pass

Hi!

I’m using pippy package and I experimented the following code.

https://github.com/pytorch/tau/blob/main/examples/resnet/pippy_resnet.py

I only changed dataset from cifar10 to imagenet, and world size from 5 to 4(annotate_split_points block too.) because I only have 4 gpus.
But, I can’t see backward pass.
this is the result of print(pipe)

def forward(self, input, target):
    input_1 = input
    submod_0 = self.submod_0(input_1);  input_1 = None
    submod_1 = self.submod_1(submod_0);  submod_0 = None
    submod_2 = self.submod_2(submod_1, target);  submod_1 = target = None
    getitem = submod_2[0]
    getitem_1 = submod_2[1];  submod_2 = None
    sync_barrier = pippy_backward_sync_barrier((getitem, getitem_1), [], None);  getitem = getitem_1 = None
    return sync_barrier
    

there’s no stage_backward.
and there’s only forward pass stage in tensorbard as you can see…

how can i get the stage backward?
Thank you.

cc: @kwen2501 on PiPPy example question.

edited// this reply was my mistake

I changed LossWrapper block to


    class OutputLossWrapper(LossWrapper):
        def forward(self, x, target):
            return self.loss_fn(self.module(x), target) 

    wrapper = OutputLossWrapper(model, cross_entropy)

.
the original version from github example I mentioned is

    class OutputLossWrapper(LossWrapper):
        def __init__(self, module, loss_fn):
            super().__init__(module, loss_fn)

        def forward(self, input, target):
            output = self.module(input)
            return output, self.loss_fn(output, target)

    wrapper = OutputLossWrapper(model, cross_entropy)

.
.
.
and now I got stage_backward like this

def forward(self, x, target):
    submod_0 = self.submod_0(x)
    submod_1 = self.submod_1(submod_0)
    submod_2 = self.submod_2(submod_1, target)
    stage_backward = pippy_backward_stage_backward(stage_output = (submod_2,), output_grads = (None,), input_values = [submod_1, target], outputs_with_grads_idxs = [0], stage_info = 'stage_backward for stage %submod_2 : [#users=2] = call_module[target=submod_2](args = (%submod_1, %target), kwargs = {})');  target = None
    getitem = stage_backward[0]
    getitem_1 = stage_backward[1];  stage_backward = None
    getitem_2 = getitem[0]
    getitem_3 = getitem[1];  getitem = None
    stage_backward_1 = pippy_backward_stage_backward(stage_output = (submod_1,), output_grads = (getitem_2,), input_values = [submod_0], outputs_with_grads_idxs = [0], stage_info = 'stage_backward_1 for stage %submod_1 : [#users=3] = call_module[target=submod_1](args = (%submod_0,), kwargs = {})');  submod_1 = getitem_2 = None
    getitem_4 = stage_backward_1[0]
    getitem_5 = stage_backward_1[1];  stage_backward_1 = None
    getitem_6 = getitem_4[0];  getitem_4 = None
    stage_backward_2 = pippy_backward_stage_backward(stage_output = (submod_0,), output_grads = (getitem_6,), input_values = [x], outputs_with_grads_idxs = [0], stage_info = 'stage_backward_2 for stage %submod_0 : [#users=3] = call_module[target=submod_0](args = (%x,), kwargs = {})');  submod_0 = getitem_6 = x = None
    getitem_7 = stage_backward_2[0]
    getitem_8 = stage_backward_2[1];  stage_backward_2 = None
    getitem_9 = getitem_7[0]
    sync_barrier = pippy_backward_sync_barrier(submod_2, [getitem_1, getitem_5, getitem_8], getitem_7);  submod_2 = getitem_1 = getitem_5 = getitem_8 = getitem_7 = None
    return sync_barrier
 

but it doesn’t seem to be trained well.
and I printed loss value(=pipe_driver(x, target)) , It is greater than 1.

Why doesn’t the original OutputLossWrapper make backward stage?

Hi @nomaue thanks for reporting this issue and glad you found a workaround.
We will investigate why returning two outputs stops PiPPy from generating the backward pass and report back here.

3 Likes

Thank you @kwen2501!
I have one more question.
I also experimented the following code, but it seems not to work for pipelining.

when I set

DIMS = [28 * 28, 300, 100, 10]
DP_LAYERS = 2
PP_LAYERS = 1
#nnode=1, nproc_per_node=2

It works well for Dataparallel,
but when I set like below,

DIMS = [28 * 28, 300, 100, 10]
DP_LAYERS = 1
PP_LAYERS = 2
#nnode=1, nproc_per_node=2

or

DIMS = [28 * 28, 500, 250, 100, 50, 25, 10]
DP_LAYERS = 2
PP_LAYERS = 4
#nnode=2, nproc_per_node=4

It doesn’t work at all. could this problem be related to the issue I suggested at first? or did I set somethin incorreclty? If I need to post a new topic for this, I’ll do that.
Thank you.

Hi @nomaue, I just came back from PTO.
Regarding the issue of PiPPy failing to generate the backward pass, I submitted a fix here:

Thanks for reporting this issue!

The issue in the ddp2pipe example is a separate issue.
The model in the ddp2pipe example contains two parts: a tradition DDP part followed by a pipeline. And a special connection is built to connect those two parts.
It is not very stable for now as it only works for the parameters specified in the example. It may also work for CPU only due to a hang in GPU mode.

Many thanks for you reply @kwen2501. I’ll try running the code.

Hi @kwen2501 . so, Is ‘PP+DP’ unavailable on single or multi-node so for? for both any cnn model and transformer?

Hi,
PP + DP is available today through PiPPy’s init_data_parallel API.

Documentation:

Example: