Pippy I can't see backward pass

Hi!

I’m using pippy package and I experimented the following code.

I only changed dataset from cifar10 to imagenet, and world size from 5 to 4(annotate_split_points block too.) because I only have 4 gpus.
But, I can’t see backward pass.
this is the result of print(pipe)

def forward(self, input, target):
    input_1 = input
    submod_0 = self.submod_0(input_1);  input_1 = None
    submod_1 = self.submod_1(submod_0);  submod_0 = None
    submod_2 = self.submod_2(submod_1, target);  submod_1 = target = None
    getitem = submod_2[0]
    getitem_1 = submod_2[1];  submod_2 = None
    sync_barrier = pippy_backward_sync_barrier((getitem, getitem_1), [], None);  getitem = getitem_1 = None
    return sync_barrier
    

there’s no stage_backward.
and there’s only forward pass stage in tensorbard as you can see…

how can i get the stage backward?
Thank you.

cc: @kwen2501 on PiPPy example question.

edited// this reply was my mistake

I changed LossWrapper block to


    class OutputLossWrapper(LossWrapper):
        def forward(self, x, target):
            return self.loss_fn(self.module(x), target) 

    wrapper = OutputLossWrapper(model, cross_entropy)

.
the original version from github example I mentioned is

    class OutputLossWrapper(LossWrapper):
        def __init__(self, module, loss_fn):
            super().__init__(module, loss_fn)

        def forward(self, input, target):
            output = self.module(input)
            return output, self.loss_fn(output, target)

    wrapper = OutputLossWrapper(model, cross_entropy)

.
.
.
and now I got stage_backward like this

def forward(self, x, target):
    submod_0 = self.submod_0(x)
    submod_1 = self.submod_1(submod_0)
    submod_2 = self.submod_2(submod_1, target)
    stage_backward = pippy_backward_stage_backward(stage_output = (submod_2,), output_grads = (None,), input_values = [submod_1, target], outputs_with_grads_idxs = [0], stage_info = 'stage_backward for stage %submod_2 : [#users=2] = call_module[target=submod_2](args = (%submod_1, %target), kwargs = {})');  target = None
    getitem = stage_backward[0]
    getitem_1 = stage_backward[1];  stage_backward = None
    getitem_2 = getitem[0]
    getitem_3 = getitem[1];  getitem = None
    stage_backward_1 = pippy_backward_stage_backward(stage_output = (submod_1,), output_grads = (getitem_2,), input_values = [submod_0], outputs_with_grads_idxs = [0], stage_info = 'stage_backward_1 for stage %submod_1 : [#users=3] = call_module[target=submod_1](args = (%submod_0,), kwargs = {})');  submod_1 = getitem_2 = None
    getitem_4 = stage_backward_1[0]
    getitem_5 = stage_backward_1[1];  stage_backward_1 = None
    getitem_6 = getitem_4[0];  getitem_4 = None
    stage_backward_2 = pippy_backward_stage_backward(stage_output = (submod_0,), output_grads = (getitem_6,), input_values = [x], outputs_with_grads_idxs = [0], stage_info = 'stage_backward_2 for stage %submod_0 : [#users=3] = call_module[target=submod_0](args = (%x,), kwargs = {})');  submod_0 = getitem_6 = x = None
    getitem_7 = stage_backward_2[0]
    getitem_8 = stage_backward_2[1];  stage_backward_2 = None
    getitem_9 = getitem_7[0]
    sync_barrier = pippy_backward_sync_barrier(submod_2, [getitem_1, getitem_5, getitem_8], getitem_7);  submod_2 = getitem_1 = getitem_5 = getitem_8 = getitem_7 = None
    return sync_barrier
 

but it doesn’t seem to be trained well.
and I printed loss value(=pipe_driver(x, target)) , It is greater than 1.

Why doesn’t the original OutputLossWrapper make backward stage?

Hi @nomaue thanks for reporting this issue and glad you found a workaround.
We will investigate why returning two outputs stops PiPPy from generating the backward pass and report back here.

3 Likes

Thank you @kwen2501!
I have one more question.
I also experimented the following code, but it seems not to work for pipelining.

when I set

DIMS = [28 * 28, 300, 100, 10]
DP_LAYERS = 2
PP_LAYERS = 1
#nnode=1, nproc_per_node=2

It works well for Dataparallel,
but when I set like below,

DIMS = [28 * 28, 300, 100, 10]
DP_LAYERS = 1
PP_LAYERS = 2
#nnode=1, nproc_per_node=2

or

DIMS = [28 * 28, 500, 250, 100, 50, 25, 10]
DP_LAYERS = 2
PP_LAYERS = 4
#nnode=2, nproc_per_node=4

It doesn’t work at all. could this problem be related to the issue I suggested at first? or did I set somethin incorreclty? If I need to post a new topic for this, I’ll do that.
Thank you.