nomaue
(nomaue)
January 25, 2023, 2:00am
6
Thank you @kwen2501 !
I have one more question.
I also experimented the following code, but it seems not to work for pipelining.
import argparse
# import logging
import os
import socket
import torch
import torch.distributed
import torch.distributed.rpc as rpc
import torch.multiprocessing as mp
import torch.nn.functional as F
from torch import nn, optim
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.utils.data import DistributedSampler
from torchvision import datasets
from torchvision.transforms import transforms
from tqdm import tqdm
from pippy import Pipe, PipelineDriverFillDrain, annotate_split_points, PipeSplitWrapper
from pippy.microbatch import TensorChunkSpec, CustomReducer
from pippy.utils import tp_transports
This file has been truncated. show original
when I set
DIMS = [28 * 28, 300, 100, 10]
DP_LAYERS = 2
PP_LAYERS = 1
#nnode=1, nproc_per_node=2
It works well for Dataparallel,
but when I set like below,
DIMS = [28 * 28, 300, 100, 10]
DP_LAYERS = 1
PP_LAYERS = 2
#nnode=1, nproc_per_node=2
or
DIMS = [28 * 28, 500, 250, 100, 50, 25, 10]
DP_LAYERS = 2
PP_LAYERS = 4
#nnode=2, nproc_per_node=4
It doesn’t work at all. could this problem be related to the issue I suggested at first? or did I set somethin incorreclty? If I need to post a new topic for this, I’ll do that.
Thank you.