Pipe Module throws cuda error

I want to use (torch.distributed.pipeline.sync) module to split my model for 2 different GPUs. I was trying to implement documentation example in my case (Training Transformer models using Pipeline Parallelism — PyTorch Tutorials 2.1.0+cu121 documentation) but unfortunately PyTorch throws the following error.

“ValueError: nn.Module: Sequential(#skipped#), should have all parameters on a single device, please use .to() to place the module on a single device.”

Following code is the example for my case.

temp_list = [Transformer(1, 2, (192,448,480), 16, 12, 12, 0.1, [3, 6, 9, 12])]
module_list = []

for i in range(10):

    block = Transformer(1, 2, (192,448,480), 16, 12, 12, 0.1, [3, 6, 9, 12])
    
    device = 0 if i<5 else 1

    temp_list.append(block.to(device))

module_list.append(nn.Sequential(*temp_list))

model = Pipe(torch.nn.Sequential(*module_list), chunks=2)

Since the main idea of Pipe is distributing model pipelines to the GPUs, how can it throw errors into use .to() ?