Federated - IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1) - Python

Hello, I have a problem when training a federated network. I get the error when passing a noise vector to one of my sub-networks. This error occurs only when I am in the federated training step. When training in host with the same code the training process goes on. The code that gives me the error is below:

 ## Train with all-fake batch
    # Generate batch of latent vectors
    noise = torch.zeros(10,128)
    # Generate fake image batch with G
    fake = model[0](noise)
    label = torch.zeros(len(noise))
    # Classify all fake batch with D
    output = model[1](fake.detach()).view(-1)
    # Calculate D's loss on the all-fake batch

the error is :

---------------------------------------------------------------------------
PureFrameworkTensorFoundError             Traceback (most recent call last)
~/.local/lib/python3.8/site-packages/syft/frameworks/torch/tensors/interpreters/native.py in handle_func_command(cls, command)
    327             # Note that we return also args_type which helps handling case 3 in the docstring
--> 328             new_args, new_kwargs, new_type, args_type = hook_args.unwrap_args_from_function(
    329                 cmd, args_, kwargs_, return_args_type=True

~/.local/lib/python3.8/site-packages/syft/generic/frameworks/hook/hook_args.py in unwrap_args_from_function(attr, args_, kwargs_, return_args_type)
    157         # Try running it
--> 158         new_args = hook_args(args_)
    159 

~/.local/lib/python3.8/site-packages/syft/generic/frameworks/hook/hook_args.py in <lambda>(x)
    356 
--> 357     return lambda x: f(lambdas, x)
    358 

~/.local/lib/python3.8/site-packages/syft/generic/frameworks/hook/hook_args.py in three_fold(lambdas, args_, **kwargs)
    535         lambdas[0](args_[0], **kwargs),
--> 536         lambdas[1](args_[1], **kwargs),
    537         lambdas[2](args_[2], **kwargs),

~/.local/lib/python3.8/site-packages/syft/generic/frameworks/hook/hook_args.py in <lambda>(i)
    331         # Last if not, rule is probably == 1 so use type to return the right transformation.
--> 332         else lambda i: forward_func[type(i)](i)
    333         for a, r in zip(args_, rules)  # And do this for all the args / rules provided

~/.local/lib/python3.8/site-packages/syft/frameworks/torch/hook/hook_args.py in <lambda>(i)
     29     if hasattr(i, "child")
---> 30     else (_ for _ in ()).throw(PureFrameworkTensorFoundError),
     31     torch.nn.Parameter: lambda i: i.child

~/.local/lib/python3.8/site-packages/syft/frameworks/torch/hook/hook_args.py in <genexpr>(.0)
     29     if hasattr(i, "child")
---> 30     else (_ for _ in ()).throw(PureFrameworkTensorFoundError),
     31     torch.nn.Parameter: lambda i: i.child

PureFrameworkTensorFoundError: 

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-24-eb14fd9fab4b> in <module>
      2     start_time = time.time()
      3     print(f"Epoch Number {epoch + 1}")
----> 4     federated_model = train()
      5     model = federated_model
      6 #     test(federated_model)

<ipython-input-23-7027cb8c9ab0> in train()
     57         for remote_index in range(len(compute_nodes)):
     58             data, target = remote_dataset[remote_index][data_index]
---> 59             models[remote_index] = update(data, target, models[remote_index], optimizers[remote_index])
     60         for model in models:
     61             model.get()

<ipython-input-23-7027cb8c9ab0> in update(data, target, model, optimizer)
     24     label = torch.zeros(len(noise))
     25     # Classify all fake batch with D
---> 26     output = model[1](fake.detach()).view(-1)
     27     # Calculate D's loss on the all-fake batch
     28 

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/Desktop/DE/Pytorch_ADAGAN.py in forward(self, input)
    167 
    168         def forward(self, input):
--> 169             return self.main(input)
    170 
    171     def parameters(self):

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/.local/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
     98     def forward(self, input):
     99         for module in self:
--> 100             input = module(input)
    101         return input
    102 

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/.local/lib/python3.8/site-packages/torch/nn/modules/linear.py in forward(self, input)
     85 
     86     def forward(self, input):
---> 87         return F.linear(input, self.weight, self.bias)
     88 
     89     def extra_repr(self):

~/.local/lib/python3.8/site-packages/syft/generic/frameworks/hook/hook.py in overloaded_func(*args, **kwargs)
    591                 handle_func_command = syft.framework.Tensor.handle_func_command
    592 
--> 593             response = handle_func_command(command)
    594 
    595             return response

~/.local/lib/python3.8/site-packages/syft/frameworks/torch/tensors/interpreters/native.py in handle_func_command(cls, command)
    361             # in the execute_command function
    362             try:
--> 363                 response = cls._get_response(cmd, args_, kwargs_)
    364             except AttributeError:
    365                 # Change the library path to avoid errors on layers like AvgPooling

~/.local/lib/python3.8/site-packages/syft/frameworks/torch/tensors/interpreters/native.py in _get_response(cmd, args_, kwargs_)
    395 
    396         if isinstance(args_, tuple):
--> 397             response = command_method(*args_, **kwargs_)
    398         else:
    399             response = command_method(args_, **kwargs_)

~/.local/lib/python3.8/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1368     if input.dim() == 2 and bias is not None:
   1369         # fused op is marginally faster
-> 1370         ret = torch.addmm(bias, input, weight.t())
   1371     else:
   1372         output = input.matmul(weight.t())

~/.local/lib/python3.8/site-packages/syft/generic/frameworks/hook/hook.py in overloaded_func(*args, **kwargs)
    591                 handle_func_command = syft.framework.Tensor.handle_func_command
    592 
--> 593             response = handle_func_command(command)
    594 
    595             return response

~/.local/lib/python3.8/site-packages/syft/frameworks/torch/tensors/interpreters/native.py in handle_func_command(cls, command)
    361             # in the execute_command function
    362             try:
--> 363                 response = cls._get_response(cmd, args_, kwargs_)
    364             except AttributeError:
    365                 # Change the library path to avoid errors on layers like AvgPooling

~/.local/lib/python3.8/site-packages/syft/frameworks/torch/tensors/interpreters/native.py in _get_response(cmd, args_, kwargs_)
    395 
    396         if isinstance(args_, tuple):
--> 397             response = command_method(*args_, **kwargs_)
    398         else:
    399             response = command_method(args_, **kwargs_)

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Thank you in advance!

Hey Can you give the code snippet for your loss calculation(or the whole training process) . It seems like there is a problem.

It is hard to tell just by looking at the given code.
Thank You.

yes of course , the full training code is:

def update(data, target, model, optimizer):
    model[0].send(data.location)
    model[1].send(data.location)
    
    model[1].zero_grad()
    
    real_cpu = data.float()
    b_size = real_cpu.size(0)

    label = torch.full((b_size,), 1)
    # Forward pass real batch through D
    print(model[0])
    output = model[1](data.float()).view(-1).float()
    # Calculate loss on all-real batch
    errD_real = F.mse_loss(output, target.float())
    # Calculate gradients for D in backward pass
    errD_real.backward()
    
    ## Train with all-fake batch
    # Generate batch of latent vectors
    noise = torch.randn(128,10)
    noise.send(data.location)
#     print(noise)
    # Generate fake image batch with G
    fake = model[0](noise)
    label = torch.zeros(len(noise))
    # Classify all fake batch with D
    output = model[1](fake.detach()).view(-1)
    # Calculate D's loss on the all-fake batch

    errD_fake = F.mse_loss(output, label)
    # Calculate the gradients for this batch
    errD_fake.backward()
    # Add the gradients from the all-real and all-fake batches
    errD = errD_real + errD_fake
    # Update D
    optimizer[1].step()
    
    return model

try

fake.detach().view(1,-1)

Torch tensors can be modified into many ways, you just need to play with that.

Let me know if that works :slight_smile:

I tried it and it still shows the error message. The thing is that the error occurs before fake is defined. Specifically, it occurs when trying to predict the noise vector , fake = model[0](noise). When running normal training this does not occur.

If this is the case I think there might be something wrong with generator model.
Can you provide that code.
Thank you

Unfortunately, I cannot share the Generator model’s code at this time. I do understand the problem that poses, but I am confident that the generator’s structure is not at fault. That is because the network trains normally without error when I don’t federate it, under the same circumstances.