RuntimeError: matrices expected, got 1D, 2D tensors

I’m working on my first project with PyTorch and Syft. Here I am facing an issue of matrices expected. This same dataset works for training and it gets completed with no issue. As soon as I try to run test loop, it shows the above error.

The batch size is 1024, and input features 30.

class MLPUpdated(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(30, 32),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(32,16),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(16, 8),
            nn.ReLU(),
            nn.Linear(8, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

Error:
Traceback (most recent call last):
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\frameworks\torch\tensors\interpreters\native.py”, line 336, in handle_func_command
new_args, new_kwargs, new_type, args_type = hook_args.unwrap_args_from_function(
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\generic\frameworks\hook\hook_args.py”, line 167, in unwrap_args_from_function
new_args = args_hook_function(args_)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\generic\frameworks\hook\hook_args.py”, line 356, in
return lambda x: f(lambdas, x)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\generic\frameworks\hook\hook_args.py”, line 534, in three_fold
lambdas[0](args_[0], **kwargs),
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\generic\frameworks\hook\hook_args.py”, line 331, in
else lambda i: forward_functype(i)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\frameworks\torch\hook\hook_args.py”, line 27, in
else (_ for _ in ()).throw(PureFrameworkTensorFoundError),
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\frameworks\torch\hook\hook_args.py”, line 27, in
else (_ for _ in ()).throw(PureFrameworkTensorFoundError),
syft.exceptions.PureFrameworkTensorFoundError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “federated_learning.py”, line 274, in
test_loss, test_acc = test(args, model, device, test_loader, epoch)
File “federated_learning.py”, line 145, in test
output = model(data)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “D:\internship\Indo_Aus\Federated Learning\Split-Learning-and-Federated-Learning\src\models.py”, line 213, in forward
return self.layers(x)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\container.py”, line 100, in forward
input = module(input)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\linear.py”, line 87, in forward
return F.linear(input, self.weight, self.bias)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\generic\frameworks\hook\hook.py”, line 335, in overloaded_func
response = handle_func_command(command)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\frameworks\torch\tensors\interpreters\native.py”, line 380, in handle_func_command
response = cls.get_response(cmd, args, kwargs_)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\frameworks\torch\tensors\interpreters\native.py”, line 414, in get_response
response = command_method(*args
, **kwargs_)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\functional.py”, line 1370, in linear
ret = torch.addmm(bias, input, weight.t())
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\generic\frameworks\hook\hook.py”, line 335, in overloaded_func
response = handle_func_command(command)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\frameworks\torch\tensors\interpreters\native.py”, line 380, in handle_func_command
response = cls.get_response(cmd, args, kwargs_)
File “C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\frameworks\torch\tensors\interpreters\native.py”, line 414, in get_response
response = command_method(*args
, **kwargs_)
RuntimeError: matrices expected, got 1D, 2D tensors at C:\w\1\s\windows\pytorch\aten\src\TH/generic/THTensorMath.cpp:131

Hi Hell!

You must not be calling your model correctly in your test loop; your
code works for me:

>>> import torch
>>> import torch.nn as nn
>>> torch.__version__
'1.9.0'
>>> class MLPUpdated(nn.Module):
...     def __init__(self):
...         super().__init__()
...         self.layers = nn.Sequential(
...             nn.Linear(30, 32),
...             nn.ReLU(),
...             nn.Dropout(0.2),
...             nn.Linear(32,16),
...             nn.ReLU(),
...             nn.Dropout(0.1),
...             nn.Linear(16, 8),
...             nn.ReLU(),
...             nn.Linear(8, 1),
...             nn.Sigmoid()
...         )
...
...     def forward(self, x):
...         return self.layers(x)
...
>>> _ = torch.manual_seed (2021)
>>> MLPUpdated() (torch.randn (1024, 30))
tensor([[0.4777],
        [0.4868],
        [0.4884],
        ...,
        [0.4624],
        [0.4619],
        [0.4893]], grad_fn=<SigmoidBackward>)

Best.

K. Frank

Hey KFrank,
Thanks for the reply.
The model is working perfectly, and the data passed inside the forward function is also tensor. But still I’m getting this error.

Traceback (most recent call last):
  File "federated_learning.py", line 283, in <module>
    test_loss, test_acc =  test(args, model, device, test_loader, epoch)
  File "federated_learning.py", line 145, in test
    output = model(data)
  File "C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\internship\Indo_Aus\Federated Learning\Split-Learning-and-Federated-Learning\src\models.py", line 206, in forward
    return self.layers(x)
  File "C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
    input = module(input)
  File "C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\syft\generic\frameworks\hook\hook.py", line 335, in overloaded_func
    response = handle_func_command(command)

This error is indicating that something is wrong in the self.layers
Given below are the tensors being passed inside the self.layers function.

[ 3.9251e+00,  6.7676e-01,  1.3711e+00,  ...,  4.6753e-01,
         -8.6226e-02, -3.1110e-02],
        [ 8.0766e+00, -1.7748e-01,  1.2341e-01,  ..., -2.9790e-01,
         -9.5155e-02,  1.1550e-01],
        [-2.8701e-01, -9.5842e-02, -3.6565e+00,  ...,  3.8841e-01,
         -2.5426e+00, -2.0138e+00]])

The test function is given below.

def test(args, model, device, test_loader, epoch):
    """Evaluate the model

    Args:
        args: Hyper-parameters for testing
        model: The model to be evaluated
        device: Training device (e.g. cpu/gpu)
        test_loader: Federated data loader for testing
        epoch: Current epoch

    Returns:
        test_loss: Test loss for current epoch
        test_acc: Test accuracy for current epoch
    """
    model.eval()
    test_loss, correct = 0.0, 0

    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)

            # Feed-forward data through the model
            # print("Test: ", data.location.size)
            # print("Test Type: ", type(target))
            output = model(data)

            # Sum up the batch loss
            binary_loss = torch.nn.BCEWithLogitsLoss()
            target = target.unsqueeze(1)
            target = target.float()
            test_loss += binary_loss(output, target, reduction="sum").item()
            # test_loss += F.nll_loss(output, target, reduction="sum").item()

            # Retrieve the index of the maximum log-probability
            pred = output.argmax(1, keepdim=True)

            # Get the number of correct classifications
            correct += pred.eq(target.view_as(pred)).sum().item()

    # Average the test_loss
    test_loss /= len(test_loader.dataset)

    # Test accuracy
    test_acc = correct / len(test_loader.dataset)

    print("\nTest set - Epoch: {} - Loss: {:.4f}, Acc: {:.2f}%\n".format(
        epoch, test_loss, 100 * test_acc))

    return test_loss, test_acc

Test Loader is given below.

test_ds = PeopleDataset(test_file)
    # train_loader = torch.utils.data.DataLoader(train_ds,batch_size=args.batch_size, shuffle=True)

test_loader = sy.FederatedDataLoader(test_ds.federate(workers),
                  batch_size=args.batch_size, shuffle=True, **kwargs)

Hi Hell!

From what I understand calling your model in your training loop
works, but it fails in your test loop. The error message you quote
suggests that you are applying your model to an input tensor of
the wrong shape.

As an aside, this looks fishy. It’s only a comment, of course, but
if the data you pass to your model really has a location property,
it probably isn’t what your model is expecting.

I assume that your training loop also has an equivalent
model (data) call in it somewhere.

Print out the shape of data just before your training-loop
model (data) call and your test-loop model (data) call.
I expect that they will not be the same.

If they are the same, capture the training-loop and test-loop
data tensors, and post a complete, self-contained, runnable
script that reproduces your error that uses those example
data tensors.

That is, get rid of the data loaders and loops – just have
something like:

train_data = torch.tensor (...)
model (train_data)
test_data = torch.tensor (...)
model.eval()
with torch.no_grad():
    model (test_data)

And also post the complete results of your self-contained,
runnable script.

Best.

K. Frank

Hey KFrank,
Thanks again for the reply.

Here’s the code snippet which you asked for.

class MLPUpdated(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(30, 32),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(32,16),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(16, 8),
            nn.ReLU(),
            nn.Linear(8, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

device = "cuda" if torch.cuda.is_available() else "cpu"
device = torch.device(device)
model = MLPUpdated().to(device)

train_data = torch.tensor([ 0.3018, -0.3064,  1.1054, -0.0751,  0.4352,  0.9034, -0.4978, -0.4954,
         0.0315, -0.0492,  0.4202, -0.2261, -0.2327,  0.4352, -0.5217,  0.1191,
        -0.0143, -0.7541,  0.4915, -1.1997, -0.2744, -0.1156, -0.0913, -0.1326,
        -0.0158,  0.4527,  0.4410,  0.3742, -0.0228,  0.0137])
print(train_data.shape)

test_data = torch.tensor([ 0.5229,  0.8923, -1.6663, -1.8197,  1.1027, -1.2481,  0.3701,  1.5603,
        -1.6727,  1.3396,  0.1646, -0.6536, -0.4564,  1.0487, -0.0249, -0.4200,
        -1.7048, -1.9665,  0.4084,  0.8154, -1.8285, -0.3640, -0.0078,  0.1991,
         0.2119, -1.6663, -1.0810, -0.4766,  0.2329, -0.0569])
print(test_data.shape)

model(train_data)
model.eval()
with torch.no_grad():
    model(test_data)
print("SUCCESS")

Output:

2021-06-26 12:06:47.330716: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2021-06-26 12:06:47.340872: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.        
torch.Size([30])
torch.Size([30])
SUCCESS

This is running successfully.

But when I run the below code, it gets the error of matrices expected.

#!/usr/bin/python3

"""federated.py Contains an implementation of federated learning with ten
                workers applied to the Fashion MNIST data set for image
                classification using a slightly modified version of the LeNet5
                CNN architecture.

For the ID2223 Scalable Machine Learning course at KTH Royal Institute of
Technology"""

__author__ = "Xenia Ioannidou and Bas Straathof"


import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

from sys import argv
from time import time
from argparse import ArgumentParser, Namespace
from syft.federated.floptimizer import Optims

# The Pysyft Federated Learning library
import syft as sy

from models import LeNetComplete, MLP, MLPUpdated
from plotting import generate_simple_plot

from load_data import PeopleDataset

torch.autograd.set_detect_anomaly(True)

def parse_args() -> Namespace:
    """Parses CL arguments

    Returns:
        Namespace object containing all arguments"""
    parser = ArgumentParser()

    parser.add_argument("-bs", "--batch_size", type=int, default=64)
    parser.add_argument("-tbs", "--test_batch_size", type=int, default=1000)
    parser.add_argument("-ls", "--log_steps", type=int, default=50)
    parser.add_argument("-lr", "--learning_rate", type=float, default=0.01)
    parser.add_argument("-e", "--epochs", type=int, default=25)

    return parser.parse_args(argv[1:])


def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    """Train the model

    Args:
        args: Hyper-paramters for training
        model: The model to be trained
        device: Training device (e.g. cpu/gpu)
        train_loader: Federated data loader for training
        optimizer: Training optimizer (e.g. SGD)
        epoch: Current epoch

    Returns:
        training_time: Time it took for one epoch of training
    """
    start = time()
    # data, target = next(iter(train_loader))
    # batch_idx = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        # Ensure that thet data is send to the right worker
        # print("Train: " ,data.shape)
        model.send(data.location)

        # Write the data and targets to the compute device
        data, target = data.to(device), target.to(device)

        #Call the optimizer for the worker using get_optim
        # print(data.location)
        # print(optimizer.optimizers)
        opt = optimizer.get_optim(data.location.id)

        # Reset the optimizers gradients to zero
        # optimizer.zero_grad()
        opt.zero_grad()

        # Pass the data to the model
        print("Train: ", data.shape)
        # print("Train Type: ", type(data.location))
        output = model(data)

        # Compute the loss
        # binary_loss = nn.BCELoss()
        binary_loss = torch.nn.BCEWithLogitsLoss()
        target = target.unsqueeze(1)
        target = target.float()
        loss = binary_loss(output, target)
        # loss = F.nll_loss(output, target)

        # Execute the backpropagation step
        loss.backward()

        # Apply the optimizer
        opt.step()

        # Retrieve the model (PySyft feature)
        model.get()

        if batch_idx % args.log_steps == 0:
            # Retrieve the loss (PySyft feature)
            loss = loss.get()
            print(' {} - Epoch: {} [{}/{} ({:.0f}%)] Loss: {:.6f}'.format(
                data.location.id, epoch, batch_idx * args.batch_size,
                len(train_loader) * args.batch_size,
                100. * batch_idx / len(train_loader), loss.item()))

    end = time()
    training_time = end-start

    return training_time

def test(args, model, device, test_loader, epoch):
    """Evaluate the model

    Args:
        args: Hyper-parameters for testing
        model: The model to be evaluated
        device: Training device (e.g. cpu/gpu)
        test_loader: Federated data loader for testing
        epoch: Current epoch

    Returns:
        test_loss: Test loss for current epoch
        test_acc: Test accuracy for current epoch
    """
    model.eval()
    test_loss, correct = 0.0, 0

    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)

            # Feed-forward data through the model
            print("Test: ", data.shape)
            # print("Test Type: ", type(target))
            output = model(data)

            # Sum up the batch loss
            binary_loss = torch.nn.BCEWithLogitsLoss()
            target = target.unsqueeze(1)
            target = target.float()
            test_loss += binary_loss(output, target, reduction="sum").item()
            # test_loss += F.nll_loss(output, target, reduction="sum").item()

            # Retrieve the index of the maximum log-probability
            pred = output.argmax(1, keepdim=True)

            # Get the number of correct classifications
            correct += pred.eq(target.view_as(pred)).sum().item()

    # Average the test_loss
    test_loss /= len(test_loader.dataset)

    # Test accuracy
    test_acc = correct / len(test_loader.dataset)

    print("\nTest set - Epoch: {} - Loss: {:.4f}, Acc: {:.2f}%\n".format(
        epoch, test_loss, 100 * test_acc))

    return test_loss, test_acc

if __name__ == "__main__":
    args = parse_args()

    # Pysyft needs to be hooked to PyTorch to enable its features
    hook = sy.TorchHook(torch)

    # Define the workers
    alfa    = sy.VirtualWorker(hook, id="alfa")
    bravo   = sy.VirtualWorker(hook, id="bravo")

    workers = (alfa, bravo)
    workers_id = (alfa.id, bravo.id)

    # Check if a GPU is available
    device = "cuda" if torch.cuda.is_available() else "cpu"
    device = torch.device(device)
    kwargs = {'num_workers': 1, 'pin_memory': True} if device=="cuda" else {}


    # Federated learning needs a special train loader to distribute the data
    # over the various workers
    print("Loading data...")
    train_file = '../dataset/creditcard_train.csv'
    train_ds = PeopleDataset(train_file)

    train_loader = sy.FederatedDataLoader(train_ds.federate(workers),
        batch_size=args.batch_size, shuffle=True, **kwargs)


    test_file = '../dataset/creditcard_test.csv'
    test_ds = PeopleDataset(test_file)
    
    test_loader = sy.FederatedDataLoader(test_ds.federate(workers),
        batch_size=args.batch_size, shuffle=True, **kwargs)


    print("Data loaded...")

    # Instantiate the CNN model
    model = MLPUpdated().to(device)

    # Load the optimizer
    optimizer = Optims(workers_id, optim=optim.Adam(params=model.parameters(),lr=0.1))

    # Train and evaluate for a number of epochs
    total_train_time, test_losses, test_accs = 0.0, [], []
    for epoch in range(1, args.epochs + 1):
        train_time = train(args, model, device, train_loader, optimizer, epoch)
        total_train_time += train_time

        test_loss, test_acc =  test(args, model, device, test_loader, epoch)
        test_losses.append(test_loss)
        test_accs.append(test_acc)



    print("Total training time: {:.2f}s".format(total_train_time))
    print("Final test accuracy: {:.4f}".format(test_acc))
    print("Final test loss: {:.4f}".format(test_loss))


This gives the following output:

Train:  torch.Size([1024, 30])
Train:  torch.Size([1024, 30])
Train:  torch.Size([1024, 30])
 alfa - Epoch: 1 [102400/257024 (40%)] Loss: 0.693148
Train:  torch.Size([1024, 30])
Train:  torch.Size([1024, 30])
Train:  torch.Size([1024, 30])
.
.
.
Train:  torch.Size([1024, 30])
Train:  torch.Size([1024, 30])
 bravo - Epoch: 1 [256000/257024 (100%)] Loss: 0.693148
Train:  torch.Size([163, 30])
Test:  torch.Size([1024, 30])
Traceback (most recent call last):
.
.
.
Traceback (most recent call last):
  File "federated_learning.py", line 284, in <module>
    test_loss, test_acc =  test(args, model, device, test_loader, epoch)
  File "federated_learning.py", line 146, in test
    output = model(data)
  File "C:\Users\het\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "\Federated Learning\Split-Learning-and-Federated-Learning\src\models.py", line 206, in forward
    return self.layers(x)
.
.
.
RuntimeError: matrices expected, got 1D, 2D tensors at C:\w\1\s\windows\pytorch\aten\src\TH/generic/THTensorMath.cpp:131

Hi Hell!

You are posting misleading code.

This version of MLPUpdated:

comes from self-contained code you posted (which would be good
if it were relevant, but it isn’t), while this version:

comes from some package called “models.”

And we know that these two versions of MLPUpdated differ
substantively because, among other reasons, your self-contained
version does not contain a .send() method.

You are probably using your “federated learning” Syft framework
incorrectly. (But I don’t know …)

Note, you call your model differently in your train and test loops:

vs.

I have two suggestions for you:

  1. Debugging 101: Trim down your code to the simplest version that
    still shows the error. Can you get rid of the ArgumentParser()
    stuff? The plotting stuff? Can you get rid of .to (device)?
    Can you get rid of the data loaders? Can you get rid of the loops
    and just make a single train and test call to model (data)? Can
    you get rid of the syft stuff and still reproduce the error? (I doubt it.)

  2. Try printing out type (model), type (model).mro(),
    type (data), and type (data).mro() just before your train
    and test calls to model (data) to see what Syft is doing under
    the hood. Maybe you’re trying to call a “syft-hooked” model with
    an ordinary pytorch tensor, or vice versa.

(As an aside, let me note that the code you posted that reproduces
your error is not a “complete, self-contained, runnable script.”)

Best.

K. Frank

Hey K.Frank,
Thanks for the reply.
MLPUpdated from models is same as MLPUpdated. I just wrote it together for your convinence, but sorry for that.

Thanks for the help.
Still I don’t understand the difference between train and test call for model. Can you please eloborate that part?

Hi Hell!

No it’s not:

>>> import torch
>>> torch.__version__
'1.9.0'
>>> class MLPUpdated(nn.Module):
...     def __init__(self):
...         super().__init__()
...         self.layers = nn.Sequential(
...             nn.Linear(30, 32),
...             nn.ReLU(),
...             nn.Dropout(0.2),
...             nn.Linear(32,16),
...             nn.ReLU(),
...             nn.Dropout(0.1),
...             nn.Linear(16, 8),
...             nn.ReLU(),
...             nn.Linear(8, 1),
...             nn.Sigmoid()
...         )
...
...     def forward(self, x):
...         return self.layers(x)
...
>>> model = MLPUpdated()
>>> type (model)
<class '__main__.MLPUpdated'>
>>> model.send
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<...>\torch\nn\modules\module.py", line 1131, in __getattr__
    type(self).__name__, name))
AttributeError: 'MLPUpdated' object has no attribute 'send'

I guarantee that your “models” version of MLPUpdated does not have
type <class '__main__.MLPUpdated'>. Also, in your code that is
not self-contained you call model.send(). Your free-standing version
of MLPUpdated does not have a .send() method.

Perhaps you copy-pasted some lines of code out of some more
elaborate context and thought things would be the same, but lacking
that larger context, there is no reason they should be.

In my previous reply I excerpted the relevant portions of the code you
posted.

Among other things you call model.send() / model.get() in your
training loop, but not in your test loop. Also, you get your data
object from different data loaders, so there is no reason to expect
them to be of the same or similar type.

Did you try any of the things I suggested in my previous post?

Best.

K. Frank