Can't replicate pytorch linear operation

Hi !

I need to build a tiny Deep Learning project for school and I am using pytorch to compare all of my modules.
Most module returns to me the same gradient and output as pytorch and I am happy with that.
But I fail all my tests involving my linear Module (wx + b)
Here is my test :

TestLinearModule(unittest.TestCase):

    def __init__(self, *args, **kwargs):

        super(TestLinearModule, self).__init__(*args, **kwargs)

        self.batch = 2

        self.feature_in = 3

        self.feature_out = 3

    def test_linear_forward_random_init(self):

        """

            test the linear module forward pass with with specific weights

        """

        x = np.random.randn(self.batch, self.feature_in)

        _x = torch.tensor(x, dtype=torch.float32)

        weights = np.random.randn(self.feature_in, self.feature_out).astype(np.float32)

        _weights = torch.tensor(weights, dtype=torch.float32)

        bias = np.zeros((self.feature_out, ))

        _bias = torch.tensor(bias, dtype=torch.float32)

        linear = Linear(self.feature_in,self.feature_out, weights, bias)

        oracle_linear = torch.nn.Linear(self.feature_in, self.feature_out, bias=False)

        oracle_linear.weight = torch.nn.parameter.Parameter(_weights, requires_grad=False)

        oracle_linear.bias = torch.nn.parameter.Parameter(_bias, requires_grad=False)

        print("weights : ", weights)

        print("_weights : ", oracle_linear.weight)

        print("bias : ", bias)

        print("_bias : ", oracle_linear.bias)

       

        output = linear.forward(x)

        oracle_output = oracle_linear(_x)

        # test output shape

        self.assertEqual(output.shape, (self.batch, self.feature_out))

        # test output value

        np.testing.assert_array_equal(output, oracle_output.detach())

Here is the code of my linear class :

class Linear(Module):

    """

        A class implementing a linear module

        linear operation : f(x) = XW

    """

    def __init__(self,features_in,features_out,weights=None, bias=None):

        """

            Args:

                features_in:  features_in size

                features_out: features_out size

                weights: the initialized weights

                bias : the initilized bais

        """

        super(Linear, self).__init__()

        self.features_in = features_in

        self.features_out = features_out

        if not weights is None:

            assert weights.shape[0] == features_in

            assert weights.shape[1] == features_out

            self._parameters = weights

        if not bias is None:

            assert bias.shape[0] == features_out

            self._bias = bias

        else:

            # TODO : add initialization strategies

            # using random initialization by default

            self._parameters = np.random.randn(features_in, features_out)

            # zero-init for bias

            self._bias = np.zeros((features_out, ))

        self.zero_grad()

    def forward(self, data):

        """

            Linear operation : f_w(x) = XW

            Args:

                data: (batch,features_in)

            Out:

                (batch,features_out)

        """

        assert self._parameters.shape[0] == data.shape[1]

        return np.matmul(data,self._parameters) + self._bias

Please, if you have any idea, let me know :slight_smile:
Thx !

Hi Valentin!

I haven’t looked at your code, so I won’t comment on its correctness.

But:

You are testing for exact equality of the two results. Before we dig any
deeper, please rule out round-off error as the cause of the discrepancy.

Best.

K. Frank

1 Like

Dear Frank,

Thank you for your reply !
I have changed :

np.testing.assert_array_equal

to :

np.testing.assert_almost_equal

Here is the output of the test, x being my computed forward pass and y pytorch :

 x: array([[ 1.860081 , -0.9923667,  1.6458035],
       [-0.2241226,  0.1235049,  0.0995825]])
 y: array([[ 1.5756445, -1.8553152,  1.1914932],
       [ 0.0713167,  0.1563707, -0.0121961]], dtype=float32)

Hi again,

After setting the weights to different values (all 1 or 0) it appears that I achieve the same result. You are actually right, it might be a round-off error. Can dtype broadcasting be the cause ?

Hi Valentin!

These two results are indeed different, and not just because of
round-off error.

Your problem is that the weight property of torch.nn.Linear is
stored as the transpose of how you store the _parameters property
of your Linear class. That is, pytorch’s weight has shape
[out_features, in_features], rather than the other way around.

Changing your initialization of pytorch’s weight to:

oracle_linear.weight = torch.nn.parameter.Parameter(_weights.T, requires_grad=False)

resolves the issue. (Note the transpose, _weights.T.)

As an aside, in your testing class you chose to have
in_feature = out_feature = 3. Because your weight matrix was
now square, this masked the transpose issue, which for a non-square
weight matrix would have shown up as a matrix-multiplication shape
mismatch error in torch.nn.Linear.

I modified your code to make a script that illustrates these points:

import torch
print (torch.__version__)

_ = torch.manual_seed (2022)

import numpy as np
print (np.__version__)

np.random.seed (2022)

class Linear():
    """
        A class implementing a linear module
        linear operation : f(x) = XW
    """
    
    def __init__(self,features_in,features_out,weights=None, bias=None):
        """
            Args:
                features_in:  features_in size
                features_out: features_out size
                weights: the initialized weights
                bias : the initilized bais
        """
        
        # super(Linear, self).__init__()
        self.features_in = features_in
        self.features_out = features_out
        
        if not weights is None:
            assert weights.shape[0] == features_in
            assert weights.shape[1] == features_out
            self._parameters = weights
        
        if not bias is None:
            assert bias.shape[0] == features_out
            self._bias = bias
        else:
            # TODO : add initialization strategies
            # using random initialization by default
            self._parameters = np.random.randn(features_in, features_out)
            # zero-init for bias
            self._bias = np.zeros((features_out, ))
        
        # self.zero_grad()
    
    def forward(self, data):
        """
            Linear operation : f_w(x) = XW
            Args:
                data: (batch,features_in)
            Out:
                (batch,features_out)
        """
        
        assert self._parameters.shape[0] == data.shape[1]
        return np.matmul(data,self._parameters) + self._bias

class TestLinearModule():
    
    def __init__(self):
        self.batch = 2
        self.feature_in = 3
        # self.feature_out = 3
        self.feature_out = 5   # make feature_out != feature_in so that transpose changes shape
    
    def test_linear_forward_random_init(self):
        """
            test the linear module forward pass with with specific weights
        """
        
        x = np.random.randn(self.batch, self.feature_in)
        _x = torch.tensor(x, dtype=torch.float32)
        weights = np.random.randn(self.feature_in, self.feature_out).astype(np.float32)
        _weights = torch.tensor(weights, dtype=torch.float32)
        bias = np.zeros((self.feature_out, ))
        _bias = torch.tensor(bias, dtype=torch.float32)
        linear = Linear(self.feature_in,self.feature_out, weights, bias)
        oracle_linear = torch.nn.Linear(self.feature_in, self.feature_out, bias=False)
        # oracle_linear.weight = torch.nn.parameter.Parameter(_weights, requires_grad=False)
        # set weight for torch.nn.Linear to the transpose of _parameters of Linear
        oracle_linear.weight = torch.nn.parameter.Parameter(_weights.T, requires_grad=False)
        oracle_linear.bias = torch.nn.parameter.Parameter(_bias, requires_grad=False)
        print("weights : ", weights)
        print("_weights : ", oracle_linear.weight)
        print("bias : ", bias)
        print("_bias : ", oracle_linear.bias)
        
        output = linear.forward(x)
        oracle_output = oracle_linear(_x)
        
        # test output shape
        # self.assertEqual(output.shape, (self.batch, self.feature_out))
        np.testing.assert_equal(output.shape, (self.batch, self.feature_out))
        
        # test output value
        # np.testing.assert_array_equal(output, oracle_output.detach())
        np.testing.assert_array_almost_equal(output, oracle_output.detach())
        
        print ("output - oracle_output.numpy() :\n", output - oracle_output.numpy())

print ("torch.nn.Linear (3, 5).weight.shape : ", torch.nn.Linear (3, 5).weight.shape)

tstr = TestLinearModule()
tstr.test_linear_forward_random_init()

Here is its output:

1.10.2
1.16.4
torch.nn.Linear (3, 5).weight.shape :  torch.Size([5, 3])
weights :  [[ 0.3009816   0.54029727  0.37349728  0.3778134  -0.09021319]
 [-2.3059433   1.14276    -1.5356543  -0.863752    1.0165449 ]
 [ 1.0339639  -0.8244922   0.01890486 -0.38334355 -0.30418548]]
_weights :  Parameter containing:
tensor([[ 0.3010, -2.3059,  1.0340],
        [ 0.5403,  1.1428, -0.8245],
        [ 0.3735, -1.5357,  0.0189],
        [ 0.3778, -0.8638, -0.3833],
        [-0.0902,  1.0165, -0.3042]])
bias :  [0. 0. 0. 0. 0.]
_bias :  Parameter containing:
tensor([0., 0., 0., 0., 0.])
output - oracle_output.numpy() :
 [[ 4.44273004e-08 -1.27371908e-08  8.48134096e-09  4.54972782e-09
  -1.59171025e-08]
 [-2.92024305e-08  4.55501836e-08 -4.35077616e-08 -7.90099125e-09
  -1.69252692e-08]]

Best.

K. Frank

1 Like

Hi, it worked perfectly !
Thank you very much for your help !

Valentin