How to add orthogonal constrain to weight?

I’m trying to add orthogonal constrain to the weight of a Linear layer. I find this is similar to the function torch.nn.utils.weight_norm, so I write a similar class WeightOrtho to orthogonalize the weight. Each time the forward pre hook assigns a new orthogonal weight to the layer. But this seems not working. The weight is not updating in the training. Is there a good way to add orthogonal constrain to weight?

weight_ortho(torch.nn.Linear(6, 2, bias=True), name='weight', dim=0)
class WeightOrtho(object):
    def __init__(self, name, dim):
        if dim is None:
            dim = -1 = name
        self.dim = dim

    def compute_weight(self, module):
        w = getattr(module,
        return torch.nn.Parameter(torch.transpose(torch.qr(torch.transpose(w, 0, 1))[0], 0, 1))

    def apply(module, name, dim):
        for k, hook in module._forward_pre_hooks.items():
            if isinstance(hook, WeightOrtho) and == name:
                raise RuntimeError("Cannot register two weight_ortho hooks on "
                                   "the same parameter {}".format(name))

        if dim is None:
            dim = -1

        fn = WeightOrthho(name, dim)

        weight = getattr(module, name)

        # remove w from parameter list
        del module._parameters[name]

        module.register_parameter(name , Parameter(
        setattr(module, name, fn.compute_weight(module))

        # recompute weight before every forward()

        return fn

    def remove(self, module):
        weight = self.compute_weight(module)
        del module._parameters[]
        module.register_parameter(, Parameter(

    def __call__(self, module, inputs):
        setattr(module,, self.compute_weight(module))

def weight_ortho(module, name='weight', dim=0):
    WeightOrtho.apply(module, name, dim)
    return module

Have a look at this link.

It’s an iterative orthogonalization procedure which you have to call iteratively until an acted upon linear layer converges to orthogonality. If you are wondering about their implementation, self.params.map_beta is just a scalar hyperparameter which defaults to 0.001.

A bit simplified, their procedure looks as follows:

beta = 0.001

def orthogonalize(self):
        Orthogonalize the mapping.
        W =
        W.copy_((1 + beta) * W - beta *, 1).mm(W)))

A bit late to the party, but starting in 1.10 this feature is supported natively in PyTorch:

Leaving this here in case someone else bumps into this post in the future.