Manual Implementation of weight_norm() in C++ API

Hello Pytorch Community,

since at this point in time, there’s no weight normalization available in Libtorch, but I still need it to port a model over to the C++ API, I tried to manually implement it in C++. I used the source code for the python implementation: https://pytorch.cn/docs/master/_modules/torch/nn/utils/weight_norm.html as inspiration.

But I’m just not sure if I can actually use this in real projects or if this approach is incorrect.

First, in the Constructor of the network, I call init_pre_forward_normalize(*layer) after initialising and registering the layer to my network:

void init_pre_forward_normalize(torch::nn::Module & module)
{
	//Since we cannot delete the old Paramter, we treat it as our "weight_v"
	//We assume that we only want to apply this normalization to "weight" parameter
	auto old_weight = module.named_parameters().find("weight")->data();
	torch::Tensor new_weight = _norm(old_weight);
	module.register_parameter("weight_g", new_weight.data());
}

Using a simplified _norm function:

torch::Tensor _norm(torch::Tensor &old_weight)
{
	//We assume, that always: dim=0
	torch::Tensor new_weight;
	if (old_weight.dim() == 1)
	{
		new_weight = old_weight.contiguous().view({ old_weight.size(0), -1 }).norm(2, 1)
			.view({ old_weight.size(0) });
	}
	if (old_weight.dim() == 2)
	{
		new_weight = old_weight.contiguous().view({ old_weight.size(0), -1 }).norm(2, 1)
			.view({ old_weight.size(0), 1 });
	}
	if (old_weight.dim() == 3)
	{
		new_weight = old_weight.contiguous().view({ old_weight.size(0), -1 }).norm(2, 1)
			.view({ old_weight.size(0), 1, 1 });
	}
	if (old_weight.dim() == 4)
	{
		new_weight = old_weight.contiguous().view({ old_weight.size(0), -1 }).norm(2, 1)
			.view({ old_weight.size(0), 1, 1, 1 });
	}
	return new_weight;
}

And then, in the forward Method of my network, I just call pre_forward_normalize(*layer) everytime before using that specific layer’s forward function.

void pre_forward_normalize(torch::nn::Module & module)
{
	auto v = *(module.named_parameters().find("weight")); //This will be normalized and stay between -1.0 and 1.0
	auto g = *(module.named_parameters().find("weight_g")); //This will stay constant through out Learning process
	module.named_parameters().find("weight")->data() = v * (g / _norm(v));
}

I tested it with a simple 2-Linear-Layer Network and the XOR Dataset, and it seems to be working?, since the “weight” parameters of the layers are still changing and the loss is decreasing (though the network’s performance is getting worse, especially when using Normalization on both layers).
I also noticed, that the weight’s values are always between -1.0 and 1.0 for the layers with Normalization, while the “weight_g” values are just staying the same during the training process.

I hope that somebody can look at my code and tell me if there’s something wrong here, because I’m not sure if I can rely on this implementation or not. Any help or suggestions are welcome.

with kind regards, Florian Korotschenko

So I compared the result to the same network, implemented in Pyton with the same seed and all. Turns out the resulting weights are not the same between this naive C++ implementation and pytorch. I guess there’s more happening inside pytorch than I thought, especially in the backwards pass.

Has somebody figured out how to implement weight_norm? Or is it just best to wait for official support in upcomming Libtorch releases?

Wrapping the desired Layer into a custom module solves the issue here. Using _norm() from above, you can implement a module with, for example, a Linear Layer + Weight Norm this way:

struct Linear_WN_Impl : torch::nn::Module
{
	torch::Tensor v;
	torch::Tensor g;
	torch::Tensor bias;
	torch::nn::LinearOptions _options;	//Hold Addition Information like (Stride, Padding, ...)

	Linear_WN_Impl(torch::nn::LinearOptions options) : _options(options)
	{
		torch::manual_seed(187); //Only for repeatability and comparison
		//Initialize v, g and bias
		auto default_layer = torch::nn::Linear(_options);
		v = default_layer->weight.data();
		g = _norm(v);
		bias = default_layer->bias.data();
		//Register as Parameters, make trainable
		v = this->register_parameter("v", v);
		g = this->register_parameter("g", g);
		bias = this->register_parameter("bias", bias);
	}
	torch::Tensor forward(torch::Tensor & input)
	{
		//Forward Pass, use v and g to form normalized weight
		return torch::linear(input, v * (g / _norm(v)), bias);
	}
};
TORCH_MODULE(Linear_WN_);

then, you can utilze the module like here for example:

struct Net : torch::nn::Module
{
	Linear_WN_ fc1{ nullptr };
	torch::nn::Linear fc2{ nullptr };

	Net()
	{
                fc1 = register_module("fc1", Linear_WN_(torch::nn::LinearOptions(2, 4)));
		fc2 = register_module("fc2", torch::nn::Linear(4, 1));
	}

	torch::Tensor forward(torch::Tensor x) 
	{
		auto y = torch::sigmoid(fc1->forward(x));
		y = torch::sigmoid(fc2->forward(y));
		return y;
	}
};

I tested and compared it to equivalent Pytorch Code with the built-in weight_norm on the first layer, here are the results of both programs:


and after Training Loop:

So as you can see, the resulting weights are the same.