LIBTORCH/C++ How to initialize weights (Xavier) in a Sequential module with Apply function?

I tried implementing the approach described here: https://stackoverflow.com/questions/49433936/how-to-initialize-weights-in-pytorch

i.e. creating a function that could be passed to Apply on all modules part of the sequential net. Here is the code:

=============================================

void Init_Weights(torch::nn::Module& m)
{
if (typeid(m) == typeid(torch::nn::Linear))
{
torch::nn::init::xavier_normal_(m->weight);
torch::nn::init::constant_(m->bias, 0.01);
}
}

int main() {

torch::nn::Sequential XORModel(
	torch::nn::Linear(2, 3),
	torch::nn::Functional(torch::tanh),
	torch::nn::Linear(3, 1),
	torch::nn::Functional(torch::sigmoid));


XORModel->apply(Init_Weights);

}

=================================================
The problem is that the code will not compile. Torch requires that definition of Init_Weights should have ‘torch::nn::Module& m’ as input. In this case ‘m->weight’ could not be resolved as type Module does not have ‘weight’
If I change definition of Init_Weights so that its input is of ‘torch::nn::Linear& m’ than Init_weights could not be passed to Apply.

Is there a way to initialize (Xavier normal) all weight in Linear modules part of Sequential module?

1 Like

Just to update - I found a solution. See code below:

void Init_Weights(torch::nn::Module& m)
{

if ((typeid(m) == typeid(torch::nn::LinearImpl)) || (typeid(m) == typeid(torch::nn::Linear))) {
	auto p = m.named_parameters(false);
	auto w = p.find("weight");
	auto b = p.find("bias");

	if (w != nullptr) torch::nn::init::xavier_uniform_(*w);
	if (b != nullptr) torch::nn::init::constant_(*b, 0.01);
}

}

1 Like

Even better solution in the manual - https://pytorch.org/cppdocs/api/classtorch_1_1nn_1_1_module.html#exhale-class-classtorch-1-1nn-1-1-module

template < typename ModuleType, typename = torch::detail::disable_if_module_holder_t>
ModuleType * as ()

Attempts to cast this Module to the given ModuleType .

This method is useful when calling apply() .

void initialize_weights(nn::Module& module) { torch::NoGradGuard no_grad; if (auto* linear = module.asnn::Linear()) { linear->weight.normal_(0.0, 0.02); } } MyModule module; module.apply(initialize_weights);

I am not sure I follow this solution. Could you kindly elaborate on how you achieved the initialization ?

Thanks!

In case anyone runs across this particular question again, the following should work as a simple solution:

void xavier_init(torch::nn::Module& module) {
	torch::NoGradGuard noGrad;
	if (auto* linear = module.as<torch::nn::Linear>()) {
		torch::nn::init::xavier_normal_(linear->weight);
		torch::nn::init::constant_(linear->bias, 0.01);
	}
}

Then for any Linear module or module with Linear submodules, you can just initialize via module->apply(xavier_init). I think this is basically what @cheggars was suggesting with their second response.

JamesDickensJames McCulloch Dickens

1m

What does the torch::NoGradGuard noGrad; line do?

It’s the same thing as Pytorch’s with torch.no_grad():, it’s just the C++ equivalent. They talk more about layer initialization and grad guards in this thread: Initialization in-place and `tensor.data`, and it’s also explained in the documentation here: Autograd mechanics — PyTorch 2.1 documentation

The implementations in torch.nn.init also rely on no-grad mode when initializing the parameters as to avoid autograd tracking when updating the intialized parameters in-place.