Hello guys, I’m now training an DDPG model with libtorch c++ api
before the network training, I’ve called
int seed = SEED;
srand(seed);
torch::manual_seed(SEED)
in the main function,
but I still get different result every time.
Here are my network structures:
Critic:
net = register_module("Sequential", nn::Sequential(
nn::Linear(ops.state_dim + ops.action_dim + ops.max_num_contour, 32),
nn::ReLU(),
nn::Linear(32, 64),
nn::ReLU(),
nn::Linear(64, 32),
nn::ReLU(),
nn::Linear(32, 1)
));
Actor:
net = register_module("Sequential",
nn::Sequential(
nn::Linear(ops.state_dim + ops.max_num_contour, 64),
nn::ReLU(),
nn::Linear(64, 32),
nn::ReLU(),
nn::Linear(32, (ops.action_range.second - 1) + 2)
)
);
Optimizers:
optimizer_q = std::make_shared<torch::optim::Adam>(net->q->parameters(), lr_q);
optimizer_pi = std::make_shared<torch::optim::Adam>(net->pi->parameters(), lr_pi);
Weight Initializations
{
torch::NoGradGuard no_grad;
// weight init
auto initialize_weights_norm = [](nn::Module& module) {
torch::NoGradGuard no_grad;
if (auto* linear = module.as<nn::Linear>()) {
torch::nn::init::kaiming_uniform_(linear->weight);
torch::nn::init::constant_(linear->bias, 0.01);
}
};
this->apply(initialize_weights_norm);
}
And the c-library rand() is used for exploration.
Is it possible that there are other function or algorithm that has indeterministic behaviors that I didn’t notice?