Expected more than 1 value per channel when training

HyungJoo-Kwon · October 28, 2022, 1:40am

I tested on Libtorch version 1.12.0
I am converting code (Python → C++)
And i have pre-trained model in Python.
Save model_state_dict to .pt file.
Load C++ model weight. And Inference in C++

// model.h
class Model : public torch::nn::Module {
public:
    explicit Model(int64_t k = 64);
    torch::Tensor forward(torch::Tensor x);

private:
    int64_t k;

    torch::nn::Conv1d conv1 = torch::nn::Conv1d(torch::nn::Conv1dOptions(k, 64, 1));
    torch::nn::Conv1d conv2 = torch::nn::Conv1d(torch::nn::Conv1dOptions(64, 128, 1));
    torch::nn::Conv1d conv3 = torch::nn::Conv1d(torch::nn::Conv1dOptions(128, 512, 1));
    torch::nn::Linear fc1 = torch::nn::Linear(torch::nn::LinearOptions(512, 256));
    torch::nn::Linear fc2 = torch::nn::Linear(256, 128);
    torch::nn::Linear fc3 = torch::nn::Linear(128, k * k);
    torch::nn::ReLU relu = torch::nn::ReLU();

    torch::nn::BatchNorm1d bn1 = torch::nn::BatchNorm1d(torch::nn::BatchNorm1dOptions(64).track_running_stats(false));
    torch::nn::BatchNorm1d bn2 = torch::nn::BatchNorm1d(torch::nn::BatchNorm1dOptions(128).track_running_stats(false));
    torch::nn::BatchNorm1d bn3 = torch::nn::BatchNorm1d(torch::nn::BatchNorm1dOptions(512).track_running_stats(false));
    torch::nn::BatchNorm1d bn4 = torch::nn::BatchNorm1d(torch::nn::BatchNorm1dOptions(256).track_running_stats(false));
    torch::nn::BatchNorm1d bn5 = torch::nn::BatchNorm1d(torch::nn::BatchNorm1dOptions(128).track_running_stats(false));
};

// model.cpp
Model ::Model (int64_t k) : k(k)
{
    register_module("conv1", conv1);
    register_module("conv2", conv2);
    register_module("conv3", conv3);
    register_module("fc1", fc1);
    register_module("fc2", fc2);
    register_module("fc3", fc3);
    register_module("relu", relu);
    register_module("bn1", bn1);
    register_module("bn2", bn2);
    register_module("bn3", bn3);
    register_module("bn4", bn4);
    register_module("bn5", bn5);
}
torch::Tensor Model ::forward(torch::Tensor x)
{
    x = conv1->forward(x);
    x = bn1->forward(x);
    x = torch::nn::functional::relu(x);
    x = conv2->forward(x);
    x = bn2->forward(x);
    x = torch::nn::functional::relu(x);
    x = conv3->forward(x);
    x = bn3->forward(x);                            // Matched Python result...
    x = torch::nn::functional::relu(x);
    x = std::get<0>(x.max(2, true));
    x = x.view({ -1, 512 });
    x = fc1->forward(x);
    x = bn4->forward(x);                          // Error : Expected more than 1 value per channel when training
    x = relu->forward(x);
    x = fc2->forward(x);
    x = bn5->forward(x);
    x = relu->forward(x);
    x = fc3->forward(x);

    return x;
}

// main.cpp
auto container = torch::jit::load("python_model.pt");

try {
	auto model_ = std::make_shared<Model >(64);
	for (auto p : model_->named_parameters().keys())
		model_->named_parameters()[p].data() = container.attr(p).toTensor();
	model_->eval();
	auto input = torch::randn({ 1, 64, 9999 });
	auto output = model_->forward(input);
}
catch (const c10::Error& e)
{
	cout << e.msg() << endl;
}

BatchNorm “track_running_stats(True)” is not occured Error. But not matched python result.
I compared the results of each layer in Python and C++…
BatchNorm “track_running_stats(false)” is occured Error in bn4. When I used this Before bn4, matched python result.
So i want to using "track_running_stats(false):in bn4.
Is there How to solve this problem ??
Or Please give me another idea…

ptrblck · October 28, 2022, 1:50am

Double post from here.
As already described, you need to provide more than a single element to batchnorm layers in training mode so that stats can be calculated. Calling eval() was a suggested workaround, which might work if you are using a pretrained model, since the already trained running stats are used.

HyungJoo-Kwon · October 28, 2022, 3:14am

hmm, I don’t consider training mode in C++. Only conisder evaluate mode.
i would like to evaluate one data from C++ using a pre-trained model made of Python.

So, I change input_1 [1, 15, 9999] → input_2[2, 15, 9999] using expand method. (x.expand(2,-1,-1))
And It was worked…!
I have question…
If same batch norm layer passed, result_1[1, 256] and result_2[2, 256] is same normalized result in bn4?
I mean result_1 and result_2[0] be the same in the code process above?

ptrblck · October 28, 2022, 5:24am

No, these results won’t be the same since you are disabling the running stats. The batchnorm layers will then even in eval() mode use the batch statistics to normalize the activation and the result will thus differ (if the calculation is even possible, which might not be the case as seen in your example).

HyungJoo-Kwon · October 28, 2022, 8:28am

Thank you for your reply.
I’ll try more !