LibTorch C++ possible problem in torch::data::transforms::Normalize<>

Hi, I recently started using the C++ API, and need to standardize my tabular data similar to the python “sklearn.preprocessing.StandardScaler”. I did figure out the “map” utility that uses “torch::data::transforms” object, and atleast on paper, the torch::data::transforms::Normalize<> seems to be what I want. But applying that gives me unwanted dimensions, and it seems to arise from the two extra unsqueeze oprations I found in the implementation here:

template <typename Target = Tensor>
struct Normalize : public TensorTransform<Target> {
  Normalize(ArrayRef<double> mean, ArrayRef<double> stddev)
      : mean(torch::tensor(mean, torch::kFloat32)
                 .unsqueeze(/*dim=*/1)
                 .unsqueeze(/*dim=*/2)),
        stddev(torch::tensor(stddev, torch::kFloat32)
                   .unsqueeze(/*dim=*/1)
                   .unsqueeze(/*dim=*/2)) {}

  torch::Tensor operator()(Tensor input) override {
    return input.sub(mean).div(stddev);
  }

  torch::Tensor mean, stddev;
};

I assume this is because it expects scalars for “mean” and “stddev” that have to be applied to an image? Is this intentional or a bug? because in its present form, using this for tabular data with 1-D rows becomes quite convoluted. Would it be possible to change it so that it accepts just torch::Tensor types for mean and stdDev, each element of the tensor corresponding to mean and stdDev of a feature column? with the onus of ensuring the matching of dimensions on the user?

I believe your guess is correct and the expected use case were images. Note that this transformation was added ~6 years ago in this PR and I don’t know if the C++ API is maintained anymore, so you might want to implement a custom transformation without the unsqeeze calls.

Thanks @ptrblck for the reply. For anyone looking for something similar, this is the sort of ad-hoc solution I use for my particular use case:

struct Normalize2 : public torch::data::transforms::TensorTransform<torch::Tensor> {
  Normalize2(std::vector<double> means, std::vector<double> stddevs):   
                mean(torch::tensor(means)), stddev(torch::tensor(stddevs)) {}

  torch::Tensor operator()(torch::Tensor input) override {
    torch::Tensor output = std::move(input);
    return output.sub(mean).div(stddev);
  }

  torch::Tensor mean, stddev;
};

I was hoping someone maintaining libtorch could implement a more generalized version of this, but I guess that wont be the case now :sweat_smile:.

Btw, maybe a newbie question, but what C++ ML library would you recommend instead of libtorch (since I get the vibes that it isnt maintained and documentation is very sparse)? For more context, I am a physics grad student, who was using tensorflow in python until a week ago. I just want to migrate to C++ because I like it better than python

This is an interesting take as usually it’s the other way. :wink:
I honestly don’t know which framework I would recommend and would hope the libtorch C++ API will be maintained (even if not further developed). @albanD might have more insight here.

Yeah I would probably have been the same if our community didn’t use the ROOT C++ data analysis framework, a framework optimized for fast and efficient analysis of millions of collider events, which has us already write almost all of our analyses in C++. I almost entirely use python just for TensorFlow.

1 Like

Hey!

I think it depends a little bit what you are looking for.
libtorch will most likely be your best bet for very wide coverage and good performance. You will give up on the ML Compiler tools like torch.compile though, cool python-level extension points, the nn/optim/distributed APIs there are not really worked on either. But foundamental components like kernels, autograd, etc that are used from python will defintely continue to work as expected.

If you’re looking for a more c++ centric option and for a more research-y framework, GitHub - flashlight/flashlight: A C++ standalone library for machine learning is an actively developed one that I know of. There might be others as well but I don’t know off the top of my head.

1 Like