Is it possible to script this module?

Hello everyone, hope you are all having a great day.
I was wondering if its possible to convert the following module into torch script:

class PriorBox(torch.nn.Module):
    def __init__(self):
        super(PriorBox, self).__init__()
    # @torch.jit.script
    def forward(self, minSizes, steps, clip, image_size):
        anchors = []
        feature_maps = [[ceil(image_size[0]/step), ceil(image_size[1]/step)] for step in steps]

        for k, f in enumerate(feature_maps):
            min_sizes = minSizes[k]
            for i, j in product(range(f[0]), range(f[1])):
                for min_size in min_sizes:
                    s_kx = min_size / image_size[1]
                    s_ky = min_size / image_size[0]
                    dense_cx = [x * steps[k] / image_size[1] for x in [j + 0.5]]
                    dense_cy = [y * steps[k] / image_size[0] for y in [i + 0.5]]
                    for cy, cx in product(dense_cy, dense_cx):
                        anchors += [cx, cy, s_kx, s_ky]

        # back to torch land
        output = torch.Tensor(anchors).view(-1, 4)
        if clip:
            output.clamp_(max=1, min=0)
        return output

I have reimplemented this in libtorch but its slow as snail!
here it is in case its needed:

struct PriorBox : torch::nn::Module
    PriorBox(const std::vector<std::pair<torch::Tensor, torch::Tensor>>& min_sizes, const std::vector<int>& steps, const bool& clip, const std::pair<int, int>& image_size, const std::string& phase = "train")
        this->min_sizes = min_sizes;
        this->steps = steps;
        this->clip = clip;
        this->image_size = image_size;
        this->phase = phase;
        this->name = name;
        for (auto& step : this->steps)
            auto height = torch::tensor(float(this->image_size.first) / step);
            auto width = torch::tensor(float(this->image_size.second) / step);
            this->feature_maps.emplace_back(std::make_pair(torch::ceil(height), torch::ceil(width)));
    torch::Tensor forward()
        std::vector<torch::Tensor> anchors;
        int i = -1;
        for (auto& fmap : this->feature_maps)
            auto min_sizes = this->min_sizes[++i];
            auto result = torch::cartesian_prod({ torch::arange(c10::Scalar(0), c10::Scalar(fmap.first.item().toInt()), c10::Scalar(1)),
                                                  torch::arange(c10::Scalar(0), c10::Scalar(fmap.second.item().toInt()), c10::Scalar(1)) });
            for (int idx = 0; idx <= result.sizes()[0] - 1; idx++)
                //takes around 0.006 ms
                auto i_ = result[idx][0];
                auto j_ = result[idx][1];
                //takes around 0.20 ms to 0.30 ms
                for (auto& min_size : { min_sizes.first, min_sizes.second })
                    auto s_kx = min_size / float(this->image_size.second);
                    auto s_ky = min_size / float(this->image_size.first);
                    // takes around 0.037 ms
                    torch::Tensor dense_cx = (j_ + 0.5) * this->steps[i] / this->image_size.second;
                    torch::Tensor dense_cy = (i_ + 0.5) * this->steps[i] / this->image_size.first;
                    //takes around 0.02ms
                    auto result_cy_cx = torch::cartesian_prod({ dense_cy.unsqueeze(0), dense_cx.unsqueeze(0) });
                    //this takes around 0.010ms
                    for (int l = 0; l <= result_cy_cx.sizes()[0] - 1; l++)
                        auto cy = result_cy_cx[l][0].unsqueeze(0);
                        auto cx = result_cy_cx[l][1].unsqueeze(0);
                        anchors.emplace_back(torch::cat({ cx, cy, s_kx, s_ky }));
        //takes around 5ms!
        auto output = torch::stack(anchors).view({ -1,4 });
        //takes around 0 ms!
        if (this->clip)
            output.clamp_(0, 1);
        return output;

    std::vector<std::pair<torch::Tensor, torch::Tensor>> min_sizes;
    std::vector<int> steps;
    bool clip;
    std::pair<int, int> image_size;
    std::string phase;
    std::string name;
    std::vector<std::pair<torch::Tensor, torch::Tensor>> feature_maps;

So I thought maybe converting this into a Torchscript and loading it in C++ would be a better idea.
Since we are dealing with loops, etc we cant simply jit trace the module. So I guess I’m stuck with scripting.
When I tried to do scripting I kept getting different errors such asrange cannot be used as a value:

range cannot be used as a value:
  File "P:\ligen\layers\functions\", line 19
        for k, f in enumerate(feature_maps):
            min_sizes = minSizes[k]
            for i, j in product(range(int(f[0])), range(int(f[1]))):
                                ~~~~~~~~~~~~~~ <--- HERE
                for min_size in min_sizes:
                    s_kx = min_size / image_size[1]

So my question is, considering this, is it portable to torchscript? if so What am I missing here? how should I go about this?
If not, what are my other options?
Thanks a lot in advance

In general, scripting requires you to modify your input program so that it only uses the features that torch.jit.script supports. Take a look at the TorchScript Language Reference for more information.

Thanks, I apprecaite your kind help, but I know about it.
The issue is I’m not sure how I should be going about this! how can I substitute the range as I can’t find it in the supported list and I’m at a complete loss of idea on how to go about this!

Torchscript allows for range. I think its just your product(range, range) that’s the problem. Maybe just nest for loops.

Thanks a lot.
I did try the range outside of the product and it failed with the same error nevertheless.
I also substituted the product with the PyTorch’s built-in cartesian-prod if I remember correctly but that seems not to work either.

Torchscript code won’t be much faster. Torchscript doesn’t speed up your code by a factor of 2. If
If you reimplemented in libtorch and code is slow, also it will be with Torchscript.

Finally, I think you could avoid some list comprehension like with dense_cx and dense_cy

Thanks a lot. I already stripped down much of libtorch machinery and resorted back to the old plain for loops and got around 3x speed up!
Having that part jit scripted should in theory at least help in performance, as I believe (and expect) the instructions are optimized as well by the engine) at least this is what I hoped for.
I guess I’ll be able to hopefully extract a bit more performance out of it (making it around 4x faster), but thats still no way around the performance I get in Python!
I’m not sure if this is because something is wrong with libtorch’s cartesian_prod implementation that takes too much compared to its counterpart (Python’s product) or some thing else that I’m missing completely. I know the implementation posted above is not efficient by any means, but I really didn’t expect this to be this slow!
To give you an impression, Python code runs around 20,30 ms while this takes around 350!450 ms!
(after my last changes, it now runs at 70~100ms )but still way slower than Python!