Why the c++ frontend cost so much memory?

Hi, I wrote a function to encode the text Ids by albert,but it cost much more memory than python. for 1.6k sentences, will cost more than 60GB memory, but every sentences are less than 10 words. Here is my code:

torch::Tensor encode(torch::Tensor texts, torch::Tensor attention_masks, torch::jit::script::Module model) {
    assert(texts.size(0) == attention_masks.size(0) && texts.size(1) == attention_masks.size(1));

    int n = texts.size(0);
    int _batch_size = 128;
    int nums = n / _batch_size;
    if (n % _batch_size != 0) {
    std::vector<torch::jit::IValue> inputs_texts;
    std::vector<torch::jit::IValue> inputs_attention_masks;
    for (int i = 0; i < nums; i++) {
        int start = i * _batch_size;
        int end = start + _batch_size;
        inputs_texts.push_back(texts.slice(0, start, end));
        inputs_attention_masks.push_back(attention_masks.slice(0, start, end));
    torch::Tensor rst;
    for (int i = 0; i < nums; i++) {
        std::vector<torch::jit::IValue> inputs;
        torch::Tensor temp_rst = model.forward(inputs).toTensor();

        if (i == 0) {
            rst = temp_rst;
        } else {
            rst = torch::cat({rst, temp_rst}, 1);


    return rst;

While the model is a torchscript model of albert small, I have modifid it to output the hidden_state.

Given that you’re doing only inference, you can add at the beginning of your function NoGradGuard guard;. This will disable the autograd and any extra buffer.
Does that help?