Forward method engages only 1 CPU core when written in C++, while it engages multiple cores when written in Python

kovanostra · June 7, 2020, 2:48pm

Hello

I have a custom pytorch model of which I moved the forward method from python to C++ to speed it up. However, I notice that the pure Python version is engaging multiple cores while training, while the C++ version is engaging only 1 core.

I would like to ask if there is anything that specifically controls this particular behaviour and I need to make sure to add in my code or is it just that my C++ implementation is inefficient?

Code for both Python and C++ versions can be seen below:

Model in Python:

github.com

kovanostra/message-passing-neural-network/blob/master/message_passing_nn/model/graph_rnn_encoder.py

from typing import List

import torch as to
import torch.nn as nn

from message_passing_nn.data.data_preprocessor import DataPreprocessor
from message_passing_nn.model.node import Node


class GraphRNNEncoder(nn.Module):
    def __init__(self,
                 time_steps: int,
                 number_of_nodes: int,
                 number_of_node_features: int,
                 fully_connected_layer_input_size: int,
                 fully_connected_layer_output_size: int,
                 device: str) -> None:
        super(GraphRNNEncoder, self).__init__()
        node_features_tensor_shape = [number_of_node_features, number_of_node_features]
        nodes_tensor_shape = [number_of_nodes, number_of_nodes]

This file has been truncated. show original

Forward/Backward methods in C++:

github.com

kovanostra/message-passing-neural-network/blob/v1.5.0/message_passing_nn/model/graph_rnn_encoder.cpp

#include <torch/extension.h>
#include <iostream>

std::vector<int> find_nonzero_elements(const torch::Tensor& tensor){
  std::vector<int> vector;
  for(int index=0; index<tensor.sizes()[0]; index++){
    if(tensor[index].item<int>()!=0) {
       vector.push_back(index);
      }
  }
  return vector;
}

int find_index_by_value(const std::vector<int>& vector, const int& value){
  auto vector_size = static_cast<int>(vector.size());
  for (int index = 0; index < vector_size; ++index) {
    if (vector[index]==value) return index;
  }
  return -1;
}

This file has been truncated. show original