How is the data of tensor iterated within `serial_for_each` func of TensorIterator.cpp?

void TensorIterator::serial_for_each(loop2d_t loop, Range range) const {
  if (range.size() == 0) {
    return;
  }
  auto strides = get_strides();
  while (strides.size() < 2 * ntensors()) {//why would this multiplied by 2?    
      strides.push_back(0);
  }

  auto base_ptrs = get_base_ptrs();
  if (ndim() <= 1) {
    auto ptrs = get_data_ptrs(base_ptrs, { range.begin });
    loop(ptrs.data(), strides.data(), range.size(), 1);
  } else {
    auto counter = DimCounter(shape_, range);
    while (!counter.is_done()) {
      auto ptrs = get_data_ptrs(base_ptrs, counter.values);
      auto step = counter.max_2d_step();
      loop(ptrs.data(), strides.data(), step[0], step[1]);
      counter.increment(step);
    }
  }
}

I can’t figure the DimCounter out. How does the iteration carries out? I am really new to Pytorch Sourcecode. Are there any resources about TensorIterator details? Thanks.

No one shows interest on this. OK, I have found some articles about Pytorch Internals about TensorIterator: