Question about THTensorDimApply

In the code of macroTH_TENSOR_DIM_APPLY2(TYPE1, TENSOR1, TYPE2, TENSOR2, DIMENSION, CODE)

I am confusing about the part of follows:

if(TH_TENSOR_DIM_APPLY_counter[TH_TENSOR_DIM_APPLY_i] == TENSOR1->size[TH_TENSOR_DIM_APPLY_i]) \
 \
        if(TH_TENSOR_DIM_APPLY_i == TENSOR1->nDimension-1) \
        { \
          TH_TENSOR_DIM_APPLY_hasFinished = 1; \
          break; \
        } \
        else \
        { \
          TENSOR1##_data -= TH_TENSOR_DIM_APPLY_counter[TH_TENSOR_DIM_APPLY_i]*TENSOR1->stride[TH_TENSOR_DIM_APPLY_i]; \
          TENSOR2##_data -= TH_TENSOR_DIM_APPLY_counter[TH_TENSOR_DIM_APPLY_i]*TENSOR2->stride[TH_TENSOR_DIM_APPLY_i]; \
          TH_TENSOR_DIM_APPLY_counter[TH_TENSOR_DIM_APPLY_i] = 0; \
        } \
      } \
      else \
        break; \

Like the tensor: [[1,1,1;1,1,1],[1,1,1;1,1,1]](2x3x2), use the THTensor_(cumsum) operation in dimension 1(the size is 3), after finish the dimension 0,this code will set TH_TENSOR_DIM_APPLY_counter[TH_TENSOR_DIM_APPLY_i] = 0; ???
And When we operate in dimenison 2, the dimension 0 need to be run again ? (this will increase the steps of loop)

Might I misundestand something. thank you advance.

Extra: (is this code can be changed as follows?)

      if(TH_TENSOR_DIM_APPLY_counter[TH_TENSOR_DIM_APPLY_i]>TENSOR1->size[TH_TENSOR_DIM_APPLY_i]) \
	    continue;\
	  TH_TENSOR_DIM_APPLY_counter[TH_TENSOR_DIM_APPLY_i]++; \
      TENSOR1##_data += TENSOR1->stride[TH_TENSOR_DIM_APPLY_i]; \
      TENSOR2##_data += TENSOR2->stride[TH_TENSOR_DIM_APPLY_i]; \
      \
      if(TH_TENSOR_DIM_APPLY_counter[TH_TENSOR_DIM_APPLY_i] == TENSOR1->size[TH_TENSOR_DIM_APPLY_i]) \
      { \
        if(TH_TENSOR_DIM_APPLY_i == TENSOR1->nDimension-1) \
        { \
          TH_TENSOR_DIM_APPLY_hasFinished = 1; \
          break; \
        } \
        else \
        { \
          TENSOR1##_data -= TH_TENSOR_DIM_APPLY_counter[TH_TENSOR_DIM_APPLY_i]*TENSOR1->stride[TH_TENSOR_DIM_APPLY_i]; \
          TENSOR2##_data -= TH_TENSOR_DIM_APPLY_counter[TH_TENSOR_DIM_APPLY_i]*TENSOR2->stride[TH_TENSOR_DIM_APPLY_i]; \
          TH_TENSOR_DIM_APPLY_counter[TH_TENSOR_DIM_APPLY_i] =TENSOR1->size[TH_TENSOR_DIM_APPLY_i]+1 ; \
        } \
      } \
      else \
        break; \

After rebuild, I found two method can out the same answer (test on cumsum), however, there is no “speed increase”. :persevere: