Some Puzzles On Nvidia's Implementation of Correlation Layer

Nowadays, I am reading source codes of Nvidia’s implementation of flownet2 with Pytorch. I am confused about how tensors and cuda variables are organized and how indexes are calculated by means of blocks indexes and threads indexes. Anyone familiar with this can solve my puzzles?