when indices and offset are all 2D,
CPU:
GPU :
Could you post a minimal and executable code snippet by wrapping them into three backticks ``` so we could try to reproduce and debug the issue?