I agree with @googlebot what capturing the profiling information would be confusing and bad.
Note that your current code snippet uses a BoolTensor to index a, which will yield a variable sized output tensor (in your example you are using torch.ones_like, so all values would be returned).
This would call into nonzero, which needs to synchronize as seen here. Besides that, the origin of the to() op could be found in a profiler as already explained.
If you are using Nsight Systems, you could have a look at this post to see how to enable backtraces.