Slowdown in CUDA graph execution using cudaMemsetAsync (vs. a fill kernel)

ptrblck · October 4, 2024, 12:25pm

This is an interesting observation as this topic claims the opposite at least for eager execution.

Could you post your full extension code to reproduce the issue?