Hi,
As far as I know, pytorch cuda implement will call cuda runtime API.
I install cuda10.2 and torch-1.8.1+cu102-cp39-cp39 in a centos7 host, and it works well.
I hijacked pytorch cuda function __cudaRegisterFatBinary by LD_PRELOAD, and convert the args(void fatCubin) to __fatBinC_Wrapper_t, finally I extracted the elf from it. But the elf.e_ident start with “A2 7F 45 4C 46 …”. However, a valid elf.e_ident should start with “7F 45 4C 46 …”. May I ask where I went wrong?
extern "C" __host__ void **__cudaRegisterFatBinary(void *fatCubin) {
__fatBinC_Wrapper_t *bin = (__fatBinC_Wrapper_t *)fatCubin;
char *data = (char *)bin->data;
NvFatCubin *pFatCubin = (NvFatCubin *)data;
Elf64_Ehdr *eh = &(pFatCubin->elf);
for (int i = 0; i < 16; ++i) {
printf("%X ", eh->e_ident[i]);
}
}