PyTorch crashes when running with OpenACC

I already opened an issue (PyTorch crashes when running with OpenACC · Issue #82627 · pytorch/pytorch · GitHub), but I’m hopping the community could also give me pointers on what I could do to start debugging it.


:bug: Describe the bug

I’m binding OpenACC code with ctypes, and it is working fine. However, just by importing torch pkg, it crashes the application.

module_c.cpp

#include "module_c.h"

int addvector_cab(void)
{
    
    int i;
    float a[50];
    float b[50];
    float c[50];
    int n=50;

    for( i=0; i<n; i++)
    {
        a[i] = 1;
        b[i] = 1;
        c[i] = 0;
    }
    
    printf("ENTERED C FUNCTION!\n");

    if( n == 0 ){
        printf("DUMMY ERROR!\n");
        printf("EXITING C FUNCTION!\n");
        return(1);
    }
    
    #pragma acc parallel loop present_or_copyin(a,b) present_or_copyout(c)
    for(i = 0; i < n; i++){
        c[i] = a[i] + b[i];
    }

    printf("EXITING C FUNCTION!\n");
    return(0);
}

module_c.h :

#pragma once

#ifndef __MODULE_C_H_INCLUDED__
#define __MODULE_C_H_INCLUDED__

#include <iostream>
#include <string>
#include "openacc.h"
#include "stdlib.h"


extern "C" {

    int addvector_cab(void);
}

#endif

Compiling lines:

nvc++ -c -std=c++11 -acc -ta=multicore -fPIC -o module_c.o  module_c.cpp
nvc++ -shared -Minfo=acc -std=c++11 -mp -acc:gpu -gpu=pinned   -o mylib.so  module_c.o

bind.py :

import ctypes
#import torch

so_file = "./mylib.so"

my_functions = ctypes.CDLL(so_file)

my_functions.addvector_cab.restype = ctypes.c_int

if( my_functions.addvector_cab() == 0):
    print("Returned OKAY!")

Expected Outputs

One should expect:

ENTERED C FUNCTION!
EXITING C FUNCTION!
Returned OKAY!

However, importing PyTorch in bind.py (uncommeting line 2, nothing else changed) and running again, it returns:

ENTERED C FUNCTION!

libgomp: TODO

Not sure if is related, but I tried a similar approach with libtorch in C++, and whenever I tried to run a code with OpenACC and libtorch, same thing happened… it just crashed and output ‘libgomp: TODO’.

What I’m trying behind all this is to allocate a tensor via torch, share it with Cupy via Cuda_Array_Interface, and them use it in OpenACC (I’m already doing this last part without errors, if I allocated memory via Cupy). But the error I’m getting is way more basic than that… just by import torch, it crashes.

Any help/hint/axes are appreciated. =]

EDIT: Due to space constraints, I’ve simplified some parts… better documentation and example can be found here: https://github.com/estojoverde/Torch_OpenACC/blob/pytorch_openacc

Versions

Collecting environment information…
PyTorch version: 1.12.0
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.3
Libc version: glibc-2.31

Python version: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18) [GCC 10.3.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.49.1.el7.x86_64-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration:
GPU 0: Tesla V100-PCIE-32GB
GPU 1: Tesla V100-PCIE-32GB
GPU 2: Tesla V100-PCIE-32GB

Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.12.0
[pip3] torchaudio==0.12.0
[pip3] torchvision==0.13.0
[conda] blas 1.0 mkl anaconda
[conda] cudatoolkit 11.6.0 hecad31d_10 conda-forge
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640 anaconda
[conda] mkl-service 2.4.0 py38h95df7f1_0 conda-forge
[conda] mkl_fft 1.3.1 py38h8666266_1 conda-forge
[conda] mkl_random 1.2.2 py38h1abd341_0 conda-forge
[conda] numpy 1.22.3 py38he7a7128_0 anaconda
[conda] numpy-base 1.22.3 py38hf524024_0 anaconda
[conda] pytorch 1.12.0 py3.8_cuda11.6_cudnn8.3.2_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 0.12.0 py38_cu116 pytorch
[conda] torchvision 0.13.0 py38_cu116 pytorch