PyTorch crashes when running with OpenACC

estojoverde · August 2, 2022, 1:23pm

I already opened an issue (PyTorch crashes when running with OpenACC · Issue #82627 · pytorch/pytorch · GitHub), but I’m hopping the community could also give me pointers on what I could do to start debugging it.

Describe the bug

I’m binding OpenACC code with ctypes, and it is working fine. However, just by importing torch pkg, it crashes the application.

module_c.cpp

#include "module_c.h"

int addvector_cab(void)
{
    
    int i;
    float a[50];
    float b[50];
    float c[50];
    int n=50;

    for( i=0; i<n; i++)
    {
        a[i] = 1;
        b[i] = 1;
        c[i] = 0;
    }
    
    printf("ENTERED C FUNCTION!\n");

    if( n == 0 ){
        printf("DUMMY ERROR!\n");
        printf("EXITING C FUNCTION!\n");
        return(1);
    }
    
    #pragma acc parallel loop present_or_copyin(a,b) present_or_copyout(c)
    for(i = 0; i < n; i++){
        c[i] = a[i] + b[i];
    }

    printf("EXITING C FUNCTION!\n");
    return(0);
}

module_c.h :

#pragma once

#ifndef __MODULE_C_H_INCLUDED__
#define __MODULE_C_H_INCLUDED__

#include <iostream>
#include <string>
#include "openacc.h"
#include "stdlib.h"


extern "C" {

    int addvector_cab(void);
}

#endif

Compiling lines:

nvc++ -c -std=c++11 -acc -ta=multicore -fPIC -o module_c.o  module_c.cpp
nvc++ -shared -Minfo=acc -std=c++11 -mp -acc:gpu -gpu=pinned   -o mylib.so  module_c.o

bind.py :

import ctypes
#import torch

so_file = "./mylib.so"

my_functions = ctypes.CDLL(so_file)

my_functions.addvector_cab.restype = ctypes.c_int

if( my_functions.addvector_cab() == 0):
    print("Returned OKAY!")

Expected Outputs

One should expect:

ENTERED C FUNCTION!
EXITING C FUNCTION!
Returned OKAY!

However, importing PyTorch in bind.py (uncommeting line 2, nothing else changed) and running again, it returns:

ENTERED C FUNCTION!

libgomp: TODO

Not sure if is related, but I tried a similar approach with libtorch in C++, and whenever I tried to run a code with OpenACC and libtorch, same thing happened… it just crashed and output ‘libgomp: TODO’.

What I’m trying behind all this is to allocate a tensor via torch, share it with Cupy via Cuda_Array_Interface, and them use it in OpenACC (I’m already doing this last part without errors, if I allocated memory via Cupy). But the error I’m getting is way more basic than that… just by import torch, it crashes.

Any help/hint/axes are appreciated. =]

EDIT: Due to space constraints, I’ve simplified some parts… better documentation and example can be found here: GitHub - estojoverde/Torch_OpenACC at pytorch_openacc

Versions

Collecting environment information…
PyTorch version: 1.12.0
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.3
Libc version: glibc-2.31

Python version: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18) [GCC 10.3.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.49.1.el7.x86_64-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration:
GPU 0: Tesla V100-PCIE-32GB
GPU 1: Tesla V100-PCIE-32GB
GPU 2: Tesla V100-PCIE-32GB

Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.12.0
[pip3] torchaudio==0.12.0
[pip3] torchvision==0.13.0
[conda] blas 1.0 mkl anaconda
[conda] cudatoolkit 11.6.0 hecad31d_10 conda-forge
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640 anaconda
[conda] mkl-service 2.4.0 py38h95df7f1_0 conda-forge
[conda] mkl_fft 1.3.1 py38h8666266_1 conda-forge
[conda] mkl_random 1.2.2 py38h1abd341_0 conda-forge
[conda] numpy 1.22.3 py38he7a7128_0 anaconda
[conda] numpy-base 1.22.3 py38hf524024_0 anaconda
[conda] pytorch 1.12.0 py3.8_cuda11.6_cudnn8.3.2_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 0.12.0 py38_cu116 pytorch
[conda] torchvision 0.13.0 py38_cu116 pytorch