About run_test.py

thnkim · February 10, 2019, 6:29am

When I build the source code, sometimes I failed to get ‘PASSED’ by running run_test.py.
(Some builds are ok, but some builds are not.)

Do I need to roll back to the previous commit or can I just ignore the failures/errors?
(I guess that commits in master branch have been passed unit tests.)

albanD · February 11, 2019, 10:07am

Hi,

Which tests exactly are causing issues? Some of them may be flaky but we would like to fix them if possible.

thnkim · February 12, 2019, 9:56am

Hello,
I tested using commit a9f1d2e3711476ba4189ea804488e5264a4229a8 (and some previous versions).

I’m using

Ubuntu 16.04
conda 4.6.2
python 3.7
cuda 10.0.130_410.48 (driver 410.48)
cudnn 7.4.2.24
magma for cuda10 (installed by conda install -c cpbotha magma-cuda10)

The error message I got is:

python test_cuda.py 
......ssss........ssssssssssssssssssssssss..................................................................................................................................................................ssss........ssssssssssssssssssssssss........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ssss........ssssssssssssssssssssssss........................................................................................................................................................................ssss........ssssssssssssssssssssssss........................................................................................................................................................................ssss........ssssssssssssssssssssssss.....................................................................................................................................................................................EE.
python: /home/ubuntu/miniconda3/conda-bld/magma-cuda10_1543042134558/work/interface_cuda/interface.cpp:732: void magma_queue_create_internal(magma_device_t, magma_queue**, const char*, const char*, int): Assertion `queue->dBarray__ != NULL' failed.
Aborted (core dumped)

If you need more information, please let me know.
Thank you.

albanD · February 12, 2019, 1:20pm

The error here comes from magma apparently.
I’m afraid I’m not very familiar with this part of the code but it is used only for linear algebra on the GPU. So if you don’t use these, you will be safe
Also master should always pass the tests (it might have been broken briefly at some point but that should be rare).

thnkim · February 12, 2019, 2:45pm

I see.
BTW, some other test files like test_cpp_extensions.py seems to be outdated. It outputs errors like

======================================================================
ERROR: test_backward (__main__.TestCppExtension)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_cpp_extensions.py", line 72, in test_backward
    mm = cpp_extension.MatrixMultiplier(4, 8)
NameError: name 'cpp_extension' is not defined

======================================================================
ERROR: test_cuda_extension (__main__.TestCppExtension)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_cpp_extensions.py", line 113, in test_cuda_extension
    import torch_test_cpp_extension.cuda as cuda_extension
ModuleNotFoundError: No module named 'torch_test_cpp_extension'

I guess the sub-directory cpp_extension has been changed to cpp_extensions recently.

Thank you