Running all Source code unit tests again

Hello, first time I post here, I think.

We are installing the latest version of PyTorch on our supercomputers here. We have a couple hundred NVIDIA V100 16gb, around 5000 NVDIA A100 40gb, and a couple dozen other less important gpus. Therefore it’s important to get this right.

We compile all our software from source. And PyTorch, on our shared filesystems, with all unit tests running and some failing, takes up to 19 hours.

To avoid that, I would like to compile it once per system and run the tests separately, so one can check what is with each test without going over the whole process again.

We install our software with EasyBuild. Which means that we don’t keep (necessarily) the source/build directories after a package gets installed. I can keep them, if needed, though.

My question is: How to run the unit tests again? There’s a test directory of pytorch, but that seems to be a much smaller subset of tests.

Currently, this is the set of all failed tests on our systems:

       # This test seems to take too long on NVIDIA Ampere at least.
        'distributed/test_distributed_spawn',
        'dynamo/test_misc',
        'dynamo/test_dynamic_shapes',
        'test_ops_fwd_gradients',
        'test_proxy_tensor',
        'distributed/_tensor/test_device_mesh',
        'distributed/elastic/multiprocessing/api_test',
        'distributed/test_c10d_common',
        'distributed/test_c10d_gloo',
        'distributed/test_c10d_nccl',
        'distributed/test_dynamo_distributed',
        'distributed/test_inductor_collectives',
        'test_sparse_csr',
        # no xdoctest
        'doctests',
        'inductor/test_compiled_autograd',
        'distributed/_composable/fully_shard/test_fully_shard_compile',
        'distributed/_composable/fully_shard/test_fully_shard_init',
        'distributed/_composable/fully_shard/test_fully_shard_mixed_precision',
        'distributed/_composable/fully_shard/test_fully_shard_model_checkpoint',
        'distributed/_composable/fully_shard/test_fully_shard_optim_checkpoint',
        'distributed/_composable/fully_shard/test_fully_shard_runtime',
        'distributed/_composable/fully_shard/test_fully_shard_util',
        'distributed/_composable/test_compose',
        'distributed/_shard/sharded_optim/test_sharded_optim',
        'distributed/_shard/sharded_tensor/ops/test_binary_cmp',
        'distributed/_shard/sharded_tensor/ops/test_embedding',
        'distributed/_shard/sharded_tensor/ops/test_embedding_bag',
        'distributed/_shard/sharded_tensor/ops/test_init',
        'distributed/_shard/sharded_tensor/ops/test_tensor_ops',
        'distributed/_shard/sharded_tensor/test_sharded_tensor',
        'distributed/_shard/sharded_tensor/test_sharded_tensor_reshard',
        'distributed/_shard/sharding_plan/test_sharding_plan',
        'distributed/_shard/sharding_spec/test_sharding_spec',
        'distributed/_tensor/test_api',
        'distributed/_tensor/test_common_rules',
        'distributed/_tensor/test_dtensor',
        'distributed/_tensor/test_dtensor_compile',
        'distributed/_tensor/test_dtensor_ops',
        'distributed/_tensor/test_embedding_ops',
        'distributed/_tensor/test_init',
        'distributed/_tensor/test_math_ops',
        'distributed/_tensor/test_matrix_ops',
        'distributed/_tensor/test_random_ops',
        'distributed/_tensor/test_redistribute',
        'distributed/_tensor/test_tensor_ops',
        'distributed/algorithms/ddp_comm_hooks/test_ddp_hooks',
        'distributed/algorithms/test_join',
        'distributed/checkpoint/test_checkpoint',
        'distributed/checkpoint/test_dtensor_checkpoint',
        'distributed/checkpoint/test_file_system_checkpoint',
        'distributed/checkpoint/test_fsdp_model_state',
        'distributed/checkpoint/test_fsdp_optim_state',
        'distributed/checkpoint/test_fsspec',
        'distributed/fsdp/test_distributed_checkpoint',
        'distributed/fsdp/test_fsdp_apply',
        'distributed/fsdp/test_fsdp_backward_prefetch',
        'distributed/fsdp/test_fsdp_checkpoint',
        'distributed/fsdp/test_fsdp_clip_grad_norm',
        'distributed/fsdp/test_fsdp_comm',
        'distributed/fsdp/test_fsdp_comm_hooks',
        'distributed/fsdp/test_fsdp_core',
        'distributed/fsdp/test_fsdp_dtensor_state_dict',
        'distributed/fsdp/test_fsdp_exec_order',
        'distributed/fsdp/test_fsdp_fine_tune',
        'distributed/fsdp/test_fsdp_flatten_params',
        'distributed/fsdp/test_fsdp_freezing_weights',
        'distributed/fsdp/test_fsdp_grad_acc',
        'distributed/fsdp/test_fsdp_hybrid_shard',
        'distributed/fsdp/test_fsdp_ignored_modules',
        'distributed/fsdp/test_fsdp_input',
        'distributed/fsdp/test_fsdp_memory',
        'distributed/fsdp/test_fsdp_meta',
        'distributed/fsdp/test_fsdp_misc',
        'distributed/fsdp/test_fsdp_mixed_precision',
        'distributed/fsdp/test_fsdp_multiple_forward',
        'distributed/fsdp/test_fsdp_multiple_wrapping',
        'distributed/fsdp/test_fsdp_optim_state',
        'distributed/fsdp/test_fsdp_overlap',
        'distributed/fsdp/test_fsdp_pure_fp16',
        'distributed/fsdp/test_fsdp_sharded_grad_scaler',
        'distributed/fsdp/test_fsdp_state_dict',
        'distributed/fsdp/test_fsdp_tp_integration',
        'distributed/fsdp/test_fsdp_traversal',
        'distributed/fsdp/test_fsdp_uneven',
        'distributed/fsdp/test_fsdp_unshard_params',
        'distributed/fsdp/test_fsdp_use_orig_params',
        'distributed/fsdp/test_shard_utils',
        'distributed/fsdp/test_wrap',
        'distributed/optim/test_zero_redundancy_optimizer',
        'distributed/tensor/parallel/test_ddp_2d_parallel',
        'distributed/tensor/parallel/test_fsdp_2d_parallel',
        'distributed/tensor/parallel/test_parallelize_api',
        'distributed/tensor/parallel/test_tp_examples',
        'distributed/tensor/parallel/test_tp_random_state',
        'distributed/tensor/parallel/test_tp_style',
        'distributed/tensor/parallel/test_view_sharding_dim_change',
        'distributed/test_c10d_logger',
        'distributed/test_c10d_object_collectives',
        'test_cuda',
        'test_cuda_expandable_segments',
        'test_cuda_multigpu',
        'test_jit',
        'test_jit_legacy',
        'test_jit_profiling',
        'test_nn',
        'test_ops',
        'test_quantization',
        'distributed/algorithms/quantization/test_quantization',
        'distributed/rpc/cuda/test_tensorpipe_agent',
        'distributed/test_c10d_spawn_nccl',
        'distributed/test_data_parallel',
        'distributed/test_functional_api',
        'distributed/test_pg_wrapper',
        'nn/test_parametrization',
        'test_autograd',

Answering myself if someone needs this info again (probably me in a couple months when I forget):

The source of PyTorch has a test/run_test.py which seems to be the one which does the same as during compile time.