And the statistics for Workstation 1 (problematic) with three A5000 GPU are listed as follows:
Workstation 1, 1GPU:
Use Cuda GPU!
Let's use 1 GPUs!
Epoch: 0
0.0009272070497193106
0.0003752760953701497
0.00031495944450507437
0.0002910588679051464
0.0002672624948959456
0.0002367020626337095
0.00022315971894642215
0.00020790642670759333
1.533455 minutes passed!
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
CudnnConvolutionBackward 0.00% 381.000us 2.22% 2.108s 52.692ms 0.000us 0.00% 54.972s 1.374s 40
aten::cudnn_convolution_backward 0.00% 705.000us 2.22% 2.107s 52.683ms 0.000us 0.00% 54.972s 1.374s 40
aten::cudnn_convolution_backward_input 0.01% 5.448ms 2.21% 2.103s 52.568ms 50.136s 58.59% 50.136s 1.253s 40
void cudnn::detail::dgrad_alg1_engine<512, 6, 5, 3, ... 0.00% 0.000us 0.00% 0.000us 0.000us 44.616s 52.14% 44.616s 44.616s 1
model_inference 5.70% 5.414s 95.87% 91.039s 91.039s 464.128ms 0.54% 21.950s 21.950s 1
aten::convolution 0.00% 335.000us 0.49% 462.566ms 11.564ms 0.000us 0.00% 16.317s 407.926ms 40
aten::_convolution 0.00% 708.000us 0.49% 462.231ms 11.556ms 0.000us 0.00% 16.317s 407.926ms 40
aten::cudnn_convolution 0.01% 6.173ms 0.48% 459.872ms 11.497ms 16.310s 19.06% 16.310s 407.752ms 40
aten::conv3d 0.00% 219.000us 0.00% 4.495ms 187.292us 0.000us 0.00% 16.239s 676.625ms 24
void implicit_convolveNd_dgemm<3, 128, 6, 7, 3, 3, 5... 0.00% 0.000us 0.00% 0.000us 0.000us 15.896s 18.58% 15.896s 993.531ms 16
aten::mm 0.01% 9.218ms 2.98% 2.834s 70.847ms 13.538s 15.82% 13.538s 338.457ms 40
MmBackward 0.00% 292.000us 1.90% 1.802s 112.642ms 0.000us 0.00% 8.605s 537.802ms 16
void convolveNd_dgrad_engine<3, 256, false, true, 6,... 0.00% 0.000us 0.00% 0.000us 0.000us 5.461s 6.38% 5.461s 227.562ms 24
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_64... 0.00% 0.000us 0.00% 0.000us 0.000us 5.058s 5.91% 5.058s 316.115ms 16
aten::linear 0.00% 350.000us 4.21% 3.995s 249.676ms 0.000us 0.00% 4.945s 309.063ms 16
aten::matmul 0.00% 332.000us 1.09% 1.033s 64.563ms 0.000us 0.00% 4.933s 308.341ms 16
aten::cudnn_convolution_backward_weight 0.00% 2.785ms 0.00% 3.894ms 97.350us 4.837s 5.65% 4.837s 120.921ms 40
void convolve_wgradNd_engine<3, 128, 5, 5, 3, 3, 3, ... 0.00% 0.000us 0.00% 0.000us 0.000us 4.772s 5.58% 4.772s 198.844ms 24
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_64... 0.00% 0.000us 0.00% 0.000us 0.000us 3.547s 4.15% 3.547s 443.373ms 8
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_12... 0.00% 0.000us 0.00% 0.000us 0.000us 3.281s 3.83% 3.281s 468.684ms 7
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 94.957s
Self CUDA time total: 85.568s
Epoch: 1
0.00018144507594206833
0.0001743677954530191
0.0001600232324589636
0.00014879743250654224
0.0001384801070815437
0.0001225694448793762
0.00012302444233999212
0.0001217971922400595
1.427343 minutes passed!
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
CudnnConvolutionBackward 0.00% 326.000us 0.01% 7.464ms 186.600us 0.000us 0.00% 54.125s 1.353s 40
aten::cudnn_convolution_backward 0.00% 643.000us 0.01% 7.138ms 178.450us 0.000us 0.00% 54.125s 1.353s 40
aten::cudnn_convolution_backward_input 0.00% 2.242ms 0.00% 3.131ms 78.275us 49.441s 59.84% 49.441s 1.236s 40
void cudnn::detail::dgrad_alg1_engine<512, 6, 5, 3, ... 0.00% 0.000us 0.00% 0.000us 0.000us 45.787s 55.42% 45.787s 45.787s 1
model_inference 0.30% 251.531ms 99.98% 85.017s 85.017s 58.326ms 0.07% 19.903s 19.903s 1
aten::convolution 0.00% 254.000us 0.01% 5.951ms 148.775us 0.000us 0.00% 16.243s 406.067ms 40
aten::_convolution 0.00% 612.000us 0.01% 5.697ms 142.425us 0.000us 0.00% 16.243s 406.067ms 40
aten::cudnn_convolution 0.00% 2.923ms 0.00% 3.794ms 94.850us 16.236s 19.65% 16.236s 405.899ms 40
aten::conv3d 0.00% 178.000us 0.00% 3.000ms 125.000us 0.000us 0.00% 16.165s 673.540ms 24
void implicit_convolveNd_dgemm<3, 128, 6, 7, 3, 3, 5... 0.00% 0.000us 0.00% 0.000us 0.000us 15.819s 19.15% 15.819s 988.664ms 16
aten::mm 0.00% 1.834ms 0.00% 2.973ms 74.325us 12.097s 14.64% 12.097s 302.414ms 40
MmBackward 0.00% 269.000us 0.00% 2.053ms 128.312us 0.000us 0.00% 8.552s 534.500ms 16
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_64... 0.00% 0.000us 0.00% 0.000us 0.000us 5.003s 6.06% 5.003s 312.702ms 16
aten::cudnn_convolution_backward_weight 0.00% 2.285ms 0.00% 3.364ms 84.100us 4.685s 5.67% 4.685s 117.115ms 40
void convolve_wgradNd_engine<3, 128, 5, 5, 3, 3, 3, ... 0.00% 0.000us 0.00% 0.000us 0.000us 4.621s 5.59% 4.621s 192.548ms 24
void convolveNd_dgrad_engine<3, 256, false, true, 6,... 0.00% 0.000us 0.00% 0.000us 0.000us 3.597s 4.35% 3.597s 149.855ms 24
aten::linear 0.00% 250.000us 0.00% 3.172ms 198.250us 0.000us 0.00% 3.556s 222.244ms 16
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_64... 0.00% 0.000us 0.00% 0.000us 0.000us 3.549s 4.30% 3.549s 443.595ms 8
aten::matmul 0.00% 232.000us 0.00% 2.092ms 130.750us 0.000us 0.00% 3.545s 221.534ms 16
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_12... 0.00% 0.000us 0.00% 0.000us 0.000us 3.314s 4.01% 3.314s 473.412ms 7
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 85.034s
Self CUDA time total: 82.620s
Workstation 1, 2 GPUs working in parallel:
Use Cuda GPU!
Let's use 2 GPUs!
Epoch: 0
0.0040544034369015755
0.0008194306166093237
0.0004000130152196939
0.0003876064496647676
0.000310091377404055
0.00019129871352052222
0.00016230525331223535
0.00019453547816230253
6.192227 minutes passed!
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
CudnnConvolutionBackward 0.00% 1.817ms 0.01% 52.025ms 650.312us 0.000us 0.00% 679.478s 8.493s 80
aten::cudnn_convolution_backward 0.00% 6.669ms 0.01% 50.208ms 627.600us 0.000us 0.00% 679.478s 8.493s 80
aten::cudnn_convolution_backward_input 0.00% 12.998ms 0.00% 25.172ms 314.650us 674.194s 93.78% 674.194s 8.427s 80
void cudnn::detail::dgrad_alg1_engine<512, 6, 5, 3, ... 0.00% 0.000us 0.00% 0.000us 0.000us 621.258s 86.42% 621.258s 44.376s 14
void cudnn::detail::dgrad_alg1_engine<128, 5, 5, 3, ... 0.00% 0.000us 0.00% 0.000us 0.000us 44.641s 6.21% 44.641s 22.320s 2
void implicit_convolveNd_dgemm<3, 128, 6, 7, 3, 3, 5... 0.00% 0.000us 0.00% 0.000us 0.000us 15.373s 2.14% 15.373s 480.420ms 32
BroadcastBackward 0.00% 564.000us 9.14% 46.517s 5.815s 0.000us 0.00% 9.480s 1.185s 8
ReduceAddCoalesced 9.14% 46.501s 9.14% 46.517s 5.815s 9.479s 1.32% 9.480s 1.185s 8
ncclReduceRingLLKernel_sum_f64(ncclColl) 0.00% 0.000us 0.00% 0.000us 0.000us 9.479s 1.32% 9.479s 148.117ms 64
MmBackward 0.00% 3.753ms 0.00% 15.253ms 476.656us 0.000us 0.00% 8.403s 262.587ms 32
aten::mm 0.00% 3.920ms 0.00% 8.917ms 185.771us 8.403s 1.17% 8.403s 175.058ms 48
void convolveNd_dgrad_engine<3, 256, false, true, 6,... 0.00% 0.000us 0.00% 0.000us 0.000us 8.283s 1.15% 8.283s 172.556ms 48
aten::cudnn_convolution_backward_weight 0.00% 11.219ms 0.00% 18.367ms 229.588us 5.284s 0.74% 5.284s 66.052ms 80
void convolve_wgradNd_engine<3, 128, 5, 5, 3, 3, 3, ... 0.00% 0.000us 0.00% 0.000us 0.000us 5.218s 0.73% 5.218s 108.698ms 48
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_64... 0.00% 0.000us 0.00% 0.000us 0.000us 4.937s 0.69% 4.937s 154.270ms 32
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_64... 0.00% 0.000us 0.00% 0.000us 0.000us 3.466s 0.48% 3.466s 216.633ms 16
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_32... 0.00% 0.000us 0.00% 0.000us 0.000us 3.426s 0.48% 3.426s 214.122ms 16
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_64... 0.00% 0.000us 0.00% 0.000us 0.000us 1.420s 0.20% 1.420s 88.773ms 16
model_inference 9.82% 49.989s 72.65% 369.789s 369.789s 421.474ms 0.06% 850.378ms 850.378ms 1
Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 421.496ms 0.06% 421.496ms 3.099ms 136
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 509.012s
Self CUDA time total: 718.905s
Epoch: 1
0.00021934110920382598
0.00023363136428638785
0.00021870171749141336
0.00020551554557354
0.00017651321924490616
0.00015204309430480085
0.0001667849194120618
0.00017974811397844192
6.040423 minutes passed!
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
CudnnConvolutionBackward 0.00% 2.109ms 0.01% 50.901ms 636.263us 0.000us 0.00% 682.018s 8.525s 80
aten::cudnn_convolution_backward 0.00% 5.480ms 0.01% 48.792ms 609.900us 0.000us 0.00% 682.018s 8.525s 80
aten::cudnn_convolution_backward_input 0.00% 14.127ms 0.01% 22.035ms 275.438us 676.651s 93.99% 676.651s 8.458s 80
void cudnn::detail::dgrad_alg1_engine<512, 6, 5, 3, ... 0.00% 0.000us 0.00% 0.000us 0.000us 623.678s 86.63% 623.678s 44.548s 14
void cudnn::detail::dgrad_alg1_engine<128, 5, 5, 3, ... 0.00% 0.000us 0.00% 0.000us 0.000us 44.453s 6.17% 44.453s 22.227s 2
void implicit_convolveNd_dgemm<3, 128, 6, 7, 3, 3, 5... 0.00% 0.000us 0.00% 0.000us 0.000us 15.740s 2.19% 15.740s 491.883ms 32
void convolveNd_dgrad_engine<3, 256, false, true, 6,... 0.00% 0.000us 0.00% 0.000us 0.000us 8.507s 1.18% 8.507s 177.226ms 48
MmBackward 0.00% 1.960ms 0.00% 14.646ms 457.688us 0.000us 0.00% 8.357s 261.151ms 32
aten::mm 0.00% 4.850ms 0.00% 9.045ms 188.438us 8.357s 1.16% 8.357s 174.101ms 48
BroadcastBackward 0.00% 536.000us 0.01% 30.782ms 3.848ms 0.000us 0.00% 7.875s 984.383ms 8
ReduceAddCoalesced 0.00% 15.328ms 0.01% 30.246ms 3.781ms 7.875s 1.09% 7.875s 984.383ms 8
ncclReduceRingLLKernel_sum_f64(ncclColl) 0.00% 0.000us 0.00% 0.000us 0.000us 7.875s 1.09% 7.875s 123.047ms 64
aten::cudnn_convolution_backward_weight 0.00% 12.964ms 0.01% 21.277ms 265.962us 5.367s 0.75% 5.367s 67.090ms 80
void convolve_wgradNd_engine<3, 128, 5, 5, 3, 3, 3, ... 0.00% 0.000us 0.00% 0.000us 0.000us 5.300s 0.74% 5.300s 110.426ms 48
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_64... 0.00% 0.000us 0.00% 0.000us 0.000us 4.900s 0.68% 4.900s 153.116ms 32
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_32... 0.00% 0.000us 0.00% 0.000us 0.000us 3.533s 0.49% 3.533s 220.821ms 16
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_64... 0.00% 0.000us 0.00% 0.000us 0.000us 3.457s 0.48% 3.457s 216.070ms 16
void cutlass::Kernel<cutlass_80_tensorop_d884gemm_64... 0.00% 0.000us 0.00% 0.000us 0.000us 1.438s 0.20% 1.438s 89.852ms 16
model_inference 0.11% 398.925ms 99.96% 361.230s 361.230s 56.032ms 0.01% 494.470ms 494.470ms 1
aten::copy_ 0.00% 2.888ms 0.06% 233.377ms 3.241ms 416.612ms 0.06% 416.612ms 5.786ms 72
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 361.389s
Self CUDA time total: 719.940s
Workstation 1 is currently working, so I don’t want to stop it (change CUDNN version), does the statistics means that I need to try other CUDNN versions (It is CUDNN 8.2.1 & for CUDA 11.3 now) ? Or does it indicates other problems?
Thanks!