Question about ImageNet with multi T4

I try to run the single node with multi gpu (T4) on AWS g4dn.12xlarge of
examples/imagenet. (Options are taken by README.md with --epochs 1)

python main.py -a resnet50 --dist-url 'tcp://127.0.0.1:6007' --epochs 1 --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 ~/imagenette-320

I get following output log (with NCCL_DEBUG=INFO)
I suspect 4 GPU works do the same operation.(not data parallel)
Is my understanding incorrect?
If this should goes to issue tracker, I will write it on examples issue.

P.S.
I try to upload log file separately but it is not available.
So I paste the log.

===
ip-172-31-35-240:19495:19495 [0] NCCL INFO Bootstrap : Using [0]ens5:172.31.35.240<0>

ip-172-31-35-240:19495:19495 [0] ofi_init:700 NCCL WARN NET/OFI Only EFA provider is supported
ip-172-31-35-240:19495:19495 [0] NCCL INFO NET/IB : No device found.
ip-172-31-35-240:19495:19495 [0] NCCL INFO NET/Socket : Using [0]ens5:172.31.35.240<0>
NCCL version 2.4.8+cuda10.1
ip-172-31-35-240:19497:19497 [2] NCCL INFO Bootstrap : Using [0]ens5:172.31.35.240<0>
ip-172-31-35-240:19496:19496 [1] NCCL INFO Bootstrap : Using [0]ens5:172.31.35.240<0>
ip-172-31-35-240:19498:19498 [3] NCCL INFO Bootstrap : Using [0]ens5:172.31.35.240<0>

ip-172-31-35-240:19498:19498 [3] ofi_init:700 NCCL WARN NET/OFI Only EFA provider is supported

ip-172-31-35-240:19497:19497 [2] ofi_init:700 NCCL WARN NET/OFI Only EFA provider is supported

ip-172-31-35-240:19496:19496 [1] ofi_init:700 NCCL WARN NET/OFI Only EFA provider is supported
ip-172-31-35-240:19497:19497 [2] NCCL INFO NET/IB : No device found.
ip-172-31-35-240:19496:19496 [1] NCCL INFO NET/IB : No device found.
ip-172-31-35-240:19498:19498 [3] NCCL INFO NET/IB : No device found.
ip-172-31-35-240:19497:19497 [2] NCCL INFO NET/Socket : Using [0]ens5:172.31.35.240<0>
ip-172-31-35-240:19496:19496 [1] NCCL INFO NET/Socket : Using [0]ens5:172.31.35.240<0>
ip-172-31-35-240:19498:19498 [3] NCCL INFO NET/Socket : Using [0]ens5:172.31.35.240<0>
ip-172-31-35-240:19495:19521 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff
ip-172-31-35-240:19497:19522 [2] NCCL INFO Setting affinity for GPU 2 to ffff,ffffffff
ip-172-31-35-240:19496:19523 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffffff
ip-172-31-35-240:19498:19524 [3] NCCL INFO Setting affinity for GPU 3 to ffff,ffffffff
ip-172-31-35-240:19495:19521 [0] NCCL INFO Channel 00 : 0 1 2 3
ip-172-31-35-240:19498:19524 [3] NCCL INFO Ring 00 : 3[3] -> 0[0] via direct shared memory
ip-172-31-35-240:19496:19523 [1] NCCL INFO Ring 00 : 1[1] -> 2[2] via direct shared memory
ip-172-31-35-240:19497:19522 [2] NCCL INFO Ring 00 : 2[2] -> 3[3] via direct shared memory
ip-172-31-35-240:19495:19521 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via direct shared memory
ip-172-31-35-240:19495:19521 [0] NCCL INFO Using 256 threads, Min Comp Cap 7, Trees disabled
ip-172-31-35-240:19496:19523 [1] NCCL INFO comm 0x7efc6c001b60 rank 1 nranks 4 cudaDev 1 nvmlDev 1 - Init COMPLETE
ip-172-31-35-240:19498:19524 [3] NCCL INFO comm 0x7f536c001b60 rank 3 nranks 4 cudaDev 3 nvmlDev 3 - Init COMPLETE
ip-172-31-35-240:19497:19522 [2] NCCL INFO comm 0x7fda80001b60 rank 2 nranks 4 cudaDev 2 nvmlDev 2 - Init COMPLETE
ip-172-31-35-240:19495:19521 [0] NCCL INFO comm 0x7fce48001b60 rank 0 nranks 4 cudaDev 0 nvmlDev 0 - Init COMPLETE
ip-172-31-35-240:19495:19495 [0] NCCL INFO Launch mode Parallel
Use GPU: 1 for training
=> creating model ‘resnet50’
Use GPU: 2 for training
=> creating model ‘resnet50’
Use GPU: 0 for training
=> creating model ‘resnet50’
Use GPU: 3 for training
=> creating model ‘resnet50’
Epoch: [0][ 0/51] Time 4.175 ( 4.175) Data 0.817 ( 0.817) Loss 7.1020e+00 (7.1020e+00) Acc@1 0.00 ( 0.00) Acc@5 0.00 ( 0.00)
Epoch: [0][10/51] Time 0.568 ( 0.884) Data 0.000 ( 0.075) Loss 6.2389e+00 (2.7609e+01) Acc@1 12.50 ( 10.37) Acc@5 51.56 ( 50.28)
Epoch: [0][20/51] Time 0.573 ( 0.735) Data 0.000 ( 0.047) Loss 4.8649e+00 (1.9667e+01) Acc@1 12.50 ( 10.04) Acc@5 64.06 ( 49.93)
Epoch: [0][30/51] Time 0.576 ( 0.682) Data 0.000 ( 0.037) Loss 2.6044e+00 (1.4510e+01) Acc@1 9.38 ( 10.08) Acc@5 51.56 ( 50.30)
Epoch: [0][40/51] Time 0.578 ( 0.656) Data 0.000 ( 0.032) Loss 2.3226e+00 (1.1761e+01) Acc@1 7.81 ( 10.18) Acc@5 57.81 ( 49.85)
Epoch: [0][50/51] Time 1.896 ( 0.666) Data 0.000 ( 0.029) Loss 2.3349e+00 (1.0023e+01) Acc@1 8.33 ( 10.14) Acc@5 50.00 ( 50.00)
Epoch: [0][ 0/51] Time 4.166 ( 4.166) Data 0.808 ( 0.808) Loss 7.1076e+00 (7.1076e+00) Acc@1 0.00 ( 0.00) Acc@5 0.00 ( 0.00)
Epoch: [0][10/51] Time 0.569 ( 0.886) Data 0.000 ( 0.076) Loss 9.1219e+00 (1.8244e+01) Acc@1 9.38 ( 9.09) Acc@5 48.44 ( 46.02)
Epoch: [0][20/51] Time 0.573 ( 0.736) Data 0.000 ( 0.047) Loss 4.3179e+00 (1.2560e+01) Acc@1 12.50 ( 9.52) Acc@5 45.31 ( 47.77)
Epoch: [0][30/51] Time 0.575 ( 0.683) Data 0.000 ( 0.037) Loss 2.7694e+00 (9.7450e+00) Acc@1 17.19 ( 10.13) Acc@5 59.38 ( 49.04)
Epoch: [0][40/51] Time 0.581 ( 0.657) Data 0.000 ( 0.032) Loss 2.2670e+00 (7.9916e+00) Acc@1 23.44 ( 11.09) Acc@5 62.50 ( 49.92)
Epoch: [0][50/51] Time 1.879 ( 0.666) Data 0.000 ( 0.028) Loss 2.2895e+00 (6.9589e+00) Acc@1 12.50 ( 10.83) Acc@5 41.67 ( 49.44)
Epoch: [0][ 0/51] Time 4.183 ( 4.183) Data 0.815 ( 0.815) Loss 7.1518e+00 (7.1518e+00) Acc@1 0.00 ( 0.00) Acc@5 0.00 ( 0.00)
Epoch: [0][10/51] Time 0.571 ( 0.885) Data 0.000 ( 0.075) Loss 5.8699e+00 (2.7128e+01) Acc@1 10.94 ( 9.38) Acc@5 51.56 ( 48.15)
Epoch: [0][20/51] Time 0.571 ( 0.735) Data 0.000 ( 0.047) Loss 4.1080e+00 (1.7798e+01) Acc@1 6.25 ( 9.30) Acc@5 45.31 ( 50.22)
Epoch: [0][30/51] Time 0.574 ( 0.682) Data 0.000 ( 0.037) Loss 3.8085e+00 (1.3301e+01) Acc@1 7.81 ( 9.68) Acc@5 46.88 ( 50.15)
Epoch: [0][40/51] Time 0.582 ( 0.656) Data 0.000 ( 0.032) Loss 2.4562e+00 (1.0682e+01) Acc@1 10.94 ( 10.25) Acc@5 51.56 ( 50.61)
Epoch: [0][50/51] Time 1.902 ( 0.666) Data 0.000 ( 0.029) Loss 2.3628e+00 (9.2144e+00) Acc@1 0.00 ( 9.99) Acc@5 45.83 ( 50.43)
Epoch: [0][ 0/51] Time 4.189 ( 4.189) Data 0.823 ( 0.823) Loss 7.0857e+00 (7.0857e+00) Acc@1 0.00 ( 0.00) Acc@5 0.00 ( 0.00)
Epoch: [0][10/51] Time 0.568 ( 0.885) Data 0.000 ( 0.075) Loss 5.9996e+00 (1.6526e+01) Acc@1 10.94 ( 9.66) Acc@5 51.56 ( 45.03)
Epoch: [0][20/51] Time 0.570 ( 0.735) Data 0.000 ( 0.047) Loss 4.8403e+00 (1.1769e+01) Acc@1 12.50 ( 10.49) Acc@5 50.00 ( 47.54)
Epoch: [0][30/51] Time 0.569 ( 0.682) Data 0.000 ( 0.037) Loss 2.8636e+00 (9.2034e+00) Acc@1 7.81 ( 10.13) Acc@5 59.38 ( 49.90)
Epoch: [0][40/51] Time 0.572 ( 0.656) Data 0.000 ( 0.032) Loss 2.3001e+00 (7.5971e+00) Acc@1 6.25 ( 10.75) Acc@5 51.56 ( 49.54)
Epoch: [0][50/51] Time 1.898 ( 0.666) Data 0.000 ( 0.029) Loss 6.2036e+00 (6.7050e+00) Acc@1 4.17 ( 10.61) Acc@5 66.67 ( 49.81)
Test: [0/8] Time 0.997 ( 0.997) Loss 2.5230e+00 (2.5230e+00) Acc@1 9.38 ( 9.38) Acc@5 100.00 (100.00)

  • Acc@1 12.400 Acc@5 54.200
    Test: [0/8] Time 0.997 ( 0.997) Loss 2.5230e+00 (2.5230e+00) Acc@1 9.38 ( 9.38) Acc@5 100.00 (100.00)
  • Acc@1 12.400 Acc@5 54.200
    Test: [0/8] Time 0.992 ( 0.992) Loss 2.5230e+00 (2.5230e+00) Acc@1 9.38 ( 9.38) Acc@5 100.00 (100.00)
  • Acc@1 12.400 Acc@5 54.200
    Test: [0/8] Time 0.988 ( 0.988) Loss 2.5230e+00 (2.5230e+00) Acc@1 9.38 ( 9.38) Acc@5 100.00 (100.00)
  • Acc@1 12.400 Acc@5 54.200
    ===