Pytorch cannot access all 4 GPUs

I am new to running model training on a pytorch on a Azure VM (NC16as_T4_v3). According to the documentation, this instance has 16GB for each GPU (x 4 = 64 GB). I am using pytorch (version: ‘2.3.1+cu121’). So I believe the installed torch is the correct version to use GPUs. However, when i am running:
torch.cuda.device_count(), the result shows 1, although it should show 4. How do I access all 4 GPUs and enable distributed training ?
Any help would be greatly appreciated.

This table seems to indicate that NC16as_T4_v3 only has one GPU:
NCas T4 v3-series - Azure Virtual Machines | Microsoft Learn
while Standard_NC64as_T4_v3 would have 4 GPUs.

oopss!!! How did I miss that? Thanks for pointing that out.