D refers to the “depth” of the volume you are using, since nn.Conv3d expect volumes as inputs instead of 2D planes/images. The filter kernel of the Conv3d layer will also have an additional depth dimension and the convolution will be applied on all 3 dimensions (i.e. the filter is moving in all 3 dims).
No, the depth can be any value as the other height and width dimensions.
Think about the depth as e.g. the stack of medical images (e.g. CT scans).
The height and width defines the “image” dimensions of each CT slice while the depth is defining the number of slices. In this case the volume would contain “'voxels” instead of pixels, since you are now using a volume to represent the scan.