How does Conv2D down sampling work behind the scenes?

From my current understanding, the following operation:

input = torch.rand(size=(3, 5, 10, 10))
conv = nn.Conv2D(input_channels=5, output_channels=1, kernel_size=(3,3))
output = conv(input)

produces a tensor of size (3, 1, 8, 8). To get from 5 channels to 1 channel, PyTorch uses a kernel of the shape (5, 3, 3). Is this understanding correct so far?

If so, what if we apply the following operation:

input = torch.rand(size=(3, 5, 10, 10))
conv = nn.Conv2D(input_channels=5, output_channels=3, kernel_size=(3,3))
output = conv(input)

Do we use simply 3 different kernels of the size (5, 3, 3)?

Yes, each output channel is created by a separate kernel in the default setup and the actual .weight shape is defined as [out_channels, in_channels, height, width].