A question on how batches are passed through a PyTorch network

So when a batch of data is passed through a network, is each sample in the batch passed through its own instance of the network in parallel? My assumption is that it is. How does PyTorch go about this? Is it a multithreaded process? How does it work on a GPU?

The concept is called https://en.wikipedia.org/wiki/Data_parallelism. Batch dimension of tensors is assumed to be independent, meaning that operations that transform other dimensions can be executed in parallel . And threading to do this is “implementation detail”, i.e. it is backend specific, and also operation specific (e.g. vector/matrix operations treat some dimensions as special)