Batching Data with PyTorch's DataLoader

Hi, I’m somewhat new to PyTorch so I would like to validate if I understand something related to the DataLoader correctly.

I’m using PyTorch’s DataLoader to wrap my training data. Do I understand correctly that the batch size defines the number of samples processed before the model is updated (i.e. number of samples on which a forward pass is made before doing a backward pass)?

That is, if I define the batch size to be, let’s say, 25, then I will sequentially do a forward pass with samples 0 → 1 → … → 24, then do a backward pass. I just want to confirm that the batched samples are not being fed in parallel to the network I’m training.

Thx a lot in advance.

No, the entire batch is processes in a single forward pass and the samples will not be used sequentially.
PyTorch layers accept batched input usually in the shape [batch_size, *] (with a few exceptions e.g. the default layout in RNNs).

In a custom layer I defined I have a component to the connections h_ij (being added to the weight w_ij) which is the product of the activation of unit i in layer L and the unit j in layer L+1. h_ij is initialized as 0. The value of h_ij is computed after I get the activation of unit j such that in the next forward pass the total connection “strengh” is w_ij + h_ij (h_ij is now the product of the activations of i and j in the previous forward pass). This is the dynamics I want the model to have (and this is the dynamic I should have if I’m forwarding samples 1 by 1). Do I still get this within a batch? I’m confused.

It depends on the actual implementation and which operation and shapes you are using.
Could you post the code and an example input?

So, my data consists of state vectors representing the state of a dynamical system in a simulated environment. So one entire simulation consists of 1k of such vectors, and vector index 1 represents the simulation step coming after vector index 0. I’m loading the data like this:

for i in range(len(training_data['X'])):            # each training_data['X'][i] has 1k vectors (i.e. whole simulation time steps).
    for j in range(len(training_data['X'][i])):     # each training_data['X'][i][j] represents a time step within a simulation.

        inputs.append(training_data['X'][i][j])
        outputs.append(training_data['Y'][i][j])

inputs = torch.tensor(np.array(inputs), dtype = torch.float32)
outputs = torch.tensor(np.array(outputs), dtype = torch.float32)

outputs = outputs.unsqueeze(1)

dataset = TensorDataset(inputs, outputs)
dataloader = DataLoader(dataset, batch_size = 25, shuffle = False)

The model consists of these custom layer (followed by a simple linear layer):

class MyCustomLayer(nn.Module):
    def __init__(self, size_in, size_out):
        super().__init__()
        self.size_in, self.size_out = size_in, size_out
        
        weights = torch.Tensor(size_out, size_in)
        alphas = torch.Tensor(size_out, size_in)

        self.weights = nn.Parameter(weights)
        self.alphas = nn.Parameter(alphas)

        bias = torch.Tensor(size_out)
        self.bias = nn.Parameter(bias)

        self.h_ij = torch.zeros((size_out, size_in), requires_grad = False)

        nn.init.kaiming_uniform_(self.weights, a=math.sqrt(5))
        nn.init.kaiming_uniform_(self.alphas, a=math.sqrt(5))

        fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.weights)
        bound = 1 / math.sqrt(fan_in)
        nn.init.uniform_(self.bias, -bound, bound)

    def forward(self, x):

        torch.clamp(self.alphas, min = -1.0, max = 1.0)

        w_times_x = torch.mm(x, self.weights.t() + (torch.mul(self.alphas, self.h_ij)).t())

        yout = torch.add(w_times_x, self.bias)

        self.h_ij = 0.1 * torch.matmul(yout.t(), x)

        self.h_ij.detach_()

        return yout

The training is being performed like this:

for epoch in range(num_epochs):
    for batch_inputs, batch_outputs in dataloader:

        optimizer.zero_grad()

        predictions = model(batch_inputs)
        loss = loss_function(predictions, batch_outputs)

        loss.backward()
        optimizer.step()

So the dynamics I whish for is that when sample training_data[‘X’][0][1] is forwarded the self.h_ij within my custom layer holds the value that was computed for it when sample training_data[‘X’][0][0] was forwarded.

In this case it seems you have a direct sequential dependency and would either have to use single samples or could check if self.h_ij could use the batch size (similar to the states of RNNs) too.

Thx a lot. I validated that if I use batch size of 1 then I get the proper sequential dependency between samples in the self.h_ij values, so that’s one solution (tho the simulation becomes quite slow). I’m new to PyTorch, how would I make self.h_ij use the batch size?

I would also have a question regarding calling self.h_ij.detach_(). I’m doing it because otherwise pytorch throws an error about trying to backpropagate through a graph for a second time, but if I understand correctly that means that when doing the backward pass the self.h_ij won’t be there when the chain rule is applied (they’ll be treated as a constant maybe?). Is there a way I could do this without detaching this tensor? I know that if I simply rebuild the tensor like in the code bellow I don’t get the error but I’m not sure if this is breaking the computational graph:

# calling model[layer].rebuild_h() instead of .detach_()

    def rebuild_h(self):

        _aux = self.h_ij

        self.h_ij = torch.zeros((size_out, size_in), requires_grad = False)

        self.h_ij = torch.tensor(_aux, requires_grad = False, dtype = torch.float32)