Batching Data with PyTorch's DataLoader

William_Caster · July 12, 2023, 6:03am

Hi, I’m somewhat new to PyTorch so I would like to validate if I understand something related to the DataLoader correctly.

I’m using PyTorch’s DataLoader to wrap my training data. Do I understand correctly that the batch size defines the number of samples processed before the model is updated (i.e. number of samples on which a forward pass is made before doing a backward pass)?

That is, if I define the batch size to be, let’s say, 25, then I will sequentially do a forward pass with samples 0 → 1 → … → 24, then do a backward pass. I just want to confirm that the batched samples are not being fed in parallel to the network I’m training.

Thx a lot in advance.

ptrblck · July 12, 2023, 6:45am

No, the entire batch is processes in a single forward pass and the samples will not be used sequentially.
PyTorch layers accept batched input usually in the shape [batch_size, *] (with a few exceptions e.g. the default layout in RNNs).

William_Caster · July 12, 2023, 6:57am

In a custom layer I defined I have a component to the connections h_ij (being added to the weight w_ij) which is the product of the activation of unit i in layer L and the unit j in layer L+1. h_ij is initialized as 0. The value of h_ij is computed after I get the activation of unit j such that in the next forward pass the total connection “strengh” is w_ij + h_ij (h_ij is now the product of the activations of i and j in the previous forward pass). This is the dynamics I want the model to have (and this is the dynamic I should have if I’m forwarding samples 1 by 1). Do I still get this within a batch? I’m confused.

ptrblck · July 12, 2023, 8:48am

It depends on the actual implementation and which operation and shapes you are using.
Could you post the code and an example input?

William_Caster · July 12, 2023, 3:32pm

So, my data consists of state vectors representing the state of a dynamical system in a simulated environment. So one entire simulation consists of 1k of such vectors, and vector index 1 represents the simulation step coming after vector index 0. I’m loading the data like this:

for i in range(len(training_data['X'])):            # each training_data['X'][i] has 1k vectors (i.e. whole simulation time steps).
    for j in range(len(training_data['X'][i])):     # each training_data['X'][i][j] represents a time step within a simulation.

        inputs.append(training_data['X'][i][j])
        outputs.append(training_data['Y'][i][j])

inputs = torch.tensor(np.array(inputs), dtype = torch.float32)
outputs = torch.tensor(np.array(outputs), dtype = torch.float32)

outputs = outputs.unsqueeze(1)

dataset = TensorDataset(inputs, outputs)
dataloader = DataLoader(dataset, batch_size = 25, shuffle = False)

The model consists of these custom layer (followed by a simple linear layer):

class MyCustomLayer(nn.Module):
    def __init__(self, size_in, size_out):
        super().__init__()
        self.size_in, self.size_out = size_in, size_out
        
        weights = torch.Tensor(size_out, size_in)
        alphas = torch.Tensor(size_out, size_in)

        self.weights = nn.Parameter(weights)
        self.alphas = nn.Parameter(alphas)

        bias = torch.Tensor(size_out)
        self.bias = nn.Parameter(bias)

        self.h_ij = torch.zeros((size_out, size_in), requires_grad = False)

        nn.init.kaiming_uniform_(self.weights, a=math.sqrt(5))
        nn.init.kaiming_uniform_(self.alphas, a=math.sqrt(5))

        fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.weights)
        bound = 1 / math.sqrt(fan_in)
        nn.init.uniform_(self.bias, -bound, bound)

    def forward(self, x):

        torch.clamp(self.alphas, min = -1.0, max = 1.0)

        w_times_x = torch.mm(x, self.weights.t() + (torch.mul(self.alphas, self.h_ij)).t())

        yout = torch.add(w_times_x, self.bias)

        self.h_ij = 0.1 * torch.matmul(yout.t(), x)

        self.h_ij.detach_()

        return yout

The training is being performed like this:

for epoch in range(num_epochs):
    for batch_inputs, batch_outputs in dataloader:

        optimizer.zero_grad()

        predictions = model(batch_inputs)
        loss = loss_function(predictions, batch_outputs)

        loss.backward()
        optimizer.step()

So the dynamics I whish for is that when sample training_data[‘X’][0][1] is forwarded the self.h_ij within my custom layer holds the value that was computed for it when sample training_data[‘X’][0][0] was forwarded.

ptrblck · July 12, 2023, 7:14pm

In this case it seems you have a direct sequential dependency and would either have to use single samples or could check if self.h_ij could use the batch size (similar to the states of RNNs) too.

William_Caster · July 12, 2023, 7:50pm

Thx a lot. I validated that if I use batch size of 1 then I get the proper sequential dependency between samples in the self.h_ij values, so that’s one solution (tho the simulation becomes quite slow). I’m new to PyTorch, how would I make self.h_ij use the batch size?

I would also have a question regarding calling self.h_ij.detach_(). I’m doing it because otherwise pytorch throws an error about trying to backpropagate through a graph for a second time, but if I understand correctly that means that when doing the backward pass the self.h_ij won’t be there when the chain rule is applied (they’ll be treated as a constant maybe?). Is there a way I could do this without detaching this tensor? I know that if I simply rebuild the tensor like in the code bellow I don’t get the error but I’m not sure if this is breaking the computational graph:

# calling model[layer].rebuild_h() instead of .detach_()

    def rebuild_h(self):

        _aux = self.h_ij

        self.h_ij = torch.zeros((size_out, size_in), requires_grad = False)

        self.h_ij = torch.tensor(_aux, requires_grad = False, dtype = torch.float32)