So, my data consists of state vectors representing the state of a dynamical system in a simulated environment. So one entire simulation consists of 1k of such vectors, and vector index 1 represents the simulation step coming after vector index 0. I’m loading the data like this:

```
for i in range(len(training_data['X'])): # each training_data['X'][i] has 1k vectors (i.e. whole simulation time steps).
for j in range(len(training_data['X'][i])): # each training_data['X'][i][j] represents a time step within a simulation.
inputs.append(training_data['X'][i][j])
outputs.append(training_data['Y'][i][j])
inputs = torch.tensor(np.array(inputs), dtype = torch.float32)
outputs = torch.tensor(np.array(outputs), dtype = torch.float32)
outputs = outputs.unsqueeze(1)
dataset = TensorDataset(inputs, outputs)
dataloader = DataLoader(dataset, batch_size = 25, shuffle = False)
```

The model consists of these custom layer (followed by a simple linear layer):

```
class MyCustomLayer(nn.Module):
def __init__(self, size_in, size_out):
super().__init__()
self.size_in, self.size_out = size_in, size_out
weights = torch.Tensor(size_out, size_in)
alphas = torch.Tensor(size_out, size_in)
self.weights = nn.Parameter(weights)
self.alphas = nn.Parameter(alphas)
bias = torch.Tensor(size_out)
self.bias = nn.Parameter(bias)
self.h_ij = torch.zeros((size_out, size_in), requires_grad = False)
nn.init.kaiming_uniform_(self.weights, a=math.sqrt(5))
nn.init.kaiming_uniform_(self.alphas, a=math.sqrt(5))
fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.weights)
bound = 1 / math.sqrt(fan_in)
nn.init.uniform_(self.bias, -bound, bound)
def forward(self, x):
torch.clamp(self.alphas, min = -1.0, max = 1.0)
w_times_x = torch.mm(x, self.weights.t() + (torch.mul(self.alphas, self.h_ij)).t())
yout = torch.add(w_times_x, self.bias)
self.h_ij = 0.1 * torch.matmul(yout.t(), x)
self.h_ij.detach_()
return yout
```

The training is being performed like this:

```
for epoch in range(num_epochs):
for batch_inputs, batch_outputs in dataloader:
optimizer.zero_grad()
predictions = model(batch_inputs)
loss = loss_function(predictions, batch_outputs)
loss.backward()
optimizer.step()
```

So the dynamics I whish for is that when sample **training_data[‘X’][0][1]** is forwarded the **self.h_ij** within my custom layer holds the value that was computed for it when sample **training_data[‘X’][0][0]** was forwarded.