I can’t seem to find a clear answer to what this type of network/training is called, to see what else has been done in the field. It looks like RNN, but the recursive loop is the size of the network itself.

The idea is to perform iterative classification, instead of just one pass. Going off the assumption that having a prior probability for what you expect at a given input, can be used to help classify (i.e. attention) it. The benefit might be a “shorter” network.

I realize memory usage is one pitfall to doing this. But I’m just trying to see what similar theoretical and practical work has been done.

```
# assuming equal prior probability (probably not a good assumption)
initial_predictions = torch.ones([n_classes, 1])/n_classes
# first pass, assuming final network operation is a softmax
model_output = model(static_inputs, initial_predictions)
# could be a fixed number of passes, or could use some measure of the output
# to determine when the predictions have leveled off.
for i in range(4):
# for 2nd+ pass, use the previous model_output
model_output = model(static_inputs, model_output)
loss = criterion(model_output, targets, weights)
loss.backward()
```