I mean that the network has different parameters for each element of the batch. For example, if I set the batch size at 3 (such as in my example) then the network will output an output for each element over the batch dimension (second dimension, in Pytorch the dimensions of input and output are (Sequence, Batch, Feature)).
The training set:
training_set =\
[[[1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1], [0, 0, 1, 0, 0, 0]],
[[0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 1, 0]],
[[0, 0, 1, 0, 0, 0], [0, 0, 0, 1, 0, 0], [1, 0, 0, 0, 0, 0]],
[[0, 0, 0, 1, 0, 0], [0, 0, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0]],
[[0, 0, 0, 0, 1, 0], [0, 1, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0]],
[[0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1]]]
The training step:
for epoch in range(100):
# Step 1. Remember that Pytorch accumulates gradients.
# We need to clear them out before each instance
model.zero_grad()
# Starting each batch, we detach the hidden state from how it was
# previously produced. If we didn't, the model would try backpropagating
# all the way to start of the dataset.
model.h_t, model.c_t = repackage_hidden((model.h_t, model.c_t))
# Step 2. Get our inputs ready for the network, that is, turn them into
# Variables of word indices.
batch_input, batch_targets = prepare_sequences(training_set, labels,
batch_size)
# Step 3. Run our forward pass.
# Predicted target vertices
batch_outputs = model(batch_input)
# Step 4. Compute the loss, gradients, and update the parameters by
# calling optimizer.step()
loss = loss_function(batch_outputs, batch_targets)
loss.backward(retain_graph=True)
optimizer.step()
The loss function:
def loss_function(preds: autograd.Variable,
_batch_targets: autograd.Variable):
nllloss = nn.NLLLoss()
loss_seq = nllloss(preds.contiguous().view(-1, 6), _batch_targets.view(-1))
return loss_seq
I compute the loss function over all the elements of the batch.
Finally I train the LSTM for a batch size of 3 that is to say on the whole dataset. The input is then of dimensions (5, 3, 6). When the training is over, want to use the LSTM for just one vector (and not for a 3-elements’ batch at a time). That is why I wrote the following function:
def one_input(model, seq: autograd.Variable, batch_size):
if isinstance(_path, autograd.Variable):
if len(seq.size()) == 2:
seq = seq.view(len(seq), 1, -1)
sizes = _path.size()
if sizes[1] == 1:
seq = seq.expand(sizes[0], batch_size, sizes[2])
else:
raise TypeError("seq must be an autograd.Variable")
return _model(seq)
And here is the output after passing a batch containing the first sequence ducplicated 3 times to have a batch of size 3:
Variable containing:
(0 ,.,.) =
-1.8790 -1.7101 -1.8548 -1.7101 -1.7329 -1.8819
-1.8773 -1.7029 -1.8542 -1.7156 -1.7324 -1.8867
-1.8914 -1.7042 -1.8518 -1.7058 -1.7318 -1.8860
(1 ,.,.) =
-1.8776 -1.6937 -1.8465 -1.7505 -1.7217 -1.8775
-1.8767 -1.6895 -1.8472 -1.7532 -1.7217 -1.8797
-1.8821 -1.6903 -1.8464 -1.7500 -1.7195 -1.8803
(2 ,.,.) =
-1.8620 -1.7102 -1.8386 -1.7112 -1.7629 -1.8800
-1.8614 -1.7081 -1.8395 -1.7123 -1.7631 -1.8807
-1.8638 -1.7086 -1.8385 -1.7114 -1.7622 -1.8810
(3 ,.,.) =
-1.8820 -1.7209 -1.8325 -1.7252 -1.7310 -1.8736
-1.8816 -1.7199 -1.8329 -1.7256 -1.7314 -1.8740
-1.8827 -1.7201 -1.8327 -1.7256 -1.7302 -1.8742
(4 ,.,.) =
-1.8532 -1.7232 -1.8492 -1.7212 -1.7408 -1.8761
-1.8529 -1.7229 -1.8494 -1.7213 -1.7410 -1.8762
-1.8536 -1.7226 -1.8494 -1.7216 -1.7404 -1.8763
[torch.FloatTensor of size 5x3x6]
Despite the sequences of the batch are identical, over the batch dimension the outputs are different