Hi, I am currently considering a special case of training the model, which I only use part of the data to compute the loss, for example:

```
def train(epoch):
model.train()
total_loss = total_correct = 0
for batch in train_loader:
optimizer.zero_grad()
out = model(batch.x, batch.edge_index.to(device))[:100]
y = batch.y[:100].squeeze()
loss = F.cross_entropy(out, y)
loss.backward()
optimizer.step()
total_loss += float(loss)
total_correct += int(out.argmax(dim=-1).eq(y).sum())
```

I have two questions about the backward propagation:

(1) What are the gradients for those data points that do not participate in the loss computation? For instance, in the given example, what are the gradients for data points beyond the first 100? Are they zero?

(2) In this case, which part of the data will participant in update the model? Do those data points that are not part of the loss computation (say after first 100 in the above example) contribute to the update of the model parameters?

Thanks.