Hello, I have a question about ‘with torch.no_grad()’,

torch.no_grad() disables gradient calculation which is useful for inference

Then, are the following two codes equivalent? Is it true that in both code the model doesn’t learn the test data? Does it matters if the location of ‘with torch.no_grad()’ changes in the following case?

(1)

```
def dcn(x): # detach, cpu, numpy
if type(x)== np.ndarray: return x
else: return x.detach().cpu().numpy()
def predict(dataloader, network):
Y_new, Y_hat, Y_hat_pb = np.array([]), np.array([]), np.array([[],[],[],[],[]]).reshape(0,5)
for iteration, batch in enumerate(zip(dataloader)):
x, y = batch[0]
x, y = x.to(device), y.flatten().to(device)
with torch.no_grad():
x = network.FE(x)
x_att, _ = network.sce(x)
h = network.bilstm(x_att)
x = x.flatten(start_dim=2)
h = network.dropout(network.project_f(x) + h)
l_2 = network.cls(h)
l_2 = l_2.flatten(end_dim=1)
y_hat = dcn(l_2.detach().argmax(-1))
y_hat_pb = dcn(F.softmax(l_2, dim=-1))
Y_new = np.concatenate([Y_new, dcn(y)])
Y_hat = np.concatenate([Y_hat, y_hat])
Y_hat_pb = np.concatenate([Y_hat_pb, y_hat_pb])
return Y_hat_pb, Y_hat, Y_new
for epoch in range(10):
network.train()
loss = train(trainloader, network)
network.eval()
Yts_hat_pb, Yts_hat, Yts_new = predict(testloader, network)
```

(2)

```
def dcn(x): # detach, cpu, numpy
if type(x)== np.ndarray: return x
else: return x.detach().cpu().numpy()
def predict2(dataloader, network):
Y_new, Y_hat, Y_hat_pb = np.array([]), np.array([]), np.array([[],[],[],[],[]]).reshape(0,5)
with torch.no_grad():
for iteration, batch in enumerate(zip(dataloader)):
x, y = batch[0]
x, y = x.to(device), y.flatten().to(device)
x = network.FE(x)
x_att, _ = network.sce(x)
h = network.bilstm(x_att)
x = x.flatten(start_dim=2)
h = network.dropout(network.project_f(x) + h)
l_2 = network.cls(h)
l_2 = l_2.flatten(end_dim=1)
y_hat = dcn(l_2.detach().argmax(-1))
y_hat_pb = dcn(F.softmax(l_2, dim=-1))
Y_new = np.concatenate([Y_new, dcn(y)])
Y_hat = np.concatenate([Y_hat, y_hat])
Y_hat_pb = np.concatenate([Y_hat_pb, y_hat_pb])
return Y_hat_pb, Y_hat, Y_new
for epoch in range(10):
network.train()
loss = train(trainloader, network)
network.eval()
Yts_hat_pb, Yts_hat, Yts_new = predict2(testloader, network)
```

I am using (1) for test data (for inference/evaluation), is it right to use code (1)? Is using code (1) the same as using code (2)?

the simple version of the above is like this

(a)

```
network.eval()
for iteration, batch in enumerate(zip(dataloader)):
x, y = batch[0]
with torch.no_grad():
y_hat = network(x)
```

(b)

```
network.eval()
with torch.no_grad():
for iteration, batch in enumerate(zip(dataloader)):
x, y = batch[0]
y_hat = network(x)
```