Hi, I am new to Pytorch. I have written a custom basic network. The above training framework is taking 5 min of time, for one update_model function call / one iteration in the CPU, whereas taking approximately 7 min when using GPU. And is there a better way of accumulating loss values, as I want the loss from each layer and take the average of loss for an entire batch. So, is there a way to do batch training for a custom written forward function ? And while training the model, the RAM is piling up, and I assuming it is because of the accumulating the loss from each batch. Am I assuming right ?

The below code is for one layer of the network

```
class DetNet(nn.Module):
def __init__(self):
super(DetNet, self).__init__()
self.W1 = torch.nn.Parameter(torch.randn(8*K,5*K, requires_grad =True, dtype=torch.float64))
self.b1 = torch.nn.Parameter(torch.randn(8*K,1, requires_grad = True, dtype=torch.float64))
self.W2 = torch.nn.Parameter(torch.randn(K, 8*K, requires_grad = True, dtype=torch.float64))
self.b2 = torch.nn.Parameter(torch.randn(K, 1, requires_grad = True, dtype=torch.float64))
self.W3 = torch.nn.Parameter(torch.randn(2*K, 8*K, requires_grad = True, dtype=torch.float64))
self.b3 = torch.nn.Parameter(torch.randn(2*K, 1, requires_grad = True, dtype=torch.float64))
self.t = torch.nn.Parameter(torch.randn(1,1, requires_grad = True, dtype=torch.float64))
def forward(self, x, v, y, H):
M1 = torch.matmul(torch.transpose(H,0,1), y)
M2 = torch.matmul(torch.transpose(H,0,1), torch.matmul(H,x))
con = torch.cat((M1, x, M2, v))
z = F.relu(torch.matmul(self.W1, con) + self.b1)
y = torch.matmul(self.W2, z)+self.b2
one_K = torch.ones([K,1])
x_k = (F.relu(y+(one_K*self.t))/abs(self.t) - F.relu(y-(one_K*self.t))/abs(self.t)-one_K)
v_k = torch.matmul(self.W3, z) + self.b3
return (x_k, v_k)
```

The below code is for update of network for one epoch of the model.

```
def update_model(optimizer):
# Number of samples for each iteration
H = varying_channel()
loss = torch.tensor(0).double()
for samples in range(3500):
# The below four lines are for generation of a random data sample
x_main = torch.DoubleTensor([[(2*round(np.random.rand())-1)] for cnt in range(K)])
v = torch.zeros([2*K,1], dtype=torch.float64)
y = received_signal(x_main, H)
x_tilde = ZF_decoder(H, y)
x = x_main
# Passing through each layer for accumulation of loss.
for cnt in range(3*K):
foo = DetLayers[cnt]
(x_loc,v_loc)= foo.forward(x, v, y, H)
(x, v) = (x_loc, v_loc)
curr_loss = (log(cnt+1)* (torch.sum((x_main-x)**2))/ torch.sum((x_main-x_tilde)**2))
loss = loss + curr_loss
loss = loss/3500
optimizer.zero_grad()
loss.backward()
optimizer.step()
return(loss.item())
```

The below code is for creating a specified number of layers and running the model for specified iterations.

```
DetLayers = []
ParamLayers = list()
for cnt in range(3*K):
curr_layer = DetNet()
DetLayers.append(curr_layer)
ParamLayers.extend(list(curr_layer.parameters()))
# creating the optimizer
optimizer = optim.Adam(ParamLayers, lr = 0.01)
# Number of iterations
for iterations in range(1):
val = update_model(optimizer)
print(val)
```

Thanks in advance.