Hello,

I have a problem with training a Neural Net for a Deep Reinforcement Learning task.

My net looks as followed:

```
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.emb1= nn.Embedding(41,3).to(device)
self.emb2 = nn.Embedding(4,3).to(device)
self.fc1 = nn.Linear(140, 150).to(device)
self.fc2 = nn.Linear(150, 150).to(device)
self.fc3 = nn.Linear(150,num_action).to(device)
self.enc1 = nn.Linear(num_last_actions*3,10).to(device)
self.enc2 = nn.Linear(num_edge_elements,10).to(device)
self.dropout = nn.Dropout(0.25).to(device)
def forward(self, x):
x1 = x[:,:40]
x2 = x[:,40:79]
x3 = x[:,79:]
x11 = self.putInEmbeddingForm_SNR(x1)
x22 = self.putInEmbeddingForm_action(x2)
x1 = self.emb1(x11).view((-1, num_SNRs*3))
x2 = self.emb2(x22).view((-1, num_last_actions*3))
x2 = self.enc1(x2)
x3 = self.enc2(x3)
x = torch.cat((x1,x2,x3),dim = 1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
action_value = self.fc3(x)
return action_value
```

So what I am is, i take an input tensor with length of 118. The tensor consists out of integer values (anyways they are given to the net as float values). The first 40 values (x1) are fed to an embedding emb1 to get a better representation. The next 39 values (x2) go to a second embedding emb1. Here the functions that are used to bring the content it the proper form to input it in an embedding layer.

```
def putInEmbeddingForm_SNR(self, x):
with torch.no_grad():
x = torch.add(x,20).long()
return x
def putInEmbeddingForm_action(self,x):
with torch.no_grad():
x = torch.add(x,0).long()
return x
```

Then x2 (after going through embedding layer) and x3 are compressed using each a linear layer.

I hope the rest should be clear from the code.

Okay then the update step:

```
def update(self, i_ep):
self.memory = self.memory2.copy()
state = torch.tensor([t.state for t in self.memory]).float().to(device)
action = torch.LongTensor([t.action for t in self.memory]).view(-1,1).long().to(device)
reward = torch.tensor([t.reward for t in self.memory]).float().to(device)
next_state = torch.tensor([t.next_state for t in self.memory]).float().to(device)
for index in BatchSampler(SubsetRandomSampler(range(len(self.memory))), batch_size=self.batch_size, drop_last=False):
target_v = (reward + self.gamma * self.target_net(next_state).gather(1, torch.argmax(self.act_net(next_state), dim = 1).unsqueeze(-1)).squeeze())[index].unsqueeze(1)
v = (self.act_net(state).gather(1, action))[index]
loss = self.loss_func(target_v, v)
print(loss)
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
self.update_count +=1
if self.update_count % 100 == 0:
self.target_net.load_state_dict(self.act_net.state_dict())
```

I hope this is quite clear, from self.memory2 we take the relevant data state, action, reward, next_state.

then we perform SGD. At a random point of time i get an error due to the loss.backward() function. It says **CUDA error: an illegal memory access was encountered**

I assumde that it could have to do with BatchSampler(SubsetRandomSampler(range(len(self.memory)), becuase this is a random operation (like my random error). But still I couldnâ€™t find out why it stuck.

My training loop looks as followed:

```
for i_ep in range(num_episodes):
for n in range(len(seed_vec_train)):
reward_train[n, i_ep] = agent.train_episode(max_steps_per_episode, i_ep, state_list_allSeeds_train[n][i_ep],init_pos_list_allSeeds_train[n][i_ep], v_x_allSeeds_train[n, i_ep], v_y_allSeeds_train[n, i_ep])
agent.update(i_ep)
```

Here seed_vec_train is a vector of length 15.

The number of training episodes num_episodes is equal to 4000.

So it is seed that for each episode we go through all elements of seed_vec_train and then perform an update step of the net. train_episode performs 300 steps each step a transition is saved which is a tuple that includes the respective state, next_state, reward, action.

So then 300*15 are saved in self.memory2 before it is updated.

State and next_state are vectors each of length 118, it consists out of natural values from -20 to 20.

Action is a number between 0 and 3, reward is also a natural value.

I hope I could make my problem clear.

I am not really experienced with pytorch and learned it more or less by doing. So if you have any tips to further improve my net etc. I would also be happy.

If you need more of my code to understand my problem more, I can post it.

Thanks a lot

seller_basti