Hello, I am currently implementing an automated machine learning algorithm with weight inheritance. I implemented my NNs as networkx graphs. Since I mutate the graphs from one generation to another, I add new layers and hence need to pass those to the cuda device again. I do this in the a for loop over the nodes in the graph in the the forward pass. In each iteration of the loop, I first pass the nodes nn.module to the cuda device then forward pass the data through it. My issue is that I get the above mentioned error message at some point. Is there anyway to avoid this issue ?`
def forward(self, inputs):
# Evaluate the graph in topological ordering
topological_order = nx.algorithms.dag.topological_sort(self)
self.nodes[self.get_input_nodes()]['output'] = inputs
for node in topological_order:
# try:
node_info = self.nodes[node]
node_info['op'].to(device)
preds = list(self.predecessors(node))
if len(preds) > 0:
cell_input = [self.nodes[pred]['output'] for pred in preds]
if node_info['type'] == 'merge':
node_info['output'] = node_info['op'](cell_input)
else:
node_info['output'] = node_info['op'](cell_input[0])
node_info['params']['output_dim'] = node_info['output'].size()
return [self.nodes[node]['output'] for node in self.get_output_nodes()][0]`
I don’t get it at all but since you are adding new layers I imagine you have to pass them to the optimizer at some point right?
Soo the leaf tensor is the original nn.Module. Once you allocate it in cuda, the output of that allocation is a non-leaf tensor whose backward is something like “copy to the cpu leaf node”
Soooo I would tell you to pass the cpu module to the optimizer or to convert the one in cuda into a leaf tensor.
Anyway it would be nice if you paste a code to reproduce it.
First create some networks by hand and train them (works perfectly fine)
I create a deep copy of the networks, that I want to mutate by adding layers
then I create an evaluator, which trains the mutated child networks (here I get the error with the leaf nodes)
This is the code of the evaluator:
class Evaluator:
def __init__(self, graph: NodeOpGraph, train_loader, *args, **kwargs):
self.graph = graph
self.train_loader = train_loader
self.optimizer = torch.optim.Adam(self.graph.parameters())
self.criterion = torch.nn.BCELoss(reduction='none')
def train(self, n_samples_per_epoch, epochs=1, log_interval=10, verbose=True):
self.graph.train()
print('Device is {}'.format(device))
batch_size = next(self.train_loader)[0].shape[0]
n_steps_per_epoch = int(np.ceil(n_samples_per_epoch / batch_size))
print('BATCH SIZE:', batch_size)
print('N_STEPS:', n_steps_per_epoch)
for epoch in range(epochs):
if not verbose:
old_stdout = sys.stdout
sys.stdout = open(os.devnull, 'w')
print('EPOCH #', epoch)
total_loss = 0.
total_epoch_loss = 0.
total_size = 0
for step, (inputs, labels, sample_weights) in enumerate(self.train_loader):
s = inputs.shape
inputs = np.reshape(inputs, (s[0], s[3], s[1], s[2]))
self.optimizer.zero_grad()
inputs = torch.Tensor(inputs)
labels = torch.Tensor(labels)
sample_weights = torch.Tensor(sample_weights).to(device)
preds = self.graph(inputs.to(device))
preds = torch.reshape(preds, (preds.shape[0],))
loss = self.criterion(preds, labels.to(device))
loss = loss * sample_weights
loss = loss.mean()
loss.backward()
self.optimizer.step()
# Todo float(loss.item()) otherwise maybe memory issues
total_loss += loss.item()
total_epoch_loss += loss.item()
total_size += labels.size(0)
# if step % log_interval == log_interval-1:
print('Step {} Avg loss: {}'.format(str(step), str(total_loss / total_size)))
total_loss = 0.
if step >= n_steps_per_epoch:
break
if not verbose:
sys.stdout = old_stdout
print('*' * 25, '\nEpoch {} Avg loss: {}\n'.format(str(epoch), str(total_epoch_loss / total_size)), '*' * 25)
def eval(self):
pass
The problem is that part of the network is not on the GPU model by that point because of the adding a layer. Therefor I pass each node of the network graph first to the gpu. This works fine for the forward passes, but for the optimizer step it fails.
Is it because I passed the graph to gpu in the forward pass ?
UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.