Element 0 of tensors does not require grad and does not have a grad_fn How can i fix this

train

device = get_device()
print(device)
def train(train_set):
    epochs = 1000

    learning_rate = 0.01
    optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate)
    loss_f = nn.MSELoss()
    epoch = 0
    while epoch < epochs:
        for input,label in train_set:
            optimizer.zero_grad()
            input = input.to(device)
            label = label.to(device)
            print(input.shape)
            print(label.shape)
            output = model(input)
            print(output.shape)
            loss = loss_f(output,label)

            loss.backward()
            optimizer.step()

        print("epoch:{},loss:{}".format(epoch,loss))
        epoch+=1

output

cuda
torch.Size([270, 93])
torch.Size([270])
torch.Size([270])

How can i fix this

1 Like

This error is raised if the model output or loss has been detached from the computation graph e.g. via:

  • using another library such as numpy
  • using non-differentiable operations such as torch.argmax
  • explicitly detaching the tensor via tensor = tensor.detach()
  • rewrapping the tensor via x = torch.tensor(x)

or if the gradient calculation was disabled in the current context or globally such that no computation graph was created at all.

To debug this issue, check the .grad_fn attribute of the loss, model output, and then the intermediate activations created in the forward method of your model and make sure they are returning a valid function name. If None is returned it means that this tensor is not attached to any computation graph.

13 Likes
x = torch.tensor([[1., -1.], [1., 1.]], requires_grad=True)
out = x.pow(2).sum()
out.backward()
x.grad

this is a example on torch.Tensor — PyTorch 1.10.1 documentation

the output is tensor([[ 2.0000, -2.0000], [ 2.0000, 2.0000]])

but my pytorch also have the same error
element 0 of tensors does not require grad and does not have a grad_fn

1 Like

I 've found out the problem,thanks!!!

If the posted code snippet is still raising the issue I guess you’ve disabled autograph globally or what was the issue?

H Lynn,
I’m trying to run the same piece of code but I’m getting the “element 0 …” error.
how did you solve this issue?

Hello @ptrblck ,

Is there a way at all to keep the loss attached to the computation graph even when the tensor is rewrapped?

For instance, if I want to do the following:

loss = criterion(model(data), target)
n = torch.tensor(target.size(0))
t = torch.tensor([loss, n])

if use_cuda:
   t = t.cuda(non_blocking=True)

dist.all_reduce(t, op=dist.ReduceOp.SUM)
loss = t[0]
loss = loss / t[1]

No, that’s not possible and t will be detached from loss.
It seems you would like to concatenate loss with n so you might want to use torch.cat instead.

1 Like

Hi All I have similar error, Could some one help me out


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[60], line 22
     20 images=images.to(device)
     21 targets=targets.to(device)
---> 22 batch_loss = train_batch(images, targets, model, optimizer, loss_fn)
     23 is_correct = accuracy(images, targets, model)
     24 train_epoch_accuracies.extend(is_correct)

Cell In[55], line 5, in train_batch(images, labels, model, opt, loss_fn)
      3 #     print(f"type of output - {type(output)}")
      4     batch_loss = loss_fn(output, labels)
----> 5     batch_loss.backward()
      6     optimizer.step()
      7     optimizer.zero_grad()

File /opt/anaconda3/lib/python3.9/site-packages/torch/_tensor.py:396, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    387 if has_torch_function_unary(self):
    388     return handle_torch_function(
    389         Tensor.backward,
    390         (self,),
   (...)
    394         create_graph=create_graph,
    395         inputs=inputs)
--> 396 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)

File /opt/anaconda3/lib/python3.9/site-packages/torch/autograd/__init__.py:173, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    168     retain_graph = create_graph
    170 # The reason we repeat same the comment below is that
    171 # some Python versions print out the first line of a multi-line function
    172 # calls in the traceback and some print out the last line
--> 173 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    174     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    175     allow_unreachable=True, accumulate_grad=True)

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

You are most likely detaching a tensor creating the loss from the computation graph, which causes the issue. Could you post a minimal and executable code snippet to reproduce the issue so that we could debug it, please?

1 Like

Hi @ptrblck

Thank you for your responce. I have changed values in some of the layes, That fixed the issue.

Hi, I am facing the similar issues where I am trying to run my image processing

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[63], line 5
      2 for epoch in range(num_epochs):  # loop over the dataset multiple times
      3     print("------------------ Training Epoch {} ------------------".format(epoch+1))
----> 5     train_one_epoch(model, optimizer,'train' , device)
      7     val_model(model, 'val')
      9 print('Finished Training')

Cell In[62], line 42, in train_one_epoch(model, optimizer, data_loader, device)
     38 print("Labels shape:", labels.shape)
     41 loss = criterion(labels, torch.argmax(outputs, dim=1))
---> 42 loss.backward()
     43 optimizer.step()
     45 _, predicted = torch.max(outputs.data, 1)

File ~\anaconda3\envs\withGPU\lib\site-packages\torch\_tensor.py:492, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    482 if has_torch_function_unary(self):
    483     return handle_torch_function(
    484         Tensor.backward,
    485         (self,),
   (...)
    490         inputs=inputs,
    491     )
--> 492 torch.autograd.backward(
    493     self, gradient, retain_graph, create_graph, inputs=inputs
    494 )

File ~\anaconda3\envs\withGPU\lib\site-packages\torch\autograd\__init__.py:251, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    246     retain_graph = create_graph
    248 # The reason we repeat the same comment below is that
    249 # some Python versions print out the first line of a multi-line function
    250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    252     tensors,
    253     grad_tensors_,
    254     retain_graph,
    255     create_graph,
    256     inputs,
    257     allow_unreachable=True,
    258     accumulate_grad=True,
    259 )

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

any advice on how should I solve this issues?

Check if and where the compuation graph was detached or if you are running your forward pass in a no_grad() context. If you are stuck, feel free to post a minimal and executable code snippet reproducing the issue.

Same comment from above applies here too.

Hello, I also have the same issue. I don’t know anymore how to fix it.

Here is the code:

from functools import lru_cache
from pathlib import Path


from easse.sari import corpus_sari
from torch.nn import functional as F
from source.helper import log_stdout, tokenize, yield_sentence_pair, yield_lines, load_preprocessor, read_lines, \
    count_line
import argparse
import os
import logging
import random
import nltk
from source.resources import NEWSELA_DATASET, get_data_filepath, WIKILARGE_DATASET, TURKCORPUS_DATASET, \
    WIKILARGE_WIKIAUTO_DATASET

nltk.download('punkt')

import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
import pytorch_lightning as pl
from pytorch_lightning.trainer import seed_everything
from transformers import (
    AdamW,
    T5ForConditionalGeneration,
    T5TokenizerFast,
    get_linear_schedule_with_warmup, AutoConfig, AutoModel
)

torch.set_grad_enabled(True)
print("START_____________________________")

class T5FineTuner(pl.LightningModule):
    def __init__(self, model_name, learning_rate, adam_epsilon, custom_loss, weight_decay, dataset,
                 train_batch_size, valid_batch_size, train_sample_size, valid_sample_size, max_seq_length,
                 n_gpu, gradient_accumulation_steps, num_train_epochs, warmup_steps, nb_sanity_val_steps,
                 *args, **kwargs):
        super(T5FineTuner, self).__init__()
        self.save_hyperparameters()
        self.model = T5ForConditionalGeneration.from_pretrained(self.hparams.model_name)
        self.tokenizer = T5TokenizerFast.from_pretrained(self.hparams.model_name)
        self.model = self.model.to(self.device)
        self.preprocessor = load_preprocessor()


    def is_logger(self):
        return self.trainer.global_rank <= 0

    def forward(
        self, input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, labels=None
    ):
        outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            decoder_input_ids=decoder_input_ids,
            decoder_attention_mask=decoder_attention_mask,
            labels=labels
        )
        return outputs

    def generate(self, sentence):
        sentence = self.preprocessor.encode_sentence(sentence)
        text = "simplify: " + sentence

        encoding = self.tokenizer(
            text,
            truncation=True,
            max_length=self.hparams.max_seq_length,
            padding='max_length',
            return_tensors="pt"
        )

        input_ids = encoding["input_ids"].to(self.device)
        attention_masks = encoding["attention_mask"].to(self.device)

        beam_outputs = self.model.generate(
            input_ids=input_ids,
            attention_mask=attention_masks,
            do_sample=False,
            max_length=self.hparams.max_seq_length,
            num_beams=8,
            early_stopping=True,
            num_return_sequences=1
        )
        pred_sent = self.tokenizer.decode(beam_outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
        return pred_sent

    def training_step(self, batch, batch_idx):
        labels = batch["target_ids"]
        # Huggingface’s loss functions are defined to exclude the ID -100 during loss calculations. Therefore, we need to convert all padding token IDs in labels to -100.
        labels[labels[:, :] == self.tokenizer.pad_token_id] = -100

        self.opt.zero_grad()
        outputs = self(
            input_ids=batch["source_ids"],
            attention_mask=batch["source_mask"],
            labels=labels,
            decoder_attention_mask=batch['target_mask'],
        )

        if self.hparams.custom_loss:
            print("______________EnteredIf!______________")
            loss = outputs.loss
            complexity_score = torch.tensor(random.randint(0, 100) * 0.01, requires_grad=True, device=self.device)
            complexity_score.requires_grad = True
            # complexity_score = self._custom_step(outputs['logits'])
            # loss = loss * complexity_score
            lambda_ = 0.7
            # loss = lambda_ * loss + (1-lambda_)*complexity_score
            # loss = torch.sqrt(loss + lambda_ * complexity_score)
            print("Before custom loss calculation - loss shape:", loss.shape, "complexity_score shape:", complexity_score.shape)
            loss = loss + complexity_score + lambda_ * (complexity_score - loss)
            print("After custom loss calculation - loss shape:", loss.shape)
            
            print(complexity_score)
            self.log('train_loss', loss, on_step=True, prog_bar=True, logger=True)
            # print(loss)
            loss.requires_grad = True
            return loss
        else:
            print("______________Entered Else!______________")
            loss = outputs.loss
            self.log('train_loss', loss, on_step=True, prog_bar=True, logger=True)
            loss.requires_grad = True
            return loss

        # loss = outputs.loss
        # logs = {"train_loss": loss}
        # self.logger.experiment.add_scalars('loss', logs, global_step=self.global_step)
        # return {"loss": loss, "log": logs}

    def validation_step(self, batch, batch_idx):
        loss = self.sari_validation_step(batch)
        # loss = self._step(batch)
        print("Val_loss", loss)
        logs = {"val_loss": loss}
        # self.logger.experiment.add_scalars('loss', logs, global_step=self.global_step)
        # return {"val_loss": torch.tensor(loss)}
        self.log('val_loss', loss, batch_size=self.hparams.valid_batch_size)
        t = torch.tensor(loss, dtype=float, requires_grad=True)
        print(t)
        return t

    def sari_validation_step(self, batch):
        def generate(sentence):
            sentence = self.preprocessor.encode_sentence(sentence)
            text = "simplify: " + sentence
            # print("Simplifying: ", text)

            encoding = self.tokenizer(
                text,
                truncation=True,
                max_length=self.hparams.max_seq_length,
                padding='max_length',
                return_tensors="pt"
            )

            input_ids = encoding["input_ids"].to(self.device)
            attention_masks = encoding["attention_mask"].to(self.device)

            beam_outputs = self.model.generate(
                input_ids=input_ids,
                attention_mask=attention_masks,
                do_sample=False,
                max_length=self.hparams.max_seq_length,
                num_beams=8,
                early_stopping=True,
                num_return_sequences=1
            ).to(self.device)
            # final_outputs = []
            # for beam_output in beam_outputs:
            sent = self.tokenizer.decode(beam_outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
            # if sent.lower() != sentence.lower() and sent not in final_outputs:
                # final_outputs.append(sent)
            
            return sent
            # return final_outputs[0]

        pred_sents = []
        for source in batch["source"]:
            pred_sent = generate(source)
            pred_sents.append(pred_sent)

        score = corpus_sari(batch["source"], pred_sents, batch["targets"])
        print("Sari score: ", score)

        return 1 - score / 100

    def configure_optimizers(self):
        "Prepare optimizer and schedule (linear warmup and decay)"

        model = self.model
        no_decay = ["bias", "LayerNorm.weight"]
        optimizer_grouped_parameters = [
            {
                "params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
                "weight_decay": self.hparams.weight_decay,
            },
            {
                "params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)],
                "weight_decay": 0.0,
            },
        ]
        optimizer = AdamW(optimizer_grouped_parameters, lr=self.hparams.learning_rate, eps=self.hparams.adam_epsilon)
        # optimizer = SAM(optimizer_grouped_parameters, base_optimizer, lr=self.hparams.learning_rate, momentum=0.9)
        self.opt = optimizer
        return [optimizer]

    def optimizer_step(self, epoch=None, batch_idx=None, optimizer=None, optimizer_idx=None, optimizer_closure=None,
                       on_tpu=None, using_native_amp=None, using_lbfgs=None):
        optimizer.step(closure=optimizer_closure)

        optimizer.zero_grad()
        self.lr_scheduler.step()


    def train_dataloader(self):
        train_dataset = TrainDataset(dataset=self.hparams.dataset,
                                     tokenizer=self.tokenizer,
                                     max_len=self.hparams.max_seq_length,
                                     sample_size=self.hparams.train_sample_size)

        dataloader = DataLoader(train_dataset,
                                batch_size=self.hparams.train_batch_size,
                                drop_last=True,
                                shuffle=True,
                                pin_memory=True,
                                num_workers=4)
        t_total = ((len(dataloader.dataset) // (self.hparams.train_batch_size * max(1, self.hparams.n_gpu)))
                   // self.hparams.gradient_accumulation_steps
                   * float(self.hparams.num_train_epochs)
                   )
        scheduler = get_linear_schedule_with_warmup(
            self.opt, num_warmup_steps=self.hparams.warmup_steps, num_training_steps=t_total
        )
        self.lr_scheduler = scheduler
        return dataloader

    def val_dataloader(self):
        val_dataset = ValDataset(dataset=self.hparams.dataset,
                                 tokenizer=self.tokenizer,
                                 max_len=self.hparams.max_seq_length,
                                 sample_size=self.hparams.valid_sample_size)
        return DataLoader(val_dataset,
                          batch_size=self.hparams.valid_batch_size,
                          num_workers=2)


logger = logging.getLogger(__name__)


class LoggingCallback(pl.Callback):
    def on_validation_end(self, trainer, pl_module):
        logger.info("***** Validation results *****")
        if pl_module.is_logger():
            metrics = trainer.callback_metrics
            # Log results
            for key in sorted(metrics):
                if key not in ["log", "progress_bar"]:
                    logger.info("{} = {}\n".format(key, str(metrics[key])))

    def on_test_end(self, trainer, pl_module):
        logger.info("***** Test results *****")

        if pl_module.is_logger():
            metrics = trainer.callback_metrics

            # Log and save results to file
            output_test_results_file = os.path.join(pl_module.hparams.output_dir, "test_results.txt")
            with open(output_test_results_file, "w") as writer:
                for key in sorted(metrics):
                    if key not in ["log", "progress_bar"]:
                        logger.info("{} = {}\n".format(key, str(metrics[key])))
                        writer.write("{} = {}\n".format(key, str(metrics[key])))


class TrainDataset(Dataset):
    def __init__(self, dataset, tokenizer, max_len=256, sample_size=1):
        self.sample_size = sample_size
        # print("init TrainDataset ...")
        preprocessor = load_preprocessor()
        self.source_filepath = preprocessor.get_preprocessed_filepath(dataset, 'train', 'complex')
        self.target_filepath = preprocessor.get_preprocessed_filepath(dataset, 'train', 'simple')

        self.max_len = max_len
        self.tokenizer = tokenizer

        self._load_data()

    def _load_data(self):
        self.inputs = read_lines(self.source_filepath)
        self.targets = read_lines(self.target_filepath)

    def __len__(self):
        return int(len(self.inputs) * self.sample_size)

    def __getitem__(self, index):
        source = "simplify: " + self.inputs[index]
        target = self.targets[index]

        tokenized_inputs = self.tokenizer(
            [source],
            truncation=True,
            max_length=self.max_len,
            padding='max_length',
            return_tensors="pt"
        )
        tokenized_targets = self.tokenizer(
            [target],
            truncation=True,
            max_length=self.max_len,
            padding='max_length',
            return_tensors="pt"
        )
        source_ids = tokenized_inputs["input_ids"].squeeze()
        target_ids = tokenized_targets["input_ids"].squeeze()

        src_mask = tokenized_inputs["attention_mask"].squeeze()  # might need to squeeze
        target_mask = tokenized_targets["attention_mask"].squeeze()  # might need to squeeze

        return {"source_ids": source_ids, "source_mask": src_mask, "target_ids": target_ids, "target_mask": target_mask,
                'sources': self.inputs[index], 'targets': [self.targets[index]]}


class ValDataset(Dataset):
    def __init__(self, dataset, tokenizer, max_len=256, sample_size=1):
        self.sample_size = sample_size
        self.source_filepath = get_data_filepath(dataset, 'valid', 'complex')
        if dataset == NEWSELA_DATASET:
            self.target_filepaths = [get_data_filepath(dataset, 'valid', 'simple')]

        else:  # TURKCORPUS_DATASET as default
            self.target_filepaths = [get_data_filepath(TURKCORPUS_DATASET, 'valid', 'simple.turk', i) for i in range(8)]

        self.max_len = max_len
        self.tokenizer = tokenizer

        self._build()

    def __len__(self):
        return int(len(self.inputs) * self.sample_size)

    def __getitem__(self, index):
        return {"source": self.inputs[index], "targets": self.targets[index]}

    def _build(self):
        self.inputs = []
        self.targets = []

        for source in yield_lines(self.source_filepath):
            self.inputs.append(source)

        self.targets = [[] for _ in range(count_line(self.target_filepaths[0]))]
        for filepath in self.target_filepaths:
            for idx, line in enumerate(yield_lines(filepath)):
                self.targets[idx].append(line)


def train(train_args):
    args = argparse.Namespace(**train_args)
    seed_everything(args.seed, workers=True)

    print(train_args)
    checkpoint_callback = pl.callbacks.ModelCheckpoint(
        dirpath=args.output_dir,
        filename="checkpoint-{epoch}",
        monitor="val_loss",
        verbose=True,
        mode="min",
        save_top_k=5
    )

    train_params = dict(
        accumulate_grad_batches=args.gradient_accumulation_steps,
        #gpus=args.n_gpu,
        max_epochs=args.num_train_epochs,
        # early_stop_callback=False,
        precision=16 if args.fp_16 else 32,
        amp_level=args.opt_level,
        amp_backend='apex',
        # gradient_clip_val=args.max_grad_norm,
        # checkpoint_callback=checkpoint_callback,
        callbacks=[LoggingCallback(), checkpoint_callback],
        # logger=TensorBoardLogger(f'{args.output_dir}/logs'),
        num_sanity_val_steps=args.nb_sanity_val_steps,  # skip sanity check to save time for debugging purpose
        # plugins='ddp_sharded',
        # progress_bar_refresh_rate=1,

    )

    print("Initialize model")
    model = T5FineTuner(**train_args)

    trainer = pl.Trainer(**train_params, accelerator="auto")
    print(" Training model")
    trainer.fit(model)

    print("training finished")

    # print("Saving model")
    # model.model.save_pretrained(args.output_dir)

    # print("Saved model")

Here is the error message:

Traceback (most recent call last):
  File "/content/drive/My Drive/TS_T5-main/scripts/train.py", line 44, in <module>
    run_training(args_dict, dataset)
  File "/content/drive/My Drive/TS_T5-main/source/train.py", line 33, in run_training
    train(args_dict)
  File "/content/drive/My Drive/TS_T5-main/source/model.py", line 398, in train
    trainer.fit(model)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 737, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1168, in _run
    results = self._run_stage()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1254, in _run_stage
    return self._run_train()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1285, in _run_train
    self.fit_loop.run()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/fit_loop.py", line 270, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 248, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 358, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1552, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/content/drive/My Drive/TS_T5-main/source/model.py", line 213, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 216, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 153, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 75, in wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 385, in wrapper
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/optimization.py", line 457, in step
    loss = closure()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 138, in _wrap_closure
    closure_result = closure()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 141, in closure
    self._backward_fn(step_output.closure_loss)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 304, in backward_fn
    self.trainer._call_strategy_hook("backward", loss, optimizer, opt_idx)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1706, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 191, in backward
    self.precision_plugin.backward(self.lightning_module, closure_loss, optimizer, optimizer_idx, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 80, in backward
    model.backward(closure_loss, optimizer, optimizer_idx, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/module.py", line 1418, in backward
    loss.backward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Epoch 0:   0%|          | 0/16832 [00:19<?, ?it/s]

Any help will be much appreciated.

I am transcribing the makemore 5 code from Andrej K’s YouTube channel:

==============================================
The code works up until:
x = Xb
for layer in layers:
x = layer(x)

loss = F.cross_entropy(x, Yb) # loss function

# Backwards pass
for p in parameters:
    p.grad = None
    
loss.backward()

The call to loss backward() crashes and I get this following error at the bottom of the stack:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

If I debut and print loss.grad_fn it says its None.

What is breaking loss.backward() "

Could you post a minimal and executable code snippet reproducing the issue?

{
“cells”: [
{
“cell_type”: “code”,
“execution_count”: 2,
“id”: “72f5cb69-a965-4788-a35c-96256875b717”,
“metadata”: {},
“outputs”: ,
“source”: [
“import torch\n”,
“import torch.nn.functional as F\n”,
“import matplotlib.pyplot as plt\n”,
“%matplotlib inline”
]
},
{
“cell_type”: “code”,
“execution_count”: 3,
“id”: “9677ebd3-c4e2-4c9f-ba48-ba87fe3bd61c”,
“metadata”: {},
“outputs”: ,
“source”: [
“words = open(‘names.txt’,‘r’).read().splitlines()”
]
},
{
“cell_type”: “code”,
“execution_count”: 4,
“id”: “436c67c6-9d58-465a-a411-924efcc0826c”,
“metadata”: {},
“outputs”: [
{
“name”: “stdout”,
“output_type”: “stream”,
“text”: [
“{1: ‘a’, 2: ‘b’, 3: ‘c’, 4: ‘d’, 5: ‘e’, 6: ‘f’, 7: ‘g’, 8: ‘h’, 9: ‘i’, 10: ‘j’, 11: ‘k’, 12: ‘l’, 13: ‘m’, 14: ‘n’, 15: ‘o’, 16: ‘p’, 17: ‘q’, 18: ‘r’, 19: ‘s’, 20: ‘t’, 21: ‘u’, 22: ‘v’, 23: ‘w’, 24: ‘x’, 25: ‘y’, 26: ‘z’, 0: ‘.’}\n”,
“27\n”
]
}
],
“source”: [
#Build the vocabulary of characters and mappings to/from integers\n”,
“chars = sorted ( list( set(‘’.join(words)) ))\n”,
“stoi = { s: i+1 for i,s in enumerate(chars)}\n”,
“stoi[‘.’] = 0\n”,
“itos = { i:s for s,i in stoi.items()}\n”,
“vocab_size = len(itos)\n”,
“print(itos)\n”,
“print(vocab_size)”
]
},
{
“cell_type”: “code”,
“execution_count”: 5,
“id”: “8c6d698c-a05d-471b-b8b6-b13fe989a15a”,
“metadata”: {},
“outputs”: ,
“source”: [
“import random\n”,
“random.seed(42)\n”,
“random.shuffle(words)”
]
},
{
“cell_type”: “code”,
“execution_count”: 6,
“id”: “cd8d18a6-bbe6-43a2-ac0e-f8cf4945a805”,
“metadata”: {},
“outputs”: [
{
“name”: “stdout”,
“output_type”: “stream”,
“text”: [
“torch.Size([182625, 3]) torch.Size([182625])\n”,
“torch.Size([22655, 3]) torch.Size([22655])\n”,
“torch.Size([22866, 3]) torch.Size([22866])\n”
]
}
],
“source”: [
“block_size = 3\n”,
“def build_dataset(words):\n”,
" X,Y = ,\n",
" for w in words:\n",
" #print(w)\n",
" context = [0]block_size\n",
" for ch in w + ‘.’:\n",
" ix = stoi[ch]\n",
" X.append(context)\n",
" Y.append(ix)\n",
" context = context[1:] + [ix]\n",
" X = torch.tensor(X)\n",
" Y = torch.tensor(Y)\n",
" print(X.shape, Y.shape)\n",
" return X,Y\n",
“\n”,
#import random\n”,
"n1 = int(0.8
len(words))\n",
“n2 = int(0.9len(words))\n",
“\n”,
“Xtr, Ytr = build_dataset(words[:n1])\n”,
“Xdev, Ydev = build_dataset(words[n1:n2])\n”,
“Xte, Yte = build_dataset(words[n2:])”
]
},
{
“cell_type”: “code”,
“execution_count”: 7,
“id”: “e235f156-c8d4-4318-be69-3b7c356e6293”,
“metadata”: {},
“outputs”: [
{
“name”: “stdout”,
“output_type”: “stream”,
“text”: [
“… → y\n”,
“…y → u\n”,
“.yu → h\n”,
“yuh → e\n”,
“uhe → n\n”,
“hen → g\n”,
“eng → .\n”,
“… → d\n”,
“…d → i\n”,
“.di → o\n”,
“dio → n\n”,
“ion → d\n”,
“ond → r\n”,
“ndr → e\n”,
“dre → .\n”,
“… → x\n”,
“…x → a\n”,
“.xa → v\n”,
“xav → i\n”,
“avi → e\n”
]
}
],
“source”: [
“for x,y in zip(Xtr[:20], Ytr[:20]):\n”,
" print(‘’.join(itos[ix.item()] for ix in x),‘–>’,itos[y.item()] )"
]
},
{
“cell_type”: “code”,
“execution_count”: 8,
“id”: “7bbfe1f0-efb9-4630-926b-4ac2902d33b3”,
“metadata”: {},
“outputs”: [],
“source”: [
“class Linear:\n”,
" def init(self, fan_in,fan_out,bias=True):\n",
" self.weight = torch.randn(fan_in,fan_out)/fan_in**0.5 #note: kaisming init\n",
" self.bias = torch.zeros(fan_out) if bias else None\n",
" \n",
" def call(self,x):\n",
" self.out = x @ self.weight\n",
" if self.bias is not None:\n",
" self.out += self.bias\n",
" return self.out\n",
" \n",
" def parameters(self):\n",
" return [self.weight] + ([] if self.bias is None else [self.bias])\n",
" "
]
},
{
“cell_type”: “code”,
“execution_count”: 9,
“id”: “ae9f314b-db74-4844-b5f4-06fe8869d472”,
“metadata”: {},
“outputs”: [],
“source”: [
“class BatchNorm1d:\n”,
“\n”,
" def init( self, dim, eps=1e-5,momentum=0.1):\n",
" self.eps = eps\n",
" self.momentum = momentum\n",
" self.training = True\n",
" # Parameters (trained with backprop)\n",
" self.gamma = torch.ones(dim)\n",
" self.beta = torch.zeros(dim)\n",
" #buffers (trained with a running momentum update)\n",
" self.running_mean = torch.zeros(dim)\n",
" self.running_var = torch.ones(dim)\n",
“\n”,
" def call(self,x):\n",
" #calculate forward pass\n",
" if self.training:\n",
" xmean = x.mean(0,keepdim=True) #Batch mean\n",
" xvar = x.var(0,keepdim=True) #Batch Variance\n",
" else:\n",
" xmean = self.running_mean\n",
" xvar = self.running_var\n",
" xhat = (x-xmean)/torch.sqrt(xvar+self.eps) # normalize the variance \n",
" self.out = self.gamma * xhat + self.beta\n",
" #update the buffers\n",
" if self.training:\n",
" with torch.no_grad():\n",
" self.running_mean=(1-self.momentum)
self.running_mean + self.momentum * xmean\n”,
" self.running_var=(1-self.momentum)* self.running_var + self.momentum * xvar\n",
" return self.out\n",
" \n",
" def parameters(self):\n",
" return [self.gamma, self.beta]\n",
“\n”,
#--------------------------------------------------------------------\n”,
“\n”,
“class Tanh:\n”,
" \n",
" def call(self,x):\n",
" self.out = torch.tanh(x)\n",
" return self.out\n",
" \n",
" def parameters(self):\n",
" return \n",
“\n”,
“# -----------------------------------------------------------------------------------------------\n”,
“\n”,
“class Embedding:\n”,
" \n",
" def init(self, num_embeddings, embedding_dim):\n",
" self.weight = torch.randn((num_embeddings, embedding_dim))\n",
" \n",
" def call(self, IX):\n",
" self.out = self.weight[IX]\n",
" return self.out\n",
" \n",
" def parameters(self):\n",
" return [self.weight]\n",
“\n”,
“# -----------------------------------------------------------------------------------------------\n”,
“class Flatten:\n”,
" \n",
" #def init(self, n):\n",
" # self.n = n\n",
" \n",
" def call(self, x):\n",
" #B, T, C = x.shape\n",
" #x = x.view(B, T//self.n, Cself.n)\n",
" #if x.shape[1] == 1:\n",
" # x = x.squeeze(1)\n",
" #self.out = x\n",
" self.out = x.view(x.shape[0],-1)\n",
" return self.out\n",
" \n",
" def parameters(self):\n",
" return []\n",
“\n”,
" \n",
" "
]
},
{
“cell_type”: “code”,
“execution_count”: 10,
“id”: “ed7f978c-45e5-4013-bf24-780d9effef9f”,
“metadata”: {},
“outputs”: [
{
“data”: {
“text/plain”: [
“<torch._C.Generator at 0x1ae1d7aa0f0>”
]
},
“execution_count”: 10,
“metadata”: {},
“output_type”: “execute_result”
}
],
“source”: [
“\n”,
“torch.manual_seed(42) # seed rng for reproducibility”
]
},
{
“cell_type”: “code”,
“execution_count”: 11,
“id”: “23c5f2ff-1a75-4852-9062-853f377ac13b”,
“metadata”: {},
“outputs”: [
{
“name”: “stdout”,
“output_type”: “stream”,
“text”: [
“12097\n”
]
}
],
“source”: [
“n_embd = 10 # the dimemnsionaliy of the charcrter embeddmg vector\n”,
“n_hidden = 200 # the number of nuerons in the hidden layer\n”,
C++ = torch.randn((vocab_size,n_embd))\n”,
“layers = [\n”,
" Embedding(vocab_size,n_embd),\n",
" Flatten(),\n",
" Linear(n_embd
block_size,n_hidden, bias=False), \n",
" BatchNorm1d(n_hidden), \n",
" Tanh(),\n",
" Linear(n_hidden,vocab_size),\n",
“]\n”,
“# Parameter init\n”,
“with torch.no_grad():\n”,
" layers[-1].weight *= 0.1 # last layer make less confident.\n",
“\n”,
#parameters = [C] + [p for layer in layers for p in layer.parameters()]\n”,
“parameters = [p for layer in layers for p in layer.parameters()]\n”,
“\n”,
“\n”,
“print( sum(p.nelement() for p in parameters) ) # Number of parameters in total\n”,
“\n”,
“for p in parameters:\n”,
" p.requiers_grad = True\n",
" "
]
},
{
“cell_type”: “code”,
“execution_count”: 12,
“id”: “0d28cc3d-73e0-4912-a8bf-3e0fbe5e90ae”,
“metadata”: {},
“outputs”: [
{
“ename”: “RuntimeError”,
“evalue”: “element 0 of tensors does not require grad and does not have a grad_fn”,
“output_type”: “error”,
“traceback”: [
“\u001b[1;31m---------------------------------------------------------------------------\u001b[0m”,
“\u001b[1;31mRuntimeError\u001b[0m Traceback (most recent call last)”,
“Cell \u001b[1;32mIn[12], line 29\u001b[0m\n\u001b[0;32m 26\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m p \u001b[38;5;129;01min\u001b[39;00m parameters:\n\u001b[0;32m 27\u001b[0m p\u001b[38;5;241m.\u001b[39mgrad \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m—> 29\u001b[0m \u001b[43mloss\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbackward\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 31\u001b[0m \u001b[38;5;66;03m#update\u001b[39;00m\n\u001b[0;32m 32\u001b[0m lr \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0.1\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m i \u001b[38;5;241m<\u001b[39m \u001b[38;5;241m10000\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m0.01\u001b[39m\n”,
“File \u001b[1;32m~\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\tensor.py:522\u001b[0m, in \u001b[0;36mTensor.backward\u001b[1;34m(self, gradient, retain_graph, create_graph, inputs)\u001b[0m\n\u001b[0;32m 512\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m has_torch_function_unary(\u001b[38;5;28mself\u001b[39m):\n\u001b[0;32m 513\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m handle_torch_function(\n\u001b[0;32m 514\u001b[0m Tensor\u001b[38;5;241m.\u001b[39mbackward,\n\u001b[0;32m 515\u001b[0m (\u001b[38;5;28mself\u001b[39m,),\n\u001b[1;32m (…)\u001b[0m\n\u001b[0;32m 520\u001b[0m inputs\u001b[38;5;241m=\u001b[39minputs,\n\u001b[0;32m 521\u001b[0m )\n\u001b[1;32m–> 522\u001b[0m \u001b[43mtorch\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mautograd\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbackward\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m 523\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mgradient\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mretain_graph\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcreate_graph\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43minputs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43minputs\u001b[49m\n\u001b[0;32m 524\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[1;32m~\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\autograd\init.py:266\u001b[0m, in \u001b[0;36mbackward\u001b[1;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)\u001b[0m\n\u001b[0;32m 261\u001b[0m retain_graph \u001b[38;5;241m=\u001b[39m create_graph\n\u001b[0;32m 263\u001b[0m \u001b[38;5;66;03m# The reason we repeat the same comment below is that\u001b[39;00m\n\u001b[0;32m 264\u001b[0m \u001b[38;5;66;03m# some Python versions print out the first line of a multi-line function\u001b[39;00m\n\u001b[0;32m 265\u001b[0m \u001b[38;5;66;03m# calls in the traceback and some print out the last line\u001b[39;00m\n\u001b[1;32m–> 266\u001b[0m \u001b[43mVariable\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_execution_engine\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun_backward\u001b[49m\u001b[43m(\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;66;43;03m# Calls into the C++ engine to run the backward pass\u001b[39;49;00m\n\u001b[0;32m 267\u001b[0m \u001b[43m \u001b[49m\u001b[43mtensors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 268\u001b[0m \u001b[43m \u001b[49m\u001b[43mgrad_tensors
\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 269\u001b[0m \u001b[43m \u001b[49m\u001b[43mretain_graph\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 270\u001b[0m \u001b[43m \u001b[49m\u001b[43mcreate_graph\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 271\u001b[0m \u001b[43m \u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 272\u001b[0m \u001b[43m \u001b[49m\u001b[43mallow_unreachable\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[0;32m 273\u001b[0m \u001b[43m \u001b[49m\u001b[43maccumulate_grad\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[0;32m 274\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n”,
“\u001b[1;31mRuntimeError\u001b[0m: element 0 of tensors does not require grad and does not have a grad_fn”
]
}
],
“source”: [
“max_steps = 200000\n”,
“batch_size = 32\n”,
“lossi = \n”,
“for i in range(max_steps):\n”,
" #mini batch construct\n",
" ix = torch.randint(0,Xtr.shape[0],(batch_size,))\n",
" Xb, Yb = Xtr[ix], Ytr[ix]\n",
" #print(Yb)\n",
" #break\n",
" \n",
" # forward pass\n",
" # emb = C[Xb] #embed the characters into vectors\n",
" #x = emb.view(emb.shape[0],-1) # concatinate the vectors\n",
" x = Xb\n",
" \n",
" for layer in layers:\n",
" x = layer(x)\n",
“\n”,
" loss = F.cross_entropy(x, Yb) # loss function\n",
“\n”,
" #print(loss.grad_fn)\n",
" #break\n",
" \n",
" \n",
" # Backwards pass\n",
" for p in parameters:\n",
" p.grad = None\n",
“\n”,
" loss.backward()\n",
" \n",
" #update\n",
" lr = 0.1 if i < 10000 else 0.01\n",
" for p in parameters:\n",
" p.data += -lr * p.grad\n",
“\n”,
" #track stats\n",
" if i%10000 == 0:\n",
" print(f’{i:7d}/{max_steps:7d}: {loss.item():.4f}‘)\n",
" lossi.append(loss.log10().item())\n",
" break"
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “8a2e0b4f-cdba-40ef-bf1a-513e3c6b5b64”,
“metadata”: {},
“outputs”: ,
“source”:
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “f51606a7-93be-460b-8eae-46a517b8ac5e”,
“metadata”: {},
“outputs”: ,
“source”:
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “38ce9c54-92b3-487e-a5c6-4c3d93cce450”,
“metadata”: {},
“outputs”: ,
“source”: [
“plt.plot(lossi)”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “2f170749-eb02-486b-aa8a-1b622af191e5”,
“metadata”: {},
“outputs”: ,
“source”: [
“for layer in layers:\n”,
" layer.training = False\n",
" "
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “8f5d95ac-3090-414f-b2ba-ecc11a0c7071”,
“metadata”: {},
“outputs”: ,
“source”: [
@torch.no_grad() # this decorator disables gradient tracking\n”,
“def split_loss(split):\n”,
" x,y = {\n",
" ‘train’: (Xtr, Ytr),\n",
" ‘val’: (Xdev, Ydev),\n",
" ‘test’: (Xte, Yte),\n",
" }[split]\n",
" emb = C # (N, block_size, n_embd)\n",
" x = emb.view(emb.shape[0], -1) # concat into (N, block_size * n_embd)\n",
" for layer in layers:\n",
" x = layer(x)\n",
" loss = F.cross_entropy(x,y)\n",
" print(split,loss.item())\n",
" \n",
“split_loss(‘train’)\n”,
“split_loss(‘val’)”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “67bb2592-897b-4e23-a96c-80712b815679”,
“metadata”: {},
“outputs”: ,
“source”: [
“for _ in range(20):\n”,
" out = \n",
" context = [0] * block_size\n",
" #print(context)\n",
" #break\n",
" while True:\n",
" emb = C[torch.tensor([context])] # (1,bloock_size,n_embd)\n",
" x = emb.view(emb.shape[0],-1) # concatinate the vectors\n",
" for layer in layers:\n",
" x = layer(x)\n",
" logits = x\n",
" probs = F.softmax(logits, dim=1)\n",
" ix=torch.multinomial(probs,num_samples=1).item()\n",
" context = context[1:] + [ix]\n",
" out.append(ix)\n",
" if ix == 0:\n",
" break\n",
" print(’'.join(itos[i] for i in out) )"
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “15ff869c-e646-4117-9385-c3143a0a0151”,
“metadata”: {},
“outputs”: ,
“source”:
}
],
“metadata”: {
“kernelspec”: {
“display_name”: “Python 3 (ipykernel)”,
“language”: “python”,
“name”: “python3”
},
“language_info”: {
“codemirror_mode”: {
“name”: “ipython”,
“version”: 3
},
“file_extension”: “.py”,
“mimetype”: “text/x-python”,
“name”: “python”,
“nbconvert_exporter”: “python”,
“pygments_lexer”: “ipython3”,
“version”: “3.12.2”
}
},
“nbformat”: 4,
“nbformat_minor”: 5
}

Here is the link to the GitHub repository for the finished makemore:


RuntimeError Traceback (most recent call last)
Cell In[17], line 20
8 #print(Yb)
9 #break
10
(…)
15 #for layer in layers:
16 # x = layer(x)
18 logits = model(Xb)
—> 20 loss = F.cross_entropy(logits, Xb) # loss function
22 #print(loss.grad_fn)
23 #break
24
25
26 # Backwards pass
27 for p in parameters:

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\functional.py:3059, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
3057 if size_average is not None or reduce is not None:
3058 reduction = _Reduction.legacy_get_string(size_average, reduce)
→ 3059 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

RuntimeError: 0D or 1D target tensor expected, multi-target not supported