RuntimeError: Expected a 'mps:0' generator device but found 'cpu'

Hi all,

I am new to LLM programming in Python and I am trying to fine-tune the instructlab/merlinite-7b-lab] model (see HiggingFaces) on my Mac M1. My goal is to teach this model to a new music composer Xenobi Amilen I have invented.

Using the new Ilab CLI from RedHat I created this training set for the model. It is a JSONL file with 100 questions/answers about the invented composer.

I wrote this Python script to train the model. I tested all the parts related to the tokenizer, datasets and it seems to work. However, the final train got this error:

Traceback (most recent call last):
  File "/Users/sasadangelo/", line 99, in <module>
  File "/Users/sasadangelo/", line 1932, in train
    return inner_training_loop(
  File "/Users/sasadangelo/", line 2230, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/Users/sasadangelo/", line 454, in __iter__
    current_batch = next(dataloader_iter)
  File "/Users/sasadangelo/", line 701, in __next__
    data = self._next_data()
  File "/Users/sasadangelo/", line 756, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "/Users/sasadangelo/", line 691, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/Users/sasadangelo/", line 347, in __iter__
    for idx in self.sampler:
  File "/Users/sasadangelo/", line 92, in __iter__
    yield from super().__iter__()
  File "/Users/sasadangelo/", line 197, in __iter__
    yield from torch.randperm(n, generator=generator).tolist()
  File "/Users/sasadangelo/", line 79, in __torch_function__
    return func(*args, **kwargs)
RuntimeError: Expected a 'mps:0' generator device but found 'cpu'

However, even forcing the PyTorch code in the file torch/utils/data/ to use a Generator on the mps device (I changed the pytorch code locally), then I got the problem:

RuntimeError: Placeholder storage has not been allocated on MPS device!
  0%|          | 0/75 [00:00<?, ?it/s]                                                                                                                                        

I found a lot of articles about this error on Google and also StackOverflow. This last problem seems related to sending model and input data to mps. I am sure both the model and input data are on mps, I tested it.

I don’t know how to fix these issues. I tried with Pytorch last stable release and also today nightly build.

Can anyone help?

Hi all again,
Can anyone help on this problem?