Torch.embedding fails with RuntimeError: Placeholder storage has not been allocated on MPS device!

I am trying to use pytorch based library “transformers”
When setting the device as “mps” I get the titular error:

Traceback (most recent call last):
File “/Users/raam/code/pytorch_accl/t2v-transformers-models/./app.py”, line 50, in read_item
vector = await vec.vectorize(item.text, item.config)
File “/Users/raam/code/pytorch_accl/t2v-transformers-models/./vectorizer.py”, line 71, in vectorize
batch_results = self.get_batch_results(tokens, sentences[start_index:end_index])
File “/Users/raam/code/pytorch_accl/t2v-transformers-models/./vectorizer.py”, line 52, in get_batch_results
return self.model_delegate.get_batch_results(tokens, text)
File “/Users/raam/code/pytorch_accl/t2v-transformers-models/./vectorizer.py”, line 95, in get_batch_results
return self.model(**tokens)
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py”, line 1010, in forward
embedding_output = self.embeddings(
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py”, line 235, in forward
inputs_embeds = self.word_embeddings(input_ids)
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/torch/nn/modules/sparse.py”, line 158, in forward
return F.embedding(
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/torch/nn/functional.py”, line 2148, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!

looking at the code this line seems to always fail when MPS
[github permalink(https://github.com/pytorch/pytorch/blob/e011a8e18bf469a6a612fd1e7647159c353730a9/aten/src/ATen/native/mps/OperationUtils.mm#L331)

TORCH_CHECK(self.is_mps(), "Placeholder storage has not been allocated on MPS device!");

any advice appreciated,
Thanks!

Hi,

Did you make sure that you moved both your model and your input to the “mps” device?

Same problem here, and after fixing as you guided with:

inputs = tokenizer.encode(sentence, return_tensors="pt").to("mps")

I’ve got:

Input In [38], in generate(sentence, max_length)
      3 inputs = tokenizer.encode(sentence, return_tensors="pt").to(device)
      4 # generated_ids = model.generate(input_ids)
----> 5 outputs = model.generate(inputs, use_cache=True, max_length=max_length)
      6 return tokenizer.decode(outputs[0], skip_special_tokens=True)

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/transformers/generation_utils.py:1278, in GenerationMixin.generate(self, inputs, max_length, min_length, do_sample, early_stopping, num_beams, temperature, top_k, top_p, typical_p, repetition_penalty, bad_words_ids, force_words_ids, bos_token_id, pad_token_id, eos_token_id, length_penalty, no_repeat_ngram_size, encoder_no_repeat_ngram_size, num_return_sequences, max_time, max_new_tokens, decoder_start_token_id, use_cache, num_beam_groups, diversity_penalty, prefix_allowed_tokens_fn, logits_processor, renormalize_logits, stopping_criteria, constraints, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, forced_bos_token_id, forced_eos_token_id, remove_invalid_values, synced_gpus, exponential_decay_length_penalty, **model_kwargs)
   1273         raise ValueError(
   1274             f"num_return_sequences has to be 1, but is {num_return_sequences} when doing greedy search."
   1275         )
   1277     # 10. run greedy search
-> 1278     return self.greedy_search(
   1279         input_ids,
   1280         logits_processor=logits_processor,
   1281         stopping_criteria=stopping_criteria,
   1282         pad_token_id=pad_token_id,
   1283         eos_token_id=eos_token_id,
   1284         output_scores=output_scores,
   1285         return_dict_in_generate=return_dict_in_generate,
   1286         synced_gpus=synced_gpus,
   1287         **model_kwargs,
   1288     )
   1290 elif is_sample_gen_mode:
   1291     # 10. prepare logits warper
   1292     logits_warper = self._get_logits_warper(
   1293         top_k=top_k,
   1294         top_p=top_p,
   (...)
   1298         renormalize_logits=renormalize_logits,
   1299     )

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/transformers/generation_utils.py:1652, in GenerationMixin.greedy_search(self, input_ids, logits_processor, stopping_criteria, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs)
   1647     encoder_hidden_states = (
   1648         model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
   1649     )
   1651 # keep track of which sequences are already finished
-> 1652 unfinished_sequences = input_ids.new(input_ids.shape[0]).fill_(1)
   1653 cur_len = input_ids.shape[-1]
   1655 this_peer_finished = False  # used by synced_gpus only

RuntimeError: new(): expected key in DispatchKeySet(CPU, CUDA, HIP, XLA, IPU, XPU, HPU, Lazy) but got: MPS

Thank, it was about calling .to again on a second model

@Willian you want to call to on the model and inputs before you evaluate them. See the basic example at MPS backend — PyTorch master documentation

Yes they are both called to(device) before evaluated.
I’ve created an issue on Github detailing this.

It was indeed something else. Fixed in master now.

“RuntimeError: Placeholder storage has not been allocated on MPS device!” I’m not really text savvy. This is the message I get when I try to run ControlNet on stable diffusion. I’m using Pinokio and I don’t know how to input that to fix this. :frowning_face: