Torch.embedding fails with RuntimeError: Placeholder storage has not been allocated on MPS device!

I am trying to use pytorch based library “transformers”
When setting the device as “mps” I get the titular error:

Traceback (most recent call last):
File “/Users/raam/code/pytorch_accl/t2v-transformers-models/./app.py”, line 50, in read_item
vector = await vec.vectorize(item.text, item.config)
File “/Users/raam/code/pytorch_accl/t2v-transformers-models/./vectorizer.py”, line 71, in vectorize
batch_results = self.get_batch_results(tokens, sentences[start_index:end_index])
File “/Users/raam/code/pytorch_accl/t2v-transformers-models/./vectorizer.py”, line 52, in get_batch_results
return self.model_delegate.get_batch_results(tokens, text)
File “/Users/raam/code/pytorch_accl/t2v-transformers-models/./vectorizer.py”, line 95, in get_batch_results
return self.model(**tokens)
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py”, line 1010, in forward
embedding_output = self.embeddings(
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py”, line 235, in forward
inputs_embeds = self.word_embeddings(input_ids)
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/torch/nn/modules/sparse.py”, line 158, in forward
return F.embedding(
File “/Users/raam/code/pytorch_accl/.venv/lib/python3.10/site-packages/torch/nn/functional.py”, line 2148, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!

looking at the code this line seems to always fail when MPS
[github permalink(pytorch/OperationUtils.mm at e011a8e18bf469a6a612fd1e7647159c353730a9 · pytorch/pytorch · GitHub)

TORCH_CHECK(self.is_mps(), "Placeholder storage has not been allocated on MPS device!");

any advice appreciated,
Thanks!

1 Like

Hi,

Did you make sure that you moved both your model and your input to the “mps” device?

Same problem here, and after fixing as you guided with:

inputs = tokenizer.encode(sentence, return_tensors="pt").to("mps")

I’ve got:

Input In [38], in generate(sentence, max_length)
      3 inputs = tokenizer.encode(sentence, return_tensors="pt").to(device)
      4 # generated_ids = model.generate(input_ids)
----> 5 outputs = model.generate(inputs, use_cache=True, max_length=max_length)
      6 return tokenizer.decode(outputs[0], skip_special_tokens=True)

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/transformers/generation_utils.py:1278, in GenerationMixin.generate(self, inputs, max_length, min_length, do_sample, early_stopping, num_beams, temperature, top_k, top_p, typical_p, repetition_penalty, bad_words_ids, force_words_ids, bos_token_id, pad_token_id, eos_token_id, length_penalty, no_repeat_ngram_size, encoder_no_repeat_ngram_size, num_return_sequences, max_time, max_new_tokens, decoder_start_token_id, use_cache, num_beam_groups, diversity_penalty, prefix_allowed_tokens_fn, logits_processor, renormalize_logits, stopping_criteria, constraints, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, forced_bos_token_id, forced_eos_token_id, remove_invalid_values, synced_gpus, exponential_decay_length_penalty, **model_kwargs)
   1273         raise ValueError(
   1274             f"num_return_sequences has to be 1, but is {num_return_sequences} when doing greedy search."
   1275         )
   1277     # 10. run greedy search
-> 1278     return self.greedy_search(
   1279         input_ids,
   1280         logits_processor=logits_processor,
   1281         stopping_criteria=stopping_criteria,
   1282         pad_token_id=pad_token_id,
   1283         eos_token_id=eos_token_id,
   1284         output_scores=output_scores,
   1285         return_dict_in_generate=return_dict_in_generate,
   1286         synced_gpus=synced_gpus,
   1287         **model_kwargs,
   1288     )
   1290 elif is_sample_gen_mode:
   1291     # 10. prepare logits warper
   1292     logits_warper = self._get_logits_warper(
   1293         top_k=top_k,
   1294         top_p=top_p,
   (...)
   1298         renormalize_logits=renormalize_logits,
   1299     )

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/transformers/generation_utils.py:1652, in GenerationMixin.greedy_search(self, input_ids, logits_processor, stopping_criteria, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs)
   1647     encoder_hidden_states = (
   1648         model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
   1649     )
   1651 # keep track of which sequences are already finished
-> 1652 unfinished_sequences = input_ids.new(input_ids.shape[0]).fill_(1)
   1653 cur_len = input_ids.shape[-1]
   1655 this_peer_finished = False  # used by synced_gpus only

RuntimeError: new(): expected key in DispatchKeySet(CPU, CUDA, HIP, XLA, IPU, XPU, HPU, Lazy) but got: MPS

Thank, it was about calling .to again on a second model

1 Like

@Willian you want to call to on the model and inputs before you evaluate them. See the basic example at MPS backend — PyTorch master documentation

Yes they are both called to(device) before evaluated.
I’ve created an issue on Github detailing this.

It was indeed something else. Fixed in master now.

1 Like