TorchScript: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 0: unexpected end of data

Hi, I have the following Tokenizer class which I’m trying to jit to use in c++:

class Tokenizer(jit.ScriptModule):
  def __init__(self):
    super().__init__()
    self.tokens_to_idx : Dict[str, int] = {...}
    self.idx_to_tokens : Dict[int, str] = {...}

  @jit.script_method
  def encode(self, word : str):
    word_idx : List[int] = []

    for char in word.lower():
        word_idx.append(self.tokens_to_idx[char])

    return list(word_idx)

I am passing unicode strings to the encode() method with the following:

tokenizer_to_jit = Tokenizer()
tokenizer_jitted = torch.jit.script(tokenizer_to_jit)
tokenizer_jitted.encode("নমস্কাৰ")

This produces the following output:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 0: unexpected end of data

The same code works when I pass English strings. What could be the issue and how to resolve it?