PyTorch UnicodeEncodeError During Debugging


(adent) #1

I am trying to debug a PyTorch model with pdb, but I cannot seem to get around this particular Unicode encoding error when trying to view the contents of tensors or net parameters, and none of the usual tricks work to strip or reformat Unicode:

(Pdb) dir(params)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
(Pdb) params
*** UnicodeEncodeError: 'ascii' codec can't encode character '\u22f1' in position 294: ordinal not in range(128)
(Pdb) print(params).encode('ascii', 'ignore').decode('ascii')
*** UnicodeEncodeError: 'ascii' codec can't encode character '\u22f1' in position 294: ordinal not in range(128)
(Pdb) print(params).encode('utf-8')
*** UnicodeEncodeError: 'ascii' codec can't encode character '\u22f1' in position 294: ordinal not in range(128)
(Pdb) print(params).encode('utf-8').strip()
*** UnicodeEncodeError: 'ascii' codec can't encode character '\u22f1' in position 294: ordinal not in range(128)

I can’t even use locals() or globals() in pdb to dump everything:

(Pdb) locals()
*** UnicodeEncodeError: 'ascii' codec can't encode character '\u22f1' in position 335: ordinal not in range(128)
(Pdb) globals()
*** UnicodeEncodeError: 'ascii' codec can't encode character '\u22f1' in position 335: ordinal not in range(128)

Any ideas about a way around this? TIA


#2

I’ve seen this before when I ran a buggy tmux installation that didn’t have unicode support. Are you using a terminal emulator (or something else) that has unicode?


(adent) #3

Just a standard bash console, nothing fancy.


#4

Does print(params) also give you that?


(adent) #5

Yeah exact same error even with utf8 and ascii encoding and strip()

Makes debugging almost useless.


#6

What version of python are you using?


(adent) #7

Whatever ships with Docker pytorch:latest (0.2.0), which I think is 2.7. I will check later this evening to see.


(adent) #8

Ok I am actually running Python 3.5.2 :: Continuum Analytics, Inc.

Bash shell on Konsole Version 2.10.5. Fairly recent everything. Only thing
I can attribute it to is perhaps the bash shell being executed inside the
Docker pytorch:latest container? Maybe I need to spawn Konsole from inside
the Docker container. I will try that next.


(Sidak Pal Singh) #9

You can try something like,

export LC_ALL="en_US.UTF-8"

If it works for you (as it did for me), put it in the .bashrc.