PyDev, Python and system default Unicode encoding problem

Python 2 has a thing called “default encoding” to automagically encode Unicode strings when they are presented as byte strings. This is evil and has been discussed various times before.

What could be even more evil? Something in your development environment messes this setting set for you, without telling you that. This way you never encounter Unicode problems on your development computer and when you roll out your seemingly working code to production, the world goes haywire.

Evil. Evil. Evil. Thousands of curses and overworking hours to fix the problems.

I encountered this problem. And this is the code I used to track the problem down in site.py:

# Trap the bastard messing with the default encoding
# using a monkey patch
old_set_default_encoding = sys.setdefaultencoding

def aargh(x):
    import pdb ; pdb.set_trace()

sys.setdefaultencoding = aargh
And the result was surprising:
--Return--
> /home/moo/py24/lib/python2.4/site.py(485)aargh()->None
-> import pdb ; pdb.set_trace()
(Pdb) bt
/home/moo/py24/lib/python2.4/site.py(613)?()
-> main()
/home/moo/py24/lib/python2.4/site.py(604)main()
-> execsitecustomize()
/home/moo/py24/lib/python2.4/site.py(514)execsitecustomize()
-> import sitecustomize
/home/moo/Desktop/Aptana Studio 2.0/plugins/org.python.pydev_1.5.3.1260479439/PySrc/pydev_sitecustomize/sitecustomize.py(99)?()
-> sys.setdefaultencoding(encoding) #@UndefinedVariable (it's deleted after the site.py is executed -- so, it's undefined for code-analysis)
> /home/moo/py24/lib/python2.4/site.py(485)aargh()->None
-> import pdb ; pdb.set_trace()
--Return--> /home/moo/py24/lib/python2.4/site.py(485)aargh()->None-> import pdb ; pdb.set_trace()(Pdb) bt  /home/moo/py24/lib/python2.4/site.py(613)?()-> main()  /home/moo/py24/lib/python2.4/site.py(604)main()-> execsitecustomize()  /home/moo/py24/lib/python2.4/site.py(514)execsitecustomize()-> import sitecustomize  /home/moo/Desktop/Aptana Studio 2.0/plugins/org.python.pydev_1.5.3.1260479439/PySrc/pydev_sitecustomize/sitecustomize.py(99)?()-> sys.setdefaultencoding(encoding) #@UndefinedVariable (it's deleted after the site.py is executed -- so, it's undefined for code-analysis)> /home/moo/py24/lib/python2.4/site.py(485)aargh()->None-> import pdb ; pdb.set_trace()

Looks like the culprint was PyDev (Eclipse Python plug-in).  The interfering source code is here. Looks like the reason was to co-operate with Eclipse console. However it has been done incorrectly. Instead of setting the console encoding, the encoding is set to whole Python run-time environment, messing up the target run-time where the development is being done.

There is a possible fix for this problem. In Eclipse Run… dialog settings you can choose Console Encoding on Common tab. There is a possible value US-ASCII. I am not sure what Python 2 thinks “US-ASCII” encoding name, since the default is “ascii”.

4 thoughts on “PyDev, Python and system default Unicode encoding problem

  1. What a Great Article it its really informative and innovative keep us posted with new updates.
    its was really valuable.I must admit it, terrific job on this blog,
    I’ll be sure to check back again real soon.

Leave a Reply

Your email address will not be published. Required fields are marked *