How do I code with Python
Setting the correct encoding when routing stdout in Python
When the output of a Python program is routed, the Python interpreter is confused about the encoding and sets it to None. This means a program like this:
works fine under normal execution but fails with:
UnicodeEncodeError: The codec 'ascii' cannot encode the character u '\ xa0' at position 0: Ordinal number not in the range (128)
when used in a sequence of pipes.
What's the best way to do this piping job? Can I just tell him to use whatever encoding the shell / filesystem / whatever?
The suggestions I've seen so far are to modify your site.py directly or hardcode the default encoding with this hack:
What better way to get pipelines working?
Your code will work when run in script because Python encodes the output into the encoding used by your terminal application. If you are using piping, you will need to code it yourself.
As a rule of thumb, always use Unicode internally. Decode what you receive and encode what you send.
Another didactic example is a Python program for converting between ISO-8859-1 and UTF-8, in which everything in between is capitalized.
Setting the system's default encoding is a bad idea because some modules and libraries you use can be trusted to be ASCII. Do not do it.
First about this solution:
It is not practical to explicitly print with a specific encoding each time. That would be repetitive and error-prone.
A better solution is to change at the beginning of your program and encode with a chosen encoding. Here is a solution I found in Python: How is sys.stdout.encoding selected? , especially a comment from "toka":
You may want to try changing the PYTHONIOENCODING environment variable to utf_8. I wrote a page about my ordeal with this problem.
Tl; dr of the blog post:
do the job but can't set it to python itself ...
What we can do is check that it is not set and tell the user to set it before calling the script with:
Update to reply to the comment: The problem is only with redirecting to stdout. I tested Python 2.7.13 in Fedora 25
running ./b.py | Less
I had a similar problem last week. It was easy to fix in my IDE (PyCharm).
Here was my fix:
Starting from the PyCharm menu bar: File -> Settings ... -> Editor -> File encodings, then specify: "IDE encoding", "Project encoding" and "Standard encoding for property files" ALL to UTF-8 and it now works like a Magic.
Hope that helps!
An adjusted version of Craig McQueen's answer.
I could "automate" it with one call:
Yes, it is possible to get an infinite loop here if this "Setenv" fails.
I just thought I was mentioning something here that I had to experiment with for a long time before I finally realized what was going on. This may be so obvious to everyone here that they didn't bother to mention it. But it would have helped me if they had done it, so according to this principle ...!
NB: I am using Jython specifically, version 2.7, so this may not apply to CPython ...
NB2: The first two lines of my .py file here are:
The string construction mechanism "%" (AKA "interpolation operator") also causes ADDITIONAL problems ... when the default encoding of the "environment" is ASCII and you try to do something like that
You will have no trouble running in Eclipse ... In a Windows CLI (DOS window) you will find that the encoding is code page 850 (my Windows 7 operating system) or something similar that at least is the European accented character can process will work.
will work too.
When you, OTOH, reference a file from the CLI, the default encoding is None. This is by default ASCII (on my operating system anyway) which cannot handle any of the above expressions ... (dreaded encoding) Error).
Then you might think of redirecting your standard using
and try to do the CLI piping to a file ... Very oddly enough, press A works above ... but press B above triggers the coding error! However, the following works fine:
The conclusion I've come to (tentatively) is that when a string specified as a Unicode string prefixed with "u" is sent to the% handling mechanism, it appears to be using standard environmental encoding is used regardless of whether you have set stdout to forwarding!
How people deal with it is a matter of choice. I would welcome a Unicode expert who will tell why this is happening, whether I got it somehow wrong, what is the preferred solution for it, whether it also applies to CPython, whether it happens in Python 3, etc. etc.
I ran into this problem in an older application and it was difficult to identify what was printing where. I helped myself with this hack:
In addition to my test.py script:
Note that this will change ALL calls to print to use an encoding so that your console will print:
On Windows I had this problem very often when executing Python code in an editor (like Sublime Text), but Not, when I ran it from the command line.
In this case, check the parameters of your editor. In the case of SublimeText, this has been resolved:
- What are the downsides of being thin
- What screams i'm a plumber
- What is an electronic check payment system
- Deserved Sriniva's Ramanujan Bharat Ratna
- Why many Telugu people also speak Tamil
- Which countries use voting cards
- How do you say please in Persian
- How often do reservists work
- What causes painful urination
- What is a sori
- What are the main cities in Turkey
- Why do we dream in our sleep
- How much salary is good
- How is Deakin University in Melbourne, Australia
- How were medieval cities worse?
- How can I always be motivated
- What is the difference between fishing rods
- What do you call a carbonated drink
- Why is the 1922 committee so named
- Why isn't Taiwan helping Hong Kong
- What is a good advisor
- What does rising inflation mean
- How's the weather with you
- Will my Skype calls be recorded?