How do I code with Python

Setting the correct encoding when routing stdout in Python


When the output of a Python program is routed, the Python interpreter is confused about the encoding and sets it to None. This means a program like this:

works fine under normal execution but fails with:

UnicodeEncodeError: The codec 'ascii' cannot encode the character u '\ xa0' at position 0: Ordinal number not in the range (128)

when used in a sequence of pipes.

What's the best way to do this piping job? Can I just tell him to use whatever encoding the shell / filesystem / whatever?

The suggestions I've seen so far are to modify your site.py directly or hardcode the default encoding with this hack:

What better way to get pipelines working?






Reply:


Your code will work when run in script because Python encodes the output into the encoding used by your terminal application. If you are using piping, you will need to code it yourself.

As a rule of thumb, always use Unicode internally. Decode what you receive and encode what you send.

Another didactic example is a Python program for converting between ISO-8859-1 and UTF-8, in which everything in between is capitalized.

Setting the system's default encoding is a bad idea because some modules and libraries you use can be trusted to be ASCII. Do not do it.







First about this solution:

It is not practical to explicitly print with a specific encoding each time. That would be repetitive and error-prone.

A better solution is to change at the beginning of your program and encode with a chosen encoding. Here is a solution I found in Python: How is sys.stdout.encoding selected? , especially a comment from "toka":







You may want to try changing the PYTHONIOENCODING environment variable to utf_8. I wrote a page about my ordeal with this problem.

Tl; dr of the blog post:

give them







do the job but can't set it to python itself ...

What we can do is check that it is not set and tell the user to set it before calling the script with:

Update to reply to the comment: The problem is only with redirecting to stdout. I tested Python 2.7.13 in Fedora 25

Cat b.py.

runs ./b.py

running ./b.py | Less





I had a similar problem last week. It was easy to fix in my IDE (PyCharm).

Here was my fix:

Starting from the PyCharm menu bar: File -> Settings ... -> Editor -> File encodings, then specify: "IDE encoding", "Project encoding" and "Standard encoding for property files" ALL to UTF-8 and it now works like a Magic.

Hope that helps!


An adjusted version of Craig McQueen's answer.

Usage:


I could "automate" it with one call:

Yes, it is possible to get an infinite loop here if this "Setenv" fails.



I just thought I was mentioning something here that I had to experiment with for a long time before I finally realized what was going on. This may be so obvious to everyone here that they didn't bother to mention it. But it would have helped me if they had done it, so according to this principle ...!

NB: I am using Jython specifically, version 2.7, so this may not apply to CPython ...

NB2: The first two lines of my .py file here are:

The string construction mechanism "%" (AKA "interpolation operator") also causes ADDITIONAL problems ... when the default encoding of the "environment" is ASCII and you try to do something like that

You will have no trouble running in Eclipse ... In a Windows CLI (DOS window) you will find that the encoding is code page 850 (my Windows 7 operating system) or something similar that at least is the European accented character can process will work.

will work too.

When you, OTOH, reference a file from the CLI, the default encoding is None. This is by default ASCII (on my operating system anyway) which cannot handle any of the above expressions ... (dreaded encoding) Error).

Then you might think of redirecting your standard using

and try to do the CLI piping to a file ... Very oddly enough, press A works above ... but press B above triggers the coding error! However, the following works fine:

The conclusion I've come to (tentatively) is that when a string specified as a Unicode string prefixed with "u" is sent to the% handling mechanism, it appears to be using standard environmental encoding is used regardless of whether you have set stdout to forwarding!

How people deal with it is a matter of choice. I would welcome a Unicode expert who will tell why this is happening, whether I got it somehow wrong, what is the preferred solution for it, whether it also applies to CPython, whether it happens in Python 3, etc. etc.




I ran into this problem in an older application and it was difficult to identify what was printing where. I helped myself with this hack:

In addition to my test.py script:

Note that this will change ALL calls to print to use an encoding so that your console will print:


On Windows I had this problem very often when executing Python code in an editor (like Sublime Text), but Not, when I ran it from the command line.

In this case, check the parameters of your editor. In the case of SublimeText, this has been resolved:

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.