bag.more_codecs module

What do you do when Python does not know about some exotic encoder or decoder that exists out there? Suppose you have some text, perhaps an e-mail message, that Python won’t decode, saying:

LookupError: unknown encoding: ansi_x3.110-1983

What you do is tell Python to call some UNIX command that does the encoding/decoding for you. This module sets that up using the iconv program.

Usage:

import bag.more_codecs

That is it. Importing the module registers a codec. It will convert to and from anything in codecs_dict (if iconv supports it).

However, the only possible error modes are ‘strict’ and ‘ignore’. Therefore, this raises an exception:

u'hi'.encode('utf32', errors='replace')

The module will look for iconv in the path. You may set a specific location:

bag.more_codecs.COMMAND = '/usr/bin/iconv'

Unfortunately performance suffers: A process is started for every iconv call. You can help us by writing code that calls iconv by using ctypes or something like that.

Both Python and iconv change over time, so the list of codecs that each supports is prolly going to change. The registered codecs are in a dictionary:

from bag.more_codecs import codecs_dict
print(list(codecs_dict.keys()))

You may add and remove codecs from the codecs_dict.

A reasonable, but perhaps not bug-free, method has been used to try to determine which codecs iconv has that Python 2.5 does not, and this has resulted in the default value of codecs_dict, with 894 codecs.

bag.more_codecs.discover_interesting_codecs()[source]
bag.more_codecs.get_supported_codecs()[source]

Return a list of the codec names that iconv supports.