2006-02-19-0347Z


Hmph. I finally got the packing code in docpack.py working under Cygwin, but docunpack fails. Here's why:

jcomeau@USER ~/src
$ python
Python 2.4 (#1, Dec  4 2004, 20:10:33)
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a=u''
>>> a += u'\U0001d133'
>>> a += 'b'
>>> a
u'\U0001d133b'
>>> map(None, a)
[u'\ud834', u'\udd33', u'b']
>>> a[0]
u'\ud834'
>>> a
u'\U0001d133b'
>>>

The utf-8 decoder recognizes the character but the array treats it as two separate characters. So not only does it fetch the wrong tokens, the line isn't the expected length so the decoder will barf on trying to detokenize a None object instead of a character. Bleah. I don't see any easy workaround for that.

Back to blog or home page

last updated 2013-01-10 20:45:42. served from tektonic.jcomeau.com