porges

Tag: Unicode

Unicode as she is broke

Do you think your string-handling code is robust? Are there any problems with the following snippets?

Symbols used to represent functions

I was looking for some standard symbols to represent the Control key and the Alt key, and couldn’t find one until I came across ISO/IEC 9995-7. Because I had much trouble finding a free copy of the document on the ’Net, I have made a table of the symbols and their functions below. I have [...]

What can we fit in 140 characters?

This is in reference to the current ‘Twitter image encoding challenge’ running on StackOverflow. If we want to restrict ourselves to assigned, non-control, non-private Unicode characters, then by my reckoning that gives us 129,775 available characters. wget http://unicode.org/Public/UNIDATA/UnicodeData.txt awk -F ‘;’ UnicodeData.txt -f countUnichars.awk | bc countUnichars.awk source: BEGIN { print "ibase=16" } # set [...]

Unicode breaks Google search

A search for the phrase "It’s like a light of a new day," breaks in more than one way. Not only does Google search fail to recognize that “it’s” is a word, it also ignores the quote marks, searching for the phrase as individual words.