Tag Archives: Unicode

Symbols used to represent functions

I was looking for some standard symbols to represent the Control key and the Alt key, and couldn’t find one until I came across ISO/IEC 9995-7. Because I had much trouble finding a free copy of the document on the ’Net, I have made a table of the symbols and their functions below. I have [...]

What can we fit in 140 characters?

This is in reference to the current ‘Twitter image encoding challenge’ running on StackOverflow. If we want to restrict ourselves to assigned, non-control, non-private Unicode characters, then by my reckoning that gives us 129,775 available characters. wget http://unicode.org/Public/UNIDATA/UnicodeData.txt awk -F ‘;’ UnicodeData.txt -f countUnichars.awk | bc countUnichars.awk source: BEGIN { print "ibase=16" } # set [...]

Unicode breaks Google search

A search for the phrase "It’s like a light of a new day," breaks in more than one way. Not only does Google search fail to recognize that “it’s” is a word, it also ignores the quote marks, searching for the phrase as individual words.

Baleegal!

XML 1.0 allows you to insert characters from the C1 control code range, whilst those from the C0 range are outright forbidden. XML 1.1 allows you to insert characters from the C0 range as long as they are escaped as character entity references, and mandates that you do the same for those from the C1 [...]