What can we fit in 140 characters?
This is in reference to the current ‘Twitter image encoding challenge’ running on StackOverflow.
If we want to restrict ourselves to assigned, non-control, non-private Unicode characters, then by my reckoning that gives us 129,775 available characters.
wget http://unicode.org/Public/UNIDATA/UnicodeData.txt awk -F ';' UnicodeData.txt -f countUnichars.awk | bc
countUnichars.awk source:
BEGIN { print "ibase=16" } # set bc to hex mode $2 ~ /Private/ { # skip any lines with "private" in the description getline; } n { # if n is set, then print the range for bc to calculate printf("(%s-%s+1)+",$1,n); n=""; } $2 ~ /First>/ { # set n if the start of a range n=$1; getline; } $3 !~ "C." { # otherwise count anything that isn't some kind of a control character i++; } END { # print out the count of everything else printf("%X\n",i) }
This means we can store exactly 2377 bits (297 bytes) per message (this is
), so if we use a 16-colour palette we can store about 594 pixels (
), which can almost reproduce the Mona Lisa thumbnail in the contest page.
