This is in reference to the current ‘Twitter image encoding challenge’ running on StackOverflow.
If we want to restrict ourselves to assigned, non-control, non-private Unicode characters, then by my reckoning that gives us 129,775 available characters.
wget http://unicode.org/Public/UNIDATA/UnicodeData.txt awk -F ';' UnicodeData.txt -f countUnichars.awk | bc
countUnichars.awk source:
BEGIN { print "ibase=16" } # set bc to hex mode
$2 ~ /Private/ { # skip any lines with "private" in the description
getline;
}
n { # if n is set, then print the range for bc to calculate
printf("(%s-%s+1)+",$1,n);
n="";
}
$2 ~ /First>/ { # set n if the start of a range
n=$1;
getline;
}
$3 !~ "C." { # otherwise count anything that isn't some kind of a control character
i++;
}
END { # print out the count of everything else
printf("%X\n",i)
}This means we can store exactly 2377 bits (297 bytes) per message (this is
), so if we use a 16-colour palette we can store about 594 pixels (
), which can almost reproduce the Mona Lisa thumbnail in the contest page.
Post a Comment