<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>porges &#187; stackoverflow</title>
	<atom:link href="http://porg.es/blog/tag/stackoverflow/feed" rel="self" type="application/rss+xml" />
	<link>http://porg.es/blog</link>
	<description></description>
	<lastBuildDate>Thu, 12 Jan 2012 23:45:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>What can we fit in 140 characters?</title>
		<link>http://porg.es/blog/what-can-we-fit-in-140-characters</link>
		<comments>http://porg.es/blog/what-can-we-fit-in-140-characters#comments</comments>
		<pubDate>Wed, 27 May 2009 07:09:25 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[awk]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[silly]]></category>
		<category><![CDATA[stackoverflow]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[Unicode]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=327</guid>
		<description><![CDATA[This is in reference to the current ‘Twitter image encoding challenge’ running on StackOverflow. If we want to restrict ourselves to assigned, non-control, non-private Unicode characters, then by my reckoning that gives us 129,775 available characters. wget http://unicode.org/Public/UNIDATA/UnicodeData.txt awk -F ';' UnicodeData.txt -f countUnichars.awk &#124; bc countUnichars.awk source: BEGIN &#123; print &#34;ibase=16&#34; &#125; # set [...]]]></description>
			<content:encoded><![CDATA[<p>This is in reference to the current <a href="http://stackoverflow.com/questions/891643/twitter-image-encoding-challenge">‘Twitter image encoding challenge’ running on StackOverflow</a>.</p>
<p>If we want to restrict ourselves to assigned, non-control, non-private Unicode characters, then by my reckoning that gives us 129,775 available characters.</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">wget</span> http:<span style="color: #000000; font-weight: bold;">//</span>unicode.org<span style="color: #000000; font-weight: bold;">/</span>Public<span style="color: #000000; font-weight: bold;">/</span>UNIDATA<span style="color: #000000; font-weight: bold;">/</span>UnicodeData.txt
<span style="color: #c20cb9; font-weight: bold;">awk</span> <span style="color: #660033;">-F</span> <span style="color: #ff0000;">';'</span> UnicodeData.txt <span style="color: #660033;">-f</span> countUnichars.awk <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">bc</span></pre></div></div>

<p><tt>countUnichars.awk</tt> source:</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;"><span style="color: #C20CB9; font-weight: bold;">BEGIN</span> <span style="color: #7a0874; font-weight: bold;">&#123;</span> <span style="color: #0BD507; font-weight: bold;">print</span> <span style="color: #ff0000;">&quot;ibase=16&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#125;</span> <span style="color:#808080;"># set bc to hex mode</span>
&nbsp;
<span style="color:#000088;">$2</span> <span style="color:#C4C364;">~</span> <span style="color:black;">/</span>Private<span style="color:black;">/</span> <span style="color: #7a0874; font-weight: bold;">&#123;</span> <span style="color:#808080;"># skip any lines with &quot;private&quot; in the description</span>
    <span style="color: #0BD507; font-weight: bold;">getline</span>;
<span style="color: #7a0874; font-weight: bold;">&#125;</span>
&nbsp;
n <span style="color: #7a0874; font-weight: bold;">&#123;</span> <span style="color:#808080;"># if n is set, then print the range for bc to calculate</span>
    <span style="color: #0BD507; font-weight: bold;">printf</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #ff0000;">&quot;(%s-%s+1)+&quot;</span>,<span style="color:#000088;">$1</span>,n<span style="color: #7a0874; font-weight: bold;">&#41;</span>;
    n=<span style="color: #ff0000;">&quot;&quot;</span>;
<span style="color: #7a0874; font-weight: bold;">&#125;</span>
&nbsp;
<span style="color:#000088;">$2</span> <span style="color:#C4C364;">~</span> <span style="color:black;">/</span>First<span style="color:black;">&gt;</span><span style="color:black;">/</span> <span style="color: #7a0874; font-weight: bold;">&#123;</span> <span style="color:#808080;"># set n if the start of a range</span>
    n=<span style="color:#000088;">$1</span>;
    <span style="color: #0BD507; font-weight: bold;">getline</span>;
<span style="color: #7a0874; font-weight: bold;">&#125;</span>
&nbsp;
<span style="color:#000088;">$3</span> <span style="color:#C4C364;">!~</span> <span style="color: #ff0000;">&quot;C.&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#123;</span> <span style="color:#808080;"># otherwise count anything that isn't some kind of a control character</span>
    i<span style="color:black;">++</span>;
<span style="color: #7a0874; font-weight: bold;">&#125;</span>
&nbsp;
<span style="color: #C20CB9; font-weight: bold;">END</span> <span style="color: #7a0874; font-weight: bold;">&#123;</span> <span style="color:#808080;"># print out the count of everything else</span>
    <span style="color: #0BD507; font-weight: bold;">printf</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #ff0000;">&quot;%X<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>,i<span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #7a0874; font-weight: bold;">&#125;</span></pre></div></div>

<p>This means we can store exactly 2377 bits (297 bytes) per message (this is <img src='/blog/wp-content/plugins/latexrender/pictures/3d148faa2b961edebbfea91810c1ab28.gif' title='\lfloor\log_2(129775) \times 140\rfloor' alt='\lfloor\log_2(129775) \times 140\rfloor' align=absmiddle>), so if we use a 16-colour palette we can store about 594 pixels (<img src='/blog/wp-content/plugins/latexrender/pictures/cba7624bee8dfdd5dae1931dda7f495d.gif' title='2377/\log_2(16)' alt='2377/\log_2(16)' align=absmiddle>), which can <em>almost</em> reproduce the <i>Mona Lisa</i> thumbnail in the contest page.</p>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/what-can-we-fit-in-140-characters/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

