<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>porges &#187; replies</title>
	<atom:link href="http://porg.es/blog/category/replies/feed" rel="self" type="application/rss+xml" />
	<link>http://porg.es/blog</link>
	<description></description>
	<lastBuildDate>Thu, 12 Jan 2012 23:45:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Cleaning up a set of tags with Awk</title>
		<link>http://porg.es/blog/cleaning-up-a-set-of-tags-with-awk</link>
		<comments>http://porg.es/blog/cleaning-up-a-set-of-tags-with-awk#comments</comments>
		<pubDate>Wed, 28 Jan 2009 02:08:05 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[commentary]]></category>
		<category><![CDATA[replies]]></category>
		<category><![CDATA[utility]]></category>
		<category><![CDATA[awk]]></category>
		<category><![CDATA[inflammatory]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[Unix]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=266</guid>
		<description><![CDATA[Introduction David R. MacIver has recently written this blog post about cleaning up a set of tags. This blog post, on the other hand, is about a nice old Unix tool called ‘awk’. Awk is one of those programs that is often overlooked. It is really a small domain-specific language for processing text. In some [...]]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>
<p><a href="http://www.drmaciver.com/about/">David R. MacIver</a> has recently written <a href="http://www.drmaciver.com/2009/01/cleaning-up-a-set-of-tags-part-1/">this blog post about cleaning up a set of tags</a>. <i>This</i> blog post, on the other hand, is about a nice old Unix tool called ‘awk’.</p>
<p>Awk is one of those programs that is often overlooked. It is really a small domain-specific language for processing text. In some ways it resembles sed, but it is more powerful, and it especially excels at processing line- and field-structured input.</p>
<h3>Processing cite-u-like&#8217;s data</h3>
<p>First of all, before we begin I want to say this this isn&#8217;t a criticism of David&#8217;s work. Using a general-purpose language like Ruby to process data comes with several benefits. This post is to explain the benefits of awk and what it excels at doing.</p>
<p>Now, cite-u-like&#8217;s tag data comes in a pipe-separated format, so we have input like this:</p>
<pre style="overflow:auto">42|61baaeba8de136d9c1aa9c18ec3860e8|2004-11-04 02:25:05.373798+00|ecoli
42|61baaeba8de136d9c1aa9c18ec3860e8|2004-11-04 02:25:05.373798+00|metabolism
42|61baaeba8de136d9c1aa9c18ec3860e8|2004-11-04 02:25:05.373798+00|barabasi
42|61baaeba8de136d9c1aa9c18ec3860e8|2004-11-04 02:25:05.373798+00|networks
43|61baaeba8de136d9c1aa9c18ec3860e8|2004-11-04 02:25:51.839281+00|control
43|61baaeba8de136d9c1aa9c18ec3860e8|2004-11-04 02:25:51.839281+00|engineering
43|61baaeba8de136d9c1aa9c18ec3860e8|2004-11-04 02:25:51.839281+00|robustness
44|61baaeba8de136d9c1aa9c18ec3860e8|2004-11-04 02:26:33.156319+00|networks
44|61baaeba8de136d9c1aa9c18ec3860e8|2004-11-04 02:26:33.156319+00|strogatz
44|61baaeba8de136d9c1aa9c18ec3860e8|2004-11-04 02:26:33.156319+00|survey</pre>
<p>From now on, whenever you see <i>x</i>-separated data, I want you to scream ‘USE AWK!’ <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_razz.gif" alt="" /></p>
<p>Awk scripts look something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;">pattern <span style="color: #7a0874; font-weight: bold;">&#123;</span> expression <span style="color: #7a0874; font-weight: bold;">&#125;</span></pre></div></div>

<p><i>pattern</i> is used to match against a record, and if successful, the action in the braces (<i>expression</i>) will be carried out. But what is a record? Awk allows you to define what a line is using the variable <code>RS</code> (short for ‘record separator’). By default it is set to <code>\n</code> (so that each line is a record), which is what we want here.</p>
<p>Within the expression you can refer to <i>fields</i> of the current record, using the syntax <code>$<i>n</i></code>. <code>$0</code> refers to the whole record, while <code>$1</code>,<code>$2</code>&#8230; refer to individual fields.</p>
<p>Awk also allows you to define what the <i>field separator</i> will be via the variable <code>FS</code>.</p>
<p>So how do we set these variables? Awk has two special patterns <code>BEGIN</code> and <code>END</code>, which run before and after everything else. In this case, we want the fields to be separated by <code>|</code>, so we use the pattern:</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;"><span style="color: #C20CB9; font-weight: bold;">BEGIN</span> <span style="color: #7a0874; font-weight: bold;">&#123;</span> <span style="color: #4107D5; font-weight: bold;">FS</span> = <span style="color: #ff0000;">&quot;|&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#125;</span></pre></div></div>

<p>This is rather long-winded, so mawk (an implementation of awk) also allows you to set <code>FS</code> via a command-line option <code>-F</code>.</p>
<p>For this application, we want the last field of the record (we could hard-code it as <code>$4</code>, but we&#8217;re exploring awk!). Awk provides the number of fields in the record as the variable <code>NF</code> (number of fields), so we want to access this field. We do so using the record syntax <code>$</code> and the variable <code>NF</code>.</p>
<p>So to put all this together, what we want is the command:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">awk</span> <span style="color: #660033;">-F</span> <span style="color: #ff0000;">&quot;|&quot;</span> <span style="color: #ff0000;">'{ print $NF }'</span></pre></div></div>

<p>We set the field separator to &#8220;|&#8221;, and write an awk expression to print the last field of each record. Since we left the <i>pattern</i> empty, this expression is evaluated on every record.</p>
<h3>Awk vs. Ruby</h3>
<p>So what good is learning all this, anyway? There are a couple of reasons:</p>
<ul>
<li>awk is standard on *nix operating systems. In order to use David&#8217;s code I had to install Ruby; with awk you can generally count on it being there.</li>
<li>awk is <em>fast</em> (I should say at least in the interpreter ‘mawk’ which is the standard for Ubuntu). On my machine the awk version completes in under a third of the time that it takes for David&#8217;s Ruby version to complete. An interesting thing was that the awk version didn&#8217;t even max out the CPU, indicating that it is IO-bound and would go faster if I had faster disks (I&#8217;m currently on a laptop).</li>
<li>Awk is ideal for record- and field-based input, as I hope this post will show you <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_smile.gif" alt="" /></li>
</ul>
<h3>Filtering data with awk</h3>
<p>After the above we use <code>sort</code> and <code>uniq</code> the same as David does to get the results in the following form:</p>
<pre> 212595 bibtex-import
 157136 no-tag
  27926 elegans
  27887 celegans
  27825 c_elegans
  27795 nematode
  27738 wormbase
  27736 caenorhabditis_elegans
  18933 review
  15280 all-articles</pre>
<p>David uses the following to filter out lines with no alphabetical content:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">ruby <span style="color: #660033;">-ne</span> <span style="color: #ff0000;">'puts $_ if !($_ =~ /^[^a-z]+$/)'</span></pre></div></div>

<p>We can use awk&#8217;s patterns to do the same thing:</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;"><span style="color:#000088;">$0</span> <span style="color:#C4C364;">~</span> <span style="color:black;">/</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>a<span style="color:black;">-</span>zA<span style="color:black;">-</span>Z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color:black;">/</span></pre></div></div>

<p>Here we use the <code>~</code> (match) operator to write a pattern that matches only the records with an alphabetical character in them. (Remember that <code>$0</code> refers to the entire record.) Notice also that we can leave off the expression after the pattern, because it defaults to <code>{ print }</code>, which is exactly what we want.</p>
<p>In this case, awk really shines. On my machine it outperforms the Ruby version by a factor of 8–9.</p>
<h3>Programming with awk</h3>
<p>The third task that David does is to consolidate all tags which are differentiated only by hyphens or underscores. That is, ‘a-tag’, ‘atag’, and ‘a_tag’ should all be considered the same. We choose which one to put into the output by whichever one is used the most times (and then we normalize the tag by replacing ‘-’ with ‘_’ so there are only underscores in the output).</p>
<p>Here is David&#8217;s code to do the job:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">tag_counts = <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
STDIN.<span style="color:#9900CC;">lines</span>.<span style="color:#9900CC;">each</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>l<span style="color:#006600; font-weight:bold;">|</span> c, t = l.<span style="color:#CC0066; font-weight:bold;">split</span>; tag_counts<span style="color:#006600; font-weight:bold;">&#91;</span>t.<span style="color:#9900CC;">strip</span><span style="color:#006600; font-weight:bold;">&#93;</span> = c.<span style="color:#9900CC;">to_i</span> <span style="color:#006600; font-weight:bold;">&#125;</span>
duplicates = <span style="color:#CC00FF; font-weight:bold;">Hash</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>h, k<span style="color:#006600; font-weight:bold;">|</span> h<span style="color:#006600; font-weight:bold;">&#91;</span>k<span style="color:#006600; font-weight:bold;">&#93;</span> = <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
tag_counts.<span style="color:#9900CC;">keys</span>.<span style="color:#9900CC;">each</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>k<span style="color:#006600; font-weight:bold;">|</span> duplicates<span style="color:#006600; font-weight:bold;">&#91;</span>k.<span style="color:#CC0066; font-weight:bold;">gsub</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">/-|</span>_<span style="color:#006600; font-weight:bold;">/</span>, <span style="color:#996600;">&quot;&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#93;</span> <span style="color:#006600; font-weight:bold;">&lt;&lt;</span> k <span style="color:#006600; font-weight:bold;">&#125;</span>
duplicates.<span style="color:#9900CC;">values</span>.<span style="color:#9900CC;">each</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>vs<span style="color:#006600; font-weight:bold;">|</span> vs.<span style="color:#9900CC;">sort</span>!<span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>x, y<span style="color:#006600; font-weight:bold;">|</span> tag_counts<span style="color:#006600; font-weight:bold;">&#91;</span>y<span style="color:#006600; font-weight:bold;">&#93;</span> <span style="color:#006600; font-weight:bold;">&lt;=&gt;</span> tag_counts<span style="color:#006600; font-weight:bold;">&#91;</span>x<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#125;</span> <span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
new_tag_counts = <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
duplicates.<span style="color:#9900CC;">values</span>.<span style="color:#9900CC;">each</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>vs<span style="color:#006600; font-weight:bold;">|</span> new_tag_counts<span style="color:#006600; font-weight:bold;">&#91;</span>vs<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">0</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#CC0066; font-weight:bold;">gsub</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">/</span><span style="color:#006600; font-weight:bold;">&#40;</span>_<span style="color:#006600; font-weight:bold;">|-</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">+/</span>, <span style="color:#996600;">&quot;_&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#93;</span> = vs.<span style="color:#9900CC;">map</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>v<span style="color:#006600; font-weight:bold;">|</span> tag_counts<span style="color:#006600; font-weight:bold;">&#91;</span>v<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#125;</span>.<span style="color:#9900CC;">inject</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006666;">0</span>, <span style="color:#006600; font-weight:bold;">&amp;</span>:<span style="color:#006600; font-weight:bold;">+</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
<span style="color:#CC0066; font-weight:bold;">puts</span> new_tag_counts.<span style="color:#9900CC;">to_a</span>.<span style="color:#9900CC;">sort</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>x, y<span style="color:#006600; font-weight:bold;">|</span> y<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span> <span style="color:#006600; font-weight:bold;">&lt;=&gt;</span> x<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#125;</span>.<span style="color:#9900CC;">map</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>t, c<span style="color:#006600; font-weight:bold;">|</span> <span style="color:#996600;">&quot; #{c} #{t}&quot;</span> <span style="color:#006600; font-weight:bold;">&#125;</span></pre></div></div>

<p>I&#8217;m not going to explain it here, because that&#8217;s not the point of the post <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_smile.gif" alt="" /></p>
<p>Here&#8217;s my awk script:</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">&#123;</span> tag_counts<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color:#000088;">$2</span><span style="color: #7a0874; font-weight: bold;">&#93;</span> = <span style="color:#000088;">$1</span> <span style="color: #7a0874; font-weight: bold;">&#125;</span>
<span style="color: #C20CB9; font-weight: bold;">END</span> <span style="color: #7a0874; font-weight: bold;">&#123;</span>
	<span style="color: #000000; font-weight: bold;">for</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span>tag <span style="color: #000000; font-weight: bold;">in</span> tag_counts<span style="color: #7a0874; font-weight: bold;">&#41;</span>
	<span style="color: #7a0874; font-weight: bold;">&#123;</span>
		normtag=tag
		<span style="color: #07D589; font-weight: bold;">gsub</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color:black;">/-</span>|_<span style="color:black;">/</span>,<span style="color: #ff0000;">&quot;&quot;</span>,normtag<span style="color: #7a0874; font-weight: bold;">&#41;</span>
&nbsp;
		count=tag_counts<span style="color: #7a0874; font-weight: bold;">&#91;</span>tag<span style="color: #7a0874; font-weight: bold;">&#93;</span>
		sum<span style="color: #7a0874; font-weight: bold;">&#91;</span>normtag<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color:black;">+</span>=count
&nbsp;
		<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span>count <span style="color:black;">&gt;</span> max<span style="color: #7a0874; font-weight: bold;">&#91;</span>normtag<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
		<span style="color: #7a0874; font-weight: bold;">&#123;</span>
			names<span style="color: #7a0874; font-weight: bold;">&#91;</span>normtag<span style="color: #7a0874; font-weight: bold;">&#93;</span>=tag
			max<span style="color: #7a0874; font-weight: bold;">&#91;</span>normtag<span style="color: #7a0874; font-weight: bold;">&#93;</span>=count
		<span style="color: #7a0874; font-weight: bold;">&#125;</span>
	<span style="color: #7a0874; font-weight: bold;">&#125;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">for</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span>tag <span style="color: #000000; font-weight: bold;">in</span> names<span style="color: #7a0874; font-weight: bold;">&#41;</span>
	<span style="color: #7a0874; font-weight: bold;">&#123;</span>
		finaltag=names<span style="color: #7a0874; font-weight: bold;">&#91;</span>tag<span style="color: #7a0874; font-weight: bold;">&#93;</span>
		<span style="color: #07D589; font-weight: bold;">gsub</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color:black;">/</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color:black;">-</span>|_<span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color:black;">+/</span>,<span style="color: #ff0000;">&quot;_&quot;</span>,finaltag<span style="color: #7a0874; font-weight: bold;">&#41;</span>
		<span style="color: #0BD507; font-weight: bold;">print</span> <span style="color: #ff0000;">&quot; &quot;</span> sum<span style="color: #7a0874; font-weight: bold;">&#91;</span>tag<span style="color: #7a0874; font-weight: bold;">&#93;</span> <span style="color: #ff0000;">&quot; &quot;</span> finaltag
	<span style="color: #7a0874; font-weight: bold;">&#125;</span>
<span style="color: #7a0874; font-weight: bold;">&#125;</span></pre></div></div>

<p>That&#8217;s right, you can use awk to do some ordinary programming tasks! Arrays are used using the usual syntax, and awk even has a foreach-style loop for looping over them. I&#8217;ll walk through the rest of the script slowly.</p>
<p>First we apply an expression to each record, creating an array of tags and their counts:</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">&#123;</span> tag_counts<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color:#000088;">$2</span><span style="color: #7a0874; font-weight: bold;">&#93;</span> = <span style="color:#000088;">$1</span> <span style="color: #7a0874; font-weight: bold;">&#125;</span></pre></div></div>

<p>Then, once everything has finished (the <code>END</code> pattern), we process this array. For each tag, we do the following:</p>
<ol>
<li>
<p>Normalize the tag (the gsub function overwrites the variable, so we have to make a copy):</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;">normtag=tag
<span style="color: #07D589; font-weight: bold;">gsub</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color:black;">/-</span>|_<span style="color:black;">/</span>,<span style="color: #ff0000;">&quot;&quot;</span>,normtag<span style="color: #7a0874; font-weight: bold;">&#41;</span></pre></div></div>

<p>(Notice the similarity between this and Ruby&#8217;s equivalent <code>normtag.gsub!(/-|_/, "")</code>!)</p>
</li>
<li>
<p>Get the count for that tag and add it to the count for the normalized version:</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;">count=tag_counts<span style="color: #7a0874; font-weight: bold;">&#91;</span>tag<span style="color: #7a0874; font-weight: bold;">&#93;</span>
sum<span style="color: #7a0874; font-weight: bold;">&#91;</span>normtag<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color:black;">+</span>=count</pre></div></div>

<p>Like in PHP and Perl, if a value is not present in an array it is automatically added with a default value.</p>
</li>
<li>
<p>Next we check to see if the current tag is the commonest version of the normalized tag, and if so we save its name and count in two other arrays:</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span>count <span style="color:black;">&gt;</span> max<span style="color: #7a0874; font-weight: bold;">&#91;</span>normtag<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #7a0874; font-weight: bold;">&#123;</span>
	names<span style="color: #7a0874; font-weight: bold;">&#91;</span>normtag<span style="color: #7a0874; font-weight: bold;">&#93;</span>=tag
	max<span style="color: #7a0874; font-weight: bold;">&#91;</span>normtag<span style="color: #7a0874; font-weight: bold;">&#93;</span>=count
<span style="color: #7a0874; font-weight: bold;">&#125;</span></pre></div></div>

<p>Notice again the usefulness of a default value for nonexistent keys: <code>count > max[normtag]</code> will be true if <code>max[normtag]</code> doesn&#8217;t exist.</p>
</li>
</ol>
<p>Now we have all we need to print out the answer. For each tag we normalize it to the final version (remembering to make a copy):</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;">finaltag=names<span style="color: #7a0874; font-weight: bold;">&#91;</span>tag<span style="color: #7a0874; font-weight: bold;">&#93;</span>
<span style="color: #07D589; font-weight: bold;">gsub</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color:black;">/</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color:black;">-</span>|_<span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color:black;">+/</span>,<span style="color: #ff0000;">&quot;_&quot;</span>,finaltag<span style="color: #7a0874; font-weight: bold;">&#41;</span></pre></div></div>

<p>Then we print out the line (concatenation is done by simply juxtaposing variables or strings):</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;"><span style="color: #0BD507; font-weight: bold;">print</span> <span style="color: #ff0000;">&quot; &quot;</span> sum<span style="color: #7a0874; font-weight: bold;">&#91;</span>tag<span style="color: #7a0874; font-weight: bold;">&#93;</span> <span style="color: #ff0000;">&quot; &quot;</span> finaltag</pre></div></div>

<p>If you&#8217;ve been watching closely you&#8217;ll notice there is a small difference between the awk and the Ruby scripts; Ruby sorts before outputting, while the awk version will come out in a non-defined order. This is fine! We can use the standard *nix tool ‘sort’ to sort the lines;</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">awk</span> <span style="color: #660033;">-f</span> consolidate_tags.awk <span style="color: #000000; font-weight: bold;">&lt;</span> tags <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">sort</span> <span style="color: #660033;">-nr</span> <span style="color: #000000; font-weight: bold;">&gt;</span> fixed_tags</pre></div></div>

<p>(Note that this fits in with the *nix philosophy of ‘do one thing well’ and reusing small components.)</p>
<p>Again, the awk version outperforms the Ruby by a factor of 3–4.</p>
<h3>Reading external commands and files</h3>
<p>Unfortunately there doesn&#8217;t seem to be a command-line stemming program (a quick Perl script would suffice but it isn&#8217;t what we&#8217;re here for!), so I&#8217;ll skip that stage (here&#8217;s one of the aforementioned weaknesses of a non-general-purpose language). Instead we&#8217;ll go straight to implementing stopwords.</p>
<p>Here&#8217;s David&#8217;s Ruby code again:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">&quot;set&quot;</span>
&nbsp;
stopwords = <span style="color:#CC00FF; font-weight:bold;">Set</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">*</span>
  <span style="color:#CC00FF; font-weight:bold;">IO</span>.<span style="color:#9900CC;">read</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;smart.txt&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">lines</span>.<span style="color:#9900CC;">reject</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>x<span style="color:#006600; font-weight:bold;">|</span> x =~ <span style="color:#006600; font-weight:bold;">/</span>^<span style="color:#008000; font-style:italic;">#/}.map(&amp;:strip)</span>
<span style="color:#006600; font-weight:bold;">&#93;</span>
&nbsp;
STDIN.<span style="color:#9900CC;">lines</span>.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>l<span style="color:#006600; font-weight:bold;">|</span>
  c, t = l.<span style="color:#CC0066; font-weight:bold;">split</span>
  <span style="color:#CC0066; font-weight:bold;">puts</span> l <span style="color:#9966CC; font-weight:bold;">unless</span> stopwords.<span style="color:#9966CC; font-weight:bold;">include</span>? t.<span style="color:#9900CC;">strip</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>And here&#8217;s my equivalent in awk:</p>

<div class="wp_syntax"><div class="code"><pre class="awk" style="font-family:monospace;"><span style="color: #C20CB9; font-weight: bold;">BEGIN</span> <span style="color: #7a0874; font-weight: bold;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">while</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #0BD507; font-weight: bold;">getline</span> <span style="color:black;">&lt;</span> <span style="color: #ff0000;">&quot;smart.txt&quot;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
    <span style="color: #7a0874; font-weight: bold;">&#123;</span> stopwords<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color:#000088;">$0</span><span style="color: #7a0874; font-weight: bold;">&#93;</span> = <span style="color: #000000;">1</span> <span style="color: #7a0874; font-weight: bold;">&#125;</span>
<span style="color: #7a0874; font-weight: bold;">&#125;</span>
<span style="color:black;">!</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color:#000088;">$2</span> <span style="color: #000000; font-weight: bold;">in</span> stopwords<span style="color: #7a0874; font-weight: bold;">&#41;</span></pre></div></div>

<p>Here I use the ‘getline’ function which does as its name suggests. We make an array of all the stopwords, with 1 as a placeholder value. The pattern is then short and simple: Print every record where the tag isn&#8217;t in the stopwords (again, we can leave off the expression to print the whole record).</p>
<p><i>Note: There is a discrepancy here: David claims this eliminated 46 tags, while I get a value of 368 for both my awk code and his Ruby code.</i></p>
<p>Again, the Ruby takes about 8 times as long to execute.</p>
<h3>Conclusion</h3>
<p>Here&#8217;s a couple of points:</p>
<ul>
<li>
<p>Record- or field-oriented data? Think awk.</p>
</li>
<li>
<p>Don&#8217;t discount it just because it&#8217;s <em>venerable</em>. It is very well-suited to its task.</p>
</li>
<li>
<p><code>pattern { expression }</code> syntax is extremely flexible.</p>
</li>
<li>
<p>Awk&#8217;s regular expressions are fast.</p>
</li>
</ul>
<blockquote><p><i>Nowadays everybody wanna talk<br />
&nbsp;&nbsp;&nbsp;&nbsp;like they got something to say<br />
But nothing comes out<br />
&nbsp;&nbsp;&nbsp;&nbsp;when they move their lips<br />
Just a bunch of gibberish<br />
And motherf—kers act<br />
&nbsp;&nbsp;&nbsp;&nbsp;like they forgot about Awk</i></p>
</blockquote>
<p><img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_biggrin.gif" alt="" /></p>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/cleaning-up-a-set-of-tags-with-awk/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Overengineering</title>
		<link>http://porg.es/blog/overengineering</link>
		<comments>http://porg.es/blog/overengineering#comments</comments>
		<pubDate>Wed, 27 Feb 2008 00:55:09 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[replies]]></category>
		<category><![CDATA[fizzbuzz]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[horrid]]></category>
		<category><![CDATA[humour]]></category>
		<category><![CDATA[overengineered]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://porg.es/blog/overengineering</guid>
		<description><![CDATA[Douglas, you&#8217;re not alone. import Data.List &#40;sortBy&#41; import Data.Function &#40;on&#41; import Data.Maybe &#40;mapMaybe&#41; import Control.Monad.Instances &#160; gizzabuzz pairs combiner = zipWith &#40;$&#41; &#40;cycle funcs&#41; &#91;1..&#93; where sortedPairs = sortBy &#40;compare `on` fst&#41; pairs funcs = map &#40;\n -&#62; display $ mapMaybe &#40;filterOut n&#41; sortedPairs&#41; &#91;1..foldr1 lcm $ map fst $ sortedPairs&#93; display &#91;&#93; = show [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dougalstanton.net/blog/index.php/2008/02/26/my-shame-is-complete">Douglas</a>, you&#8217;re not alone.</p>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;"><span style="color: #06c; font-weight: bold;">import</span> Data<span style="color: #339933; font-weight: bold;">.</span>List <span style="color: green;">&#40;</span>sortBy<span style="color: green;">&#41;</span>
<span style="color: #06c; font-weight: bold;">import</span> Data<span style="color: #339933; font-weight: bold;">.</span>Function <span style="color: green;">&#40;</span>on<span style="color: green;">&#41;</span>
<span style="color: #06c; font-weight: bold;">import</span> Data<span style="color: #339933; font-weight: bold;">.</span><span style="color: #cccc00; font-weight: bold;">Maybe</span> <span style="color: green;">&#40;</span>mapMaybe<span style="color: green;">&#41;</span>
<span style="color: #06c; font-weight: bold;">import</span> Control<span style="color: #339933; font-weight: bold;">.</span><span style="color: #cccc00; font-weight: bold;">Monad</span><span style="color: #339933; font-weight: bold;">.</span>Instances
&nbsp;
gizzabuzz pairs combiner <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">zipWith</span> <span style="color: green;">&#40;</span><span style="color: #339933; font-weight: bold;">$</span><span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">cycle</span> funcs<span style="color: green;">&#41;</span> <span style="color: green;">&#91;</span><span style="color: red;">1</span><span style="color: #339933; font-weight: bold;">..</span><span style="color: green;">&#93;</span>
	<span style="color: #06c; font-weight: bold;">where</span> 
	sortedPairs <span style="color: #339933; font-weight: bold;">=</span> sortBy <span style="color: green;">&#40;</span><span style="font-weight: bold;">compare</span> `on` <span style="font-weight: bold;">fst</span><span style="color: green;">&#41;</span> pairs
	funcs <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">map</span> <span style="color: green;">&#40;</span>\n <span style="color: #339933; font-weight: bold;">-&gt;</span> display <span style="color: #339933; font-weight: bold;">$</span> mapMaybe <span style="color: green;">&#40;</span>filterOut n<span style="color: green;">&#41;</span> sortedPairs<span style="color: green;">&#41;</span> <span style="color: green;">&#91;</span><span style="color: red;">1</span><span style="color: #339933; font-weight: bold;">..</span><span style="font-weight: bold;">foldr1</span> <span style="font-weight: bold;">lcm</span> <span style="color: #339933; font-weight: bold;">$</span> <span style="font-weight: bold;">map</span> <span style="font-weight: bold;">fst</span> <span style="color: #339933; font-weight: bold;">$</span> sortedPairs<span style="color: green;">&#93;</span>
	display <span style="color: green;">&#91;</span><span style="color: green;">&#93;</span> <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">show</span>
	display xs <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">foldr1</span> combiner <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">sequence</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">map</span> <span style="font-weight: bold;">const</span> xs<span style="color: green;">&#41;</span>
	filterOut n <span style="color: green;">&#40;</span>x<span style="color: #339933; font-weight: bold;">,</span>y<span style="color: green;">&#41;</span>
		<span style="color: #339933; font-weight: bold;">|</span> n `<span style="font-weight: bold;">mod</span>` x <span style="color: #339933; font-weight: bold;">==</span> <span style="color: red;">0</span> <span style="color: #339933; font-weight: bold;">=</span> Just y
		<span style="color: #339933; font-weight: bold;">|</span> <span style="font-weight: bold;">otherwise</span>      <span style="color: #339933; font-weight: bold;">=</span> Nothing
&nbsp;
fizzbuzz <span style="color: #339933; font-weight: bold;">=</span> gizzabuzz <span style="color: green;">&#91;</span><span style="color: green;">&#40;</span><span style="color: red;">3</span><span style="color: #339933; font-weight: bold;">,</span><span style="background-color: #3cb371;">&quot;Fizz&quot;</span><span style="color: green;">&#41;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#40;</span><span style="color: red;">5</span><span style="color: #339933; font-weight: bold;">,</span><span style="background-color: #3cb371;">&quot;Buzz&quot;</span><span style="color: green;">&#41;</span><span style="color: green;">&#93;</span> <span style="color: green;">&#40;</span><span style="color: #339933; font-weight: bold;">++</span><span style="color: green;">&#41;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/overengineering/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Matching checklists using Haskell</title>
		<link>http://porg.es/blog/matching-checklists-using-haskell</link>
		<comments>http://porg.es/blog/matching-checklists-using-haskell#comments</comments>
		<pubDate>Wed, 23 Jan 2008 08:52:55 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[replies]]></category>
		<category><![CDATA[comparison]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Links]]></category>
		<category><![CDATA[lisp]]></category>

		<guid isPermaLink="false">http://porg.es/blog/matching-checklists-using-haskell</guid>
		<description><![CDATA[Our target for this exercise is “Things that other languages should take from Lisp”. Bignum support In Scheme and Common Lisp, by default you can&#8217;t overflow an integer&#8230; Prelude&#62; fac n = product &#91;2..n&#93; Prelude&#62; fac 100 933262154439441526816992388562667004907159682643816214685929638952175999932299156089414639761565182862536979208272237582 51185210916864000000000000000000000000 In Common Lisp, you can force your code to use fixed-size numbers (fixnums) for efficiency&#8230; Prelude&#62; [...]]]></description>
			<content:encoded><![CDATA[<p>Our target for this exercise is “<a href="http://repinvariant.blogspot.com/2008/01/thoughts-on-lisp-things-that-other.html">Things that other languages should take from Lisp</a>”.</p>
<h3>Bignum support</h3>
<blockquote><p>In Scheme and Common Lisp, by default you can&#8217;t overflow an integer&#8230;</p>
</blockquote>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;">Prelude<span style="color: #339933; font-weight: bold;">&gt;</span> fac n <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">product</span> <span style="color: green;">&#91;</span><span style="color: red;">2</span><span style="color: #339933; font-weight: bold;">..</span>n<span style="color: green;">&#93;</span>
Prelude<span style="color: #339933; font-weight: bold;">&gt;</span> fac <span style="color: red;">100</span>
<span style="color: red;">933262154439441526816992388562667004907159682643816214685929638952175999932299156089414639761565182862536979208272237582</span>
<span style="color: red;">51185210916864000000000000000000000000</span></pre></div></div>

<blockquote><p>In Common Lisp, you can force your code to use fixed-size numbers (fixnums) for efficiency&#8230;</p>
</blockquote>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;">Prelude<span style="color: #339933; font-weight: bold;">&gt;</span> <span style="color: #06c; font-weight: bold;">let</span> fac n <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">product</span> <span style="color: green;">&#91;</span><span style="color: red;">2</span><span style="color: #339933; font-weight: bold;">..</span>n<span style="color: green;">&#93;</span> <span style="color: #339933; font-weight: bold;">::</span> <span style="color: #cccc00; font-weight: bold;">Int</span>
Prelude<span style="color: #339933; font-weight: bold;">&gt;</span> fac <span style="color: red;">17</span>
<span style="color: #339933; font-weight: bold;">-</span><span style="color: red;">288522240</span></pre></div></div>

<blockquote><p>Ruby and Python, by default, treat language-integers as logical-integers. Java, C, C++, and Perl don&#8217;t.</p>
</blockquote>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;">Prelude<span style="color: #339933; font-weight: bold;">&gt;</span> :t <span style="color: red;">15</span>
<span style="color: red;">15</span> <span style="color: #339933; font-weight: bold;">::</span> <span style="color: green;">&#40;</span><span style="color: #cccc00; font-weight: bold;">Num</span> t<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">=&gt;</span> t</pre></div></div>

<h3>Optional type declarations</h3>
<blockquote><p>[Common Lisp] allows, but does not require, type declarations.</p>
</blockquote>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;">Prelude<span style="color: #339933; font-weight: bold;">&gt;</span> <span style="color: #06c; font-weight: bold;">let</span> doubleApply f x <span style="color: #339933; font-weight: bold;">=</span> f <span style="color: green;">&#40;</span>f x<span style="color: green;">&#41;</span>
Prelude<span style="color: #339933; font-weight: bold;">&gt;</span> :t doubleApply
doubleApply <span style="color: #339933; font-weight: bold;">::</span> <span style="color: green;">&#40;</span>t <span style="color: #339933; font-weight: bold;">-&gt;</span> t<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> t <span style="color: #339933; font-weight: bold;">-&gt;</span> t</pre></div></div>

<blockquote><p>The compiler can also use type declarations to perform compile-time typechecking. (Sadly, it isn&#8217;t <em>required</em> to do this.)</p>
</blockquote>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;">Prelude<span style="color: #339933; font-weight: bold;">&gt;</span> doubleApply <span style="color: red;">3</span> doubleApply
&nbsp;
<span style="color: #339933; font-weight: bold;">&lt;</span>interactive<span style="color: #339933; font-weight: bold;">&gt;</span>:<span style="color: red;">1</span>:<span style="color: red;">12</span>:
    No <span style="color: #06c; font-weight: bold;">instance</span> for <span style="color: green;">&#40;</span><span style="color: #cccc00; font-weight: bold;">Num</span> <span style="color: green;">&#40;</span><span style="color: green;">&#40;</span><span style="color: green;">&#40;</span>t <span style="color: #339933; font-weight: bold;">-&gt;</span> t<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> t <span style="color: #339933; font-weight: bold;">-&gt;</span> t<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: green;">&#40;</span>t <span style="color: #339933; font-weight: bold;">-&gt;</span> t<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> t <span style="color: #339933; font-weight: bold;">-&gt;</span> t<span style="color: green;">&#41;</span><span style="color: green;">&#41;</span>
      arising from the literal `<span style="color: red;">3</span>' at <span style="color: #339933; font-weight: bold;">&lt;</span>interactive<span style="color: #339933; font-weight: bold;">&gt;</span>:<span style="color: red;">1</span>:<span style="color: red;">12</span></pre></div></div>

<h3>Tail recursion</h3>
<blockquote><p>Proper (unbounded) tail-recursion (Scheme.)</p>
</blockquote>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;">Prelude<span style="color: #339933; font-weight: bold;">&gt;</span> <span style="color: #06c; font-weight: bold;">let</span> loop <span style="color: #339933; font-weight: bold;">=</span> <span style="color: #06c; font-weight: bold;">do</span> <span style="color: green;">&#123;</span> <span style="font-weight: bold;">putStr</span> <span style="background-color: #3cb371;">&quot;.&quot;</span>; loop <span style="color: green;">&#125;</span>
Prelude<span style="color: #339933; font-weight: bold;">&gt;</span> loop
<span style="color: #339933; font-weight: bold;">......................................</span>
<span style="color: #339933; font-weight: bold;">......................................</span>
<span style="color: green;">&#91;</span>etc<span style="color: green;">&#93;</span>
<span style="color: #339933; font-weight: bold;">......................................</span>
<span style="color: #339933; font-weight: bold;">..........................</span>Interrupted<span style="color: #339933; font-weight: bold;">.</span></pre></div></div>

<p>(But see <a href="http://www.haskell.org/haskellwiki/Stack_overflow">the Haskell wiki</a>. Laziness introduces a small trick here.)</p>
<h3>Functions which depend upon the type of multiple arguments</h3>
<blockquote><p>Methods with multiple-dispatch (Common Lisp.)</p>
</blockquote>
<p>(This doesn’t <em>exactly</em> match conceptually because Haskell doesn’t have nominal overloading.)</p>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;">multi Nothing Nothing <span style="color: #339933; font-weight: bold;">=</span> <span style="color: red;">0</span>
multi <span style="color: green;">&#40;</span>Just x<span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span>Just y<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">=</span> x <span style="color: #339933; font-weight: bold;">+</span> y
multi Nothing <span style="color: green;">&#40;</span>Just y<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">=</span> y
multi <span style="color: green;">&#40;</span>Just x<span style="color: green;">&#41;</span> Nothing <span style="color: #339933; font-weight: bold;">=</span> x</pre></div></div>

<h3>Fast implementations</h3>
<blockquote><p>Some Lisp implementations are fast.</p>
</blockquote>
<p><a href="http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&amp;lang=all&amp;xfullcpu=1&amp;xmem=0&amp;xloc=0&amp;binarytrees=1&amp;chameneosredux=1&amp;fannkuch=1&amp;fasta=1&amp;knucleotide=1&amp;mandelbrot=1&amp;meteor=1&amp;nbody=1&amp;nsieve=1&amp;nsievebits=1&amp;partialsums=1&amp;pidigits=1&amp;recursive=1&amp;regexdna=1&amp;revcomp=1&amp;spectralnorm=1&amp;hello=1&amp;sumcol=1&amp;threadring=1&amp;calc=Calculate">Haskell  #7 on the Great Language Shootout</a> (in terms of speed), and getting faster by the day as the backend of GHC is rewritten.</p>
<h3>Syntactic Simplicity</h3>
<blockquote><p>Syntactic simplicity (Scheme.)</p>
</blockquote>
<p><a href="http://haskell.org/onlinereport/syntax-iso.html#sect9.5">The Haskell <abbr title="Context-Free Grammar">CFG</abbr>.</a></p>
<blockquote><p>Or perhaps Python is another incarnation of syntactic simplicity.</p>
</blockquote>
<p><a href="http://haskell.org/onlinereport/lexemes.html#lexemes-layout">The Layout Rule</a></p>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/matching-checklists-using-haskell/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

