<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>porges &#187; address</title>
	<atom:link href="http://porg.es/blog/tag/address/feed" rel="self" type="application/rss+xml" />
	<link>http://porg.es/blog</link>
	<description></description>
	<lastBuildDate>Sun, 06 May 2012 22:13:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Email address validation: Simpler, Faster, More Correct</title>
		<link>http://porg.es/blog/email-address-validation-simpler-faster-more-correct</link>
		<comments>http://porg.es/blog/email-address-validation-simpler-faster-more-correct#comments</comments>
		<pubDate>Wed, 11 Mar 2009 11:10:31 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[address]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[rfc]]></category>
		<category><![CDATA[rfc5322]]></category>
		<category><![CDATA[validation]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=317</guid>
		<description><![CDATA[So, I have merged the obsolete-syntax into the code from the last post. This has resulted in shorter, cleaner, faster validation which is also more correct. I didn’t like the fact that in the old code there were places where explicit try points needed to be included. It seems that these arose because the ‘obsolete’ [...]]]></description>
			<content:encoded><![CDATA[<p>So, I have merged the obsolete-syntax into the code from the last post. This has resulted in shorter, cleaner, faster validation which is <em>also</em> more correct.</p>
<p>I didn’t like the fact that in the old code there were places where explicit <code>try</code> points needed to be included. It seems that these arose because the ‘obsolete’ syntax was tacked-on to the EBNF for the normal syntax, creating much overlap. Since I merged the syntaxes together, there are <em>no</em> explicit try points needed (there are some implicit ones, I believe, such as in <code>optional</code>). This makes the code both faster and easier to understand.</p>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;"><span style="color: #06c; font-weight: bold;">module</span> Text<span style="color: #339933; font-weight: bold;">.</span>Email<span style="color: #339933; font-weight: bold;">.</span>Validation <span style="color: green;">&#40;</span>isValid<span style="color: green;">&#41;</span>
<span style="color: #06c; font-weight: bold;">where</span>
&nbsp;
<span style="color: #06c; font-weight: bold;">import</span> Text<span style="color: #339933; font-weight: bold;">.</span>Parsec
<span style="color: #06c; font-weight: bold;">import</span> Text<span style="color: #339933; font-weight: bold;">.</span>Parsec<span style="color: #339933; font-weight: bold;">.</span><span style="color: #cccc00; font-weight: bold;">Char</span>
<span style="color: #06c; font-weight: bold;">import</span> Data<span style="color: #339933; font-weight: bold;">.</span><span style="color: #cccc00; font-weight: bold;">Char</span> <span style="color: green;">&#40;</span>chr<span style="color: green;">&#41;</span>
&nbsp;
isValid <span style="color: #339933; font-weight: bold;">::</span> <span style="color: #cccc00; font-weight: bold;">String</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: #cccc00; font-weight: bold;">Bool</span>
isValid x <span style="color: #339933; font-weight: bold;">=</span> 	<span style="font-weight: bold;">either</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">const</span> False<span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">const</span> True<span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span>valid x<span style="color: green;">&#41;</span>
&nbsp;
simply <span style="color: #339933; font-weight: bold;">=</span> <span style="color: green;">&#40;</span><span style="color: #339933; font-weight: bold;">&gt;&gt;</span> <span style="font-weight: bold;">return</span> <span style="color: green;">&#40;</span><span style="color: green;">&#41;</span><span style="color: green;">&#41;</span>
<span style="color: #5d478b; font-style: italic;">-- simply converts a parser returning something to a parser returning nothing</span>
&nbsp;
valid <span style="color: #339933; font-weight: bold;">::</span> <span style="color: #cccc00; font-weight: bold;">String</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: #cccc00; font-weight: bold;">Either</span> ParseError <span style="color: green;">&#40;</span><span style="color: green;">&#41;</span>
valid <span style="color: #339933; font-weight: bold;">=</span> parse addrSpec <span style="background-color: #3cb371;">&quot;&quot;</span>
&nbsp;
addrSpec <span style="color: #339933; font-weight: bold;">=</span> localPart <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> char '<span style="color: #339933; font-weight: bold;">@</span>' <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> domain <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> eof
&nbsp;
localPart <span style="color: #339933; font-weight: bold;">=</span> dottedAtoms
domain <span style="color: #339933; font-weight: bold;">=</span> dottedAtoms <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> domainLiteral 
&nbsp;
dottedAtoms <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> <span style="color: green;">&#40;</span>optional cfws <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> <span style="color: green;">&#40;</span>atom <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> quotedString<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> optional cfws<span style="color: green;">&#41;</span>
	`sepBy1` <span style="color: green;">&#40;</span>char '<span style="color: #339933; font-weight: bold;">.</span>'<span style="color: green;">&#41;</span>
atom <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> many1 atomText
atomText <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> alphaNum <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> oneOf <span style="background-color: #3cb371;">&quot;!#$%&amp;'*+-/=?^_`{|}~&quot;</span>
&nbsp;
domainLiteral <span style="color: #339933; font-weight: bold;">=</span>  between <span style="color: green;">&#40;</span>optional cfws <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> char '<span style="color: green;">&#91;</span>'<span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span>char '<span style="color: green;">&#93;</span>' <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> optional cfws<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">$</span>
	many <span style="color: green;">&#40;</span>optional fws <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> domainText<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> optional fws
domainText <span style="color: #339933; font-weight: bold;">=</span> ranges <span style="color: green;">&#91;</span><span style="color: green;">&#91;</span><span style="color: red;">33</span><span style="color: #339933; font-weight: bold;">..</span><span style="color: red;">90</span><span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span><span style="color: red;">94</span><span style="color: #339933; font-weight: bold;">..</span><span style="color: red;">126</span><span style="color: green;">&#93;</span><span style="color: green;">&#93;</span> <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> obsNoWsCtl
&nbsp;
quotedString <span style="color: #339933; font-weight: bold;">=</span> between <span style="color: green;">&#40;</span>char '<span style="background-color: #3cb371;">&quot;') (char '&quot;</span>'<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">$</span>
	many <span style="color: green;">&#40;</span>optional fws <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> quotedContent<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> optional fws
quotedContent <span style="color: #339933; font-weight: bold;">=</span> quotedText <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> quotedPair
quotedText <span style="color: #339933; font-weight: bold;">=</span> ranges <span style="color: green;">&#91;</span><span style="color: green;">&#91;</span><span style="color: red;">33</span><span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span><span style="color: red;">35</span><span style="color: #339933; font-weight: bold;">..</span><span style="color: red;">91</span><span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span><span style="color: red;">93</span><span style="color: #339933; font-weight: bold;">..</span><span style="color: red;">126</span><span style="color: green;">&#93;</span><span style="color: green;">&#93;</span> <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> obsNoWsCtl
quotedPair <span style="color: #339933; font-weight: bold;">=</span> char '\\' <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> <span style="color: green;">&#40;</span>vchar <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> wsp <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> lf <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> cr <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> obsNoWsCtl <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> nullChar<span style="color: green;">&#41;</span>
&nbsp;
cfws <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> many <span style="color: green;">&#40;</span>comment <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> fws<span style="color: green;">&#41;</span>
fws <span style="color: #339933; font-weight: bold;">=</span> <span style="color: green;">&#40;</span>many1 wsp <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> optional <span style="color: green;">&#40;</span>crlf <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> many1 wsp<span style="color: green;">&#41;</span><span style="color: green;">&#41;</span>
	<span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> <span style="color: green;">&#40;</span>many1 <span style="color: green;">&#40;</span>crlf <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> many1 wsp<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> <span style="font-weight: bold;">return</span> <span style="color: green;">&#40;</span><span style="color: green;">&#41;</span><span style="color: green;">&#41;</span>
&nbsp;
comment <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> between <span style="color: green;">&#40;</span>char '<span style="color: green;">&#40;</span>'<span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span>char '<span style="color: green;">&#41;</span>'<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">$</span>
	many <span style="color: green;">&#40;</span>commentContent <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> fws<span style="color: green;">&#41;</span>
commentContent <span style="color: #339933; font-weight: bold;">=</span> commentText <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> quotedPair <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> comment
commentText <span style="color: #339933; font-weight: bold;">=</span> ranges <span style="color: green;">&#91;</span><span style="color: green;">&#91;</span><span style="color: red;">33</span><span style="color: #339933; font-weight: bold;">..</span><span style="color: red;">39</span><span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span><span style="color: red;">42</span><span style="color: #339933; font-weight: bold;">..</span><span style="color: red;">91</span><span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span><span style="color: red;">93</span><span style="color: #339933; font-weight: bold;">..</span><span style="color: red;">126</span><span style="color: green;">&#93;</span><span style="color: green;">&#93;</span> <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> obsNoWsCtl
&nbsp;
nullChar <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> char '\<span style="color: red;">0</span>'
wsp <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> oneOf <span style="background-color: #3cb371;">&quot; <span style="background-color: #3cb371; font-weight: bold;">\t</span>&quot;</span>
cr <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> char '\r'
lf <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> char '\n'
crlf <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> cr <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> lf
vchar <span style="color: #339933; font-weight: bold;">=</span> ranges <span style="color: green;">&#91;</span><span style="color: green;">&#91;</span>0x21<span style="color: #339933; font-weight: bold;">..</span>0x7e<span style="color: green;">&#93;</span><span style="color: green;">&#93;</span>
obsNoWsCtl <span style="color: #339933; font-weight: bold;">=</span> ranges <span style="color: green;">&#91;</span><span style="color: green;">&#91;</span><span style="color: red;">1</span><span style="color: #339933; font-weight: bold;">..</span><span style="color: red;">8</span><span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span><span style="color: red;">11</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: red;">12</span><span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span><span style="color: red;">14</span><span style="color: #339933; font-weight: bold;">..</span><span style="color: red;">31</span><span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span><span style="color: red;">127</span><span style="color: green;">&#93;</span><span style="color: green;">&#93;</span>
ranges <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">.</span> oneOf <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">map</span> chr <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">concat</span></pre></div></div>

<p>This now passes all of Dominic Sayer&#8217;s tests that it is meant to—the domain validation used in Dominic Sayer&#8217;s tests is more strict than RFC5322 specifies. Expect this to change!</p>
<p>For those who’d like to know, email addresses that now parse that didn’t before include the often-used (‘|’ is merely to indicate the end of whitespace):</p>
<pre>I.                        |
 am.                  |
 a.      |
 nice.|
 guy@(yeah)you.com</pre>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/email-address-validation-simpler-faster-more-correct/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

