<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>porges &#187; validation</title>
	<atom:link href="http://porg.es/blog/tag/validation/feed" rel="self" type="application/rss+xml" />
	<link>http://porg.es/blog</link>
	<description>... master of none</description>
	<lastBuildDate>Sat, 12 Sep 2009 07:57:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Email address validation: Simpler, Faster, More Correct</title>
		<link>http://porg.es/blog/email-address-validation-simpler-faster-more-correct</link>
		<comments>http://porg.es/blog/email-address-validation-simpler-faster-more-correct#comments</comments>
		<pubDate>Wed, 11 Mar 2009 11:10:31 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[address]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[rfc]]></category>
		<category><![CDATA[rfc5322]]></category>
		<category><![CDATA[validation]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=317</guid>
		<description><![CDATA[So, I have merged the obsolete-syntax into the code from the last post. This has resulted in shorter, cleaner, faster validation which is also more correct. I didn’t like the fact that in the old code there were places where explicit try points needed to be included. It seems that these arose because the ‘obsolete’ [...]]]></description>
			<content:encoded><![CDATA[<p>So, I have merged the obsolete-syntax into the code from the last post. This has resulted in shorter, cleaner, faster validation which is <em>also</em> more correct.</p>
<p>I didn’t like the fact that in the old code there were places where explicit <code>try</code> points needed to be included. It seems that these arose because the ‘obsolete’ syntax was tacked-on to the EBNF for the normal syntax, creating much overlap. Since I merged the syntaxes together, there are <em>no</em> explicit try points needed (there are some implicit ones, I believe, such as in <code>optional</code>). This makes the code both faster and easier to understand.</p>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;"><span style="color: #06c; font-weight: bold;">module</span> Text<span style="color: #339933; font-weight: bold;">.</span>Email<span style="color: #339933; font-weight: bold;">.</span>Validation <span style="color: green;">&#40;</span>isValid<span style="color: green;">&#41;</span>
<span style="color: #06c; font-weight: bold;">where</span>
&nbsp;
<span style="color: #06c; font-weight: bold;">import</span> Text<span style="color: #339933; font-weight: bold;">.</span>Parsec
<span style="color: #06c; font-weight: bold;">import</span> Text<span style="color: #339933; font-weight: bold;">.</span>Parsec<span style="color: #339933; font-weight: bold;">.</span><span style="color: #cccc00; font-weight: bold;">Char</span>
<span style="color: #06c; font-weight: bold;">import</span> Data<span style="color: #339933; font-weight: bold;">.</span><span style="color: #cccc00; font-weight: bold;">Char</span> <span style="color: green;">&#40;</span>chr<span style="color: green;">&#41;</span>
&nbsp;
isValid <span style="color: #339933; font-weight: bold;">::</span> <span style="color: #cccc00; font-weight: bold;">String</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: #cccc00; font-weight: bold;">Bool</span>
isValid x <span style="color: #339933; font-weight: bold;">=</span> 	<span style="font-weight: bold;">either</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">const</span> False<span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">const</span> True<span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span>valid x<span style="color: green;">&#41;</span>
&nbsp;
simply <span style="color: #339933; font-weight: bold;">=</span> <span style="color: green;">&#40;</span><span style="color: #339933; font-weight: bold;">&gt;&gt;</span> <span style="font-weight: bold;">return</span> <span style="color: green;">&#40;</span><span style="color: green;">&#41;</span><span style="color: green;">&#41;</span>
<span style="color: #5d478b; font-style: italic;">-- simply converts a parser returning something to a parser returning nothing</span>
&nbsp;
valid <span style="color: #339933; font-weight: bold;">::</span> <span style="color: #cccc00; font-weight: bold;">String</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: #cccc00; font-weight: bold;">Either</span> ParseError <span style="color: green;">&#40;</span><span style="color: green;">&#41;</span>
valid <span style="color: #339933; font-weight: bold;">=</span> parse addrSpec <span style="background-color: #3cb371;">&quot;&quot;</span>
&nbsp;
addrSpec <span style="color: #339933; font-weight: bold;">=</span> localPart <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> char '<span style="color: #339933; font-weight: bold;">@</span>' <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> domain <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> eof
&nbsp;
localPart <span style="color: #339933; font-weight: bold;">=</span> dottedAtoms
domain <span style="color: #339933; font-weight: bold;">=</span> dottedAtoms <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> domainLiteral 
&nbsp;
dottedAtoms <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> <span style="color: green;">&#40;</span>optional cfws <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> <span style="color: green;">&#40;</span>atom <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> quotedString<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> optional cfws<span style="color: green;">&#41;</span>
	`sepBy1` <span style="color: green;">&#40;</span>char '<span style="color: #339933; font-weight: bold;">.</span>'<span style="color: green;">&#41;</span>
atom <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> many1 atomText
atomText <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> alphaNum <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> oneOf <span style="background-color: #3cb371;">&quot;!#$%&amp;'*+-/=?^_`{|}~&quot;</span>
&nbsp;
domainLiteral <span style="color: #339933; font-weight: bold;">=</span>  between <span style="color: green;">&#40;</span>optional cfws <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> char '<span style="color: green;">&#91;</span>'<span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span>char '<span style="color: green;">&#93;</span>' <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> optional cfws<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">$</span>
	many <span style="color: green;">&#40;</span>optional fws <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> domainText<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> optional fws
domainText <span style="color: #339933; font-weight: bold;">=</span> ranges <span style="color: green;">&#91;</span><span style="color: green;">&#91;</span>33<span style="color: #339933; font-weight: bold;">..</span>90<span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span>94<span style="color: #339933; font-weight: bold;">..</span>126<span style="color: green;">&#93;</span><span style="color: green;">&#93;</span> <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> obsNoWsCtl
&nbsp;
quotedString <span style="color: #339933; font-weight: bold;">=</span> between <span style="color: green;">&#40;</span>char '<span style="background-color: #3cb371;">&quot;') (char '&quot;</span>'<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">$</span>
	many <span style="color: green;">&#40;</span>optional fws <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> quotedContent<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> optional fws
quotedContent <span style="color: #339933; font-weight: bold;">=</span> quotedText <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> quotedPair
quotedText <span style="color: #339933; font-weight: bold;">=</span> ranges <span style="color: green;">&#91;</span><span style="color: green;">&#91;</span><span style="color: red;">33</span><span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span>35<span style="color: #339933; font-weight: bold;">..</span>91<span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span>93<span style="color: #339933; font-weight: bold;">..</span>126<span style="color: green;">&#93;</span><span style="color: green;">&#93;</span> <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> obsNoWsCtl
quotedPair <span style="color: #339933; font-weight: bold;">=</span> char '\\' <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> <span style="color: green;">&#40;</span>vchar <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> wsp <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> lf <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> cr <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> obsNoWsCtl <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> nullChar<span style="color: green;">&#41;</span>
&nbsp;
cfws <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> many <span style="color: green;">&#40;</span>comment <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> fws<span style="color: green;">&#41;</span>
fws <span style="color: #339933; font-weight: bold;">=</span> <span style="color: green;">&#40;</span>many1 wsp <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> optional <span style="color: green;">&#40;</span>crlf <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> many1 wsp<span style="color: green;">&#41;</span><span style="color: green;">&#41;</span>
	<span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> <span style="color: green;">&#40;</span>many1 <span style="color: green;">&#40;</span>crlf <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> many1 wsp<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> <span style="font-weight: bold;">return</span> <span style="color: green;">&#40;</span><span style="color: green;">&#41;</span><span style="color: green;">&#41;</span>
&nbsp;
comment <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> between <span style="color: green;">&#40;</span>char '<span style="color: green;">&#40;</span>'<span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span>char '<span style="color: green;">&#41;</span>'<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">$</span>
	many <span style="color: green;">&#40;</span>commentContent <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> fws<span style="color: green;">&#41;</span>
commentContent <span style="color: #339933; font-weight: bold;">=</span> commentText <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> quotedPair <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> comment
commentText <span style="color: #339933; font-weight: bold;">=</span> ranges <span style="color: green;">&#91;</span><span style="color: green;">&#91;</span>33<span style="color: #339933; font-weight: bold;">..</span>39<span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span>42<span style="color: #339933; font-weight: bold;">..</span>91<span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span>93<span style="color: #339933; font-weight: bold;">..</span>126<span style="color: green;">&#93;</span><span style="color: green;">&#93;</span> <span style="color: #339933; font-weight: bold;">&lt;|&gt;</span> obsNoWsCtl
&nbsp;
nullChar <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> char '\<span style="color: red;">0</span>'
wsp <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> oneOf <span style="background-color: #3cb371;">&quot; <span style="background-color: #3cb371; font-weight: bold;">\t</span>&quot;</span>
cr <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> char '\r'
lf <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> char '\n'
crlf <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">$</span> cr <span style="color: #339933; font-weight: bold;">&gt;&gt;</span> lf
vchar <span style="color: #339933; font-weight: bold;">=</span> ranges <span style="color: green;">&#91;</span><span style="color: green;">&#91;</span>0x21<span style="color: #339933; font-weight: bold;">..</span>0x7e<span style="color: green;">&#93;</span><span style="color: green;">&#93;</span>
obsNoWsCtl <span style="color: #339933; font-weight: bold;">=</span> ranges <span style="color: green;">&#91;</span><span style="color: green;">&#91;</span>1<span style="color: #339933; font-weight: bold;">..</span>8<span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span><span style="color: red;">11</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: red;">12</span><span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span>14<span style="color: #339933; font-weight: bold;">..</span>31<span style="color: green;">&#93;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#91;</span><span style="color: red;">127</span><span style="color: green;">&#93;</span><span style="color: green;">&#93;</span>
ranges <span style="color: #339933; font-weight: bold;">=</span> simply <span style="color: #339933; font-weight: bold;">.</span> oneOf <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">map</span> chr <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">concat</span></pre></div></div>

<p>This now passes all of Dominic Sayer&#8217;s tests that it is meant to—the domain validation used in Dominic Sayer&#8217;s tests is more strict than RFC5322 specifies. Expect this to change!</p>
<p>For those who’d like to know, email addresses that now parse that didn’t before include the often-used (‘|’ is merely to indicate the end of whitespace):</p>
<pre>I.                        |
 am.                  |
 a.      |
 nice.|
 guy@(yeah)you.com</pre>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/email-address-validation-simpler-faster-more-correct/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Properly validating e-mail addresses (or converting EBNF to Parsec)</title>
		<link>http://porg.es/blog/properly-validating-e-mail-addresses</link>
		<comments>http://porg.es/blog/properly-validating-e-mail-addresses#comments</comments>
		<pubDate>Sun, 08 Mar 2009 12:49:52 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[music]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[parsec]]></category>
		<category><![CDATA[rfc]]></category>
		<category><![CDATA[validation]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=307</guid>
		<description><![CDATA[Update: See the better code in the next post. In recent times there have been several calls for websites to properly validate email addresses. Invariably, the compiled regex from Perl’s RFC822 is pasted up as The Way To Do It. The problem with this is (as the source code from the Perl module notes) is [...]]]></description>
			<content:encoded><![CDATA[<p><i>Update</i>: See the better code in <a href="http://porg.es/blog/email-address-validation-simpler-faster-more-correct">the next post</a>.</p>
<p>In recent times there have been <a href="http://www.reddit.com/r/programming/search?q=email+valid">several calls for websites to properly validate email addresses</a>. Invariably, the compiled <a href="http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html">regex from Perl’s RFC822</a> is pasted up as The Way To Do It. The problem with this is (as the source code from the Perl module notes) is that email addresses <em>cannot</em> be validated by a simple regex (due to requiring parenthesis-matching). The Perl code addresses this by first stripping out all comments and then parsing via regex.</p>
<p>With this in mind, I thought that implementing the Addr-Spec specification from <a href="http://tools.ietf.org/html/rfc5322#section-3.4.1">RFC 5322</a> (only released less than 6 months ago) might be a good test of the Haskell library Parsec. So, without further ado I went ahead and translated the EBNF from RFC 5322 directly into Parsec.</p>
<p>The mapping is something like this:</p>
<dl>
<dt>juxtaposition</dt>
<dd><code>&gt;&gt;</code></dd>
<dt>/</dt>
<dd><code>&lt;|&gt;</code></dd>
<dt>*</dt>
<dd><code>many</code></dd>
<dt>1*</dt>
<dd><code>many1</code></dd>
<dt>[]</dt>
<dd><code>optional</code></dd>
</dl>
<p>Here is the result:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> Text.<span style="color: black;">Parsec</span>
<span style="color: #ff7700;font-weight:bold;">import</span> Text.<span style="color: black;">Parsec</span>.<span style="color: black;">Char</span>
<span style="color: #ff7700;font-weight:bold;">import</span> Data.<span style="color: black;">Char</span> <span style="color: black;">&#40;</span><span style="color: #008000;">chr</span><span style="color: black;">&#41;</span>
&nbsp;
isValid :: String -<span style="color: #66cc66;">&gt;</span> Bool
isValid x = let result = valid x <span style="color: #ff7700;font-weight:bold;">in</span>
	either <span style="color: black;">&#40;</span>const <span style="color: #008000;">False</span><span style="color: black;">&#41;</span> <span style="color: black;">&#40;</span>const <span style="color: #008000;">True</span><span style="color: black;">&#41;</span> result
&nbsp;
valid :: String -<span style="color: #66cc66;">&gt;</span> Either ParseError <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
valid x = parse addrSpec <span style="color: #483d8b;">&quot;&quot;</span> x
&nbsp;
ignore x = x <span style="color: #66cc66;">&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
addrSpec = localPart <span style="color: #66cc66;">&gt;&gt;</span> char <span style="color: #483d8b;">'@'</span> <span style="color: #66cc66;">&gt;&gt;</span> domain <span style="color: #66cc66;">&gt;&gt;</span> eof
&nbsp;
localPart = dotAtom <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> quotedString <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> obsLocalPart <span style="color: #66cc66;">&lt;?&gt;</span> <span style="color: #483d8b;">&quot;local part&quot;</span>
domain = dotAtom <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> domainLiteral <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> obsDomain <span style="color: #66cc66;">&lt;?&gt;</span> <span style="color: #483d8b;">&quot;domain&quot;</span>
&nbsp;
domainLiteral = optional cfws <span style="color: #66cc66;">&gt;&gt;</span> char <span style="color: #483d8b;">'['</span> <span style="color: #66cc66;">&gt;&gt;</span>
		many <span style="color: black;">&#40;</span> optional fws <span style="color: #66cc66;">&gt;&gt;</span> dtext<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;&gt;</span>
		optional fws  <span style="color: #66cc66;">&gt;&gt;</span> char <span style="color: #483d8b;">']'</span> <span style="color: #66cc66;">&gt;&gt;</span> optional cfws
		<span style="color: #66cc66;">&lt;?&gt;</span> <span style="color: #483d8b;">&quot;domain literal&quot;</span>
&nbsp;
ranges = oneOf . <span style="color: #008000;">map</span> <span style="color: #008000;">chr</span> . <span style="color: black;">concat</span>
vchar = ranges <span style="color: black;">&#91;</span><span style="color: black;">&#91;</span>0x21..0x7E<span style="color: black;">&#93;</span><span style="color: black;">&#93;</span> -- <span style="color: #ff7700;font-weight:bold;">from</span> Backus-Naur RFC
dtext = ranges <span style="color: black;">&#91;</span><span style="color: black;">&#91;</span>33..90<span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span>94..126<span style="color: black;">&#93;</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> obsDtext
qtext = ranges <span style="color: black;">&#91;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">33</span><span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span>35..91<span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span>93..126<span style="color: black;">&#93;</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> obsQtext
atext = alphaNum <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> oneOf <span style="color: #483d8b;">&quot;!#$%&amp;'*+-/=?^_`{|}~&quot;</span>
ctext = ranges <span style="color: black;">&#91;</span><span style="color: black;">&#91;</span>33..39<span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span>42..91<span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span>93..126<span style="color: black;">&#93;</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> obsCtext
wsp = char <span style="color: #483d8b;">' '</span>
	<span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> char <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\t</span>'</span>
	<span style="color: #66cc66;">&lt;?&gt;</span> <span style="color: #483d8b;">&quot;space or tab&quot;</span>
&nbsp;
cr = char <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\r</span>'</span> <span style="color: #66cc66;">&lt;?&gt;</span> <span style="color: #483d8b;">&quot;carriage return&quot;</span>
lf = char <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\n</span>'</span> <span style="color: #66cc66;">&lt;?&gt;</span> <span style="color: #483d8b;">&quot;line feed&quot;</span>
crlf = cr <span style="color: #66cc66;">&gt;&gt;</span> lf <span style="color: #66cc66;">&lt;?&gt;</span> <span style="color: #483d8b;">&quot;CRLF line ending&quot;</span>
&nbsp;
-- <span style="color: #808080; font-style: italic;"># modification: added try</span>
cfws = <span style="color: #ff7700;font-weight:bold;">try</span> <span style="color: black;">&#40;</span>many1 <span style="color: black;">&#40;</span>optional fws <span style="color: #66cc66;">&gt;&gt;</span> comment<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;&gt;</span> optional fws<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> ignore fws
-- <span style="color: #808080; font-style: italic;"># modification from RFC: adding try because of overlap</span>
fws = <span style="color: #ff7700;font-weight:bold;">try</span> <span style="color: black;">&#40;</span>optional <span style="color: black;">&#40;</span>many wsp <span style="color: #66cc66;">&gt;&gt;</span> crlf<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;&gt;</span> many1 wsp<span style="color: black;">&#41;</span>
	<span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> many1 wsp
	<span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> obsFws
&nbsp;
-- <span style="color: #808080; font-style: italic;"># modification: added try</span>
comment = between <span style="color: black;">&#40;</span>char <span style="color: #483d8b;">'('</span><span style="color: black;">&#41;</span> <span style="color: black;">&#40;</span>char <span style="color: #483d8b;">')'</span><span style="color: black;">&#41;</span> <span style="color: black;">&#40;</span>many <span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">try</span> <span style="color: black;">&#40;</span>optional fws <span style="color: #66cc66;">&gt;&gt;</span> ccontent<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;&gt;</span> optional fws<span style="color: black;">&#41;</span>
	<span style="color: #66cc66;">&lt;?&gt;</span> <span style="color: #483d8b;">&quot;comment&quot;</span>
ccontent = ignore ctext
	<span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> ignore quotedPair
	<span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> comment
&nbsp;
atom = optional cfws <span style="color: #66cc66;">&gt;&gt;</span> many1 atext <span style="color: #66cc66;">&gt;&gt;</span> optional cfws
dotAtomText = many1 atext <span style="color: #66cc66;">&gt;&gt;</span> many <span style="color: black;">&#40;</span>char <span style="color: #483d8b;">'.'</span> <span style="color: #66cc66;">&gt;&gt;</span> many1 atext<span style="color: black;">&#41;</span>
dotAtom = optional cfws <span style="color: #66cc66;">&gt;&gt;</span> dotAtomText <span style="color: #66cc66;">&gt;&gt;</span> optional cfws
&nbsp;
-- <span style="color: #808080; font-style: italic;"># other change from RFC -- merge prefix</span>
quotedPair = char <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\\</span>'</span> <span style="color: #66cc66;">&gt;&gt;</span> <span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>vchar <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> wsp<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> obsQp<span style="color: black;">&#41;</span>
qcontent = qtext <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> quotedPair
quotedString = optional cfws <span style="color: #66cc66;">&gt;&gt;</span>	char <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\&quot;</span>'</span> <span style="color: #66cc66;">&gt;&gt;</span> many <span style="color: black;">&#40;</span>optional fws <span style="color: #66cc66;">&gt;&gt;</span> qcontent<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;&gt;</span>
	optional fws <span style="color: #66cc66;">&gt;&gt;</span>	char <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\&quot;</span>'</span> <span style="color: #66cc66;">&gt;&gt;</span> optional cfws
	<span style="color: #66cc66;">&lt;?&gt;</span> <span style="color: #483d8b;">&quot;quoted string&quot;</span>
&nbsp;
-- <span style="color: #808080; font-style: italic;"># Obsolete syntax</span>
obsNoWsCtl = ranges <span style="color: black;">&#91;</span><span style="color: black;">&#91;</span>1..8<span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span>11..12<span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span>14..31<span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span><span style="color: #ff4500;">127</span><span style="color: black;">&#93;</span><span style="color: black;">&#93;</span>
obsCtext = obsNoWsCtl
obsDtext = obsNoWsCtl <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> quotedPair
obsQtext = obsNoWsCtl
-- <span style="color: #808080; font-style: italic;"># change: see above</span>
obsQp = <span style="color: black;">&#40;</span>char <span style="color: black;">&#40;</span><span style="color: #008000;">chr</span> <span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> obsNoWsCtl <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> lf <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> cr<span style="color: black;">&#41;</span>
obsLocalPart = word <span style="color: #66cc66;">&gt;&gt;</span> many <span style="color: black;">&#40;</span>char <span style="color: #483d8b;">'.'</span> <span style="color: #66cc66;">&gt;&gt;</span> word<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
obsDomain = atom <span style="color: #66cc66;">&gt;&gt;</span> many <span style="color: black;">&#40;</span>char <span style="color: #483d8b;">'.'</span> <span style="color: #66cc66;">&gt;&gt;</span> atom<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
obsFws = many1 wsp <span style="color: #66cc66;">&gt;&gt;</span> many <span style="color: black;">&#40;</span>crlf <span style="color: #66cc66;">&gt;&gt;</span> many1 wsp<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
word = atom <span style="color: #66cc66;">&lt;</span>|<span style="color: #66cc66;">&gt;</span> quotedString</pre></div></div>

<p><i>A note before I continue:</i> since Parsec by default does no backtracking (in order to remain efficient), there are a couple of places (two that I&#8217;ve found so far) where the original EBNF needs to be changed slightly. I have noted these in the source above. It is possible there are a couple more places that need fixing, but I haven’t run this against a large test suite yet to find them. (They are most likely to be in the ‘obsolete syntax’ section.)</p>
<p>And of course, some demonstrations (keep in mind that there is an extra level of escaping operating here&#8230; where relevant I&#8217;ve included the unescaped email address in a comment):</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">isValid <span style="color: #483d8b;">&quot;porges@example.com&quot;</span> == <span style="color: #008000;">True</span>
isValid <span style="color: #483d8b;">&quot;porges@@example.com&quot;</span> == <span style="color: #008000;">False</span>
isValid <span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>porges@<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span> == <span style="color: #008000;">True</span> -- <span style="color: #808080; font-style: italic;"># &quot;porges@&quot;@porg.es</span>
isValid <span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>por(g)es@<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span> == <span style="color: #008000;">True</span> -- <span style="color: #808080; font-style: italic;"># &quot;por(g)es@&quot;@porg.es</span>
isValid <span style="color: #483d8b;">&quot;porges(comment)@example.com&quot;</span> == <span style="color: #008000;">True</span>
isValid <span style="color: #483d8b;">&quot;porges(comme(nests)nt)@example.com&quot;</span> == <span style="color: #008000;">True</span>
isValid <span style="color: #483d8b;">&quot;porges(comme(nests)nt())@example.com&quot;</span> == <span style="color: #008000;">True</span>
isValid <span style="color: #483d8b;">&quot;porges(()comme(nests)nt())@example.com&quot;</span> == <span style="color: #008000;">True</span>
isValid <span style="color: #483d8b;">&quot;()porges(()comme(nests)nt())@example.com&quot;</span> == <span style="color: #008000;">True</span>
isValid <span style="color: #483d8b;">&quot;((lol)porges(()comme(nests)nt())@example.com&quot;</span> == <span style="color: #008000;">False</span>
isValid <span style="color: #483d8b;">&quot;((lol))porges(()comme(nests)nt())@example.com&quot;</span> == <span style="color: #008000;">True</span>
isValid <span style="color: #483d8b;">&quot;(lol))porges(()comme(nests)nt())@example.com&quot;</span> == <span style="color: #008000;">False</span>
isValid <span style="color: #483d8b;">&quot;((lol))porges(()comme(nests)nt())@example.com&quot;</span> == <span style="color: #008000;">True</span>
isValid <span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>s<span style="color: #000099; font-weight: bold;">\\</span><span style="color: #000099; font-weight: bold;">\0</span><span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span> == <span style="color: #008000;">True</span>
-- <span style="color: #808080; font-style: italic;"># &quot;s\NUL&quot;@example.com, where NUL is actually the</span>
-- <span style="color: #808080; font-style: italic;"># null character! Yep, can't strlen() on email addresses...</span></pre></div></div>

<p>I managed to find <a href="http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx">this post on email addresses</a> by Phil Haack which has the following tests:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>Abc<span style="color: #000099; font-weight: bold;">\\</span>@def<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>Fred Bloggs<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>Joe<span style="color: #000099; font-weight: bold;">\\</span>Blow<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>Abc@def<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;customer/department=shipping@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;$A12345@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;!def!xyz%abc@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;_somename@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;NotAnEmail&quot;</span>,<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;@NotAnEmail&quot;</span>,<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>test<span style="color: #000099; font-weight: bold;">\\</span><span style="color: #000099; font-weight: bold;">\\</span>blah<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>test<span style="color: #000099; font-weight: bold;">\\</span>blah<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
-- <span style="color: #808080; font-style: italic;"># Phil gets false for this, which I think is wrong</span>
-- <span style="color: #808080; font-style: italic;"># (Dominic Sayers notes the same at the end of the comment thread)</span>
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>test<span style="color: #000099; font-weight: bold;">\\</span><span style="color: #000099; font-weight: bold;">\r</span>blah<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>test<span style="color: #000099; font-weight: bold;">\r</span>blah<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>test<span style="color: #000099; font-weight: bold;">\\</span><span style="color: #000099; font-weight: bold;">\&quot;</span>blah<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>test<span style="color: #000099; font-weight: bold;">\&quot;</span>blah<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;customer/department@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;$A12345@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;!def!xyz%abc@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;_Yosemite.Sam@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;~@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;.wooly@example.com&quot;</span>,<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;wo..oly@example.com&quot;</span>,<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;pootietang.@example.com&quot;</span>,<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;.@example.com&quot;</span>,<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>Austin@Powers<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;Ima.Fool@example.com&quot;</span>,<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>Ima.Fool<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span>Ima Fool<span style="color: #000099; font-weight: bold;">\&quot;</span>@example.com&quot;</span>,<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>,
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;Ima Fool@example.com&quot;</span>,<span style="color: #008000;">False</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span></pre></div></div>

<p>Next job is to test it against <a href="http://www.dominicsayers.com/isemail/">this batch&#8230;</a></p>
<p><i>Update:</i> Yep, fails in several areas with the obsolete syntax. I’ve fixed one above. (Note that I’m not concerned with the failures in the domain part of the address, as the RFC5322 EBNF for this is more liberal than the tests require.)</p>
<p>Might have to refactor the syntax&#8230; there is a large overlap with the obsolete syntax. (Or just use ‘try’, but that’s not so efficient.)</p>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/properly-validating-e-mail-addresses/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
