<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>porges &#187; horrid</title>
	<atom:link href="http://porg.es/blog/tag/horrid/feed" rel="self" type="application/rss+xml" />
	<link>http://porg.es/blog</link>
	<description></description>
	<lastBuildDate>Thu, 12 Jan 2012 23:45:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Casting in .NET via object mutation</title>
		<link>http://porg.es/blog/casting-in-dot-net-via-object-mutation</link>
		<comments>http://porg.es/blog/casting-in-dot-net-via-object-mutation#comments</comments>
		<pubDate>Fri, 13 May 2011 10:37:03 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[.net]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[horrid]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=624</guid>
		<description><![CDATA[In this post, we will see how to make the following code fail: object it = new SomeStruct &#123; Item = 1 &#125;; &#160; Floatsy&#40;it&#41;; &#160; Console.WriteLine&#40;&#40;&#40;SomeStruct&#41;it&#41;.Item&#41;; At runtime, it will throw an InvalidCastException! Here&#8217;s how. In .NET, each object has also associated with it a value which determines the type of the object. In [...]]]></description>
			<content:encoded><![CDATA[<p>In this post, we will see how to make the following code fail:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;">	<span style="color: #6666cc; font-weight: bold;">object</span> it <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> SomeStruct <span style="color: #008000;">&#123;</span> Item <span style="color: #008000;">=</span> <span style="color: #FF0000;">1</span> <span style="color: #008000;">&#125;</span><span style="color: #008000;">;</span>
&nbsp;
	Floatsy<span style="color: #008000;">&#40;</span>it<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
	Console<span style="color: #008000;">.</span><span style="color: #0000FF;">WriteLine</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span>SomeStruct<span style="color: #008000;">&#41;</span>it<span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Item</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span></pre></div></div>

<p>At runtime, it will throw an <code>InvalidCastException</code>!</p>
<p><span id="more-624"></span></p>
<p>Here&#8217;s how. In .NET, each object has also associated with it a value which determines the type of the object. In memory, this is stored before the object&#8217;s data, like so (I got this information from the MSDN article <a href="http://msdn.microsoft.com/en-us/magazine/cc163791.aspx">Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects</a>):</p>
<p><center><a href="http://porg.es/blog/wp-content/uploads/2011/05/layout2.png"><img src="http://porg.es/blog/wp-content/uploads/2011/05/layout2.png" alt="" title="layout" width="191" height="176" class="aligncenter size-full wp-image-634" /></a></center></p>
<p>Now, if we can write to that, we can set the type of the object to whatever we want!</p>
<p>There&#8217;s one small problem &mdash; in .NET we can&#8217;t take the address of a managed object (which we need in order to write to the object in memory). There are various reasons for this, one of them being that the garbage collector likes to be able to move objects around. Being able to take arbitrary pointers of objects would mean that the pointers could become invalidated.</p>
<p>What we <em>can</em> do is to pin a struct, which allows us to retrieve the address of it (this facility exists so that users can pass pinned managed structs into unmanaged code as pointers). Here&#8217;s how to pin a struct:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;">    var handle <span style="color: #008000;">=</span> GCHandle<span style="color: #008000;">.</span><span style="color: #0000FF;">Alloc</span><span style="color: #008000;">&#40;</span>o, GCHandleType<span style="color: #008000;">.</span><span style="color: #0000FF;">Pinned</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
    var addr <span style="color: #008000;">=</span> handle<span style="color: #008000;">.</span><span style="color: #0000FF;">AddrOfPinnedObject</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span></pre></div></div>

<p>We can then write to the type handle like so (both the sync block and type handle are 32 bits):</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;">    <span style="color: #008000;">*</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">int</span><span style="color: #008000;">*</span><span style="color: #008000;">&#41;</span>addr<span style="color: #008000;">&#41;</span><span style="color: #008000;">-</span><span style="color: #FF0000;">1</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">=</span> someValue<span style="color: #008000;">;</span></pre></div></div>

<p>Another problem is that all of this is only true for values on the heap. As far as I can tell, the static type of the variable is what .NET uses to identify values on the stack. So in order for this to work, we must use a boxed copy of a struct.</p>
<hr/>
<p>Finally, here&#8217;s some demo code, showing runtime changing of types for both a primitive and custom struct:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF; font-weight: bold;">using</span> <span style="color: #008080;">System</span><span style="color: #008000;">;</span>
<span style="color: #0600FF; font-weight: bold;">using</span> <span style="color: #008080;">System.Runtime.InteropServices</span><span style="color: #008000;">;</span>
&nbsp;
<span style="color: #0600FF; font-weight: bold;">unsafe</span> <span style="color: #6666cc; font-weight: bold;">class</span> Program
<span style="color: #008000;">&#123;</span>
    <span style="color: #6666cc; font-weight: bold;">struct</span> SomeStruct <span style="color: #008000;">&#123;</span> <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">int</span> Item<span style="color: #008000;">;</span> <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #0600FF; font-weight: bold;">static</span> <span style="color: #6666cc; font-weight: bold;">void</span> Main<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #6666cc; font-weight: bold;">object</span> notAFloat <span style="color: #008000;">=</span> <span style="color: #FF0000;">1</span><span style="color: #008000;">;</span>
        <span style="color: #6666cc; font-weight: bold;">object</span> me <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> SomeStruct <span style="color: #008000;">&#123;</span>Item <span style="color: #008000;">=</span> <span style="color: #FF0000;">1</span><span style="color: #008000;">&#125;</span><span style="color: #008000;">;</span>
&nbsp;
        Console<span style="color: #008000;">.</span><span style="color: #0000FF;">WriteLine</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;{0} ({1})&quot;</span>, notAFloat, notAFloat<span style="color: #008000;">.</span><span style="color: #0000FF;">GetType</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        Floatify<span style="color: #008000;">&#40;</span>notAFloat<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        Console<span style="color: #008000;">.</span><span style="color: #0000FF;">WriteLine</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;{0} ({1})&quot;</span>, notAFloat, notAFloat<span style="color: #008000;">.</span><span style="color: #0000FF;">GetType</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        Console<span style="color: #008000;">.</span><span style="color: #0000FF;">WriteLine</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        Console<span style="color: #008000;">.</span><span style="color: #0000FF;">WriteLine</span><span style="color: #008000;">&#40;</span>me<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        Floatify<span style="color: #008000;">&#40;</span>me<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        Console<span style="color: #008000;">.</span><span style="color: #0000FF;">WriteLine</span><span style="color: #008000;">&#40;</span>me<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF; font-weight: bold;">static</span> <span style="color: #6666cc; font-weight: bold;">void</span> Floatify<span style="color: #008000;">&lt;</span>T<span style="color: #008000;">&gt;</span><span style="color: #008000;">&#40;</span>T o<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        var handle <span style="color: #008000;">=</span> GCHandle<span style="color: #008000;">.</span><span style="color: #0000FF;">Alloc</span><span style="color: #008000;">&#40;</span>o, GCHandleType<span style="color: #008000;">.</span><span style="color: #0000FF;">Pinned</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        var addr <span style="color: #008000;">=</span> handle<span style="color: #008000;">.</span><span style="color: #0000FF;">AddrOfPinnedObject</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #6666cc; font-weight: bold;">object</span> f <span style="color: #008000;">=</span> <span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">float</span><span style="color: #008000;">&#41;</span><span style="color: #FF0000;">1.0</span><span style="color: #008000;">;</span>
        var handle2 <span style="color: #008000;">=</span> GCHandle<span style="color: #008000;">.</span><span style="color: #0000FF;">Alloc</span><span style="color: #008000;">&#40;</span>f, GCHandleType<span style="color: #008000;">.</span><span style="color: #0000FF;">Pinned</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        var addr2 <span style="color: #008000;">=</span> handle2<span style="color: #008000;">.</span><span style="color: #0000FF;">AddrOfPinnedObject</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #008080; font-style: italic;">// copy type handle of a float to the object</span>
        <span style="color: #008000;">*</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">int</span><span style="color: #008000;">*</span><span style="color: #008000;">&#41;</span>addr<span style="color: #008000;">&#41;</span><span style="color: #008000;">-</span><span style="color: #FF0000;">1</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">=</span> <span style="color: #008000;">*</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">int</span><span style="color: #008000;">*</span><span style="color: #008000;">&#41;</span>addr2<span style="color: #008000;">&#41;</span><span style="color: #008000;">-</span><span style="color: #FF0000;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        handle2<span style="color: #008000;">.</span><span style="color: #0000FF;">Free</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        handle<span style="color: #008000;">.</span><span style="color: #0000FF;">Free</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>The output is:</p>
<p><center>
<pre>1 (System.Int32)
1.401298E-45 (System.Single)

Program+SomeStruct
1.401298E-45</pre>
<p></center></p>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/casting-in-dot-net-via-object-mutation/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>So, it turns out that .NET&#8217;s Regex are more powerful t̖̱̍ͭ͊h̟ͨͨa̞̖̙̔̇n͇̝͚̤̒́ͨ̐ ̯͖̏̌̔Ị̟̮̱̥̇̐̎͂ͬ͗̒ ̪̹̱͙̘ͦ̉ͪͪͣ̉͊o͕̥̝͇͙ͪ͊ͤ̑̂̽́r͔̭̪̮̟͗̍ͨ͗͛ͣḭ̝̜͈ͫ́g̥̹̥̜̦̓̇̓i̪͕̭̞͛ͯ̓͛̔̾ͫn̘̗a̰̜ͨͪ͊l̩͑̐̐́ͥ̚l̜ͨ͋̈ẙͦ́ ̟̬̬̫͙̤ͭ̚t̳͎̱̗̲́h͔͙̰̬̊̈́͊̾o͉ͫ̌̄u͉̲̥g̏ͥ̑̅̽̇h̻͇̥̰̯ͥͯṱ̯̏̄̒͒ͫ̃.͖̟͍̘̼̼̍̐̀͊̓́&#8230;</title>
		<link>http://porg.es/blog/so-it-turns-out-that-dot-nets-regex-are-more-powerful-than-i-originally-thought</link>
		<comments>http://porg.es/blog/so-it-turns-out-that-dot-nets-regex-are-more-powerful-than-i-originally-thought#comments</comments>
		<pubDate>Mon, 09 May 2011 14:09:19 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[horrid]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[silly]]></category>
		<category><![CDATA[snippet]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=575</guid>
		<description><![CDATA[Today, thanks to user Lucero on StackOverflow, I learned about .NET&#8217;s &#8220;Balancing Groups&#8221; Regex feature. Basically, any time you use a named capturing group, it actually pushes the capture onto a named stack. You can then pop this stack by using the same capturing group prefixed with a hyphen, like (?). Of course, anyone who [...]]]></description>
			<content:encoded><![CDATA[<p>Today, thanks to user <a href="http://stackoverflow.com/users/88558/lucero">Lucero</a> on StackOverflow, I learned about .NET&#8217;s &#8220;Balancing Groups&#8221; Regex feature.</p>
<p>Basically, any time you use a named capturing group, it actually pushes the capture onto a named stack. You can then pop this stack by using the same capturing group prefixed with a hyphen, like <code>(?<-stackToPop>)</code>.</p>
<hr/>
<p>Of course, anyone who finds themselves in this situation is going to ask: <em>can it match XML?</em></p>
<p><span id="more-575"></span></p>
<p>It&#8217;s possible that I am missing something completely (it is rather late at night), but &#8230; <em>very nearly</em>. I haven&#8217;t quite figured out a nested section in the local DTD subset, but no one uses that feature anyway. (Can you spot it?)</p>
<p>Aside from that, most of the well-formedness criteria are handled (the obvious one being element nesting). Things that require non-local information such as entities aren&#8217;t handled. I think it is possible to handle duplicate attribute names in this form as well (via a lookahead for duplicate names).</p>
<p>Here is the code, and a test file which shows some stuff that is caught by this. Breaking any of the elements should make it fail:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;">var surrogate <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;([\ud800-\udbff][\udc00-\udfff])&quot;</span><span style="color: #008000;">;</span><span style="color: #008080; font-style: italic;">// .NET can't handle \U10000-\u10FFFF</span>
var c <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;([\u0009\u000a\u000d\u0020-\ud7ff\ue000-\ufffd]|&quot;</span><span style="color: #008000;">+</span>surrogate <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span> 
var s <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;([\u0020\u0009\u000d\u000a]+)&quot;</span><span style="color: #008000;">;</span>
var nameStartChar <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|&quot;</span> <span style="color: #008000;">+</span> surrogate <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var nameChar <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(&quot;</span> <span style="color: #008000;">+</span> nameStartChar <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])&quot;</span><span style="color: #008000;">;</span>
var name <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'name'&quot;</span> <span style="color: #008000;">+</span> nameStartChar <span style="color: #008000;">+</span> nameChar <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;*)&quot;</span><span style="color: #008000;">;</span>
var names <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'names'&quot;</span> <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;(\u0020&quot;</span> <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span><span style="color: #666666;">&quot;)*)&quot;</span><span style="color: #008000;">;</span>
var nmtoken <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'nmtoken'&quot;</span> <span style="color: #008000;">+</span> nameChar <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;+)&quot;</span><span style="color: #008000;">;</span>
var nmtokens <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'nmtokens'&quot;</span> <span style="color: #008000;">+</span> nmtoken <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;(\u0020&quot;</span> <span style="color: #008000;">+</span> nmtoken <span style="color: #008000;">+</span><span style="color: #666666;">&quot;)*)&quot;</span><span style="color: #008000;">;</span>
var pereference <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;%&quot;</span> <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;;&quot;</span><span style="color: #008000;">;</span>
var entityReference<span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'entityRef'&amp;&quot;</span> <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;;)&quot;</span><span style="color: #008000;">;</span>
var charref <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;&amp;\#([0-9]+|x[0-9a-fA-F]+);&quot;</span><span style="color: #008000;">;</span>
var reference <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'reference'&quot;</span><span style="color: #008000;">+</span> entityReference <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> charref <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var entityValue <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'entityValue'<span style="color: #008080; font-weight: bold;">\&quot;</span>([^%&amp;<span style="color: #008080; font-weight: bold;">\&quot;</span>]|&quot;</span> <span style="color: #008000;">+</span> pereference <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> reference <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*<span style="color: #008080; font-weight: bold;">\&quot;</span>|'([^%&amp;']|&quot;</span> <span style="color: #008000;">+</span> pereference <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> reference <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*')&quot;</span><span style="color: #008000;">;</span>
var eq <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'eq'&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?=&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?)&quot;</span><span style="color: #008000;">;</span>
var versionNum  <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;1\.[0-9]+&quot;</span><span style="color: #008000;">;</span>
var comment <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'comment'&lt;!--((?!--)&quot;</span> <span style="color: #008000;">+</span> c <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*--&gt;)&quot;</span><span style="color: #008000;">;</span>
var PITarget <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'pitarget'(?![xX][mM][lL])&quot;</span><span style="color: #008000;">+</span>name<span style="color: #008000;">+</span><span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var PI<span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;(?'PI'&lt;\?&quot;</span> <span style="color: #008000;">+</span> PITarget <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;((?!\?&gt;)&quot;</span> <span style="color: #008000;">+</span> c <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;)*)?\?&gt;)&quot;</span><span style="color: #008000;">;</span>
var misc <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'misc'&quot;</span> <span style="color: #008000;">+</span> comment <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> PI <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var versionInfo <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'versionInfo'&quot;</span><span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;version&quot;</span> <span style="color: #008000;">+</span> eq <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;('&quot;</span> <span style="color: #008000;">+</span> versionNum <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;'|<span style="color: #008080; font-weight: bold;">\&quot;</span>&quot;</span> <span style="color: #008000;">+</span> versionNum <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;<span style="color: #008080; font-weight: bold;">\&quot;</span>))&quot;</span><span style="color: #008000;">;</span>
var encName <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'encName'[A-Za-z][A-Za-z0-9._-]*)&quot;</span><span style="color: #008000;">;</span>
var encodingDecl <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'encodingDecl'&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;encoding&quot;</span> <span style="color: #008000;">+</span> eq <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;(<span style="color: #008080; font-weight: bold;">\&quot;</span>&quot;</span> <span style="color: #008000;">+</span> encName <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;<span style="color: #008080; font-weight: bold;">\&quot;</span>|'&quot;</span><span style="color: #008000;">+</span> encName <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;'))&quot;</span><span style="color: #008000;">;</span>
var sddecl <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'sddecl'&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;standalone&quot;</span> <span style="color: #008000;">+</span> eq <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;(<span style="color: #008080; font-weight: bold;">\&quot;</span>(yes|no)<span style="color: #008080; font-weight: bold;">\&quot;</span>|'(yes|no)'))&quot;</span><span style="color: #008000;">;</span>
var xmlDecl <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;(?'xmlDecl'&lt;\?xml&quot;</span> <span style="color: #008000;">+</span> versionInfo <span style="color: #008000;">+</span> encodingDecl <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&quot;</span> <span style="color: #008000;">+</span> sddecl <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?\?&gt;)&quot;</span><span style="color: #008000;">;</span> 
var mixed <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;(?'mixed'\(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?\#PCDATA&quot;</span> <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?\|&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&quot;</span> <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span><span style="color: #666666;">&quot;)*&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&quot;</span> <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;\)\*|\(&quot;</span> <span style="color: #008000;">+</span>s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?\#PCDATA&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?\))&quot;</span><span style="color: #008000;">;</span>
var children <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;(?'children'unsureifpossible)&quot;</span><span style="color: #008000;">;</span>
var contentSpec <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'contentspec'EMPTY|ANY|&quot;</span><span style="color: #008000;">+</span>mixed<span style="color: #008000;">+</span><span style="color: #666666;">&quot;|&quot;</span><span style="color: #008000;">+</span>children<span style="color: #008000;">+</span><span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var elementDecl <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'elementdecl'&lt;!ELEMENT&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> contentSpec <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&gt;)&quot;</span><span style="color: #008000;">;</span>
var stringType <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;CDATA&quot;</span><span style="color: #008000;">;</span>
var tokenizedType <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(ID(REF(S)?)?|ENTIT(Y|IES)|NMTOKENS?)&quot;</span><span style="color: #008000;">;</span>
var notationType <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'notation'NOTATION&quot;</span> <span style="color: #008000;">+</span>s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;\(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&quot;</span> <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?\|&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&quot;</span> <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?\))&quot;</span><span style="color: #008000;">;</span>
var enumeration <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;(?'enumeration'\(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&quot;</span> <span style="color: #008000;">+</span> nmtoken <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?\|&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&quot;</span> <span style="color: #008000;">+</span> nmtoken <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?\))&quot;</span><span style="color: #008000;">;</span>
var enumeratedType <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'enumType'&quot;</span> <span style="color: #008000;">+</span> notationType <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> enumeration <span style="color: #008000;">+</span><span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var attType <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'attType'&quot;</span> <span style="color: #008000;">+</span> stringType <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> tokenizedType <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> enumeratedType <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var attValue <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'attValue'<span style="color: #008080; font-weight: bold;">\&quot;</span>([^&lt;&amp;<span style="color: #008080; font-weight: bold;">\&quot;</span>]|&quot;</span> <span style="color: #008000;">+</span>reference<span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*<span style="color: #008080; font-weight: bold;">\&quot;</span>|'([^&lt;&amp;']|&quot;</span> <span style="color: #008000;">+</span> reference <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*')&quot;</span><span style="color: #008000;">;</span>
var defaultDecl <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;(?'defaultDecl'\#REQUIRED|\#IMPLIED|(\#FIXED&quot;</span><span style="color: #008000;">+</span>s<span style="color: #008000;">+</span><span style="color: #666666;">&quot;)?&quot;</span> <span style="color: #008000;">+</span> attValue <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var attDef <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'attDef'&quot;</span><span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> attType <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> defaultDecl <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var attListDecl <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'attlist'&lt;!ATTLIST&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> attDef <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;*&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&gt;)&quot;</span><span style="color: #008000;">;</span>
var systemLiteral <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'systemLiteral'<span style="color: #008080; font-weight: bold;">\&quot;</span>[^<span style="color: #008080; font-weight: bold;">\&quot;</span>]*<span style="color: #008080; font-weight: bold;">\&quot;</span>|'[^']*')&quot;</span><span style="color: #008000;">;</span>
var pubIdChar <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;[a-zA-Z0-9'()+,./:=?;!*#@$_%\u0020\u000d\u000a-]&quot;</span><span style="color: #008000;">;</span>
var pubidLiteral <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'pubIdLiteral'<span style="color: #008080; font-weight: bold;">\&quot;</span>&quot;</span> <span style="color: #008000;">+</span> pubIdChar <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;*<span style="color: #008080; font-weight: bold;">\&quot;</span>|'((?!')&quot;</span> <span style="color: #008000;">+</span> pubIdChar <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*')&quot;</span><span style="color: #008000;">;</span>
var externalID <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'externalID'SYSTEM&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> systemLiteral <span style="color: #008000;">+</span><span style="color: #666666;">&quot;|PUBLIC&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> pubidLiteral <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> systemLiteral<span style="color: #008000;">+</span><span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var nDataDecl <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'ndatadecl'&quot;</span><span style="color: #008000;">+</span>s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;NDATA&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var entityDef  <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'entityDef'&quot;</span> <span style="color: #008000;">+</span> entityValue <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|(&quot;</span> <span style="color: #008000;">+</span>externalID <span style="color: #008000;">+</span> nDataDecl <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?))&quot;</span><span style="color: #008000;">;</span>
var peDef <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'pedef'&quot;</span> <span style="color: #008000;">+</span> entityValue <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span>  <span style="color: #008000;">+</span> externalID <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var GEDecl <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'gedecl'&lt;!ENTITY&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> entityDef <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&gt;)&quot;</span><span style="color: #008000;">;</span>
var PEDecl <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'gedecl'&lt;!ENTITY&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;%&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> peDef <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&gt;)&quot;</span><span style="color: #008000;">;</span>
var entityDecl <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'entityDecl'&quot;</span><span style="color: #008000;">+</span> GEDecl <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> PEDecl <span style="color: #008000;">+</span><span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var publicID <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'publicID'PUBLIC&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> pubidLiteral <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var notationDecl <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'notationDecl'&lt;!NOTATION&quot;</span> <span style="color: #008000;">+</span>  s <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;(&quot;</span> <span style="color: #008000;">+</span> externalID <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> publicID <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&gt;)&quot;</span><span style="color: #008000;">;</span>
var markupDecl <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'markupdecl'&quot;</span> <span style="color: #008000;">+</span> elementDecl <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> attListDecl <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> entityDecl <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> notationDecl <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> PI <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> comment <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var DeclSep <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'declSep'&quot;</span> <span style="color: #008000;">+</span> pereference <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var intSubSet <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;(?'intSubSet'(&quot;</span> <span style="color: #008000;">+</span> markupDecl <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;|&quot;</span> <span style="color: #008000;">+</span> DeclSep <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*)&quot;</span><span style="color: #008000;">;</span> 
var docTypeDecl <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'doctypedecl'&lt;!DOCTYPE&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> externalID<span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)?&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?(\[&quot;</span> <span style="color: #008000;">+</span> intSubSet <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;\]&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?)?&gt;)&quot;</span><span style="color: #008000;">;</span> 
var prolog <span style="color: #008000;">=</span> xmlDecl <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&quot;</span> <span style="color: #008000;">+</span> misc <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;*(&quot;</span> <span style="color: #008000;">+</span> docTypeDecl <span style="color: #008000;">+</span> misc <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;*)?&quot;</span><span style="color: #008000;">;</span> 
var attribute <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;(?'attribute'&quot;</span> <span style="color: #008000;">+</span>name <span style="color: #008000;">+</span> eq <span style="color: #008000;">+</span> attValue <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span><span style="color: #008000;">;</span>
var CDSect <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;(?'CDSect'&lt;!\[CDATA\[((?!\]\]&gt;)&quot;</span><span style="color: #008000;">+</span>c<span style="color: #008000;">+</span><span style="color: #666666;">@&quot;)*\]\]&gt;)&quot;</span><span style="color: #008000;">;</span>
var charData <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;(((?!\]\]&gt;)[^&lt;&amp;])*)&quot;</span><span style="color: #008000;">;</span>
var content <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;(?&gt;&quot;</span> <span style="color: #008000;">+</span> <span style="color: #008080; font-style: italic;">// minor optimization... don't backtrack over this (makes failing faster)</span>
		<span style="color: #666666;">@&quot;&lt;(?'openclose'&quot;</span> <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;)(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> attribute <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?/&gt;|&quot;</span><span style="color: #008000;">+</span>
		<span style="color: #666666;">@&quot;&lt;(?'open'&quot;</span><span style="color: #008000;">+</span> name <span style="color: #008000;">+</span><span style="color: #666666;">@&quot;)(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> attribute <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?&gt;|&quot;</span><span style="color: #008000;">+</span>
		<span style="color: #666666;">@&quot;&lt;/(?=\k'open'&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?&gt;)(?'close-open'&quot;</span> <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span><span style="color: #666666;">@&quot;?&gt;|&quot;</span>
		<span style="color: #008000;">+</span>reference<span style="color: #008000;">+</span><span style="color: #666666;">@&quot;|&quot;</span>
		<span style="color: #008000;">+</span>PI<span style="color: #008000;">+</span><span style="color: #666666;">@&quot;|&quot;</span>
		<span style="color: #008000;">+</span>comment<span style="color: #008000;">+</span><span style="color: #666666;">@&quot;|&quot;</span>
		<span style="color: #008000;">+</span>CDSect<span style="color: #008000;">+</span><span style="color: #666666;">@&quot;|&quot;</span>
		<span style="color: #008000;">+</span>charData<span style="color: #008000;">+</span><span style="color: #666666;">@&quot;)*&quot;</span> <span style="color: #008000;">+</span> 
	<span style="color: #666666;">&quot;(?(open)(?!))&quot;</span><span style="color: #008000;">;</span>
var rootElement <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;(?'root'(&lt;(?'rootName'&quot;</span> <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> attribute <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?&gt;&quot;</span> <span style="color: #008000;">+</span> content <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;&lt;/\k'rootName'&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;?&gt;)|(&lt;(?'rootName'&quot;</span> <span style="color: #008000;">+</span> name <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)(&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> attribute <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;)*&quot;</span> <span style="color: #008000;">+</span> s <span style="color: #008000;">+</span> <span style="color: #666666;">@&quot;?/&gt;))&quot;</span><span style="color: #008000;">;</span>
&nbsp;
var document <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;^&quot;</span> <span style="color: #008000;">+</span> prolog <span style="color: #008000;">+</span> rootElement <span style="color: #008000;">+</span> misc <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;*&quot;</span> <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;$&quot;</span><span style="color: #008000;">;</span>
&nbsp;
var testDoc <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;&lt;?xml version='1.0' encoding=&quot;</span><span style="color: #666666;">&quot;utf-8&quot;</span><span style="color: #666666;">&quot;?&gt;&lt;!DOCTYPE nothtml []&gt;&lt;items&gt;
	&lt;item available=&quot;</span><span style="color: #666666;">&quot;yes&quot;</span><span style="color: #666666;">&quot; &gt;
		&lt;name&gt; laptop  &lt;/name&gt;
		&lt;![CDATA[something14!$]] 1412]]&gt;
		&lt;&quot;</span><span style="color: #008000;">+</span><span style="color: #666666;">&quot;<span style="color: #008080; font-weight: bold;">\U</span>00010000&quot;</span><span style="color: #008000;">+</span><span style="color: #666666;">@&quot;quantity&gt;  2 &amp;amp; &amp;#121; &amp;#x234f; &lt;/&quot;</span><span style="color: #008000;">+</span><span style="color: #666666;">&quot;<span style="color: #008080; font-weight: bold;">\U</span>00010000&quot;</span><span style="color: #008000;">+</span><span style="color: #666666;">@&quot;quantity&gt;
	&lt;/item&gt;&lt;?notxml?&gt;&quot;</span> <span style="color: #008080; font-style: italic;">/* or &lt;?xml?&gt; here */</span> <span style="color: #008000;">+</span><span style="color: #666666;">@&quot;
	&lt;item available=&quot;</span><span style="color: #666666;">&quot;yes&quot;</span><span style="color: #666666;">&quot; x='' y=&quot;</span><span style="color: #666666;">&quot;&amp;amp;&quot;</span><span style="color: #666666;">&quot;&gt;
		&lt;name&gt; mouse &lt;/name &gt;
		&lt;quantity&gt; 1 &quot;</span> <span style="color: #008000;">+</span> <span style="color: #008080; font-style: italic;">/* or ]]&gt; invalid here */</span>  <span style="color: #666666;">@&quot; &lt;/quantity&gt;
	&lt;/item&gt;
	&lt;item available=&quot;</span><span style="color: #666666;">&quot;no&quot;</span><span style="color: #666666;">&quot; &gt;
		&lt;!----&gt; &lt;!-- --&gt; &lt;!-- - --&gt;&quot;</span> <span style="color: #008000;">+</span> <span style="color: #008080; font-style: italic;">/* or &lt;!-- -- --&gt; here */</span> <span style="color: #666666;">@&quot;
		&lt;name&gt; keyboad &lt;/name&gt;
		&lt;quantity&gt; 0&lt;/quantity&gt;
	&lt;/item&gt;
&lt;/items&gt;&lt;!-- stuff can go here --&gt; &lt;!-- yup --&gt; &lt;?pi aasd as!@*&amp;$^!*@&amp;$!@ ?&gt;&quot;</span><span style="color: #008000;">;</span>
&nbsp;
<span style="color: #008080; font-style: italic;">//Console.WriteLine(document);</span>
Console<span style="color: #008000;">.</span><span style="color: #0000FF;">WriteLine</span><span style="color: #008000;">&#40;</span>Regex<span style="color: #008000;">.</span><span style="color: #0000FF;">Match</span><span style="color: #008000;">&#40;</span>testDoc, document, RegexOptions<span style="color: #008000;">.</span><span style="color: #0000FF;">IgnorePatternWhitespace</span><span style="color: #008000;">|</span>RegexOptions<span style="color: #008000;">.</span><span style="color: #0000FF;">Singleline</span><span style="color: #008000;">|</span>RegexOptions<span style="color: #008000;">.</span><span style="color: #0000FF;">ExplicitCapture</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span></pre></div></div>

<hr/>
<p>And here&#8217;s the regex (with apologies to <a href="http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html">Mail::RFC822::Address</a>):</p>
<p><code style="white-space:nowrap">^(?'xmlDecl'&lt;\?xml(?'versionInfo'([\u0020\u0009\u000d\u000a]+)version(?'eq'([\u0<br />
020\u0009\u000d\u000a]+)?=([\u0020\u0009\u000d\u000a]+)?)('1\.[0-9]+'|"1\.[0-9]+<br />
"))(?'encodingDecl'([\u0020\u0009\u000d\u000a]+)encoding(?'eq'([\u0020\u0009\u00<br />
0d\u000a]+)?=([\u0020\u0009\u000d\u000a]+)?)("(?'encName'[A-Za-z][A-Za-z0-9._-]*<br />
)"|'(?'encName'[A-Za-z][A-Za-z0-9._-]*)'))?(?'sddecl'([\u0020\u0009\u000d\u000a]<br />
+)standalone(?'eq'([\u0020\u0009\u000d\u000a]+)?=([\u0020\u0009\u000d\u000a]+)?)<br />
("(yes|no)"|'(yes|no)'))?([\u0020\u0009\u000d\u000a]+)?\?>)?(?'misc'(?'comment'&lt;<br />
!--((?!--)([\u0009\u000a\u000d\u0020-\ud7ff\ue000-\ufffd]|([\ud800-\udbff][\udc0<br />
0-\udfff])))*-->)|(?'PI'&lt;\?(?'pitarget'(?![xX][mM][lL])(?'name'([:A-Z_a-z\u00C0-<br />
\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u<br />
218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc0<br />
0-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F<br />
-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\<br />
uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040]<br />
)*))(([\u0020\u0009\u000d\u000a]+)((?!\?>)([\u0009\u000a\u000d\u0020-\ud7ff\ue00<br />
0-\ufffd]|([\ud800-\udbff][\udc00-\udfff])))*)?\?>)|([\u0020\u0009\u000d\u000a]+<br />
))*((?'doctypedecl'&lt;!DOCTYPE([\u0020\u0009\u000d\u000a]+)(?'name'([:A-Z_a-z\u00C<br />
0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-<br />
\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\ud<br />
c00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u03<br />
7F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0<br />
-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u204<br />
0])*)(([\u0020\u0009\u000d\u000a]+)(?'externalID'SYSTEM([\u0020\u0009\u000d\u000<br />
a]+)(?'systemLiteral'"[^"]*"|'[^']*')|PUBLIC([\u0020\u0009\u000d\u000a]+)(?'pubI<br />
dLiteral'"[a-zA-Z0-9'()+,./:=?;!*#@$_%\u0020\u000d\u000a-]*"|'((?!')[a-zA-Z0-9'(<br />
)+,./:=?;!*#@$_%\u0020\u000d\u000a-])*')([\u0020\u0009\u000d\u000a]+)(?'systemLi<br />
teral'"[^"]*"|'[^']*')))?([\u0020\u0009\u000d\u000a]+)?(\[(?'intSubSet'((?'marku<br />
pdecl'(?'elementdecl'&lt;!ELEMENT([\u0020\u0009\u000d\u000a]+)(?'name'([:A-Z_a-z\u0<br />
0C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u207<br />
0-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\<br />
udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u<br />
037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFD<br />
F0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2<br />
040])*)([\u0020\u0009\u000d\u000a]+)(?'contentspec'EMPTY|ANY|(?'mixed'\(([\u0020<br />
\u0009\u000d\u000a]+)?\#PCDATA(([\u0020\u0009\u000d\u000a]+)?\|([\u0020\u0009\u0<br />
00d\u000a]+)?(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u0<br />
37D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDC<br />
F\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-<br />
\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u<br />
2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[<br />
-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*))*([\u0020\u0009\u000d\u000a]+)?\)\*|\(<br />
([\u0020\u0009\u000d\u000a]+)?\#PCDATA([\u0020\u0009\u000d\u000a]+)?\))|(?'child<br />
ren'unsureifpossible))([\u0020\u0009\u000d\u000a]+)?>)|(?'attlist'&lt;!ATTLIST([\u0<br />
020\u0009\u000d\u000a]+)(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02F<br />
F\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\<br />
uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u<br />
00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u21<br />
8F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-<br />
\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*)(?'attDef'([\u0020\u0009\u00<br />
0d\u000a]+)(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037<br />
D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\<br />
uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u<br />
00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2F<br />
EF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.<br />
0-9\u00B7\u0300-\u036F\u203F-\u2040])*)([\u0020\u0009\u000d\u000a]+)(?'attType'C<br />
DATA|(ID(REF(S)?)?|ENTIT(Y|IES)|NMTOKENS?)|(?'enumType'(?'notation'NOTATION([\u0<br />
020\u0009\u000d\u000a]+)\(([\u0020\u0009\u000d\u000a]+)?(?'name'([:A-Z_a-z\u00C0<br />
-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\<br />
u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc<br />
00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037<br />
F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-<br />
\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040<br />
])*)(([\u0020\u0009\u000d\u000a]+)?\|([\u0020\u0009\u000d\u000a]+)?(?'name'([:A-<br />
Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u2<br />
00D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\<br />
udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-<br />
\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\u<br />
FDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u<br />
203F-\u2040])*))*([\u0020\u0009\u000d\u000a]+)?\))|(?'enumeration'\(([\u0020\u00<br />
09\u000d\u000a]+)?(?'nmtoken'(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\<br />
u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF<br />
900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u<br />
036F\u203F-\u2040])+)(([\u0020\u0009\u000d\u000a]+)?\|([\u0020\u0009\u000d\u000a<br />
]+)?(?'nmtoken'(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u<br />
037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFD<br />
F0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2<br />
040])+))*([\u0020\u0009\u000d\u000a]+)?\))))([\u0020\u0009\u000d\u000a]+)(?'defa<br />
ultDecl'\#REQUIRED|\#IMPLIED|(\#FIXED([\u0020\u0009\u000d\u000a]+))?(?'attValue'<br />
"([^&lt;&amp;"]|(?'reference'(?'entityRef'&amp;(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6<br />
\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u<br />
3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_<br />
a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200<br />
D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\ud<br />
bff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*);())|&amp;\#([0-9]+|<br />
x[0-9a-fA-F]+);()))*"|'([^&lt;&amp;']|(?'reference'(?'entityRef'&amp;(?'name'([:A-Z_a-z\u00<br />
C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070<br />
-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\u<br />
dc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u0<br />
37F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF<br />
0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u20<br />
40])*);())|&amp;\#([0-9]+|x[0-9a-fA-F]+);()))*')))*([\u0020\u0009\u000d\u000a]+)?>)|<br />
(?'entityDecl'(?'gedecl'&lt;!ENTITY([\u0020\u0009\u000d\u000a]+)(?'name'([:A-Z_a-z\<br />
u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2<br />
070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff]<br />
[\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D<br />
\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\u<br />
FDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\<br />
u2040])*)([\u0020\u0009\u000d\u000a]+)(?'entityDef'(?'entityValue'"([^%&amp;"]|%(?'n<br />
ame'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\<br />
u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|(<br />
[\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02<br />
FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF<br />
\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300<br />
-\u036F\u203F-\u2040])*);|(?'reference'(?'entityRef'&amp;(?'name'([:A-Z_a-z\u00C0-\u<br />
00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u21<br />
8F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-<br />
\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\<br />
u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uF<br />
FFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*<br />
);())|&amp;\#([0-9]+|x[0-9a-fA-F]+);()))*"|'([^%&amp;']|%(?'name'([:A-Z_a-z\u00C0-\u00D6<br />
\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u<br />
2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udf<br />
ff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FF<br />
F\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]<br />
|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*);|(<br />
?'reference'(?'entityRef'&amp;(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u0<br />
2FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7F<br />
F\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-<br />
\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u<br />
218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc0<br />
0-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*);())|&amp;\#([0-9]+|x[0-9a-fA-<br />
F]+);()))*')|((?'externalID'SYSTEM([\u0020\u0009\u000d\u000a]+)(?'systemLiteral'<br />
"[^"]*"|'[^']*')|PUBLIC([\u0020\u0009\u000d\u000a]+)(?'pubIdLiteral'"[a-zA-Z0-9'<br />
()+,./:=?;!*#@$_%\u0020\u000d\u000a-]*"|'((?!')[a-zA-Z0-9'()+,./:=?;!*#@$_%\u002<br />
0\u000d\u000a-])*')([\u0020\u0009\u000d\u000a]+)(?'systemLiteral'"[^"]*"|'[^']*'<br />
))(?'ndatadecl'([\u0020\u0009\u000d\u000a]+)NDATA([\u0020\u0009\u000d\u000a]+)(?<br />
'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FF<br />
F\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]<br />
|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u<br />
02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7<br />
FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u03<br />
00-\u036F\u203F-\u2040])*))?))([\u0020\u0009\u000d\u000a]+)?>)|(?'gedecl'&lt;!ENTIT<br />
Y([\u0020\u0009\u000d\u000a]+)%([\u0020\u0009\u000d\u000a]+)(?'name'([:A-Z_a-z\u<br />
00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u20<br />
70-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][<br />
\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\<br />
u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uF<br />
DF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u<br />
2040])*)([\u0020\u0009\u000d\u000a]+)(?'pedef'(?'entityValue'"([^%&amp;"]|%(?'name'(<br />
[:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C<br />
-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud8<br />
00-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0<br />
370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF90<br />
0-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u03<br />
6F\u203F-\u2040])*);|(?'reference'(?'entityRef'&amp;(?'name'([:A-Z_a-z\u00C0-\u00D6\<br />
u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2<br />
C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udff<br />
f]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF<br />
\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|<br />
([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*);())<br />
|&amp;\#([0-9]+|x[0-9a-fA-F]+);()))*"|'([^%&amp;']|%(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D<br />
8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-<br />
\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))<br />
(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u20<br />
0C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\u<br />
d800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*);|(?'ref<br />
erence'(?'entityRef'&amp;(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u<br />
0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF9<br />
00-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D<br />
6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\<br />
u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\ud<br />
fff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*);())|&amp;\#([0-9]+|x[0-9a-fA-F]+);<br />
()))*')|(?'externalID'SYSTEM([\u0020\u0009\u000d\u000a]+)(?'systemLiteral'"[^"]*<br />
"|'[^']*')|PUBLIC([\u0020\u0009\u000d\u000a]+)(?'pubIdLiteral'"[a-zA-Z0-9'()+,./<br />
:=?;!*#@$_%\u0020\u000d\u000a-]*"|'((?!')[a-zA-Z0-9'()+,./:=?;!*#@$_%\u0020\u000<br />
d\u000a-])*')([\u0020\u0009\u000d\u000a]+)(?'systemLiteral'"[^"]*"|'[^']*')))([\<br />
u0020\u0009\u000d\u000a]+)?>))|(?'notationDecl'&lt;!NOTATION([\u0020\u0009\u000d\u0<br />
00a]+)(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u03<br />
7F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0<br />
-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\<br />
u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3<br />
001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u<br />
00B7\u0300-\u036F\u203F-\u2040])*)([\u0020\u0009\u000d\u000a]+)((?'externalID'SY<br />
STEM([\u0020\u0009\u000d\u000a]+)(?'systemLiteral'"[^"]*"|'[^']*')|PUBLIC([\u002<br />
0\u0009\u000d\u000a]+)(?'pubIdLiteral'"[a-zA-Z0-9'()+,./:=?;!*#@$_%\u0020\u000d\<br />
u000a-]*"|'((?!')[a-zA-Z0-9'()+,./:=?;!*#@$_%\u0020\u000d\u000a-])*')([\u0020\u0<br />
009\u000d\u000a]+)(?'systemLiteral'"[^"]*"|'[^']*'))|(?'publicID'PUBLIC([\u0020\<br />
u0009\u000d\u000a]+)(?'pubIdLiteral'"[a-zA-Z0-9'()+,./:=?;!*#@$_%\u0020\u000d\u0<br />
00a-]*"|'((?!')[a-zA-Z0-9'()+,./:=?;!*#@$_%\u0020\u000d\u000a-])*')))([\u0020\u0<br />
009\u000d\u000a]+)?>)|(?'PI'&lt;\?(?'pitarget'(?![xX][mM][lL])(?'name'([:A-Z_a-z\u0<br />
0C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u207<br />
0-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\<br />
udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u<br />
037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFD<br />
F0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2<br />
040])*))(([\u0020\u0009\u000d\u000a]+)((?!\?>)([\u0009\u000a\u000d\u0020-\ud7ff\<br />
ue000-\ufffd]|([\ud800-\udbff][\udc00-\udfff])))*)?\?>)|(?'comment'&lt;!--((?!--)([<br />
\u0009\u000a\u000d\u0020-\ud7ff\ue000-\ufffd]|([\ud800-\udbff][\udc00-\udfff])))<br />
*-->))|(?'declSep'%(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u03<br />
70-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900<br />
-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\<br />
u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2<br />
C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udff<br />
f]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*);|([\u0020\u0009\u000d\u000a]+)))<br />
*)\]([\u0020\u0009\u000d\u000a]+)?)?>)(?'misc'(?'comment'&lt;!--((?!--)([\u0009\u00<br />
0a\u000d\u0020-\ud7ff\ue000-\ufffd]|([\ud800-\udbff][\udc00-\udfff])))*-->)|(?'P<br />
I'&lt;\?(?'pitarget'(?![xX][mM][lL])(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u0<br />
0F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u300<br />
1-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z<br />
\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u<br />
2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff<br />
][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*))(([\u0020\u0009\u0<br />
00d\u000a]+)((?!\?>)([\u0009\u000a\u000d\u0020-\ud7ff\ue000-\ufffd]|([\ud800-\ud<br />
bff][\udc00-\udfff])))*)?\?>)|([\u0020\u0009\u000d\u000a]+))*)?(?'root'(&lt;(?'root<br />
Name'(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037<br />
F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-<br />
\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u<br />
00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u30<br />
01-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u0<br />
0B7\u0300-\u036F\u203F-\u2040])*))(([\u0020\u0009\u000d\u000a]+)(?'attribute'(?'<br />
name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF<br />
\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|<br />
([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u0<br />
2FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7F<br />
F\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u030<br />
0-\u036F\u203F-\u2040])*)(?'eq'([\u0020\u0009\u000d\u000a]+)?=([\u0020\u0009\u00<br />
0d\u000a]+)?)(?'attValue'"([^&lt;&amp;"]|(?'reference'(?'entityRef'&amp;(?'name'([:A-Z_a-z\<br />
u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2<br />
070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff]<br />
[\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D<br />
\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\u<br />
FDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\<br />
u2040])*);())|&amp;\#([0-9]+|x[0-9a-fA-F]+);()))*"|'([^&lt;&amp;']|(?'reference'(?'entityRe<br />
f'&amp;(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-<br />
\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\u<br />
FFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00<br />
F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001<br />
-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B<br />
7\u0300-\u036F\u203F-\u2040])*);())|&amp;\#([0-9]+|x[0-9a-fA-F]+);()))*')))*([\u0020<br />
\u0009\u000d\u000a]+)?>(?>&lt;(?'openclose'(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u<br />
00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2F<br />
EF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:<br />
A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\<br />
u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800<br />
-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*))(([\u0020\u<br />
0009\u000d\u000a]+)(?'attribute'(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00<br />
F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001<br />
-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\<br />
u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2<br />
070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff]<br />
[\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*)(?'eq'([\u0020\u0009<br />
\u000d\u000a]+)?=([\u0020\u0009\u000d\u000a]+)?)(?'attValue'"([^&lt;&amp;"]|(?'referenc<br />
e'(?'entityRef'&amp;(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-<br />
\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\u<br />
FDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00<br />
D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00<br />
-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff])<br />
)|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*);())|&amp;\#([0-9]+|x[0-9a-fA-F]+);()))*<br />
"|'([^&lt;&amp;']|(?'reference'(?'entityRef'&amp;(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00<br />
F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF<br />
\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-<br />
Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u2<br />
00D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\<br />
udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*);())|&amp;\#([0-9]<br />
+|x[0-9a-fA-F]+);()))*')))*([\u0020\u0009\u000d\u000a]+)?/>|&lt;(?'open'(?'name'([:<br />
A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\<br />
u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800<br />
-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u037<br />
0-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-<br />
\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F<br />
\u203F-\u2040])*))(([\u0020\u0009\u000d\u000a]+)(?'attribute'(?'name'([:A-Z_a-z\<br />
u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2<br />
070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff]<br />
[\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D<br />
\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\u<br />
FDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\<br />
u2040])*)(?'eq'([\u0020\u0009\u000d\u000a]+)?=([\u0020\u0009\u000d\u000a]+)?)(?'<br />
attValue'"([^&lt;&amp;"]|(?'reference'(?'entityRef'&amp;(?'name'([:A-Z_a-z\u00C0-\u00D6\u00<br />
D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00<br />
-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff])<br />
)(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u2<br />
00C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\<br />
ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*);())|&amp;\<br />
#([0-9]+|x[0-9a-fA-F]+);()))*"|'([^&lt;&amp;']|(?'reference'(?'entityRef'&amp;(?'name'([:A-<br />
Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u2<br />
00D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\<br />
udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-<br />
\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\u<br />
FDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u<br />
203F-\u2040])*);())|&amp;\#([0-9]+|x[0-9a-fA-F]+);()))*')))*([\u0020\u0009\u000d\u00<br />
0a]+)?>|&lt;/(?=\k'open'([\u0020\u0009\u000d\u000a]+)?>)(?'close-open'(?'name'([:A-<br />
Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u2<br />
00D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\<br />
udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-<br />
\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\u<br />
FDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u<br />
203F-\u2040])*))([\u0020\u0009\u000d\u000a]+)?>|(?'reference'(?'entityRef'&amp;(?'na<br />
me'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u<br />
200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([<br />
\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02F<br />
F\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\<br />
uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-<br />
\u036F\u203F-\u2040])*);())|&amp;\#([0-9]+|x[0-9a-fA-F]+);())|(?'PI'&lt;\?(?'pitarget'(<br />
?![xX][mM][lL])(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\<br />
u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uF<br />
DCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D<br />
8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-<br />
\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))<br />
|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*))(([\u0020\u0009\u000d\u000a]+)((?!\?<br />
>)([\u0009\u000a\u000d\u0020-\ud7ff\ue000-\ufffd]|([\ud800-\udbff][\udc00-\udfff<br />
])))*)?\?>)|(?'comment'&lt;!--((?!--)([\u0009\u000a\u000d\u0020-\ud7ff\ue000-\ufffd<br />
]|([\ud800-\udbff][\udc00-\udfff])))*-->)|(?'CDSect'&lt;!\[CDATA\[((?!\]\]>)([\u000<br />
9\u000a\u000d\u0020-\ud7ff\ue000-\ufffd]|([\ud800-\udbff][\udc00-\udfff])))*\]\]<br />
>)|(((?!\]\]>)[^&lt;&amp;])*))*(?(open)(?!))&lt;/\k'rootName'([\u0020\u0009\u000d\u000a]+)<br />
?>)|(&lt;(?'rootName'(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u037<br />
0-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-<br />
\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u<br />
00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C<br />
00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff<br />
]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*))(([\u0020\u0009\u000d\u000a]+)(?'<br />
attribute'(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D<br />
\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\u<br />
FDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u0<br />
0F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FE<br />
F\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0<br />
-9\u00B7\u0300-\u036F\u203F-\u2040])*)(?'eq'([\u0020\u0009\u000d\u000a]+)?=([\u0<br />
020\u0009\u000d\u000a]+)?)(?'attValue'"([^&lt;&amp;"]|(?'reference'(?'entityRef'&amp;(?'nam<br />
e'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u2<br />
00C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\<br />
ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF<br />
\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\u<br />
F900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\<br />
u036F\u203F-\u2040])*);())|&amp;\#([0-9]+|x[0-9a-fA-F]+);()))*"|'([^&lt;&amp;']|(?'referenc<br />
e'(?'entityRef'&amp;(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-<br />
\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\u<br />
FDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-z\u00C0-\u00D6\u00<br />
D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00<br />
-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff])<br />
)|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*);())|&amp;\#([0-9]+|x[0-9a-fA-F]+);()))*<br />
')))*([\u0020\u0009\u000d\u000a]+)?/>))(?'misc'(?'comment'&lt;!--((?!--)([\u0009\u0<br />
00a\u000d\u0020-\ud7ff\ue000-\ufffd]|([\ud800-\udbff][\udc00-\udfff])))*-->)|(?'<br />
PI'&lt;\?(?'pitarget'(?![xX][mM][lL])(?'name'([:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u<br />
00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u30<br />
01-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbff][\udc00-\udfff]))(([:A-Z_a-<br />
z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\<br />
u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]|([\ud800-\udbf<br />
f][\udc00-\udfff]))|[-.0-9\u00B7\u0300-\u036F\u203F-\u2040])*))(([\u0020\u0009\u<br />
000d\u000a]+)((?!\?>)([\u0009\u000a\u000d\u0020-\ud7ff\ue000-\ufffd]|([\ud800-\u<br />
dbff][\udc00-\udfff])))*)?\?>)|([\u0020\u0009\u000d\u000a]+))*$</code></p>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/so-it-turns-out-that-dot-nets-regex-are-more-powerful-than-i-originally-thought/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ridiculous UTF-8 character counting</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting</link>
		<comments>http://porg.es/blog/ridiculous-utf-8-character-counting#comments</comments>
		<pubDate>Thu, 05 Jun 2008 14:46:39 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[fast]]></category>
		<category><![CDATA[horrid]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[overengineered]]></category>
		<category><![CDATA[silly]]></category>
		<category><![CDATA[simd]]></category>
		<category><![CDATA[sse]]></category>
		<category><![CDATA[strlen]]></category>
		<category><![CDATA[stupid]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=131</guid>
		<description><![CDATA[So, Colin Percival has posted a UTF-8 strlen which improves on my previous post. While his code runs slightly slower than mine on my PC, I assume that’s because his code is aimed at a 64-bit architecture. With 32-bits (reading 4 bytes at a time, instead of 8 ) it doesn’t quite get the same [...]]]></description>
			<content:encoded><![CDATA[<p>So, Colin Percival has <a href="http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html">posted a UTF-8 <code>strlen</code></a> which improves on my previous post. While his code runs slightly slower than mine on my PC, I assume that’s because his code is aimed at a 64-bit architecture. With 32-bits (reading 4 bytes at a time, instead of 8 ) it doesn’t quite get the same speed up.</p>
<p>That said, the vectorization code is <i>clearly</i> an improvement on mine, so let’s take that ball and run with it!</p>
<h3>The Code</h3>
<p>Now we use SIMD instructions to vectorize the counting of characters. I modified this from Colin’s routine, and I’m sure he has some bit-fiddling up his sleeves that would make this run even faster <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_razz.gif" alt="" /></p>
<p>As it is, I used a straightforward algorithm to extract the information.</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#define GetMask(x) __builtin_ia32_pmovmskb128(x)</span>
<span style="color: #339933;">#define LoadBytes(x) __builtin_ia32_loaddqu(x)</span>
<span style="color: #339933;">#define CompareEquality(x,y) __builtin_ia32_pcmpeqb128((x),(y))</span>
<span style="color: #339933;">#define Or(x,y) __builtin_ia32_por128((x),(y))</span>
<span style="color: #339933;">#define NotExpected(x) __builtin_expect((x),0)</span>
<span style="color: #339933;">#define And(x,y) __builtin_ia32_pand128((x),(y))</span>
&nbsp;
<span style="color: #993333;">typedef</span> <span style="color: #993333;">unsigned</span> <span style="color: #993333;">char</span> v16qi __attribute__ <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>vector_size<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> mask<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> match<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> zero<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span> <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">unsigned</span> <span style="color: #993333;">char</span> HammingWeight<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">65536</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//initialized elsewhere</span>
&nbsp;
<span style="color: #993333;">static</span> size_t cp_strlen_utf8_sse2<span style="color: #009900;">&#40;</span><span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>_s<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>s<span style="color: #339933;">;</span>
    <span style="color: #993333;">const</span> v16qi allZero <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>zero<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">const</span> v16qi masking <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>mask<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">const</span> v16qi matching <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>match<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    v16qi row<span style="color: #339933;">;</span>
    size_t count <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">unsigned</span> <span style="color: #993333;">char</span> b<span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// unaligned bytes</span>
    <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>s <span style="color: #339933;">=</span> _s<span style="color: #339933;">;</span> <span style="color: #009900;">&#40;</span><span style="color: #993333;">uintptr_t</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #009900;">&#40;</span><span style="color: #993333;">sizeof</span><span style="color: #009900;">&#40;</span>v16qi<span style="color: #009900;">&#41;</span> <span style="color: #339933;">-</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> s<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        b <span style="color: #339933;">=</span> <span style="color: #339933;">*</span>s<span style="color: #339933;">;</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">==</span> <span style="color: #ff0000;">'<span style="color: #006699; font-weight: bold;">\0</span>'</span><span style="color: #009900;">&#41;</span>
            <span style="color: #b1b100;">goto</span> done<span style="color: #339933;">;</span>
        count <span style="color: #339933;">+=</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">7</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>~b<span style="color: #009900;">&#41;</span> <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">6</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #808080; font-style: italic;">/* Handle complete blocks. */</span>
    <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">;;</span> s <span style="color: #339933;">+=</span> <span style="color: #993333;">sizeof</span><span style="color: #009900;">&#40;</span>v16qi<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #808080; font-style: italic;">/* Prefetch */</span>
        __builtin_prefetch<span style="color: #009900;">&#40;</span><span style="color: #339933;">&amp;</span>s<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">256</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;">/* Load Bytes */</span>
        row <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;">/* Expect this to be false :) */</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>NotExpected<span style="color: #009900;">&#40;</span>GetMask<span style="color: #009900;">&#40;</span>
                                   <span style="color: #808080; font-style: italic;">/* Check for zero bytes */</span>
                                      CompareEquality<span style="color: #009900;">&#40;</span>allZero<span style="color: #339933;">,</span> row<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
            <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;">/* Count number of non-starter bytes */</span>
&nbsp;
        row <span style="color: #339933;">=</span> And<span style="color: #009900;">&#40;</span>row<span style="color: #339933;">,</span> masking<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        row <span style="color: #339933;">=</span> CompareEquality<span style="color: #009900;">&#40;</span>row<span style="color: #339933;">,</span> matching<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        count <span style="color: #339933;">+=</span> HammingWeight<span style="color: #009900;">&#91;</span>GetMask<span style="color: #009900;">&#40;</span>row<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">//leftover bytes</span>
    <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">;;</span> s<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        b <span style="color: #339933;">=</span> <span style="color: #339933;">*</span>s<span style="color: #339933;">;</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">==</span> <span style="color: #ff0000;">'<span style="color: #006699; font-weight: bold;">\0</span>'</span><span style="color: #009900;">&#41;</span>
            <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
        count <span style="color: #339933;">+=</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">7</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>~b<span style="color: #009900;">&#41;</span> <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">6</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
  done<span style="color: #339933;">:</span>
    <span style="color: #b1b100;">return</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>s <span style="color: #339933;">-</span> _s<span style="color: #009900;">&#41;</span> <span style="color: #339933;">-</span> count<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<h3>Results</h3>
<p>This counts about twice as fast as GCC/libc’s standard, non-UTF-8 <code>strlen</code>. Note the discrepancies between my timings of Colin’s code and his own tests. Damn thou, 32-bits!</p>
<pre><code>"": 0 0 0 0 0 0 0
"hello, world": 12 12 12 12 12 12 12
"naïve": 6 6 6 5 5 5 5
"こんにちは": 15 15 15 5 5 5 5
"abcdefghijklmnopqrstuvwxyzβ": 28 28 28 27 27 27 27
testing 33554424 bytes of repeated "hello, world":
                      gcc_strlen =   33554424: 0.019331 +/- 0.001076
                      kjs_strlen =   33554424: 0.035095 +/- 0.000530
                       cp_strlen =   33554424: 0.021472 +/- 0.000310
                 kjs_strlen_utf8 =   33554424: 0.070260 +/- 0.000240
                  gp_strlen_utf8 =   33554424: 0.035144 +/- 0.000471
                  cp_strlen_utf8 =   33554424: 0.050539 +/- 0.000342
             cp_strlen_utf8_sse2 =   33554424: 0.010297 +/- 0.001551
testing 33554430 bytes of repeated "naïve":
                      gcc_strlen =   33554430: 0.019176 +/- 0.000824
                      kjs_strlen =   33554430: 0.035090 +/- 0.000478
                       cp_strlen =   33554430: 0.021472 +/- 0.000323
                 kjs_strlen_utf8 =   27962025: 0.070347 +/- 0.000354
                  gp_strlen_utf8 =   27962025: 0.054802 +/- 0.000299
                  cp_strlen_utf8 =   27962025: 0.050595 +/- 0.000602
             cp_strlen_utf8_sse2 =   27962025: 0.010011 +/- 0.001453
testing 33554430 bytes of repeated "こんにちは":
                      gcc_strlen =   33554430: 0.019331 +/- 0.000836
                      kjs_strlen =   33554430: 0.035225 +/- 0.000411
                       cp_strlen =   33554430: 0.021429 +/- 0.000309
                 kjs_strlen_utf8 =   11184810: 0.070249 +/- 0.000312
                  gp_strlen_utf8 =   11184810: 0.026545 +/- 0.000621
                  cp_strlen_utf8 =   11184810: 0.050512 +/- 0.000273
             cp_strlen_utf8_sse2 =   11184810: 0.010246 +/- 0.001466
testing 33554416 bytes of repeated "abcdefghijklmnopqrstuvwxyzβ":
                      gcc_strlen =   33554416: 0.019308 +/- 0.001091
                      kjs_strlen =   33554416: 0.035070 +/- 0.000486
                       cp_strlen =   33554416: 0.021441 +/- 0.000289
                 kjs_strlen_utf8 =   32356044: 0.070287 +/- 0.000297
                  gp_strlen_utf8 =   32356044: 0.043681 +/- 0.000429
                  cp_strlen_utf8 =   32356044: 0.050402 +/- 0.000204
             cp_strlen_utf8_sse2 =   32356044: 0.010407 +/- 0.001371</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/ridiculous-utf-8-character-counting/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Overengineering</title>
		<link>http://porg.es/blog/overengineering</link>
		<comments>http://porg.es/blog/overengineering#comments</comments>
		<pubDate>Wed, 27 Feb 2008 00:55:09 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[replies]]></category>
		<category><![CDATA[fizzbuzz]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[horrid]]></category>
		<category><![CDATA[humour]]></category>
		<category><![CDATA[overengineered]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://porg.es/blog/overengineering</guid>
		<description><![CDATA[Douglas, you&#8217;re not alone. import Data.List &#40;sortBy&#41; import Data.Function &#40;on&#41; import Data.Maybe &#40;mapMaybe&#41; import Control.Monad.Instances &#160; gizzabuzz pairs combiner = zipWith &#40;$&#41; &#40;cycle funcs&#41; &#91;1..&#93; where sortedPairs = sortBy &#40;compare `on` fst&#41; pairs funcs = map &#40;\n -&#62; display $ mapMaybe &#40;filterOut n&#41; sortedPairs&#41; &#91;1..foldr1 lcm $ map fst $ sortedPairs&#93; display &#91;&#93; = show [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dougalstanton.net/blog/index.php/2008/02/26/my-shame-is-complete">Douglas</a>, you&#8217;re not alone.</p>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;"><span style="color: #06c; font-weight: bold;">import</span> Data<span style="color: #339933; font-weight: bold;">.</span>List <span style="color: green;">&#40;</span>sortBy<span style="color: green;">&#41;</span>
<span style="color: #06c; font-weight: bold;">import</span> Data<span style="color: #339933; font-weight: bold;">.</span>Function <span style="color: green;">&#40;</span>on<span style="color: green;">&#41;</span>
<span style="color: #06c; font-weight: bold;">import</span> Data<span style="color: #339933; font-weight: bold;">.</span><span style="color: #cccc00; font-weight: bold;">Maybe</span> <span style="color: green;">&#40;</span>mapMaybe<span style="color: green;">&#41;</span>
<span style="color: #06c; font-weight: bold;">import</span> Control<span style="color: #339933; font-weight: bold;">.</span><span style="color: #cccc00; font-weight: bold;">Monad</span><span style="color: #339933; font-weight: bold;">.</span>Instances
&nbsp;
gizzabuzz pairs combiner <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">zipWith</span> <span style="color: green;">&#40;</span><span style="color: #339933; font-weight: bold;">$</span><span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">cycle</span> funcs<span style="color: green;">&#41;</span> <span style="color: green;">&#91;</span><span style="color: red;">1</span><span style="color: #339933; font-weight: bold;">..</span><span style="color: green;">&#93;</span>
	<span style="color: #06c; font-weight: bold;">where</span> 
	sortedPairs <span style="color: #339933; font-weight: bold;">=</span> sortBy <span style="color: green;">&#40;</span><span style="font-weight: bold;">compare</span> `on` <span style="font-weight: bold;">fst</span><span style="color: green;">&#41;</span> pairs
	funcs <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">map</span> <span style="color: green;">&#40;</span>\n <span style="color: #339933; font-weight: bold;">-&gt;</span> display <span style="color: #339933; font-weight: bold;">$</span> mapMaybe <span style="color: green;">&#40;</span>filterOut n<span style="color: green;">&#41;</span> sortedPairs<span style="color: green;">&#41;</span> <span style="color: green;">&#91;</span><span style="color: red;">1</span><span style="color: #339933; font-weight: bold;">..</span><span style="font-weight: bold;">foldr1</span> <span style="font-weight: bold;">lcm</span> <span style="color: #339933; font-weight: bold;">$</span> <span style="font-weight: bold;">map</span> <span style="font-weight: bold;">fst</span> <span style="color: #339933; font-weight: bold;">$</span> sortedPairs<span style="color: green;">&#93;</span>
	display <span style="color: green;">&#91;</span><span style="color: green;">&#93;</span> <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">show</span>
	display xs <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">foldr1</span> combiner <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">sequence</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">map</span> <span style="font-weight: bold;">const</span> xs<span style="color: green;">&#41;</span>
	filterOut n <span style="color: green;">&#40;</span>x<span style="color: #339933; font-weight: bold;">,</span>y<span style="color: green;">&#41;</span>
		<span style="color: #339933; font-weight: bold;">|</span> n `<span style="font-weight: bold;">mod</span>` x <span style="color: #339933; font-weight: bold;">==</span> <span style="color: red;">0</span> <span style="color: #339933; font-weight: bold;">=</span> Just y
		<span style="color: #339933; font-weight: bold;">|</span> <span style="font-weight: bold;">otherwise</span>      <span style="color: #339933; font-weight: bold;">=</span> Nothing
&nbsp;
fizzbuzz <span style="color: #339933; font-weight: bold;">=</span> gizzabuzz <span style="color: green;">&#91;</span><span style="color: green;">&#40;</span><span style="color: red;">3</span><span style="color: #339933; font-weight: bold;">,</span><span style="background-color: #3cb371;">&quot;Fizz&quot;</span><span style="color: green;">&#41;</span><span style="color: #339933; font-weight: bold;">,</span><span style="color: green;">&#40;</span><span style="color: red;">5</span><span style="color: #339933; font-weight: bold;">,</span><span style="background-color: #3cb371;">&quot;Buzz&quot;</span><span style="color: green;">&#41;</span><span style="color: green;">&#93;</span> <span style="color: green;">&#40;</span><span style="color: #339933; font-weight: bold;">++</span><span style="color: green;">&#41;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/overengineering/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

