<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>porges &#187; C#</title>
	<atom:link href="http://porg.es/blog/tag/c/feed" rel="self" type="application/rss+xml" />
	<link>http://porg.es/blog</link>
	<description>... master of none</description>
	<lastBuildDate>Sun, 22 Aug 2010 21:36:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Sometimes it&#8217;s all too much&#8230;</title>
		<link>http://porg.es/blog/sometimes-its-all-too-much</link>
		<comments>http://porg.es/blog/sometimes-its-all-too-much#comments</comments>
		<pubDate>Mon, 22 Jun 2009 13:19:54 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[annoyances]]></category>
		<category><![CDATA[builtins]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[floating-point]]></category>
		<category><![CDATA[GCC]]></category>
		<category><![CDATA[processor]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=365</guid>
		<description><![CDATA[Argh #include &#60;stdio.h&#62; #include &#60;math.h&#62; #include &#60;fenv.h&#62; &#160; int main &#40;&#41; &#123; // don't set rounding here double ten0 = sin&#40;pow&#40;10.0,22&#41;&#41;; &#160; fesetround&#40;FE_DOWNWARD&#41;; double ten1 = sin&#40;pow&#40;10.0,22&#41;&#41;; &#160; fesetround&#40;FE_UPWARD&#41;; double ten2 = sin&#40;pow&#40;10.0,22&#41;&#41;; &#160; fesetround&#40;FE_TONEAREST&#41;; double ten3 = sin&#40;pow&#40;10.0,22&#41;&#41;; &#160; fesetround&#40;FE_TOWARDZERO&#41;; double ten4 = sin&#40;pow&#40;10.0,22&#41;&#41;; &#160; printf&#40; &#34;Default: %f\n&#34; &#34;Downward: %f\n&#34; &#34;Upward: %f\n&#34; &#34;ToNearest: %f\n&#34; [...]]]></description>
			<content:encoded><![CDATA[<p>Argh <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_sad.gif" alt="" /></p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &lt;stdio.h&gt;</span>
<span style="color: #339933;">#include &lt;math.h&gt;</span>
<span style="color: #339933;">#include &lt;fenv.h&gt;</span>
&nbsp;
<span style="color: #993333;">int</span> main <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #666666; font-style: italic;">// don't set rounding here</span>
	<span style="color: #993333;">double</span> ten0 <span style="color: #339933;">=</span> sin<span style="color: #009900;">&#40;</span>pow<span style="color: #009900;">&#40;</span><span style="color:#800080;">10.0</span><span style="color: #339933;">,</span><span style="color: #0000dd;">22</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	fesetround<span style="color: #009900;">&#40;</span>FE_DOWNWARD<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #993333;">double</span> ten1 <span style="color: #339933;">=</span> sin<span style="color: #009900;">&#40;</span>pow<span style="color: #009900;">&#40;</span><span style="color:#800080;">10.0</span><span style="color: #339933;">,</span><span style="color: #0000dd;">22</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	fesetround<span style="color: #009900;">&#40;</span>FE_UPWARD<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #993333;">double</span> ten2 <span style="color: #339933;">=</span> sin<span style="color: #009900;">&#40;</span>pow<span style="color: #009900;">&#40;</span><span style="color:#800080;">10.0</span><span style="color: #339933;">,</span><span style="color: #0000dd;">22</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	fesetround<span style="color: #009900;">&#40;</span>FE_TONEAREST<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #993333;">double</span> ten3 <span style="color: #339933;">=</span> sin<span style="color: #009900;">&#40;</span>pow<span style="color: #009900;">&#40;</span><span style="color:#800080;">10.0</span><span style="color: #339933;">,</span><span style="color: #0000dd;">22</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	fesetround<span style="color: #009900;">&#40;</span>FE_TOWARDZERO<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #993333;">double</span> ten4 <span style="color: #339933;">=</span> sin<span style="color: #009900;">&#40;</span>pow<span style="color: #009900;">&#40;</span><span style="color:#800080;">10.0</span><span style="color: #339933;">,</span><span style="color: #0000dd;">22</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span>
		<span style="color: #ff0000;">&quot;Default:    %f<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>
		<span style="color: #ff0000;">&quot;Downward:   %f<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>
		<span style="color: #ff0000;">&quot;Upward:     %f<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>
		<span style="color: #ff0000;">&quot;ToNearest:  %f<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>
		<span style="color: #ff0000;">&quot;TowardZero: %f<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span>
		ten0<span style="color: #339933;">,</span> ten1<span style="color: #339933;">,</span>
		ten2<span style="color: #339933;">,</span> ten3<span style="color: #339933;">,</span> ten4<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>


<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ <span style="color: #c20cb9; font-weight: bold;">gcc</span> test.c <span style="color: #660033;">-lm</span> <span style="color: #660033;">-fno-builtin</span> <span style="color: #000000; font-weight: bold;">&amp;&amp;</span> .<span style="color: #000000; font-weight: bold;">/</span>a.out
Default:    <span style="color: #000000;">0.462613</span>
Downward:   <span style="color: #000000;">0.986580</span>
Upward:     <span style="color: #000000;">0.462613</span>
ToNearest:  <span style="color: #000000;">0.462613</span>
TowardZero: <span style="color: #000000;">0.986580</span>
~$ <span style="color: #c20cb9; font-weight: bold;">gcc</span> test.c <span style="color: #660033;">-lm</span>  <span style="color: #000000; font-weight: bold;">&amp;&amp;</span> .<span style="color: #000000; font-weight: bold;">/</span>a.out
Default:    -<span style="color: #000000;">0.852201</span>
Downward:   -<span style="color: #000000;">0.852201</span>
Upward:     -<span style="color: #000000;">0.852201</span>
ToNearest:  -<span style="color: #000000;">0.852201</span>
TowardZero: -<span style="color: #000000;">0.852201</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/sometimes-its-all-too-much/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Things I&#8217;d like to see in C#: Conditional interface implementation</title>
		<link>http://porg.es/blog/things-id-like-to-see-in-c-conditional-interface-implementation</link>
		<comments>http://porg.es/blog/things-id-like-to-see-in-c-conditional-interface-implementation#comments</comments>
		<pubDate>Sat, 20 Sep 2008 02:54:29 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[idea]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=171</guid>
		<description><![CDATA[To explain this, imagine you are designing a type Wrapper&#60;T&#62;, which is just a simple wrapper around a type T. class Wrapper&#60;T&#62; &#123; T value; public Wrapper&#40;T theObject&#41; &#123; value = theObject; &#125; // Some other methods... &#125; Now, since this is a just a wrapper around some type T, we would like to implement [...]]]></description>
			<content:encoded><![CDATA[<p>To explain this, imagine you are designing a type <code>Wrapper&lt;T&gt;</code>, which is just a simple wrapper around a type <code>T</code>.</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">class</span> Wrapper<span style="color: #008000;">&lt;</span>T<span style="color: #008000;">&gt;</span>
<span style="color: #000000;">&#123;</span>
    T value<span style="color: #008000;">;</span>
    <span style="color: #0600FF;">public</span> Wrapper<span style="color: #000000;">&#40;</span>T theObject<span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        value <span style="color: #008000;">=</span> theObject<span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
    <span style="color: #008080; font-style: italic;">// Some other methods...</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>Now, since this is a just a wrapper around some type <code>T</code>, we would like to implement some simple interfaces around this object. For example, if <code>T</code> is comparable, we would like the wrapper class to implement <code>IComparable&lt;Wrapper&lt;T&gt;&gt;</code>. However this is not possible with C#. In Haskell we would have something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;"><span style="color: #06c; font-weight: bold;">instance</span> <span style="color: green;">&#40;</span>Disposable a<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">=&gt;</span> Disposable <span style="color: green;">&#40;</span>Wrapper a<span style="color: green;">&#41;</span> <span style="color: #06c; font-weight: bold;">where</span> <span style="color: #339933; font-weight: bold;">...</span></pre></div></div>

<p>This says &#8220;if <code>a</code> is a type that is an instance of <code>Disposable</code>, then the type <code>Wrapper a</code> is also an instance of <code>Disposable</code>”. In C#, I&#8217;d like to see something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">class</span> Wrapper<span style="color: #008000;">&lt;</span>T<span style="color: #008000;">&gt;</span>
    <span style="color: #008000;">:</span> IComparable<span style="color: #008000;">&lt;</span>Wrapper<span style="color: #008000;">&lt;</span>T<span style="color: #008000;">&gt;&gt;</span>
        when T <span style="color: #008000;">:</span> IComparable<span style="color: #008000;">&lt;</span>T<span style="color: #008000;">&gt;</span> <span style="color: #008080; font-style: italic;">//this is the syntax extension</span>
<span style="color: #000000;">&#123;</span>
    <span style="color: #008080; font-style: italic;">//...</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>And then in any methods that the interface specifies (in this case <code>int Compare(Wrapper&lt;T&gt; other)</code>), it is assumed that for any object of type <code>T</code> you have access to all the methods that <code>T</code> has when it implements <code>IComparable&lt;T&gt;</code>.</p>
<p>At the moment the best you can do is to <em>always</em> implement <code>IComparable&lt;Wrapper&lt;T&gt;&gt;</code> and just throw a runtime exception when <code>T</code> doesn&#8217;t implement <code>IComparable&lt;T&gt;</code>, which isn&#8217;t very nice.</p>
<p>A simple idea, but it would add great power to C#.</p>
<p>After some (very quick) research I haven&#8217;t found anything that suggests anyone else has attempted to get this into C#, but there is <a href="http://homepages.cwi.nl/~ralf/JavaGI/paper.pdf">JavaGI</a> (see section 2.4 for the equivalent of what this post talks about) where the authors have extended Java to deal with what they term ‘generalized interfaces’—a concept which alters Java (rather radically) to make interface implementations similar to Haskell&#8217;s typeclasses.</p>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/things-id-like-to-see-in-c-conditional-interface-implementation/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Anachronistic Aphorism</title>
		<link>http://porg.es/blog/n2550</link>
		<comments>http://porg.es/blog/n2550#comments</comments>
		<pubDate>Sat, 07 Jun 2008 05:19:09 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[commentary]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[inflamatory]]></category>
		<category><![CDATA[short]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=132</guid>
		<description><![CDATA[Some programming languages manage to absorb change, but withstand progress. —‘Epigrams in Programming’, Alan J. Perlis, ACM SIGPLAN Sept. 1982]]></description>
			<content:encoded><![CDATA[<blockquote><p><a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2550.pdf">Some programming languages manage to absorb change, but withstand progress.</a></p></blockquote>
<p>—‘Epigrams in Programming’, Alan J. Perlis, ACM SIGPLAN Sept. 1982</p>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/n2550/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ridiculous UTF-8 character counting</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting</link>
		<comments>http://porg.es/blog/ridiculous-utf-8-character-counting#comments</comments>
		<pubDate>Thu, 05 Jun 2008 14:46:39 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[fast]]></category>
		<category><![CDATA[horrid]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[overengineered]]></category>
		<category><![CDATA[silly]]></category>
		<category><![CDATA[simd]]></category>
		<category><![CDATA[sse]]></category>
		<category><![CDATA[strlen]]></category>
		<category><![CDATA[stupid]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=131</guid>
		<description><![CDATA[So, Colin Percival has posted a UTF-8 strlen which improves on my previous post. While his code runs slightly slower than mine on my PC, I assume that’s because his code is aimed at a 64-bit architecture. With 32-bits (reading 4 bytes at a time, instead of 8 ) it doesn’t quite get the same [...]]]></description>
			<content:encoded><![CDATA[<p>So, Colin Percival has <a href="http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html">posted a UTF-8 <code>strlen</code></a> which improves on my previous post. While his code runs slightly slower than mine on my PC, I assume that’s because his code is aimed at a 64-bit architecture. With 32-bits (reading 4 bytes at a time, instead of 8 ) it doesn’t quite get the same speed up.</p>
<p>That said, the vectorization code is <i>clearly</i> an improvement on mine, so let’s take that ball and run with it!</p>
<h3>The Code</h3>
<p>Now we use SIMD instructions to vectorize the counting of characters. I modified this from Colin’s routine, and I’m sure he has some bit-fiddling up his sleeves that would make this run even faster <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_razz.gif" alt="" /></p>
<p>As it is, I used a straightforward algorithm to extract the information.</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#define GetMask(x) __builtin_ia32_pmovmskb128(x)</span>
<span style="color: #339933;">#define LoadBytes(x) __builtin_ia32_loaddqu(x)</span>
<span style="color: #339933;">#define CompareEquality(x,y) __builtin_ia32_pcmpeqb128((x),(y))</span>
<span style="color: #339933;">#define Or(x,y) __builtin_ia32_por128((x),(y))</span>
<span style="color: #339933;">#define NotExpected(x) __builtin_expect((x),0)</span>
<span style="color: #339933;">#define And(x,y) __builtin_ia32_pand128((x),(y))</span>
&nbsp;
<span style="color: #993333;">typedef</span> <span style="color: #993333;">unsigned</span> <span style="color: #993333;">char</span> v16qi __attribute__ <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>vector_size<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> mask<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> match<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> zero<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span> <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">unsigned</span> <span style="color: #993333;">char</span> HammingWeight<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">65536</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//initialized elsewhere</span>
&nbsp;
<span style="color: #993333;">static</span> size_t cp_strlen_utf8_sse2<span style="color: #009900;">&#40;</span><span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>_s<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>s<span style="color: #339933;">;</span>
    <span style="color: #993333;">const</span> v16qi allZero <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>zero<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">const</span> v16qi masking <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>mask<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">const</span> v16qi matching <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>match<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    v16qi row<span style="color: #339933;">;</span>
    size_t count <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">unsigned</span> <span style="color: #993333;">char</span> b<span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// unaligned bytes</span>
    <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>s <span style="color: #339933;">=</span> _s<span style="color: #339933;">;</span> <span style="color: #009900;">&#40;</span>uintptr_t<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #009900;">&#40;</span><span style="color: #993333;">sizeof</span><span style="color: #009900;">&#40;</span>v16qi<span style="color: #009900;">&#41;</span> <span style="color: #339933;">-</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> s<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        b <span style="color: #339933;">=</span> <span style="color: #339933;">*</span>s<span style="color: #339933;">;</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">==</span> <span style="color: #ff0000;">'<span style="color: #006699; font-weight: bold;">\0</span>'</span><span style="color: #009900;">&#41;</span>
            <span style="color: #b1b100;">goto</span> done<span style="color: #339933;">;</span>
        count <span style="color: #339933;">+=</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">7</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>~b<span style="color: #009900;">&#41;</span> <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">6</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #808080; font-style: italic;">/* Handle complete blocks. */</span>
    <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">;;</span> s <span style="color: #339933;">+=</span> <span style="color: #993333;">sizeof</span><span style="color: #009900;">&#40;</span>v16qi<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #808080; font-style: italic;">/* Prefetch */</span>
        __builtin_prefetch<span style="color: #009900;">&#40;</span><span style="color: #339933;">&amp;</span>s<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">256</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;">/* Load Bytes */</span>
        row <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;">/* Expect this to be false :) */</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>NotExpected<span style="color: #009900;">&#40;</span>GetMask<span style="color: #009900;">&#40;</span>
                                   <span style="color: #808080; font-style: italic;">/* Check for zero bytes */</span>
                                      CompareEquality<span style="color: #009900;">&#40;</span>allZero<span style="color: #339933;">,</span> row<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
            <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;">/* Count number of non-starter bytes */</span>
&nbsp;
        row <span style="color: #339933;">=</span> And<span style="color: #009900;">&#40;</span>row<span style="color: #339933;">,</span> masking<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        row <span style="color: #339933;">=</span> CompareEquality<span style="color: #009900;">&#40;</span>row<span style="color: #339933;">,</span> matching<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        count <span style="color: #339933;">+=</span> HammingWeight<span style="color: #009900;">&#91;</span>GetMask<span style="color: #009900;">&#40;</span>row<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">//leftover bytes</span>
    <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">;;</span> s<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        b <span style="color: #339933;">=</span> <span style="color: #339933;">*</span>s<span style="color: #339933;">;</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">==</span> <span style="color: #ff0000;">'<span style="color: #006699; font-weight: bold;">\0</span>'</span><span style="color: #009900;">&#41;</span>
            <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
        count <span style="color: #339933;">+=</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">7</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>~b<span style="color: #009900;">&#41;</span> <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">6</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
  done<span style="color: #339933;">:</span>
    <span style="color: #b1b100;">return</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>s <span style="color: #339933;">-</span> _s<span style="color: #009900;">&#41;</span> <span style="color: #339933;">-</span> count<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<h3>Results</h3>
<p>This counts about twice as fast as GCC/libc’s standard, non-UTF-8 <code>strlen</code>. Note the discrepancies between my timings of Colin’s code and his own tests. Damn thou, 32-bits!</p>
<pre><code>"": 0 0 0 0 0 0 0
"hello, world": 12 12 12 12 12 12 12
"naïve": 6 6 6 5 5 5 5
"こんにちは": 15 15 15 5 5 5 5
"abcdefghijklmnopqrstuvwxyzβ": 28 28 28 27 27 27 27
testing 33554424 bytes of repeated "hello, world":
                      gcc_strlen =   33554424: 0.019331 +/- 0.001076
                      kjs_strlen =   33554424: 0.035095 +/- 0.000530
                       cp_strlen =   33554424: 0.021472 +/- 0.000310
                 kjs_strlen_utf8 =   33554424: 0.070260 +/- 0.000240
                  gp_strlen_utf8 =   33554424: 0.035144 +/- 0.000471
                  cp_strlen_utf8 =   33554424: 0.050539 +/- 0.000342
             cp_strlen_utf8_sse2 =   33554424: 0.010297 +/- 0.001551
testing 33554430 bytes of repeated "naïve":
                      gcc_strlen =   33554430: 0.019176 +/- 0.000824
                      kjs_strlen =   33554430: 0.035090 +/- 0.000478
                       cp_strlen =   33554430: 0.021472 +/- 0.000323
                 kjs_strlen_utf8 =   27962025: 0.070347 +/- 0.000354
                  gp_strlen_utf8 =   27962025: 0.054802 +/- 0.000299
                  cp_strlen_utf8 =   27962025: 0.050595 +/- 0.000602
             cp_strlen_utf8_sse2 =   27962025: 0.010011 +/- 0.001453
testing 33554430 bytes of repeated "こんにちは":
                      gcc_strlen =   33554430: 0.019331 +/- 0.000836
                      kjs_strlen =   33554430: 0.035225 +/- 0.000411
                       cp_strlen =   33554430: 0.021429 +/- 0.000309
                 kjs_strlen_utf8 =   11184810: 0.070249 +/- 0.000312
                  gp_strlen_utf8 =   11184810: 0.026545 +/- 0.000621
                  cp_strlen_utf8 =   11184810: 0.050512 +/- 0.000273
             cp_strlen_utf8_sse2 =   11184810: 0.010246 +/- 0.001466
testing 33554416 bytes of repeated "abcdefghijklmnopqrstuvwxyzβ":
                      gcc_strlen =   33554416: 0.019308 +/- 0.001091
                      kjs_strlen =   33554416: 0.035070 +/- 0.000486
                       cp_strlen =   33554416: 0.021441 +/- 0.000289
                 kjs_strlen_utf8 =   32356044: 0.070287 +/- 0.000297
                  gp_strlen_utf8 =   32356044: 0.043681 +/- 0.000429
                  cp_strlen_utf8 =   32356044: 0.050402 +/- 0.000204
             cp_strlen_utf8_sse2 =   32356044: 0.010407 +/- 0.001371</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/ridiculous-utf-8-character-counting/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Counting Characters in UTF-8 Strings Is Fast(er)</title>
		<link>http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster</link>
		<comments>http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster#comments</comments>
		<pubDate>Wed, 04 Jun 2008 05:34:57 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[fast]]></category>
		<category><![CDATA[speed]]></category>
		<category><![CDATA[strings]]></category>
		<category><![CDATA[strlen]]></category>
		<category><![CDATA[utf8]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=130</guid>
		<description><![CDATA[‘Counting Characters in UTF-8 Strings Is Fast’ by Kragen Sitaker shows several ways to count characters UTF-8, using both assembly and C. But, with a few assumptions, we can go faster. Assumption One: We are dealing with a valid UTF-8 string Making this assumption means that once we hit the start of a multi-byte character [...]]]></description>
			<content:encoded><![CDATA[<p>‘<a href="http://canonical.org/~kragen/strlen-utf8.html">Counting Characters in UTF-8 Strings Is Fast</a>’ by Kragen Sitaker shows several ways to count characters UTF-8, using both assembly and C. But, with a few assumptions, we can go faster.</p>
<h3>Assumption One: We are dealing with a valid UTF-8 string</h3>
<p>Making this assumption means that once we hit the start of a multi-byte character we can skip forward a few places. It also means we don&#8217;t check for hitting invalid characters (<s>this sends the algorithm into an infinite loop if run on non-valid input</s> it is possible to make the algorithm run past the end of the buffer by supplying malformed data).</p>
<h3>Assumption Two: Most strings are ASCII</h3>
<p>Therefore, run a simple ASCII count routine beforehand. As soon as we hit a non-ASCII character switch into counting UTF-8.</p>
<h3>The code</h3>
<p>Note: The current code relies on chars being signed bytes.</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> porges_strlen2<span style="color: #009900;">&#40;</span><span style="color: #993333;">char</span> <span style="color: #339933;">*</span>s<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
        <span style="color: #993333;">int</span> i <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #666666; font-style: italic;">//Go fast if string is only ASCII.</span>
        <span style="color: #666666; font-style: italic;">//Loop while not at end of string,</span>
        <span style="color: #666666; font-style: italic;">// and not reading anything with highest bit set.</span>
        <span style="color: #666666; font-style: italic;">//If highest bit is set, number is negative.</span>
        <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&gt;</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
                i<span style="color: #339933;">++;</span>
&nbsp;
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&lt;=</span> <span style="color: #339933;">-</span><span style="color: #0000dd;">65</span><span style="color: #009900;">&#41;</span> <span style="color: #666666; font-style: italic;">// all follower bytes have values below -65</span>
                <span style="color: #b1b100;">return</span> <span style="color: #339933;">-</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// invalid</span>
&nbsp;
        <span style="color: #666666; font-style: italic;">//Note, however, that the following code does *not*</span>
        <span style="color: #666666; font-style: italic;">// check for invalid characters.</span>
        <span style="color: #666666; font-style: italic;">//The above is just included to bail out on the tests :)</span>
&nbsp;
        <span style="color: #993333;">int</span> count <span style="color: #339933;">=</span> i<span style="color: #339933;">;</span>
        <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">//if ASCII just go to next character</span>
                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&gt;</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>      i <span style="color: #339933;">+=</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
                <span style="color: #b1b100;">else</span>
                <span style="color: #666666; font-style: italic;">//select amongst multi-byte starters</span>
                <span style="color: #b1b100;">switch</span> <span style="color: #009900;">&#40;</span><span style="color: #208080;">0xF0</span> <span style="color: #339933;">&amp;</span> s<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
                <span style="color: #009900;">&#123;</span>
                        <span style="color: #b1b100;">case</span> <span style="color: #208080;">0xE0</span><span style="color: #339933;">:</span> i <span style="color: #339933;">+=</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
                        <span style="color: #b1b100;">case</span> <span style="color: #208080;">0xF0</span><span style="color: #339933;">:</span> i <span style="color: #339933;">+=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
                        <span style="color: #b1b100;">default</span><span style="color: #339933;">:</span>   i <span style="color: #339933;">+=</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
                <span style="color: #009900;">&#125;</span>
                <span style="color: #339933;">++</span>count<span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
        <span style="color: #b1b100;">return</span> count<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<h3>Results</h3>
<p>I used Kragen’s testing code, but removed all <code>strlen</code>s that didn’t do UTF-8 counting, and added one test for valid UTF-8 text (just the phrase ‘こんにちは’ repeated). Twice as fast on both the ASCII-only and UTF-8 tests. Improvement on ASCII is due to the ASCII-only routine, and improvement on UTF-8 is due to skipping bytes.</p>
<pre><code>"": 0 0 0 0 0
"hello, world": 12 12 12 12 12
"naïve": 5 5 5 5 5
"こんにちは": 5 5 5 5 5
1: all 'a':
1:           porges_strlen2(string) =   33554431: 0.034672
1:         ap_strlen_utf8_s(string) =   33554431: 0.068210
1:         my_strlen_utf8_c(string) =   33554431: 0.071038
1:         my_strlen_utf8_s(string) =   33554431: 0.135856
2: all '\xe3':
2:           porges_strlen2(string) =   11184811: 0.032115
2:         ap_strlen_utf8_s(string) =   33554431: 0.068228
2:         my_strlen_utf8_c(string) =   33554431: 0.071050
2:         my_strlen_utf8_s(string) =   33554431: 0.152513
3: all '\x81':
3:           porges_strlen2(string) =         -1: 0.000001
3:         my_strlen_utf8_s(string) =          0: 0.068339
3:         ap_strlen_utf8_s(string) =          0: 0.068547
3:         my_strlen_utf8_c(string) =          0: 0.071039
4: all konichiwa:
4:           porges_strlen2(string) =   11184810: 0.032143
4:         ap_strlen_utf8_s(string) =   11184810: 0.068271
4:         my_strlen_utf8_c(string) =   11184810: 0.071036
4:         my_strlen_utf8_s(string) =   11184810: 0.089478
</code></pre>
<p>Note also that the invalid UTF-8 gives strange results; this is because the algorithm isn’t meant to work on it! (The first invalid sequence is a list of 3-byte starters, so the result is divided in 3 due to skipping, and the second is a list of follower bytes, so the code bails out.)</p>
<h3>Going faster</h3>
<p>By dropping back to the ASCII counter whenever we hit ASCII again, we go even faster. This will handle the cases (such as in English) where there are many ASCII characters and only a few multibyte ones.</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> porges_strlen2<span style="color: #009900;">&#40;</span><span style="color: #993333;">char</span> <span style="color: #339933;">*</span>s<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
        <span style="color: #993333;">int</span> i <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
        <span style="color: #993333;">int</span> iBefore <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
        <span style="color: #993333;">int</span> count <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&gt;</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
                ascii<span style="color: #339933;">:</span>  i<span style="color: #339933;">++;</span>
&nbsp;
        count <span style="color: #339933;">+=</span> i<span style="color: #339933;">-</span>iBefore<span style="color: #339933;">;</span>
        <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&gt;</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
                <span style="color: #009900;">&#123;</span>
                        iBefore <span style="color: #339933;">=</span> i<span style="color: #339933;">;</span>
                        <span style="color: #b1b100;">goto</span> ascii<span style="color: #339933;">;</span>
                <span style="color: #009900;">&#125;</span>
                <span style="color: #b1b100;">else</span>
                <span style="color: #b1b100;">switch</span> <span style="color: #009900;">&#40;</span><span style="color: #208080;">0xF0</span> <span style="color: #339933;">&amp;</span> s<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
                <span style="color: #009900;">&#123;</span>
                        <span style="color: #b1b100;">case</span> <span style="color: #208080;">0xE0</span><span style="color: #339933;">:</span> i <span style="color: #339933;">+=</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
                        <span style="color: #b1b100;">case</span> <span style="color: #208080;">0xF0</span><span style="color: #339933;">:</span> i <span style="color: #339933;">+=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
                        <span style="color: #b1b100;">default</span><span style="color: #339933;">:</span>   i <span style="color: #339933;">+=</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
                <span style="color: #009900;">&#125;</span>
                <span style="color: #339933;">++</span>count<span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
        <span style="color: #b1b100;">return</span> count<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>But on the ‘konichiwa’ test the speed improvement happens even though we’re counting pure multibyte, and I’m not sure exactly why&#8230; probably something to do with branch prediction or another arcane CPU topic I don’t understand. <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_smile.gif" alt="" /></p>
<pre><code>4: all konichiwa:
4:           porges_strlen2(string) =   11184810: 0.026017
4:         ap_strlen_utf8_s(string) =   11184810: 0.068320
4:         my_strlen_utf8_c(string) =   11184810: 0.071035
4:         my_strlen_utf8_s(string) =   11184810: 0.089464
5: mixed english:
5:           porges_strlen2(string) =   32435949: 0.040342
5:         my_strlen_utf8_c(string) =   32435949: 0.071035
5:         ap_strlen_utf8_s(string) =   32435949: 0.078233
5:         my_strlen_utf8_s(string) =   32435949: 0.160676</code></pre>
<p>Without the drop-back-to-ASCII modification:</p>
<pre><code>5: mixed english:
5:           porges_strlen2(string) =   32435949: 0.067753</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Implementing IEnumerable easily</title>
		<link>http://porg.es/blog/implementing-ienumerable-easily</link>
		<comments>http://porg.es/blog/implementing-ienumerable-easily#comments</comments>
		<pubDate>Fri, 04 Apr 2008 09:43:02 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[.net]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[IEnumerable]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=125</guid>
		<description><![CDATA[Say that you’re implementing a linked list, and you want an enumerator: public IEnumerator&#60;T&#62; GetEnumerator&#40;&#41; &#123; return new Stream&#60;T,Node&#62;&#40;first, node =&#62; node.next == null ? null : Tuple.Of&#40;node.next, node.datum&#41;.AsNullable&#40;&#41;&#41;; &#125; This uses the following utility class to implement the enumerator in one line (along with some code for Tuples and an extension method for structs): [...]]]></description>
			<content:encoded><![CDATA[<p>Say that you’re implementing a linked list, and you want an enumerator:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">public</span> IEnumerator<span style="color: #008000;">&lt;</span>T<span style="color: #008000;">&gt;</span> GetEnumerator<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>
<span style="color: #000000;">&#123;</span>
    <span style="color: #0600FF;">return</span> <span style="color: #008000;">new</span> Stream<span style="color: #008000;">&lt;</span>T,Node<span style="color: #008000;">&gt;</span><span style="color: #000000;">&#40;</span>first,
        node <span style="color: #008000;">=&gt;</span> node.<span style="color: #0000FF;">next</span> <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span> <span style="color: #008000;">?</span> <span style="color: #0600FF;">null</span> <span style="color: #008000;">:</span> Tuple.<span style="color: #0000FF;">Of</span><span style="color: #000000;">&#40;</span>node.<span style="color: #0000FF;">next</span>, node.<span style="color: #0000FF;">datum</span><span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">AsNullable</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>This uses the following utility class to implement the enumerator in one line (along with some code for Tuples and an extension method for structs):</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">public</span> <span style="color: #FF0000;">class</span> Stream<span style="color: #008000;">&lt;</span>Tdata, Tstate<span style="color: #008000;">&gt;</span> <span style="color: #008000;">:</span> IEnumerator<span style="color: #008000;">&lt;</span>Tdata<span style="color: #008000;">&gt;</span>
<span style="color: #000000;">&#123;</span>
    <span style="color: #0600FF;">private</span> Tdata current<span style="color: #008000;">;</span>
    <span style="color: #0600FF;">private</span> Tstate initialState<span style="color: #008000;">;</span>
    <span style="color: #0600FF;">private</span> Tstate state<span style="color: #008000;">;</span>
    <span style="color: #0600FF;">private</span> Func<span style="color: #008000;">&lt;</span>Tstate, Pair<span style="color: #008000;">&lt;</span>Tstate, Tdata<span style="color: #008000;">&gt;?&gt;</span> moveNext<span style="color: #008000;">;</span>
&nbsp;
    <span style="color: #0600FF;">public</span> Stream<span style="color: #000000;">&#40;</span>Tstate initialState, Func<span style="color: #008000;">&lt;</span>Tstate, Pair<span style="color: #008000;">&lt;</span>Tstate, Tdata<span style="color: #008000;">&gt;?&gt;</span> calcNextValue<span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        moveNext <span style="color: #008000;">=</span> calcNextValue<span style="color: #008000;">;</span>
        <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">initialState</span> <span style="color: #008000;">=</span> initialState<span style="color: #008000;">;</span>
        Reset<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
&nbsp;
    <span style="color: #008080;">#region IEnumerator&lt;Tdata&gt; Members</span>
&nbsp;
    <span style="color: #0600FF;">public</span> Tdata Current
    <span style="color: #000000;">&#123;</span>
        get <span style="color: #000000;">&#123;</span> <span style="color: #0600FF;">return</span> current<span style="color: #008000;">;</span> <span style="color: #000000;">&#125;</span>
    <span style="color: #000000;">&#125;</span>
&nbsp;
    <span style="color: #008080;">#endregion</span>
&nbsp;
    <span style="color: #008080;">#region IDisposable Members</span>
&nbsp;
    <span style="color: #0600FF;">public</span> <span style="color: #0600FF;">void</span> Dispose<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        Dispose<span style="color: #000000;">&#40;</span><span style="color: #0600FF;">true</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        GC.<span style="color: #0000FF;">SuppressFinalize</span><span style="color: #000000;">&#40;</span><span style="color: #0600FF;">this</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF;">protected</span> <span style="color: #0600FF;">void</span> Dispose<span style="color: #000000;">&#40;</span><span style="color: #FF0000;">bool</span> disposing<span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span>disposing<span style="color: #000000;">&#41;</span>
        <span style="color: #000000;">&#123;</span>
            var disposeCurrent <span style="color: #008000;">=</span> current <span style="color: #0600FF;">as</span> IDisposable<span style="color: #008000;">;</span>
            <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span>disposeCurrent <span style="color: #008000;">!=</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span>
                disposeCurrent.<span style="color: #0000FF;">Dispose</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
            var disposeState <span style="color: #008000;">=</span> state <span style="color: #0600FF;">as</span> IDisposable<span style="color: #008000;">;</span>
            <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span>disposeState <span style="color: #008000;">!=</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span>
            <span style="color: #000000;">&#123;</span>
                disposeState.<span style="color: #0000FF;">Dispose</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
                <span style="color: #008080; font-style: italic;">//safe; have checked already above.</span>
                <span style="color: #008080; font-style: italic;">//type of state == type of initialstate</span>
                <span style="color: #000000;">&#40;</span><span style="color: #000000;">&#40;</span>IDisposable<span style="color: #000000;">&#41;</span>initialState<span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">Dispose</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
            <span style="color: #000000;">&#125;</span>
        <span style="color: #000000;">&#125;</span>
    <span style="color: #000000;">&#125;</span>
&nbsp;
    <span style="color: #008080;">#endregion</span>
&nbsp;
    <span style="color: #008080;">#region IEnumerator Members</span>
&nbsp;
    <span style="color: #FF0000;">object</span> <span style="color: #000000;">System.<span style="color: #0000FF;">Collections</span></span>.<span style="color: #0000FF;">IEnumerator</span>.<span style="color: #0000FF;">Current</span>
    <span style="color: #000000;">&#123;</span>
        get <span style="color: #000000;">&#123;</span> <span style="color: #0600FF;">return</span> current<span style="color: #008000;">;</span> <span style="color: #000000;">&#125;</span>
    <span style="color: #000000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF;">public</span> <span style="color: #FF0000;">bool</span> MoveNext<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        var result <span style="color: #008000;">=</span> moveNext<span style="color: #000000;">&#40;</span>state<span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span>result.<span style="color: #0000FF;">HasValue</span><span style="color: #000000;">&#41;</span>
        <span style="color: #000000;">&#123;</span>
            current <span style="color: #008000;">=</span> result.<span style="color: #0000FF;">Value</span>.<span style="color: #0000FF;">Right</span><span style="color: #008000;">;</span>
            state <span style="color: #008000;">=</span> result.<span style="color: #0000FF;">Value</span>.<span style="color: #0000FF;">Left</span><span style="color: #008000;">;</span>
            <span style="color: #0600FF;">return</span> true<span style="color: #008000;">;</span>
        <span style="color: #000000;">&#125;</span>
        <span style="color: #0600FF;">else</span>
            <span style="color: #0600FF;">return</span> false<span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF;">public</span> <span style="color: #0600FF;">void</span> Reset<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        state <span style="color: #008000;">=</span> initialState<span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
&nbsp;
    <span style="color: #008080;">#endregion</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/implementing-ienumerable-easily/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>object.Equals handles null values correctly</title>
		<link>http://porg.es/blog/objectequals-handles-null-values-correctly</link>
		<comments>http://porg.es/blog/objectequals-handles-null-values-correctly#comments</comments>
		<pubDate>Thu, 27 Mar 2008 20:58:12 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[.NET]]></category>
		<category><![CDATA[.net]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[gotcha]]></category>
		<category><![CDATA[snippets]]></category>

		<guid isPermaLink="false">http://porg.es/blog/objectequals-handles-null-values-correctly</guid>
		<description><![CDATA[Here’s the source code, as disassembled by Reflector: public static bool Equals&#40;object objA, object objB&#41; &#123; return &#40;&#40;objA == objB&#41; &#124;&#124; &#40;&#40;&#40;objA != null&#41; &#38;&#38; &#40;objB != null&#41;&#41; &#38;&#38; objA.Equals&#40;objB&#41;&#41;&#41;; &#125; It seems that not even Microsoft knows this! I spotted this code, from ASP.NET’s MVC implementation, on Scott Hanselman’s blog: return &#40;other != null&#41; [...]]]></description>
			<content:encoded><![CDATA[<p>Here’s the source code, as disassembled by Reflector:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">public</span> <span style="color: #0600FF;">static</span> <span style="color: #FF0000;">bool</span> Equals<span style="color: #000000;">&#40;</span><span style="color: #FF0000;">object</span> objA, <span style="color: #FF0000;">object</span> objB<span style="color: #000000;">&#41;</span>
<span style="color: #000000;">&#123;</span>
    <span style="color: #0600FF;">return</span> <span style="color: #000000;">&#40;</span><span style="color: #000000;">&#40;</span>objA <span style="color: #008000;">==</span> objB<span style="color: #000000;">&#41;</span> <span style="color: #008000;">||</span> <span style="color: #000000;">&#40;</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#40;</span>objA <span style="color: #008000;">!=</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span> <span style="color: #000000;">&#40;</span>objB <span style="color: #008000;">!=</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span> objA.<span style="color: #0000FF;">Equals</span><span style="color: #000000;">&#40;</span>objB<span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>It seems that not even Microsoft knows this! I spotted this code, from ASP.NET’s MVC implementation, on <a href="http://www.hanselman.com/blog/TheWeeklySourceCode21ASPNETMVCPreview2SourceCode.aspx">Scott Hanselman’s blog</a>:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">return</span> <span style="color: #000000;">&#40;</span>other <span style="color: #008000;">!=</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span>
  <span style="color: #000000;">&#40;</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#40;</span>other._first <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span> <span style="color: #000000;">&#40;</span>_first <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">||</span>
    <span style="color: #000000;">&#40;</span><span style="color: #000000;">&#40;</span>other._first <span style="color: #008000;">!=</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span> other._first.<span style="color: #0000FF;">Equals</span><span style="color: #000000;">&#40;</span>_first<span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span>
  <span style="color: #000000;">&#40;</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#40;</span>other._second <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span> <span style="color: #000000;">&#40;</span>_second <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">||</span>
    <span style="color: #000000;">&#40;</span><span style="color: #000000;">&#40;</span>other._second <span style="color: #008000;">!=</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span> other._second.<span style="color: #0000FF;">Equals</span><span style="color: #000000;">&#40;</span>_second<span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span></pre></div></div>

<p>This can be rewritten as:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">return</span> <span style="color: #000000;">&#40;</span>other <span style="color: #008000;">!=</span> <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span>
  <span style="color: #FF0000;">object</span>.<span style="color: #0000FF;">Equals</span><span style="color: #000000;">&#40;</span>other._first, _first<span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span>
  <span style="color: #FF0000;">object</span>.<span style="color: #0000FF;">Equals</span><span style="color: #000000;">&#40;</span>other._second, _second<span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span></pre></div></div>

<p>Much nicer!</p>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/objectequals-handles-null-values-correctly/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A simple BigNum library for .NET</title>
		<link>http://porg.es/blog/a-simple-bignum-library-for-dot-net</link>
		<comments>http://porg.es/blog/a-simple-bignum-library-for-dot-net#comments</comments>
		<pubDate>Sat, 13 Oct 2007 08:05:51 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[.net]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://porg.es/blog/a-simple-bignum-library-for-net</guid>
		<description><![CDATA[Update Due to minor demand, the code is also available: BigNum source. Please note that I haven’t actually touched the code since it was first written. I’m sure there’s some things that don’t work properly. If you’re doing anything big with this you probably want to write some ‘destructive’ update functions for adding, etc. At [...]]]></description>
			<content:encoded><![CDATA[<h4>Update</h4>
<p>Due to minor demand, the code is also available: <a href='http://porg.es/blog/wp-content/uploads/2008/04/bignum.zip'>BigNum source</a>.</p>
<p>Please note that I haven’t actually touched the code since it was first written. I’m sure there’s some things that don’t work properly. If you’re doing anything big with this you probably want to write some ‘destructive’ update functions for adding, etc. At the moment every time you add, subtract, etc, a completely new BigInt is returned. To make it faster you’d want to just update the number in-place.</p>
<h4>Original content&#8230;</h4>
<p>I’ve created a simple wrapper for <a href="http://www.gmplib.org">GMP</a>. At the moment only BigInts are implemented, but I expect to have BigRationals and BigFloats coming along soon enough. (<i>Edit: this never happened.</i>)</p>
<h4>Usage</h4>
<p>I think this is very close to as-you’d-expect. The only minor thing that might come as a surprise is that BigInts aren’t value types; since they need destructors, they can’t be. The rest is fairly straightforward:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">static</span> <span style="color: #0600FF;">void</span> Main<span style="color: #000000;">&#40;</span><span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> args<span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#123;</span>
    BigInt n<span style="color: #008000;">;</span>
    <span style="color: #0600FF;">for</span> <span style="color: #000000;">&#40;</span><span style="color: #FF0000;">ulong</span> i <span style="color: #008000;">=</span> <span style="color: #FF0000;">100</span><span style="color: #008000;">;</span> i <span style="color: #008000;">&lt;=</span> <span style="color: #FF0000;">999</span><span style="color: #008000;">;</span> i<span style="color: #008000;">++</span><span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#123;</span>
        n <span style="color: #008000;">=</span> BigInt.<span style="color: #0000FF;">Factorial</span><span style="color: #000000;">&#40;</span>i<span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        Console.<span style="color: #0000FF;">WriteLine</span><span style="color: #000000;">&#40;</span>n<span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
    Console.<span style="color: #0000FF;">ReadLine</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>This code executes with sub 5-second times, which I’m pretty pleased with. I’m not sure what kind of performance decrease you get through using GMP under managed code.</p>
<h4>Caveats</h4>
<ul>
<li>Consider this alpha software <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_smile.gif" alt="" /></li>
<li>The included GMP DLL is compiled with none of the magical assembly they provide. If you want to, you can compile it yourself with all this enabled, and you should be able to just use it as a drop-in replacement.</li>
</ul>
<h4>Download</h4>
<p>Available here: <a href='http://porg.es/blog/wp-content/uploads/2007/10/bignum.zip' title='BigNum library'>BigNum library</a>. Documentation <a href="http://porg.es/Help/Index.html">is available</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/a-simple-bignum-library-for-dot-net/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
