<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>porges &#187; stupid</title>
	<atom:link href="http://porg.es/blog/tag/stupid/feed" rel="self" type="application/rss+xml" />
	<link>http://porg.es/blog</link>
	<description></description>
	<lastBuildDate>Sun, 06 May 2012 22:13:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>What Unicode sounds like</title>
		<link>http://porg.es/blog/what-unicode-sounds-like</link>
		<comments>http://porg.es/blog/what-unicode-sounds-like#comments</comments>
		<pubDate>Mon, 03 Nov 2008 05:21:01 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[music]]></category>
		<category><![CDATA[random]]></category>
		<category><![CDATA[silly]]></category>
		<category><![CDATA[stupid]]></category>
		<category><![CDATA[weird]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=233</guid>
		<description><![CDATA[lame -r -m m -s 16 --bitwidth 8 ~/Downloads/UnicodeData-5.2.0d2.txt unicodedata-520d2txt]]></description>
			<content:encoded><![CDATA[<pre>lame -r -m m -s 16 --bitwidth 8 ~/Downloads/UnicodeData-5.2.0d2.txt</pre>
<p><a href='http://porg.es/blog/wp-content/uploads/2008/11/unicodedata-520d2txt.mp3'>unicodedata-520d2txt</a></p>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/what-unicode-sounds-like/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://porg.es/blog/wp-content/uploads/2008/11/unicodedata-520d2txt.mp3" length="222696" type="audio/mpeg" />
		</item>
		<item>
		<title>Ridiculous UTF-8 character counting</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting</link>
		<comments>http://porg.es/blog/ridiculous-utf-8-character-counting#comments</comments>
		<pubDate>Thu, 05 Jun 2008 14:46:39 +0000</pubDate>
		<dc:creator>Porges</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[fast]]></category>
		<category><![CDATA[horrid]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[overengineered]]></category>
		<category><![CDATA[silly]]></category>
		<category><![CDATA[simd]]></category>
		<category><![CDATA[sse]]></category>
		<category><![CDATA[strlen]]></category>
		<category><![CDATA[stupid]]></category>

		<guid isPermaLink="false">http://porg.es/blog/?p=131</guid>
		<description><![CDATA[So, Colin Percival has posted a UTF-8 strlen which improves on my previous post. While his code runs slightly slower than mine on my PC, I assume that’s because his code is aimed at a 64-bit architecture. With 32-bits (reading 4 bytes at a time, instead of 8 ) it doesn’t quite get the same [...]]]></description>
			<content:encoded><![CDATA[<p>So, Colin Percival has <a href="http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html">posted a UTF-8 <code>strlen</code></a> which improves on my previous post. While his code runs slightly slower than mine on my PC, I assume that’s because his code is aimed at a 64-bit architecture. With 32-bits (reading 4 bytes at a time, instead of 8 ) it doesn’t quite get the same speed up.</p>
<p>That said, the vectorization code is <i>clearly</i> an improvement on mine, so let’s take that ball and run with it!</p>
<h3>The Code</h3>
<p>Now we use SIMD instructions to vectorize the counting of characters. I modified this from Colin’s routine, and I’m sure he has some bit-fiddling up his sleeves that would make this run even faster <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_razz.gif" alt="" /></p>
<p>As it is, I used a straightforward algorithm to extract the information.</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#define GetMask(x) __builtin_ia32_pmovmskb128(x)</span>
<span style="color: #339933;">#define LoadBytes(x) __builtin_ia32_loaddqu(x)</span>
<span style="color: #339933;">#define CompareEquality(x,y) __builtin_ia32_pcmpeqb128((x),(y))</span>
<span style="color: #339933;">#define Or(x,y) __builtin_ia32_por128((x),(y))</span>
<span style="color: #339933;">#define NotExpected(x) __builtin_expect((x),0)</span>
<span style="color: #339933;">#define And(x,y) __builtin_ia32_pand128((x),(y))</span>
&nbsp;
<span style="color: #993333;">typedef</span> <span style="color: #993333;">unsigned</span> <span style="color: #993333;">char</span> v16qi __attribute__ <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>vector_size<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> mask<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span><span style="color: #339933;">,</span> <span style="color: #208080;">0xc0</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> match<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span>
    <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span><span style="color: #339933;">,</span> <span style="color: #208080;">0x80</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> zero<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span> <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">unsigned</span> <span style="color: #993333;">char</span> HammingWeight<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">65536</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//initialized elsewhere</span>
&nbsp;
<span style="color: #993333;">static</span> size_t cp_strlen_utf8_sse2<span style="color: #009900;">&#40;</span><span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>_s<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>s<span style="color: #339933;">;</span>
    <span style="color: #993333;">const</span> v16qi allZero <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>zero<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">const</span> v16qi masking <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>mask<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">const</span> v16qi matching <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>match<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    v16qi row<span style="color: #339933;">;</span>
    size_t count <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">unsigned</span> <span style="color: #993333;">char</span> b<span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// unaligned bytes</span>
    <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>s <span style="color: #339933;">=</span> _s<span style="color: #339933;">;</span> <span style="color: #009900;">&#40;</span><span style="color: #993333;">uintptr_t</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #009900;">&#40;</span><span style="color: #993333;">sizeof</span><span style="color: #009900;">&#40;</span>v16qi<span style="color: #009900;">&#41;</span> <span style="color: #339933;">-</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> s<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        b <span style="color: #339933;">=</span> <span style="color: #339933;">*</span>s<span style="color: #339933;">;</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">==</span> <span style="color: #ff0000;">'<span style="color: #006699; font-weight: bold;">\0</span>'</span><span style="color: #009900;">&#41;</span>
            <span style="color: #b1b100;">goto</span> done<span style="color: #339933;">;</span>
        count <span style="color: #339933;">+=</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">7</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>~b<span style="color: #009900;">&#41;</span> <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">6</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #808080; font-style: italic;">/* Handle complete blocks. */</span>
    <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">;;</span> s <span style="color: #339933;">+=</span> <span style="color: #993333;">sizeof</span><span style="color: #009900;">&#40;</span>v16qi<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #808080; font-style: italic;">/* Prefetch */</span>
        __builtin_prefetch<span style="color: #009900;">&#40;</span><span style="color: #339933;">&amp;</span>s<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">256</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;">/* Load Bytes */</span>
        row <span style="color: #339933;">=</span> LoadBytes<span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;">/* Expect this to be false :) */</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>NotExpected<span style="color: #009900;">&#40;</span>GetMask<span style="color: #009900;">&#40;</span>
                                   <span style="color: #808080; font-style: italic;">/* Check for zero bytes */</span>
                                      CompareEquality<span style="color: #009900;">&#40;</span>allZero<span style="color: #339933;">,</span> row<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
            <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;">/* Count number of non-starter bytes */</span>
&nbsp;
        row <span style="color: #339933;">=</span> And<span style="color: #009900;">&#40;</span>row<span style="color: #339933;">,</span> masking<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        row <span style="color: #339933;">=</span> CompareEquality<span style="color: #009900;">&#40;</span>row<span style="color: #339933;">,</span> matching<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        count <span style="color: #339933;">+=</span> HammingWeight<span style="color: #009900;">&#91;</span>GetMask<span style="color: #009900;">&#40;</span>row<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">//leftover bytes</span>
    <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">;;</span> s<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        b <span style="color: #339933;">=</span> <span style="color: #339933;">*</span>s<span style="color: #339933;">;</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">==</span> <span style="color: #ff0000;">'<span style="color: #006699; font-weight: bold;">\0</span>'</span><span style="color: #009900;">&#41;</span>
            <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
        count <span style="color: #339933;">+=</span> <span style="color: #009900;">&#40;</span>b <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">7</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>~b<span style="color: #009900;">&#41;</span> <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">6</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
  done<span style="color: #339933;">:</span>
    <span style="color: #b1b100;">return</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>s <span style="color: #339933;">-</span> _s<span style="color: #009900;">&#41;</span> <span style="color: #339933;">-</span> count<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<h3>Results</h3>
<p>This counts about twice as fast as GCC/libc’s standard, non-UTF-8 <code>strlen</code>. Note the discrepancies between my timings of Colin’s code and his own tests. Damn thou, 32-bits!</p>
<pre><code>"": 0 0 0 0 0 0 0
"hello, world": 12 12 12 12 12 12 12
"naïve": 6 6 6 5 5 5 5
"こんにちは": 15 15 15 5 5 5 5
"abcdefghijklmnopqrstuvwxyzβ": 28 28 28 27 27 27 27
testing 33554424 bytes of repeated "hello, world":
                      gcc_strlen =   33554424: 0.019331 +/- 0.001076
                      kjs_strlen =   33554424: 0.035095 +/- 0.000530
                       cp_strlen =   33554424: 0.021472 +/- 0.000310
                 kjs_strlen_utf8 =   33554424: 0.070260 +/- 0.000240
                  gp_strlen_utf8 =   33554424: 0.035144 +/- 0.000471
                  cp_strlen_utf8 =   33554424: 0.050539 +/- 0.000342
             cp_strlen_utf8_sse2 =   33554424: 0.010297 +/- 0.001551
testing 33554430 bytes of repeated "naïve":
                      gcc_strlen =   33554430: 0.019176 +/- 0.000824
                      kjs_strlen =   33554430: 0.035090 +/- 0.000478
                       cp_strlen =   33554430: 0.021472 +/- 0.000323
                 kjs_strlen_utf8 =   27962025: 0.070347 +/- 0.000354
                  gp_strlen_utf8 =   27962025: 0.054802 +/- 0.000299
                  cp_strlen_utf8 =   27962025: 0.050595 +/- 0.000602
             cp_strlen_utf8_sse2 =   27962025: 0.010011 +/- 0.001453
testing 33554430 bytes of repeated "こんにちは":
                      gcc_strlen =   33554430: 0.019331 +/- 0.000836
                      kjs_strlen =   33554430: 0.035225 +/- 0.000411
                       cp_strlen =   33554430: 0.021429 +/- 0.000309
                 kjs_strlen_utf8 =   11184810: 0.070249 +/- 0.000312
                  gp_strlen_utf8 =   11184810: 0.026545 +/- 0.000621
                  cp_strlen_utf8 =   11184810: 0.050512 +/- 0.000273
             cp_strlen_utf8_sse2 =   11184810: 0.010246 +/- 0.001466
testing 33554416 bytes of repeated "abcdefghijklmnopqrstuvwxyzβ":
                      gcc_strlen =   33554416: 0.019308 +/- 0.001091
                      kjs_strlen =   33554416: 0.035070 +/- 0.000486
                       cp_strlen =   33554416: 0.021441 +/- 0.000289
                 kjs_strlen_utf8 =   32356044: 0.070287 +/- 0.000297
                  gp_strlen_utf8 =   32356044: 0.043681 +/- 0.000429
                  cp_strlen_utf8 =   32356044: 0.050402 +/- 0.000204
             cp_strlen_utf8_sse2 =   32356044: 0.010407 +/- 0.001371</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://porg.es/blog/ridiculous-utf-8-character-counting/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

