<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Counting Characters in UTF-8 Strings Is Fast(er)</title>
	<atom:link href="http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster/feed" rel="self" type="application/rss+xml" />
	<link>http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster</link>
	<description></description>
	<lastBuildDate>Tue, 27 Dec 2011 14:16:55 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: 几个汇编/C高性能处理UTF-8的帖子</title>
		<link>http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster/comment-page-1#comment-85580</link>
		<dc:creator>几个汇编/C高性能处理UTF-8的帖子</dc:creator>
		<pubDate>Mon, 17 Nov 2008 11:40:19 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=130#comment-85580</guid>
		<description>[...] COUNTING CHARACTERS IN UTF-8 STRINGS IS FAST(ER) http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster [...]</description>
		<content:encoded><![CDATA[<p>[...] COUNTING CHARACTERS IN UTF-8 STRINGS IS FAST(ER) <a href="http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster">http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Colin Percival</title>
		<link>http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster/comment-page-1#comment-71853</link>
		<dc:creator>Colin Percival</dc:creator>
		<pubDate>Thu, 05 Jun 2008 09:24:36 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=130#comment-71853</guid>
		<description>I&#039;ve done even better. :-)

Vectorization yields a 2-4x speedup over your code: http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html</description>
		<content:encoded><![CDATA[<p>I&#8217;ve done even better. <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_smile.gif" alt="" /></p>
<p>Vectorization yields a 2-4x speedup over your code: <a href="http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html">http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Porges</title>
		<link>http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster/comment-page-1#comment-71817</link>
		<dc:creator>Porges</dc:creator>
		<pubDate>Thu, 05 Jun 2008 00:47:14 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=130#comment-71817</guid>
		<description>Hi Savvu, I implemented this as:

&lt;pre lang=&quot;c&quot;&gt;int tbl[] = {
    1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1
};

int savvu_strlen(char *s)
{ 
    int cnt = 0;
    while(*s) cnt += tbl[(*s++ &gt;&gt; 4) &amp; 0xF];
    return cnt;
}&lt;/pre&gt;

It is consistently the slowest or second-to slowest.

I tried implementing it with byte-skipping:

&lt;pre lang=&quot;c&quot;&gt;int tbl[] = {
1,1,1,1,1,1,1,1, //one-byte
1,1,1,1, //invalid, but don&#039;t go into infinite loop
2,2, //two-byte starter
3, //three-byte starter
4 //four-byte starter
};

int porges_strlen(char *s)
{
        int cnt = 0;
        int i = 0;
        while(s[i]) { i += tbl[(s[i] &gt;&gt; 4) &amp; 0x0f]; ++cnt; }
        return cnt;
}&lt;/pre&gt;

This version is only faster on the byte-skipping tests, and is still about half the speed of what I posted.</description>
		<content:encoded><![CDATA[<p>Hi Savvu, I implemented this as:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> tbl<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">int</span> savvu_strlen<span style="color: #009900;">&#40;</span><span style="color: #993333;">char</span> <span style="color: #339933;">*</span>s<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span> 
    <span style="color: #993333;">int</span> cnt <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">*</span>s<span style="color: #009900;">&#41;</span> cnt <span style="color: #339933;">+=</span> tbl<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">*</span>s<span style="color: #339933;">++</span> <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">4</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #208080;">0xF</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> cnt<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>It is consistently the slowest or second-to slowest.</p>
<p>I tried implementing it with byte-skipping:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> tbl<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
<span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #666666; font-style: italic;">//one-byte</span>
<span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #666666; font-style: italic;">//invalid, but don't go into infinite loop</span>
<span style="color: #0000dd;">2</span><span style="color: #339933;">,</span><span style="color: #0000dd;">2</span><span style="color: #339933;">,</span> <span style="color: #666666; font-style: italic;">//two-byte starter</span>
<span style="color: #0000dd;">3</span><span style="color: #339933;">,</span> <span style="color: #666666; font-style: italic;">//three-byte starter</span>
<span style="color: #0000dd;">4</span> <span style="color: #666666; font-style: italic;">//four-byte starter</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">int</span> porges_strlen<span style="color: #009900;">&#40;</span><span style="color: #993333;">char</span> <span style="color: #339933;">*</span>s<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
        <span style="color: #993333;">int</span> cnt <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
        <span style="color: #993333;">int</span> i <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> i <span style="color: #339933;">+=</span> tbl<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&gt;&gt;</span> <span style="color: #0000dd;">4</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #208080;">0x0f</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span> <span style="color: #339933;">++</span>cnt<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
        <span style="color: #b1b100;">return</span> cnt<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>This version is only faster on the byte-skipping tests, and is still about half the speed of what I posted.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Savvu</title>
		<link>http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster/comment-page-1#comment-71757</link>
		<dc:creator>Savvu</dc:creator>
		<pubDate>Wed, 04 Jun 2008 14:00:44 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=130#comment-71757</guid>
		<description>while(*s) cnt += tbl[*s++ &gt;&gt; 4]; return cnt;

Setting up tbl is left as an excercise to the reader. If your chars are signed you also need an AND mask.</description>
		<content:encoded><![CDATA[<p>while(*s) cnt += tbl[*s++ &gt;&gt; 4]; return cnt;</p>
<p>Setting up tbl is left as an excercise to the reader. If your chars are signed you also need an AND mask.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Porges</title>
		<link>http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster/comment-page-1#comment-71727</link>
		<dc:creator>Porges</dc:creator>
		<pubDate>Wed, 04 Jun 2008 07:55:23 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=130#comment-71727</guid>
		<description>Whoops :)

I think the URL must have tripped me up; I’m so used to Bob Smith being /~bsmith/...</description>
		<content:encoded><![CDATA[<p>Whoops <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_smile.gif" alt="" /></p>
<p>I think the URL must have tripped me up; I’m so used to Bob Smith being /~bsmith/&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: matthew</title>
		<link>http://porg.es/blog/counting-characters-in-utf-8-strings-is-faster/comment-page-1#comment-71725</link>
		<dc:creator>matthew</dc:creator>
		<pubDate>Wed, 04 Jun 2008 07:15:59 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=130#comment-71725</guid>
		<description>BTW, his name is Kragen, not Ragen.</description>
		<content:encoded><![CDATA[<p>BTW, his name is Kragen, not Ragen.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

