<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Ridiculous UTF-8 character counting</title>
	<atom:link href="http://porg.es/blog/ridiculous-utf-8-character-counting/feed" rel="self" type="application/rss+xml" />
	<link>http://porg.es/blog/ridiculous-utf-8-character-counting</link>
	<description>... master of none</description>
	<lastBuildDate>Sun, 21 Feb 2010 11:01:09 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Porges</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting/comment-page-1#comment-71928</link>
		<dc:creator>Porges</dc:creator>
		<pubDate>Thu, 05 Jun 2008 23:40:28 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=131#comment-71928</guid>
		<description>Joshua,

Unfortunately, AFAICT, there is no way to do this from the C code. If I edit the assembly manually I lose about 0.001 s :)</description>
		<content:encoded><![CDATA[<p>Joshua,</p>
<p>Unfortunately, AFAICT, there is no way to do this from the C code. If I edit the assembly manually I lose about 0.001 s <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_smile.gif" alt="" /></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wade</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting/comment-page-1#comment-71920</link>
		<dc:creator>Wade</dc:creator>
		<pubDate>Thu, 05 Jun 2008 22:09:26 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=131#comment-71920</guid>
		<description>Shaneal,

It is, and it isnt masturabation.  Somewhere, someone will need this code because they need to make their strlens _really_ fast. ;)</description>
		<content:encoded><![CDATA[<p>Shaneal,</p>
<p>It is, and it isnt masturabation.  Somewhere, someone will need this code because they need to make their strlens _really_ fast. <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_wink.gif" alt="" /></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ast_tree</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting/comment-page-1#comment-71912</link>
		<dc:creator>ast_tree</dc:creator>
		<pubDate>Thu, 05 Jun 2008 20:08:45 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=131#comment-71912</guid>
		<description>Are you running an Intel© viral ad? What&#039;s next, multithread/multiprocess/MPI Cluster version?

Maybe if you try the profile mode of icc you can speed up vectorization or lower the cache-miss (if any remains).</description>
		<content:encoded><![CDATA[<p>Are you running an Intel© viral ad? What&#8217;s next, multithread/multiprocess/MPI Cluster version?</p>
<p>Maybe if you try the profile mode of icc you can speed up vectorization or lower the cache-miss (if any remains).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joshua Haberman</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting/comment-page-1#comment-71907</link>
		<dc:creator>Joshua Haberman</dc:creator>
		<pubDate>Thu, 05 Jun 2008 18:30:39 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=131#comment-71907</guid>
		<description>Inside the loop, you should load with movdqa instead of movdqu, since you know the addresses will be aligned.</description>
		<content:encoded><![CDATA[<p>Inside the loop, you should load with movdqa instead of movdqu, since you know the addresses will be aligned.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mark++</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting/comment-page-1#comment-71904</link>
		<dc:creator>mark++</dc:creator>
		<pubDate>Thu, 05 Jun 2008 18:05:59 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=131#comment-71904</guid>
		<description>Core2 and AMD&#039;s Quad Core parts have 128-bit wide SSE execution units.  Prior architectures from both companies, including classic Opteron, used double-pumped 64-bit execution units.  This will likely make a difference.</description>
		<content:encoded><![CDATA[<p>Core2 and AMD&#8217;s Quad Core parts have 128-bit wide SSE execution units.  Prior architectures from both companies, including classic Opteron, used double-pumped 64-bit execution units.  This will likely make a difference.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matthew Hartman</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting/comment-page-1#comment-71893</link>
		<dc:creator>Matthew Hartman</dc:creator>
		<pubDate>Thu, 05 Jun 2008 16:59:31 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=131#comment-71893</guid>
		<description>Keep masturbating guys (sorry, I couldn&#039;t resist). I love reading this kind of constructive banter.</description>
		<content:encoded><![CDATA[<p>Keep masturbating guys (sorry, I couldn&#8217;t resist). I love reading this kind of constructive banter.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stephen</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting/comment-page-1#comment-71890</link>
		<dc:creator>Stephen</dc:creator>
		<pubDate>Thu, 05 Jun 2008 16:49:11 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=131#comment-71890</guid>
		<description>I see your code has way more variance than the others, probably due to the hamming lookup table.

You&#039;d be better off accumulating the counters and doing a horizontal add at the end.

e.g. put a nested loop into the block loop which executes at most 255 times, then accumulate the bytes to the sum and continue.</description>
		<content:encoded><![CDATA[<p>I see your code has way more variance than the others, probably due to the hamming lookup table.</p>
<p>You&#8217;d be better off accumulating the counters and doing a horizontal add at the end.</p>
<p>e.g. put a nested loop into the block loop which executes at most 255 times, then accumulate the bytes to the sum and continue.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shaneal Manek</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting/comment-page-1#comment-71882</link>
		<dc:creator>Shaneal Manek</dc:creator>
		<pubDate>Thu, 05 Jun 2008 15:38:56 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=131#comment-71882</guid>
		<description>Isn&#039;t this just mental masturbation at some point?

But boy it is fun :-D 

I really need to learn x86 assembler - I only know a bit of MIPs. The instruction set seems huge ...</description>
		<content:encoded><![CDATA[<p>Isn&#8217;t this just mental masturbation at some point?</p>
<p>But boy it is fun <img src="http://porg.es/blog/wp-content/plugins/wp-smiley-switcher/noktahhitam/icon_biggrin.gif" alt="" /> </p>
<p>I really need to learn x86 assembler &#8211; I only know a bit of MIPs. The instruction set seems huge &#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vineet Kumar</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting/comment-page-1#comment-71881</link>
		<dc:creator>Vineet Kumar</dc:creator>
		<pubDate>Thu, 05 Jun 2008 15:37:56 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=131#comment-71881</guid>
		<description>That oughta be &quot;damn thee&quot;. =)

(Also, the required name and email fields on this comment form are hard to decipher.  Icons too small and similar, with not enough meaning.)</description>
		<content:encoded><![CDATA[<p>That oughta be &#8220;damn thee&#8221;. =)</p>
<p>(Also, the required name and email fields on this comment form are hard to decipher.  Icons too small and similar, with not enough meaning.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Colin Percival</title>
		<link>http://porg.es/blog/ridiculous-utf-8-character-counting/comment-page-1#comment-71879</link>
		<dc:creator>Colin Percival</dc:creator>
		<pubDate>Thu, 05 Jun 2008 15:29:55 +0000</pubDate>
		<guid isPermaLink="false">http://porg.es/blog/?p=131#comment-71879</guid>
		<description>On my Opteron, I get your SSE2 code taking 18.7ms (compared to 14.1ms for my non-sse2 version).  Clearly your CPU, whatever it is, has faster SSE2 support than my Opteron.

I&#039;m going to see if I can coax some more performance out of the SSE2 code, though...</description>
		<content:encoded><![CDATA[<p>On my Opteron, I get your SSE2 code taking 18.7ms (compared to 14.1ms for my non-sse2 version).  Clearly your CPU, whatever it is, has faster SSE2 support than my Opteron.</p>
<p>I&#8217;m going to see if I can coax some more performance out of the SSE2 code, though&#8230;</p>
]]></content:encoded>
	</item>
</channel>
</rss>
