Tag Archives: fast

Ridiculous UTF-8 character counting

So, Colin Percival has posted a UTF-8 strlen which improves on my previous post. While his code runs slightly slower than mine on my PC, I assume that’s because his code is aimed at a 64-bit architecture. With 32-bits (reading 4 bytes at a time, instead of 8 ) it doesn’t quite get the same [...]

Counting Characters in UTF-8 Strings Is Fast(er)

‘Counting Characters in UTF-8 Strings Is Fast’ by Kragen Sitaker shows several ways to count characters UTF-8, using both assembly and C. But, with a few assumptions, we can go faster.
Assumption One: We are dealing with a valid UTF-8 string
Making this assumption means that once we hit the start of a multi-byte character we can [...]