Ridiculous UTF-8 character counting

code 6 June 2008 | 10 Comments

So, Colin Percival has posted a UTF-8 strlen which improves on my previous post. While his code runs slightly slower than mine on my PC, I assume that’s because his code is aimed at a 64-bit architecture. With 32-bits (reading 4 bytes at a time, instead of 8 ) it doesn’t quite get the same [...]

Tagged in , , , , , , , , ,

Counting Characters in UTF-8 Strings Is Fast(er)

code 4 June 2008 | 6 Comments

‘Counting Characters in UTF-8 Strings Is Fast’ by Kragen Sitaker shows several ways to count characters UTF-8, using both assembly and C. But, with a few assumptions, we can go faster. Assumption One: We are dealing with a valid UTF-8 string Making this assumption means that once we hit the start of a multi-byte character [...]

Tagged in , , , , , ,