Counting Characters in UTF-8 Strings Is Fast(er)

code 4 June 2008 | 6 Comments

‘Counting Characters in UTF-8 Strings Is Fast’ by Kragen Sitaker shows several ways to count characters UTF-8, using both assembly and C. But, with a few assumptions, we can go faster. Assumption One: We are dealing with a valid UTF-8 string Making this assumption means that once we hit the start of a multi-byte character [...]

Tagged in , , , , , ,