How we made Haskell search strings as fast as Rust

How we made Haskell search strings as fast as Rust

Because we use the type for strings in Haskell, which is an array of UTF-16 code units in native endianness, the benchmarks would take UTF-16 data as input. We packed the 16-bit input code unit and the 32-bit go-to state together in a 64-bit integer, and now the transition table was a cache-friendly array of 64-bit integers. To stick with the earlier example, suppose we had the automaton set up like so, in pseudo-Python to illustrate the idea without too much noise:

Then we could step the automaton with a triply nested loop: one over the input characters, one to follow transitions, and one for the linear scan through the transition table.

Source: tech.channable.com