Why to normalize Unicode strings
In the first “Zoë”, the ë character (e with umlaut) was represented a single Unicode code point, while in the second case it was in the decomposed form. When encoded, the dog emoji can be represented in multiple byte sequences:
In a JavaScript source file, the following three statements print the same result, filling your console with lots of puppies:
Most JavaScript interpreters (including Node.js and modern browsers) use UTF-16 internally. For example, the letter could be represented using either:
The two characters look the same, but do not compare as equal, and the strings have different lenghts.
Source: withblue.ink