Unicode programming, with examples in C
The Unicode standard and specifications describe the proper way to divide words and break lines, sort text, format numbers, display text in different directions, split/combine/reorder vowels South Asian languages, and determine when characters may look visually confusable. For instance ộ can be specified in five ways:
The numbers (written U+xxxx) for each abstract character and each combining symbol are called “codepoints.” The Unicode Transformation Formats (UTF) describe different ways to map between codepoints and code units.
Source: begriffs.com