+2010-06-04 Damien Diederen <dd@crosstwine.com>
+
+ * create-category-table.cs: Utility to generate reasonably-packed
+ Unicode tables.
+
+ This program generates (partially) bi-level tables encoding the
+ contents of the Unicode character category database.
+
+ Mono embeds a linear table with category codes for the Unicode BMP
+ (first 65536 codepoints), and lacks information about characters
+ in the astral planes--leading to requests such as bug 480178.
+ Extending the linear table to cover the full codespace is not an
+ ideal solution, as that would expand the embedded "blob" by a
+ factor of 17.
+
+ The new tables generated by this program can be used to support
+ the full range of characters. An additional level of indirection
+ used for characters outside the U+0000..U+FFFF range enables
+ "page" sharing, so that the total amount of embedded data only
+ grows by 13.5kB.
+
+ Cf. in-file comments for usage instructions.
+
+2010-05-17 Atsushi Enomoto <atsushi@ximian.com>
+
+ * SimpleCollator.cs : fix extender search index for LastIndexOf().
+ Fixed bug #605094.
+
+2010-04-20 Damien Diederen <dd@crosstwine.com>
+
+ * Normalization.cs: Really apply canonical reordering "recursively."
+
+ Before this, a sequence of code points with the combining
+ classes (22, 33, 11) would be reordered to (22, 11, 33) instead of
+ the correct (11, 22, 33). This is because the 'i--' would be
+ directly cancelled by the 'i++' in the for loop.
+
+2010-04-20 Damien Diederen <dd@crosstwine.com>
+
+ * Normalization.cs: The correct "checkType" argument to
+ Decompose() is NKD or NKFD when normalizing to NKC resp. NKFC.
+
+ * StringTest.cs: More NFC test cases.
+
+2010-04-20 Damien Diederen <dd@crosstwine.com>
+
+ * Normalization.cs: Implement algorithmic Hangul composition.
+ Calling Normalize(NormalizationForm.FormC) on Korean characters
+ now works properly (bnc#480152).
+
+ * StringTest.cs: Add test cases for Hangul composition.
+
+2010-04-20 Damien Diederen <dd@crosstwine.com>
+
+ * Normalization.cs: Follow the spec when checking composition pairs.
+
+ Figure 7 in section 1.3 of http://unicode.org/reports/tr15/ shows
+ how when doing composition, one has to examine the successive
+ (starter, candidate) pairs, and combine if a matching canonical
+ decomposition exists.
+
+ The original algorithm was, instead, iterating on canonical
+ decompositions, and, for each one, trying to match a sequence
+ of (starter, non-starter, ...). This, however, does not produce
+ the same results as it is violating some implicit ordering
+ constraints in the Unicode tables.
+
+ E.g., when composing the following sequence of codepoints, the
+ original algorithm was picking:
+
+ 03B7 0313 0300 0345
+ ^^^^ ^^^^
+ 1F74 0313 0345
+ ^^^^ ^^^^
+ 1FC2 0313
+
+ and would stop at 1FC2 0313 as there is no decomposition matching
+ it. The new algorithm, which follows the guidance of the pretty
+ figure 7, ends up doing:
+
+ 03B7 0313 0300 0345
+ ^^^^ ^^^^
+ 1F20 0300 0345
+ ^^^^ ^^^^
+ 1F22 0345
+ ^^^^ ^^^^
+ 1F92
+
+ resulting in the correct 1F92.
+
+2010-04-19 Damien Diederen <dd@crosstwine.com>
+
+ * Normalization.cs: Recursively apply the Unicode decomposition mapping.
+
+ According to http://www.unicode.org/reports/tr15/tr15-31.html,
+ section 1.3:
+
+ "To transform a Unicode string into a given Unicode Normalization
+ Form, the first step is to fully decompose the string. [...] Full
+ decomposition involves recursive application of the
+ Decomposition_Mapping values, because in some cases a complex
+ composite character may have a Decomposition_Mapping into a
+ sequence of characters, one of which may also have its own
+ non-trivial Decomposition_Mapping value."
+
2010-02-18 Gabriel Burt <gabriel.burt@gmail.com>
* Normalization.cs: Implement algorithmic Hangul decomposition; Calling