1 2010-06-04 Damien Diederen <dd@crosstwine.com>
3 * create-category-table.cs: Utility to generate reasonably-packed
6 This program generates (partially) bi-level tables encoding the
7 contents of the Unicode character category database.
9 Mono embeds a linear table with category codes for the Unicode BMP
10 (first 65536 codepoints), and lacks information about characters
11 in the astral planes--leading to requests such as bug 480178.
12 Extending the linear table to cover the full codespace is not an
13 ideal solution, as that would expand the embedded "blob" by a
16 The new tables generated by this program can be used to support
17 the full range of characters. An additional level of indirection
18 used for characters outside the U+0000..U+FFFF range enables
19 "page" sharing, so that the total amount of embedded data only
22 Cf. in-file comments for usage instructions.
24 2010-05-17 Atsushi Enomoto <atsushi@ximian.com>
26 * SimpleCollator.cs : fix extender search index for LastIndexOf().
29 2010-04-20 Damien Diederen <dd@crosstwine.com>
31 * Normalization.cs: Really apply canonical reordering "recursively."
33 Before this, a sequence of code points with the combining
34 classes (22, 33, 11) would be reordered to (22, 11, 33) instead of
35 the correct (11, 22, 33). This is because the 'i--' would be
36 directly cancelled by the 'i++' in the for loop.
38 2010-04-20 Damien Diederen <dd@crosstwine.com>
40 * Normalization.cs: The correct "checkType" argument to
41 Decompose() is NKD or NKFD when normalizing to NKC resp. NKFC.
43 * StringTest.cs: More NFC test cases.
45 2010-04-20 Damien Diederen <dd@crosstwine.com>
47 * Normalization.cs: Implement algorithmic Hangul composition.
48 Calling Normalize(NormalizationForm.FormC) on Korean characters
49 now works properly (bnc#480152).
51 * StringTest.cs: Add test cases for Hangul composition.
53 2010-04-20 Damien Diederen <dd@crosstwine.com>
55 * Normalization.cs: Follow the spec when checking composition pairs.
57 Figure 7 in section 1.3 of http://unicode.org/reports/tr15/ shows
58 how when doing composition, one has to examine the successive
59 (starter, candidate) pairs, and combine if a matching canonical
62 The original algorithm was, instead, iterating on canonical
63 decompositions, and, for each one, trying to match a sequence
64 of (starter, non-starter, ...). This, however, does not produce
65 the same results as it is violating some implicit ordering
66 constraints in the Unicode tables.
68 E.g., when composing the following sequence of codepoints, the
69 original algorithm was picking:
77 and would stop at 1FC2 0313 as there is no decomposition matching
78 it. The new algorithm, which follows the guidance of the pretty
79 figure 7, ends up doing:
89 resulting in the correct 1F92.
91 2010-04-19 Damien Diederen <dd@crosstwine.com>
93 * Normalization.cs: Recursively apply the Unicode decomposition mapping.
95 According to http://www.unicode.org/reports/tr15/tr15-31.html,
98 "To transform a Unicode string into a given Unicode Normalization
99 Form, the first step is to fully decompose the string. [...] Full
100 decomposition involves recursive application of the
101 Decomposition_Mapping values, because in some cases a complex
102 composite character may have a Decomposition_Mapping into a
103 sequence of characters, one of which may also have its own
104 non-trivial Decomposition_Mapping value."
106 2010-02-18 Gabriel Burt <gabriel.burt@gmail.com>
108 * Normalization.cs: Implement algorithmic Hangul decomposition; Calling
109 string.Normalize on Korean characters now works properly (bnc#480152).
110 This reduces the number of errors in 'make test' from 27k to 4.8k.
112 * StringNormalizationTestSource.cs:
113 * Makefile: Use the local, working copy of Normalization etc,so as to make
114 modifying Normalization.cs and then testing your changes with 'make test'
115 possible. Also, fix building/running of tests, patch by Alexander
118 2009-09-18 Atsushi Enomoto <atsushi@ximian.com>
120 * Normalization.cs : Handle blocked characters which are not
121 immediately next to the primary composite character. This fixes
122 some Arabic string sequence normalization.
123 * Makefile : fix test build.
125 2009-09-17 Atsushi Enomoto <atsushi@ximian.com>
127 * Normalization.cs : some renaming for disambiguation.
128 * NormalizationTableUtil.cs : fix some wrong ranges in
129 mapIdxToComposite. This fixes some Arabic normalization (and more).
130 * normalization-notes.txt : added some notes on the implementation.
132 2008-06-19 Atsushi Enomoto <atsushi@ximian.com>
135 - reverted the previous index calculation change. It was correctly
136 implemented and I rather broke it.
137 - fix index calculation on combining.
138 - NFKD was incorrectly directed to combining path. It should not.
139 - Simplify quick check.
141 2008-06-15 Atsushi Enomoto <atsushi@ximian.com>
143 * Normalization.cs : For NFC and NFKC, IsNormalized() was not working
144 enough to check composed characters. It's not possible without
145 the actual composition, so just call Normalize() and compare them.
146 In Normalize() mapping helper didn't pick correct map index since
147 the table for index stores index for "uncompressed" numbers.
148 * NormalizationTableUtil.cs : updated to the latest UCD.
149 * Makefile : to build test, source file must be downloaded too.
151 2008-11-05 Atsushi Enomoto <atsushi@ximian.com>
153 * ucd.cs : Write type for *_count. Add notice to not edit
154 unicode-data.h directly.
156 2008-11-04 Atsushi Enomoto <atsushi@ximian.com>
158 * ucd.cs : new code to generate unicode table for eglib.
160 2008-07-04 Andreas Nahr <ClassDevelopment@A-SoftTech.com>
162 * SortKey: Fix parameter names, add attribute, small formatting
164 2008-06-27 Rodrigo Kumpera <rkumpera@novell.com>
166 * CodePointIndexer.cs : Make TableRange a struct instead
167 of a class so we save 2 memory ops per ToIndex loop.
169 2008-04-02 Atsushi Enomoto <atsushi@ximian.com>
171 * SortKey.cs : check null arguments. Fixed bug #376171.
173 2007-07-20 Atsushi Enomoto <atsushi@ximian.com>
175 * create-mscompat-collation-table.cs : I wonder how long its build
178 2007-03-06 Atsushi Enomoto <atsushi@ximian.com>
180 * SimpleCollator.cs : disable QuickCheckPossible(), which is
181 inaccurate and inefficient. Fixed bug #79714.
183 2007-02-15 Atsushi Enomoto <atsushi@ximian.com>
185 * SimpleCollator.cs : character filtering is needed for
186 OrdinalIgnoreCase in 2.0 profile. Fixed bug #80865.
188 2007-01-25 Atsushi Enomoto <atsushi@ximian.com>
190 * SimpleCollator.cs : GetTailContraction() was broken to pick correct
191 contraction/special sortkey out and thus LastIndexOf() failed when
192 it is involved. Fixed bug #80612.
194 2007-01-22 Atsushi Enomoto <atsushi@ximian.com>
196 * SimpleCollator.cs : for non-StringSort comparison, level5 (- and ')
197 should be still skipped after initial level5 check is done (while
198 they were simply treated as a normal character). Fixed bug #78748.
199 * SortKeyBuffer.cs : Fixed NRE in french sort.
201 2006-12-25 Atsushi Enomoto <atsushi@ximian.com>
203 * SimpleCollator.cs : added IndexOf() implementation for Ordinal
204 and OrdinalIgnoreCase, though Ordinal version is not used (since
205 it is slower than icall).
207 2006-05-30 Miguel de Icaza <miguel@novell.com>
209 * MSCompatUnicodeTable.cs: Remove the fixed loading and compute it
210 just when we actually consume it. This only fixes the
213 2006-04-14 Atsushi Enomoto <atsushi@ximian.com>
215 * README: removed obsolete info.
216 * Normalization.cs : canonical reordering should participate in the
217 decomposition step. In reordering, string append was incomplete.
218 Combining class check is required in NFD check. Icall is written
221 2005-12-07 Zoltan Varga <vargaz@gmail.com>
223 * SimpleCollator.cs: Fix a warning.
225 2005-11-30 Sebastien Pouliot <sebastien@ximian.com>
227 * SimpleCollator.cs: Fix CAS support. The static ctor/var try to get
228 the environment variable MUCH too soon (i.e. the security manager
231 2005-11-29 Atsushi Enomoto <atsushi@ximian.com>
233 * SimpleCollator.cs : direct fast-path optimization for IndexOf().
235 2005-11-29 Atsushi Enomoto <atsushi@ximian.com>
237 * SimpleCollator.cs :
238 - CompareQuick(): added immediateBreakup to avoid extraneous sortkey
240 - QuickCheckPossible(): index used for s1 was incorrect.
242 2005-11-29 Atsushi Enomoto <atsushi@ximian.com>
244 * SimpleCollator.cs : added another quick check for CompareInternal()
245 that does almost ordinal comparison for quick-checkable strings.
246 (It affects on Compare(), IndexOf(), IsSuffix() etc. as well.)
248 2005-11-14 Atsushi Enomoto <atsushi@ximian.com>
250 * MSCompatUnicodeTable.cs : (IsIgnorable) \0 is not ignorable.
253 2005-11-14 Atsushi Enomoto <atsushi@ximian.com>
255 * SimpleCollator.cs :
256 Created another struct to reduce method arguments. Created another
257 flags that keeps "once-matched" state (counterpart of
258 checkedFlags, now neverMatchFlags).
260 2005-11-14 Atsushi Enomoto <atsushi@ximian.com>
262 * SimpleCollator.cs :
263 - Added CompareOrdinalIgnoreCase() for NET_2_0 RTM.
264 - Reduced extra parameter from LastIndexOfSortKey().
265 - LastIndexOf() should use GetTailContraction for the source string.
266 And then, target could match in the middle of the possible
267 "replacement contraction" of the source string, so use
268 LastIndexOfSortKey() to catch them.
269 - Fixed GetTailContraction() that caused index out of range.
271 2005-11-11 Atsushi Enomoto <atsushi@ximian.com>
273 * Makefile : Now use MONO_DISABLE_MANAGED_COLLATION.
274 * SortKey.cs : some members are virtual.
276 2005-10-14 Atsushi Enomoto <atsushi@ximian.com>
278 * SimpleCollator.cs : modified to use stackalloc for byte array.
280 2005-09-27 Atsushi Enomoto <atsushi@ximian.com>
282 * SimpleCollator.cs : in CompareInternal(), there was a possibility of
283 infinite loop. Fixed bug #76243.
285 2005-09-20 Atsushi Enomoto <atsushi@ximian.com>
287 * SimpleCollator.cs : In IsPrefix/IsSuffix, if target is an empty string,
288 immediately return true.
290 2005-09-09 Atsushi Enomoto <atsushi@ximian.com>
292 * SimpleCollator.cs : IsSuffix() optimization logic was buggy, so just
293 use pretty simple way with LastIndexOf() (no significant perf.
296 2005-09-01 Atsushi Enomoto <atsushi@ximian.com>
298 * README, Collation-notes.txt, CollationDataStructures.txt :
299 removing obsolete info and some added some notes.
301 2005-08-10 Atsushi Enomoto <atsushi@ximian.com>
303 * Normalization.cs : remove warned code.
304 * managed-collation.patch : now it's not required anymore.
306 2005-08-10 Atsushi Enomoto <atsushi@ximian.com>
308 * MSCompatUnicodeTable.cs : added IsSortable(string).
310 2005-08-10 Atsushi Enomoto <atsushi@ximian.com>
312 * SimpleCollator.cs : Now all collator methods are thread safe.
314 All instance non-readonly fields turned into arguments of every
315 methods that use those fields.
316 (Sadly it is the end of no-memory-cost collator era. mcs bootstrap
317 now needs +100KB memory consumption.)
319 2005-08-09 Atsushi Enomoto <atsushi@ximian.com>
321 * SimpleCollator.cs : made "checkedFlags" as nullable and made it as
322 an argument of every index methods (to make it thread safe).
324 2005-08-09 Atsushi Enomoto <atsushi@ximian.com>
327 MSCompatUnicodeTable.cs :
328 - Now IsIgnorable() is aggregated to be one invokation to check
329 completely ignorable, nonspacing and symbols.
330 - Introduced "already checked" flags for IndexOf() and LastIndexOf()
331 to skip sortkey binary check on the same characters. Significant
332 perf. improvement for such case as IndexOf("AABCBABC...Z",'Z').
334 2005-08-08 Gert Driesen <drieseng@users.sourceforge.net>
336 * SortKey.cs: Marked Serializable to match MS.NET.
338 2005-08-08 Atsushi Enomoto <atsushi@ximian.com>
340 * create-mscompat-collation-table.cs,
341 Makefile : changed resources output directory.
343 2005-08-04 Atsushi Enomoto <atsushi@ximian.com>
345 * create-normalization-tests.cs,
346 StringNormalizationTestSource.cs : new files for Unicode
347 Normalization test generator.
348 * Makefile : added support for above.
350 2005-08-03 Atsushi Enomoto <atsushi@ximian.com>
352 * NormalizationTableUtil.cs : oops, it does not compile.
353 * managed-collation.patch : I guess having managed resource would be
354 better for collation. At least current code has such #define so
355 Makefile should be in sync with it.
357 2005-08-03 Atsushi Enomoto <atsushi@ximian.com>
359 * create-normalization-source.cs : Fixed CharMapComparer which
360 incorrectly returned 0 when the second arg is shorter. Reduced
361 extraneous helperIndex map. Other minor fixes and code removal.
362 * Normalization.cs : several fixes to support blocked combine handling.
363 * NormalizationTableUtil.cs : tiny member renaming.
365 2005-08-03 Atsushi Enomoto <atsushi@ximian.com>
367 * create-normalization-source.cs,
368 NormalizationTableUtil.cs,
369 Normalization.cs : several bugfixes on index miscomputation.
370 Renamed using aliases (csc will bork). Primary combine safety is now
371 computed during UnicodeData.txt parse.
372 Maximum NFKD length was 18, not 4 (U+FDFA).
374 2005-08-02 Atsushi Enomoto <atsushi@ximian.com>
376 * managed-collation.patch : added Normalization support.
377 * managed-collation-icall.patch : added, including normalization stuff.
379 BTW when will collation code checked in?
381 2005-08-02 Atsushi Enomoto <atsushi@ximian.com>
383 * create-normalization-source.cs : Unified three normalization source
384 generators, to compute IsUnsafe flag. Fixed helperIndex array type
386 * create-char-mapping-source.cs,
387 create-combining-class-source.cs : thus removed.
388 * Makefile : thus modified for the above integration.
389 * NormalizationTableUtil.cs : Extended to contain IsUnsafe flag.
390 * Normalization.cs : Several fixes to make Normalize() actually work.
392 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
394 * create-normalization-source.cs,
396 create-char-mapping-source.cs,
397 create-combining-class-source.cs,
398 Makefile : converted managed array to pointers (like collation stuff).
400 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
402 * NormalizationTableUtil.cs : further table range optimization.
403 * create-normalization-source.cs,
404 create-char-mapping-source.cs,
405 create-combining-class-source.cs : added C header output support.
407 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
409 * create-normalization-source.cs, Normalization.cs :
410 Now property size is < 256, so directly embed value in "props" array.
411 Add QuickCheck(c,checkType) and remove IsNFD/C/KD/KC and delegates.
413 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
415 * create-combining-class-source.cs,
416 create-char-mapping-source.cs,
417 create-normalization-source.cs,
418 NormalizationTableUtil.cs,
419 Normalization.cs : String.Normalize() does not handle surrogate
420 characters. mapping information in DerivedNormalizationProps.txt
421 are not used in the code (those from UnicodeData.txt is used).
422 Hangul syllables are computed instead of embedded in the tables.
423 * managed-collation.patch : removed IntPtrStream and Makefile patches.
425 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
427 * MSCompatUnicodeTable.cs : IsSortable() was broken.
429 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
431 * MSCompatUnicodeTable.cs : added helper for CompareInfo.IsSortable().
433 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
435 * create-tailoring.cfg : added for convenience of contraction check.
437 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
439 * create-normalization-source.cs,
442 create-mscompat-collation-table.cs,
443 MSCompatUnicodeTableUtil.cs,
445 create-collation-element-table.cs,
446 MSCompatUnicodeTable.cs,
448 create-combining-class-source.cs : added copyright lines.
450 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
452 MSCompatUnicodeTable.cs : removed extraneous definition.
454 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
456 * create-mscompat-collation-table.cs
457 MSCompatUnicodeTable.cs : full C header support, finally.
459 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
462 NormalizationTableUtil.cs,
463 create-char-mapping-source.cs : more aggressive data compression.
464 It now ignores characters that are >= U+10000.
466 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
469 Normalization.template,
470 Normalization.cs : renamed existing file.
472 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
474 * NormalizationTableUtil.cs,
475 Normalization.template,
476 create-combining-class-source.cs : GetCombiningClass is now
477 implemented as indexer based array.
478 * Makefile : renamed output filename.
479 * create-mscompat-collation-table.cs : removed comments that does not
481 * create-tailoring.cs : use utf-8 output (and fixed filename).
483 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
485 * create-mscompat-collation-table.cs : hacked safer IPA extensions.
486 * Collation-notes.txt : status of sortkey table.
488 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
490 * create-mscompat-collation-table.cs : some Greek mapping fix.
492 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
494 * create-mscompat-collation-table.cs : diacritical weight is not
495 treated correctly when they are picked from letter names, as flags.
497 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
499 * create-mscompat-collation-table.cs : fixed culture-dependent
500 nonspacing mark weight.
502 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
504 * create-mscompat-collation-table.cs : some Hebrew case letter fixes.
505 Some diacritical fixes on symbols.
507 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
509 * create-mscompat-collation-table.cs : Fixed level 3 weight of
510 Arabic presentation forms.
512 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
514 * create-mscompat-collation-table.cs : Fixed some diacritical weight
515 of Arabic presentation forms.
517 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
519 * SimpleCollator.cs : more status updates. It's almost complete,
520 except for sortkey values.
522 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
524 * SimpleCollator.cs : similar optimization also for LastIndexOf().
526 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
528 * SimpleCollator.cs : the previous patch was missing IgnoreNonSpace
531 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
533 * SimpleCollator.cs : reduced extra sortkey value computation in
534 MatchesForward(). It makes IndexOf() roughly 30% faster.
536 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
538 * SortKey.cs : GetHashCode() returns a value based on its byte data.
541 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
543 * SimpleCollator.cs : consider extractions in invariant culture.
545 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
547 * SimpleCollator.cs : (unsafeFlags) be compact ;-)
549 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
551 * SimpleCollator.cs : When the tail of the target does not match more
552 than 3 times, then IsSuffix() will never be true (3 is the max
553 length of an expansion; \uFB03 -> ffi). It brings significant
554 performance boost when "source" string is very long.
555 * MSCompatUnicodeTable.cs : added MaxExpansionLength constant.
556 Reordered code lines.
558 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
560 * Collation-notes.txt : updated implementation status.
562 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
564 * SimpleCollator.cs : Implemented quick codepoint comparison in
565 Compare(). Comparison became 125x faster.
566 * mono-tailoring-source.txt : added tiny comment.
568 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
570 * mono-tailoring-source.txt : Added all single sortkey remapping to
571 all cultures (still need to fill contractions and annotate possible
572 buggy mapping referencing to CLDR).
573 * SimpleCollator.cs : removed unused code.
574 * MSCompatUnicodeTable.cs : tiny cast removal.
576 2005-07-25 Atsushi Enomoto <atsushi@ximian.com>
579 create-mscompat-collation-table.cs
580 MSCompatUnicodeTableUtil.cs
581 MSCompatUnicodeTable.cs : Now CJK mapping data is stored as byte
582 arrays. Thus SimpleCollator does not need to use bitwise and shift
583 operations to get sortkey value and they could be managed resources.
585 2005-07-25 Atsushi Enomoto <atsushi@ximian.com>
587 * create-mscompat-collation-table.cs,
588 MSCompatUnicodeTable.cs,
589 MSCompatUnicodeTableUtil.cs : From the result of sortkey comparison
590 between None and IgnoreWidth, width compat table could be computed
591 in somewhat simple way. So removed that table and all related code.
592 Increased the collation resource version.
594 2005-07-25 Atsushi Enomoto <atsushi@ximian.com>
596 * create-mscompat-collation-table.cs : Added C header output support.
598 2005-07-25 Atsushi Enomoto <atsushi@ximian.com>
600 * create-mscompat-collation-table.cs : FillLetterNFKD() could also be
601 applied to Cyrillic letters. Saved some of them.
603 2005-07-24 Atsushi Enomoto <atsushi@ximian.com>
605 * MSCompatUnicodeTable.cs : oh, ok, so we already have
606 GetManifestResourceInternal() ;-)
607 * managed-collation.patch : in Assembly.cs made that method internal.
609 2005-07-24 Atsushi Enomoto <atsushi@ximian.com>
611 * MSCompatUnicodeTable.cs : the pointer based icall code could be
612 also applicable for USE_MANAGED_RESOURCE mode.
614 2005-07-23 Atsushi Enomoto <atsushi@ximian.com>
616 * MSCompatUnicodeTable.cs : added icall support code (not enabled
617 unless the first line is commented out).
619 2005-07-22 Atsushi Enomoto <atsushi@ximian.com>
621 * create-mscompat-collation-table.cs,
622 MSCompatUnicodeTableUtil.cs,
623 MSCompatUnicodeTable.cs : Added resource version output (and ignore
624 in case of version mismatch). Removed obsolete, commented out code.
626 2005-07-22 Atsushi Enomoto <atsushi@ximian.com>
629 MSCompatUnicodeTable.cs,
630 create-mscompat-collation-table.cs : Now they use unmanaged pointers
631 instead of managed arrays.
632 * managed-collation.patch : Now it contains patch for IntPtrStream.cs
633 and Assembly.cs as well.
635 2005-07-22 Atsushi Enomoto <atsushi@ximian.com>
637 * MSCompatUnicodeTable.cs,
638 SimpleCollator.cs : Moved tailoring support classes to
639 MSCompatUnicodeTable.cs and drawn out from SimpleCollator.
640 Now that cjk and tailoring support are filled inside
641 MSCompatUnicodeTable, no managed array is exposed.
643 2005-07-22 Atsushi Enomoto <atsushi@ximian.com>
645 * create-mscompat-collation-table.cs,
647 MSCompatUnicodeTable.cs : Now it's not exposing collation table
648 internals as managed arrays (to switch to unmanaged pointers).
650 2005-07-22 Atsushi Enomoto <atsushi@ximian.com>
652 * create-mscompat-collation-table.cs : tiny nonspacing mark fix.
654 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
656 * create-mscompat-collation-table.cs : Fixed most of Greek mappings.
657 * MSCompatUnicodeTable.cs : don't lock string.
659 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
661 * create-mscompat-collation-table.cs : More Cyrillic diacritical fixes.
663 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
665 * create-mscompat-collation-table.cs : More Latin diacritical fixes.
667 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
669 * create-mscompat-collation-table.cs : There were still missing
670 math symbol mappings. Added several hacky diacritical weight for
673 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
675 * create-mscompat-collation-table.cs : fixed a few diacritical weight
676 on Cyrillic characters. Fixed ParseTailoringSource() to handle
677 non-heading escape sequence (\uXXXX) as expected.
679 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
681 * create-mscompat-collation-table.cs,
682 MSCompatUnicodeTableUtil.cs,
683 MSCompatUnicodeTable.cs : added more aggressive index limits for
684 table optimization at data size, in cost of speed.
686 2005-07-20 Atsushi Enomoto <atsushi@ximian.com>
688 * create-mscompat-collation-table.cs : fixed Arabic thirtial weight.
690 2005-07-20 Atsushi Enomoto <atsushi@ximian.com>
692 * create-mscompat-collation-table.cs : Mapping for hyphens and
693 punctuation are kinda finished. Rewrote batch mapping method to
694 collect all NFKD. Required modification on mapping is done.
696 2005-07-20 Atsushi Enomoto <atsushi@ximian.com>
698 * create-mscompat-collation-table.cs : minor mapping fixes on accent
699 marks and punctuations.
701 2005-07-20 Atsushi Enomoto <atsushi@ximian.com>
703 * create-mscompat-collation-table.cs : Fixed some MathSymbol mapping
704 and Box drawing mapping.
706 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
708 * create-mscompat-collation-table.cs : Fixed almost all numbers.
710 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
712 * create-mscompat-collation-table.cs : Symbol mappings are almost done.
713 Removed hack that gave dummy mappings to blank symbols.
715 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
717 * create-mscompat-collation-table.cs : more fix on arrows. Fix on box
718 drawings. Some code refactoring to eliminate hack.
720 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
722 * create-mscompat-collation-table.cs : Fixed some secondary weight
723 in Devanagari and arrows.
725 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
727 * create-mscompat-collation-table.cs : a set of tiny mapping fixes.
729 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
731 * create-mscompat-collation-table.cs : some diacritical fixes for
732 Latin. Added batch mapping method that considers computed
733 diacritical weight (for numbers).
735 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
737 * managed-collation.patch : forgot to add System.String patch.
739 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
741 * MSCompatUnicodeTable.cs : added resource existence check (required
742 for mscorlib transient time from the one without resources to the
745 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
747 * create-mscompat-collation-table.cs : fixed punctuations and hyphen
748 (shift) primary weight.
750 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
752 * create-mscompat-collation-table.cs : more nonspacing mark fixes.
753 Some non-basic Cyrillic diacritical weight fixes.
755 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
757 * create-mscompat-collation-table.cs : some Gurmukhi fixes on level 1
758 and level 3. Tiny Hangul weight fixes.
759 * MSCompatUnicodeTable.cs : U+30F5 and U+30F6 are small Japanese.
761 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
763 * create-mscompat-collation-table.cs : some normal characters who have
764 "narrow" NFKD mapping are regarded as "wide" and thus level 3 weight
765 values were different. Handle U+30FB as category A.
766 * MSCompatUnicodeTable.cs : U+30FB does not have special weight.
768 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
770 * create-mscompat-collation-table.cs : more diacritical weight fixes.
771 Removed some unused code.
773 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
775 * create-mscompat-collation-table.cs : Fixed some Thai and Arabic
778 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
780 * create-mscompat-collation-table.cs : Fixed Syriac nonspacing marks.
782 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
784 * create-mscompat-collation-table.cs : Fixed nonspacing marks in
785 Malayalam, Thai and Lao. Removed extraneous hack.
787 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
789 * SimpleCollator.cs : rewrote LastIndexOf() to handle source extenders.
790 Some refactoring on IndexOf() code. Removed unused Matches().
791 * Collation-notes.txt : some methods needed to be reimplemented, so
792 rewrote the description.
794 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
796 * SimpleCollator.cs : rewrote IsSuffix() to use CompareInternal().
797 Thus supported extenders in IsSuffix().
799 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
801 * SimpleCollator.cs : more IsSuffix() simplification, but it will be
802 stopped here since it cannot handle extenders (implementing new
805 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
807 * SimpleCollator.cs : simplified IsSuffix() code.
809 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
811 * SimpleCollator.cs : Fixed IndexOf() and LasIndexOf() to search the
812 entire replacement string if char target was an expansion.
813 IsSuffix() was using a method for IsPrefix() which was incorrect.
814 Removed old IsPrefix() code.
816 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
818 * SimpleCollator.cs : IndexOf() was incorrectly sharing the same
819 byte[] field in different areas of code. Now extenders in both
820 source and target really work in IndexOf().
822 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
824 * create-mscompat-collation-table.cs : fixed U+FF9F diacritical weight.
825 * SimpleCollator.cs : handle U+FF9E and U+FF9F as extenders.
827 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
829 * SimpleCollator.cs : Now FilterExtender() handles all extender
830 support. IndexOf() and LastIndexOf() now supports extenders.
831 IndexOf() and LastIndexOf() did not proceed contraction source
832 length as expected. Tiny refactoring on private IsPrefix() to take
835 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
837 * SimpleCollator.cs : when restoring from expansion, go back to the
838 top of the loop (to avoid index out of range).
839 Now IsPrefix() is implemented to reuse Compare() and thus it now
840 supports extender as well.
841 * Collation-notes.txt : status update. Deleted optimization part in
842 status section (it is duplicate).
844 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
846 * SimpleCollator.cs : some code reordering.
847 * create-mscompat-collation-table.cs : it was still missing U+3094.
849 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
851 * SimpleCollator.cs : Compare() now supports extender (e.g. U+39FC).
853 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
855 * SimpleCollator.cs : In GetSortKey(), don't update previousChar when
856 it is not primary (e.g. don't "extend" diacritical mark).
858 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
860 * managed-collation.patch : CompareInfo.Compare() should consider
861 the possibilities that non-empty string might be actually empty
862 in culture-sensitive context.
864 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
866 * SimpleCollator.cs : IndexOf() and LastIndexOf() returns start when
867 target is "empty" (in culture-sensitive context).
869 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
871 * SimpleCollator.cs : In IndexOf() and LastIndexOf(), skip ignorable
872 characters in target string.
874 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
876 * SimpleCollator.cs : When IgnoreWidth is specified, all Kana
877 characters are regarded as half-width.
878 Even though IgnoreWidth is specified, it should not ignore case.
879 For special weight comparison, the default values (E4) are bigger
880 than non-default values.
881 * SortKeyBuffer.cs : It should save LCID and original string.
882 * create-mscompat-collation-table.cs : For Japanese half-width kana,
883 it should not be counted in widthCompat map since IgnoreWidth does
884 not really ignore those differences.
886 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
888 * create-mscompat-collation-table.cs : Fixed missing Japanese bits.
890 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
892 * create-mscompat-collation-table.cs :
893 tiny diacritical weight fix for U+20D0-U+20E1.
895 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
897 * create-mscompat-collation-table.cs : ja CJK ideograph got completed.
899 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
901 * create-mscompat-collation-table.cs : Fixed CJK custom Japanese
902 mapping. It (maybe as well as other CJK tables) mixes NFKD. For
903 Japanese, modified NFKD table (because of Windows lame design).
905 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
907 * Makefile : added MONO_USE_MANAGED_COLLATION=no almost everywhere.
908 * MSCompatUnicodeTable.cs : FillCJK() was not invoked. Now it is
909 invoked at any time it is required.
910 * SimpleCollator.cs : call FillCJK() above in .ctor().
911 * MSCompatUnicodeTableUtil.cs : CJK range was wider.
912 * create-mscompat-collation-table.cs : CJK binary was missing the
913 length. CJK remapping is being moved to ModifyUnidata().
914 For cjk-ja mapping, we have to consider compat characters to be
915 added to the map, besides the raw UCA table.
917 2005-07-12 Atsushi Enomoto <atsushi@ximian.com>
919 * SortKeyBuffer.cs : Fixed shift level computation to match w/ Windows.
921 2005-07-12 Atsushi Enomoto <atsushi@ximian.com>
923 * SimpleCollator.cs : fixed LastIndexOf() to handle _target's_
924 contraction as expected. Fixed Compare() to save s2's contraction
926 * TestDriver.cs :added LastIndexOf() tester w/ indexes.
928 2005-07-12 Atsushi Enomoto <atsushi@ximian.com>
930 * managed-collation.patch : Fixed IsPrefix() and IsSuffix(). They
931 incorrectly use Compare().
932 * TestDriver.cs : more moved to nunit tests.
934 2005-07-12 Atsushi Enomoto <atsushi@ximian.com>
936 * SimpleCollator.cs : several fixes on Compare().
937 - Ignorable characters are skippted at the top of the loop.
938 - IgnoreNonSpace is checked to avoid extraneous level 2 comparison.
939 - In such case that s1 index is increased while s2 contraction is
940 replaced, s1 is inconsistently proceeded (bug).
941 - IsIgnorable() now also checks IgnoreNonSpace.
942 - Fixed FilterOptions() that does not work for IgnoreWidth at all.
943 * TestDriver.cs : now some are moved to nunit tests.
944 * Collation-notes.txt : minor todo update.
946 2005-07-11 Atsushi Enomoto <atsushi@ximian.com>
948 * SimpleCollator.cs : Compare() was ignoring such case that both
949 entire strings have '-' to be compared.
950 * Collation-notes.txt : more status updates.
951 * TestDriver.cs : added '-' use cases.
953 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
955 * SimpleCollator.cs : to be same as other buggy part, it now handles
956 U+3005, U+3031 and U+3032 as buggy as Windows. It just repeats
958 Fixed GetSortKey(): if the repeater is U+3005, second weight is 5.
959 * create-mscompat-collation-table.cs : dummy values for extenders.
961 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
963 * SimpleCollator.cs : Special weight fixes on GetSortKey(). Dash type
964 should be computed from ExtenderType, and voice mark weight should
966 * MSCompatUnicodeTable.cs : added tiny comment.
968 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
970 * SortKey.cs : It borked when MONO_USE_MANAGED_COLLATION is not yes.
971 * SimpleCollator.cs : support for extender (U+309D etc.).
973 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
975 * create-mscompat-collation-table.cs : some punct/symbols fix.
976 * managed-collation.patch : new (and temporary) file to support
977 managed collation in mscorlib.
978 * README : described how to use managed collation.
980 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
982 * create-mscompat-collation-table.cs : Further Cyrillic fixes. Handle
983 U+482-4C8 (though needs diacritical fixes).
984 * MSCompatUnicodeTable.cs : tiny comment for alternative impl.
986 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
988 * create-mscompat-collation-table.cs : Reimplemented Cyrillic weight
989 computation code, since it looks like the same way as Latin letters
990 have. Thus removed all other approach (UCA, by letter name).
992 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
994 * create-mscompat-collation-table.cs : diacritical fix for "double-
995 struck". Syriac nonspacing fixes.
997 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
999 * create-mscompat-collation-table.cs : more math symbol weight fixes.
1001 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
1003 * create-mscompat-collation-table.cs : fixed Hebrew character sortkeys.
1005 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
1007 * create-mscompat-collation-table.cs : math symbols U+25A0-U+2600 are
1008 implemented (no stub). Some other fixes on category 8-A.
1010 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
1012 * create-mscompat-collation-table.cs : some minor fixes on Arabic,
1013 Korean and Japanese sortkey weights.
1015 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
1017 * create-mscompat-collation-table.cs : More diacritical fixes.
1018 Georgian characters do not have level 2 weights but level 3.
1020 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
1022 * create-mscompat-collation-table.cs : Roman numeral characters
1023 have diacritical weight. quick hack for control signs (U+2400..)
1026 2005-07-06 Atsushi Enomoto <atsushi@ximian.com>
1028 * create-mscompat-collation-table.cs : improving Latin mappings.
1029 Setting non-ASCII Latin characters' primary weight between those
1030 ASCII characters, and setting diacritical weight (hacky).
1031 * MSCompatUnicodeTable.cs :
1032 Kanatype check: fixed (voice marks) and improved (comparison order).
1034 2005-07-06 Atsushi Enomoto <atsushi@ximian.com>
1036 * create-mscompat-collation-table.cs : more diacritical fixes.
1037 primary weight fixes on punctuations in category 07.
1039 2005-07-06 Atsushi Enomoto <atsushi@ximian.com>
1041 * create-mscompat-collation-table.cs : several diacritical fixes.
1042 * TestDriver.cs : sortkey dumper should use StringSort.
1044 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1046 * SimpleCollator.cs : fixed incorrect indexer setup. Optimized
1047 GetContraction() call a bit.
1049 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1051 * create-mscompat-collation-table.cs : fixed incorrect level 2
1053 * MSCompatUnicodeTable.cs : remove debug line.
1055 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1057 * MSCompatUnicodeTableUtil.cs,
1058 MSCompatUnicodeTable.cs,
1059 CodePointIndexer.cs,
1060 create-mscompat-collation-table.cs : made some members internal and
1061 accessible from other classes. Many indexes could be 0 by default.
1062 * SimpleCollator.cs : optimizations. avoid method call.
1064 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1066 * Collation-notes.txt : more updates.
1067 * SimpleCollator.cs : Added quick check for Ordinal comparison.
1068 Fixed special weight comparison. It cannot be customizable in the
1069 implementation (and it won't be harmful).
1070 * mono-tailoring-source.txt : thus updated comment.
1072 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1074 * SimpleCollator.cs : Compare() was missing French sort support.
1075 * TestDriver.cs : added example case.
1077 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1079 * Collation-notes.txt : updated status. Eliminated descriptions on
1080 "iterator" (I avoided it for performance concern). Fixed misc.
1081 incorrect descriptions.
1083 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1085 * Collator.cs : Now that SimpleCollator became feature complete, it is
1088 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1090 * SimpleCollator.cs : implemented decent Compare() that immediately
1091 stops at first primary difference.
1093 2005-07-04 Atsushi Enomoto <atsushi@ximian.com>
1095 * SimpleCollator.cs : indexers might return -1.
1097 2005-07-04 Atsushi Enomoto <atsushi@ximian.com>
1099 * SimpleCollator.cs : IsPrefix() and IsSuffix() optimization code was
1100 buggy (length check for source was missing).
1102 2005-07-04 Atsushi Enomoto <atsushi@ximian.com>
1104 * create-mscompat-collation-table.cs : Fixed tailoring table output
1105 to be in correct and countable order. Now if tailoring alias was not
1106 found, just stop the build.
1107 * MSCompatUnicodeTable.cs : several build fixes. Now it works to read
1109 * mono-tailoring-source.txt : commented out CJK aliases that miss
1111 * Makefile : needed further filename fixes.
1113 2005-07-04 Atsushi Enomoto <atsushi@ximian.com>
1115 * MSCompatUnicodeTable.cs : renamed from MSCompatUnicodeTable.template
1116 (now it is working as a standalone file).
1117 * Makefile : renamed generated file as MSCompatUnicodeTableGenerated.cs
1118 (the generator now creates both binary resources and C# source).
1120 2005-07-04 Atsushi Enomoto <atsushi@ximian.com>
1122 * create-mscompat-collation-table.cs : Now it generates binary
1123 resources (to parent directory).
1124 * MSCompatUnicodeTable.template : added conditional code that fills
1125 collation tables from manifest resources.
1126 * Makefile : remove collation table binaries as well on "make clean".
1127 Removed extraneous dependency.
1129 2005-07-01 Atsushi Enomoto <atsushi@ximian.com>
1131 * MSCompatUnicodeTable.template,
1132 SimpleCollator.cs : removed extraneous GetExpansion().
1134 2005-07-01 Atsushi Enomoto <atsushi@ximian.com>
1136 * SimpleCollator.cs : IsSuffix() also supports contractions.
1137 * TestDriver.cs : IsSuffix() example contraction cases.
1139 2005-07-01 Atsushi Enomoto <atsushi@ximian.com>
1141 * SimpleCollator.cs : reverted IsSuffix() to return bool (to match w/
1142 what current IsPrefix() does). For expansion of target, IsPrefix()
1143 should check the no-match case that expansion is longer than input.
1144 Some refactory on IsPrefix().
1145 Added GetContractionTal() for IsSuffix() (not used yet).
1147 2005-07-01 Atsushi Enomoto <atsushi@ximian.com>
1149 * TestDriver.cs : added IsPrefix() expansion cases.
1150 * SimpleCollator.cs : IsPrefix() now supports contractions (with much
1151 of complexity), and it now returns bool again.
1152 IndexOf() for replacement should make use of IndexOfPrimitiveChar()
1153 since expansions won't be expanded recursively.
1155 2005-07-01 Atsushi Enomoto <atsushi@ximian.com>
1157 * SimpleCollator.cs : commonized character comparison in IsPrefix()
1158 and IsSuffix(). csc compile fix.
1159 * CompareInfoImpl.cs : deleted.
1161 2005-06-30 Atsushi Enomoto <atsushi@ximian.com>
1163 * TestDriver.cs : added SimpleCollator.ctor() sanity check.
1164 Added replacement contraction example.
1165 * SimpleCollator.cs : Now IndexOf() and LastIndexOf() support
1166 contraction in source string. Extracted matching code to Matches().
1167 Replacement contraction was including extraneous '\x0'.
1169 2005-06-30 Atsushi Enomoto <atsushi@ximian.com>
1171 * Collation-notes.txt : updated status.
1172 * CollationDataStructures.txt : tiny fixes.
1173 * SimpleCollator.cs :
1174 Renamed alias Util to UUtil (MS sys.enterprisesvc has sucky global
1175 namespace Util and csc borked).
1176 GetContraction was incorrectly returning first item.
1177 Private IsPrefix() now returns int (but it might not be in real use).
1178 Extracted simple char comparison to CompareCharSimple().
1179 IndexOf() and LastIndexOf() now fully handle contractions (both
1180 binary key and string replacement) in "target" (for "s" not yet).
1181 * TestDriver.cs : be more verbose.
1182 * mono-tailoring-source.txt : added comment.
1183 * MSCompatUnicodeTable.template :
1184 Renamed alias Util to UUtil (MS sys.enterprisesvc has sucky global
1186 2005-06-30 Atsushi Enomoto <atsushi@ximian.com>
1188 * create-mscompat-collation-table.cs : compute COMBINING blah marks as
1189 well as those characters WITH blah.
1190 * TestDriver.cs : added combining sortkey cases.
1192 2005-06-30 Atsushi Enomoto <atsushi@ximian.com>
1194 * mono-tailoring-source.txt : fixed description on '*' in sortkeys.
1195 * SimpleCollator.cs : Now it fully uses tailoring info. Fixed
1196 contraction search that worked only when string is contraction.
1197 Removed commented code. Minor refactoring.
1198 * TestDriver.cs : added example that uses "ZS" in Hungarian sorting.
1200 2005-06-29 Atsushi Enomoto <atsushi@ximian.com>
1202 * create-mscompat-collation-table.cs,
1203 * mono-tailoring-source.txt : removed extraneous level 4 sortkey
1204 which cannot be supported.
1205 * SimpleCollator.cs : added GetContraction() and used in some places.
1206 Now CompareOptions is set only once. Reordered some code (e.g.
1207 ignorable check -> get compat char -> compare).
1209 2005-06-29 Atsushi Enomoto <atsushi@ximian.com>
1211 * SimpleCollator.cs : sort tailoring tables before actual usage.
1212 Support diacritical remappings (it is customized collation rule
1213 which does not exist in UCA).
1215 2005-06-29 Atsushi Enomoto <atsushi@ximian.com>
1217 * SimpleCollator.cs : build culture specific tailoring table from
1218 TailoringInfo and unified data array.
1219 * create-mscompat-collation-table.cs : Added null termination to
1220 sortkey map tailorings (mostly to save my eyes).
1221 * MSCompatUnicodeTable.template : added public TailoringValues.
1223 2005-06-29 Atsushi Enomoto <atsushi@ximian.com>
1225 * SortKeyBuffer.cs : handle special weight (category 06) characters.
1226 * Collation-notes.txt : Updated description on special weight (it was
1228 * TestDriver.cs : added special weight cases.
1230 2005-06-29 Atsushi Enomoto <atsushi@ximian.com>
1232 * MSCompatUnicodeTable.template : added GetTailoringInfo().
1233 * SimpleCollator.cs : Now tailoring information is acquired and used.
1234 (FrenchSort is supported but Compare() won't work expectedly since
1235 the table is still incomplete for those diacritical marks).
1236 * SortKeyBuffer.cs : On reversing diacritical weights, it should
1237 ignore zeros. Reset() should reset frenchSorted flag.
1239 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1241 * create-mscompat-collation-table.cs : Further fixes on Jamo,
1242 diacritical weights by character name, and *Numbers primary weights.
1244 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1246 * create-mscompat-collation-table.cs : More fix on Devanagari,
1247 Gujarati, Oliya, Tamil and Lao sortkeys.
1249 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1251 * create-mscompat-collation-table.cs : Fixed Georgian, Thai, Gurmukhi
1254 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1256 * create-mscompat-collation-table.cs : Fixed Thai character primary
1257 and secondary values. Fixed Thaana letters. Added more LAMESPEC
1258 CJK compat. Fixed some circled CJK secondary weight.
1259 Hacked some nonspacing mark sortkey value adjustment.
1261 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1263 * create-mscompat-collation-table.cs : CP932.TXT was not parsed as
1264 expected. JIS ordering was incorrect. OtherNumbers that represents
1265 10 or more values were incorrectly computed the offset. Some Hangul
1266 compat characters has different offset.
1268 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1270 * create-mscompat-collation-table.cs : Fixed 0x8 category characters.
1271 Added hack for need-to-be-fixed characters to fall into 0xA category.
1272 * create-collation-element-table.cs : previous checkin seem failed :(
1273 * README: updated a bit.
1275 2005-06-24 Atsushi Enomoto <atsushi@ximian.com>
1277 * CodePointIndexer.cs :
1278 removed extraneous switch (I could use empty array for that need).
1279 * CollationElementTableUtil.cs : primary weight type became ushort.
1280 * create-collation-element-table.cs : several bugfixes.
1281 collElem should be int. It was skipping most of entries because of
1282 incorrect string tokenization.
1284 2005-06-23 Atsushi Enomoto <atsushi@ximian.com>
1286 * create-mscompat-collation-table.cs : handle some Jamo NKFD.
1288 2005-06-23 Atsushi Enomoto <atsushi@ximian.com>
1290 * SimpleCollator.cs : forgot to commit in the last checkin.
1291 * create-mscompat-collation-table.cs : fixed arabic shift weight chars.
1292 * TestDriver.cs : switch table dumper and collator testing.
1293 * SortKey.cs : for now comment out internal indexes (not in use).
1295 2005-06-23 Atsushi Enomoto <atsushi@ximian.com>
1297 * MSCompatUnicodeTable.template,
1298 SimpleCollator.cs : support for culture dependent CJK table.
1300 2005-06-23 Atsushi Enomoto <atsushi@ximian.com>
1302 * create-mscompat-collation-table.cs,
1303 MSCompatUnicodeTableUtil.cs : make CJK table more compact.
1305 2005-06-22 Atsushi Enomoto <atsushi@ximian.com>
1307 * SimpleCollator.cs : Fixed stupid index search when start != 0.
1309 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1311 * SimpleCollator.cs : fixed my misunderstanding on LastIndexOf(). It
1312 now starts from "start" and proceeds backward by "length".
1313 * TestDriver.cs : fix warning.
1315 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1317 * TestDriver.cs : more tests.
1318 * SimpleCollator.cs : LastIndexOf() is not setting search length
1319 on iteration. Quick workaround fro String.LastIndexOf() bug (maybe).
1321 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1323 * create-normalization-source.cs : output propValue as uint.
1325 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1327 * SortKey.cs : Now it is System.Globalization.SortKey.
1328 To replace existing implementation, it now requires lcid and
1329 CompareOptions. Added required members.
1330 * SortKeyBuffer.cs : thus .ctor() requires LCID.
1331 * SimpleCollator.cs : made required changes above.
1333 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1335 * CodePointIndexer.cs : added CompressArray(). Now it requires two more
1336 parameters for default index and codepoint.
1337 * CollationElementTableUtil.cs,
1338 NormalizationTableUtil.cs : required changes wrt above change.
1339 * MSCompatUnicodeTableUtil.cs : added for several codepoint indexers.
1340 * MSCompatUnicodeTable.template : Now it uses codepoint indexer.
1341 * create-mscompat-collation-table.cs : Now it outputs compressed array.
1342 * Makefile : now collation requires MSCompatUnicodeTableUtil.cs
1344 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1346 * SimpleCollator.cs :
1347 Implemented IsSuffix() and LastIndexOf().
1348 Several fixes on index > 0 cases.
1349 * TestDriver.cs : sample IsSuffix() and LastIndexOf() usage and more.
1351 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1353 * Collation-notes.txt : updated (status, impl. classes).
1354 * MSCompatUnicodeTable.cs : Korean Jamo are not really expansions.
1356 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1358 * SimpleCollator.cs : implemented IndexOf(string,string,CompareOptions)
1359 and IsPrefix(). Tiny code refactory.
1360 * TestDriver.cs : sample IsPrefix() and IndexOf() usage.
1361 * MSCompatUnicodeTable.cs : tiny refactory for CodePointIndexer use.
1363 2005-06-20 Atsushi Enomoto <atsushi@ximian.com>
1365 * SimpleCollator.cs :
1366 IndexOf(string, char, CompareOptions) implementation.
1367 * TestDriver.cs : sample IndexOf() usage.
1369 2005-06-20 Atsushi Enomoto <atsushi@ximian.com>
1371 * create-mscompat-collation-table.cs : was missing most important
1372 kind of blocks - equivalent expansions (e.g. invariant mappings).
1373 More readable mappings.
1375 2005-06-20 Atsushi Enomoto <atsushi@ximian.com>
1377 * mono-tailoring-source.txt : new file. It describes tailoring
1378 information. Basically examined under .NET 1.x.
1379 * create-mscompat-collation-table.cs : consume the file above.
1380 * MSCompatUnicodeTable.template : now tailorings is not a stub.
1381 * CollationDataStructures.txt : minor fixes.
1383 SimpleCollator.cs : added FrenchSort support.
1384 * Collation-notes.txt : added description on Latin primary weights.
1385 * ldml-limited.rng : added note.
1386 * create-tailorings.cs : added note. more serialization (but won't be
1389 2005-06-17 Atsushi Enomoto <atsushi@ximian.com>
1391 * SortKeyBuffer.cs : non-primary character is added to previous
1393 * TestDriver.cs : added example case of above.
1395 2005-06-17 Atsushi Enomoto <atsushi@ximian.com>
1397 * SimpleCollator.cs : IgnoreSymbols support.
1398 * TestDriver.cs : compilation fix. IgnoreSymbols example.
1399 * create-mscompat-collation-table.cs : more Hangul fixes.
1401 2005-06-17 Atsushi Enomoto <atsushi@ximian.com>
1403 * create-mscompat-collation-table.cs : more Hangul fixes.
1404 * SortKey.cs : it will replace sys.globalization.SortKey. It has
1405 some internal members.
1406 * SortKeyBuffer.cs : now it uses SortKey instead of byte[].
1407 * SimpleCollator.cs : CompareOptions support. However I don't think
1408 it will be developed anymore since SortKey never enables IndexOf().
1409 * TestDriver.cs : a few CompareOptions cases.
1411 2005-06-16 Atsushi Enomoto <atsushi@ximian.com>
1413 * SimpleCollator.cs : simple collator implementation that just will
1414 use GetSortKey() for all its basis.
1415 * TestDriver.cs : sample code that uses this collator set.
1416 * MSCompatUnicodeTable.template : removed test driver from here.
1418 2005-06-16 Atsushi Enomoto <atsushi@ximian.com>
1420 * create-mscompat-collation-table.cs : Hangul fixes.
1421 Now less than 300 characters that does not have sortkey weights.
1422 * MSCompatUnicodeTable.template : added FIXME info for Hangul Jamo.
1424 2005-06-16 Atsushi Enomoto <atsushi@ximian.com>
1426 * create-mscompat-collation-table.cs : Added control picture mappings.
1427 Minor primary weight fixes.
1429 2005-06-16 Atsushi Enomoto <atsushi@ximian.com>
1431 * create-mscompat-collation-table.cs : Added mappings for box
1432 drawings and blocks.
1434 2005-06-16 Atsushi Enomoto <atsushi@ximian.com>
1436 * create-mscompat-collation-table.cs : Added mappings for arrows.
1438 2005-06-15 Atsushi Enomoto <atsushi@ximian.com>
1440 * create-mscompat-collation-table.cs : added support for letterlike
1441 characters and squared CJK compatibility characters, ordered by
1442 character names (0x0E category).
1443 * Collation-notes.txt : added description on that.
1445 2005-06-15 Atsushi Enomoto <atsushi@ximian.com>
1447 * MSCompatUnicodeTable.template : Now expansions are simulated.
1448 * create-mscompat-collation-table.cs : filled Korean number level2.
1449 Reordered some code blocks to fill correct diacritical differences.
1450 * Collation-notes.txt : some corrections and minor additions.
1452 2005-06-15 Atsushi Enomoto <atsushi@ximian.com>
1454 * MSCompatUnicodeTable.template :
1455 Now dumper test driver uses SortKeyBuffer for dogfooding.
1456 * create-mscompat-collation-table.cs : some diacritical level fixes
1457 (with non-working extra latin check).
1458 * SortKeyBuffer.cs : several fixes to get working as a practical code.
1459 * Collator.cs : make it compilable, leaving things as NotImplemented.
1461 2005-06-15 Atsushi Enomoto <atsushi@ximian.com>
1463 * create-mscompat-collation-table.cs : some fixes on primary category
1464 07 (miscellaneous symbols and punctuations).
1466 2005-06-14 Atsushi Enomoto <atsushi@ximian.com>
1468 * create-mscompat-collation-table.cs : more mapping fix on numbers,
1469 letters, variable weight characters, circled Japanese and CJK.
1470 * MSCompatUnicodeTable.template : fixed HasSpecialWeight() to be more
1471 inclusive. Simplified dumper code.
1473 2005-06-14 Atsushi Enomoto <atsushi@ximian.com>
1475 * create-mscompat-collation-table.cs : finished Hangul (both Jamo
1476 and Syllables). sortkey dumper diff lines became 8000 from 30000.
1478 2005-06-14 Atsushi Enomoto <atsushi@ximian.com>
1480 * create-mscompat-collation-table.cs : added some nonspacing marks in
1481 either correct or hacky way.
1483 2005-06-13 Atsushi Enomoto <atsushi@ximian.com>
1485 * create-mscompat-collation-table.cs : several improvements. Japanese
1486 Kana support, Hebrew accents, Bengali nonspacing marks, sorting of
1487 numeric characters, diacritically decorated latin alphabets. Fixed
1488 some diacritical weights detection.
1489 * MSCompatUnicodeTable.cs : tiny Japanese fix. Handle nonspacing
1490 marks' primary weight as empty.
1491 * Collation-notes.txt : some updates.
1493 2005-06-13 Atsushi Enomoto <atsushi@ximian.com>
1495 * create-mscompat-collation-table.cs : don't process nonexact NFKD
1496 mapping as equivalent, however store CJK extensions into NFKD map
1497 even if one does not strictly match.
1498 Now am going to fill Hangul into tables (unlike UCA it does not look
1499 possible to calculate sortkey value).
1500 Fixed Cyrillic and Georgian UCA based orderings.
1501 * MSCompatUnicodeTable.template : added CJK extension sortkey
1504 2005-06-10 Atsushi Enomoto <atsushi@ximian.com>
1506 * create-mscompat-collation-table.cs : Fixed latin alphabet support.
1507 Added latin with diacritical and CJK extension.
1508 * MSCompatUnicodeTable.cs : modified dumper code a bit (for my purpose).
1510 2005-06-10 Atsushi Enomoto <atsushi@ximian.com>
1512 * create-mscompat-collation-table.cs : now parses DerivedAge.txt (right
1513 now not used thouth). Filled CJK ideograph, still not perfect.
1514 Fixed number primary keys. NFKD numbers and CJK ideographs are now
1515 considered, including brackets elimination.
1516 * Makefile : now it downloads DerivedAge.txt.
1517 * MSCompatUnicodeTable.template : added dummy code dumper. It computes
1518 PrivateUse, Surrogate and Hangul Syllables.
1519 * Collation-notes.txt : Noted that Hangul Syllables need more love.
1521 2005-06-09 Atsushi Enomoto <atsushi@ximian.com>
1523 * create-tailorings.cs : added configuration support. sort them.
1524 I wonder if it is really usable. Having own format might be better.
1525 * create-mscompat-collation-table.cs : fixing some sortkey numbers,
1526 making closer to windows. Now it handles NFKD in some places.
1527 * MSCompatUnicodeTable.template : Added dummy sortkey dumper driver.
1528 * CollationDataStructures.txt : added description on tailoring
1529 fields, though they are subject to change.
1531 2005-06-07 Atsushi Enomoto <atsushi@ximian.com>
1533 * create-tailorings.cs, ldml-limited.rng : new file.
1534 * LdmlReader.cs : removed old file.
1536 2005-06-07 Atsushi Enomoto <atsushi@ximian.com>
1538 * SortKeyBuffer.cs : split from Collator.cs. Now it considers
1539 practical use, reflecting updated sortkey constant design.
1540 Especially level 4 weight is split to 4 arrays that are merged in
1541 the last stage of GetSortKey().
1542 * Collator.cs : thus SortKeyBuffer is removed from here.
1543 Additionally, removed some extraneous bits in other classes.
1544 * Collation-notes.txt : Some editorial fixes. Added information on
1545 Korean matter (how to compute Hangle Syllables / Hangul Jamo cannot
1546 be stored in simple byte arrays).
1547 * CodePointIndexer.cs,
1548 create-collation-element-table.cs,
1549 CollationElementTable.template,
1550 NormalizationTableUtil.cs : short CodePointIndexer method names.
1551 * create-mscompat-collation-table.cs : Additional info on why some
1552 meaningful characters are ignored in Windows (Unicode version
1553 difference). Removed U+070F from special check (was extraneous).
1555 2005-06-06 Atsushi Enomoto <atsushi@ximian.com>
1557 * MSCompatUnicodeTable.template:
1558 Moved body implementation to table creator and put those bool
1559 results into an array.
1560 * create-mscompat-collation-table.cs :
1561 So imported those methods. Modified array output to emit "0x"
1562 only for more than 9.
1563 * create-normalization-source.cs : ditto on "0x" output matter.
1564 * CollationDataStructures.txt : so now it holds ignorableFlags.
1566 2005-06-03 Atsushi Enomoto <atsushi@ximian.com>
1568 * Collation-notes.txt, CollationDataStructures.txt :
1569 separate document for data structure design.
1571 2005-06-03 Atsushi Enomoto <atsushi@ximian.com>
1573 * create-mscompat-collation-table.cs : added culture-dependent CJK
1574 table creation. It uses CLDR as its basis. (Culture independent CJK
1576 * Makefile : added CLDR archive downloading support.
1577 * MSCompatUnicodeTable.template : tiny renamings.
1578 * Collation-notes.txt : additional CJK info.
1580 2005-06-02 Atsushi Enomoto <atsushi@ximian.com>
1582 * Collation-notes.txt, create-mscompat-collation-table.cs :
1583 added secondary weight support for BlahNumber characters.
1585 2005-06-01 Atsushi Enomoto <atsushi@ximian.com>
1587 * downloaded : added directory. All downloaded files are stored here.
1588 * Makefile : use "downloaded" directory.
1589 Added more auto-download stuff.
1590 * create-mscompat-collation-table.cs :
1591 Added Japanese square kana support.
1593 2005-06-01 Atsushi Enomoto <atsushi@ximian.com>
1595 * Collation-notes.txt : added Estrangela (ancient Syriac) and Thaana.
1596 * create-mscompat-collation-table.cs : added support for Arabic abjad,
1597 Estrangela and Thaana.
1598 * MSCompatUnicodeTable.template : removed BOM.
1600 2005-05-31 Atsushi Enomoto <atsushi@ximian.com>
1602 * Collation-notes.txt : wrong comment cleanup and spelling fixes.
1603 * create-mscompat-collation-table.cs : added diacritic support for
1604 Latin letters (as long as covered in primary weight).
1606 2005-05-31 Atsushi Enomoto <atsushi@ximian.com>
1608 * Makefile : minor fixes. Added warning lines to generated sources.
1610 2005-05-31 Atsushi Enomoto <atsushi@ximian.com>
1612 * create-char-mapping-source.cs :
1613 Removed ToWidthInsensitive() generation.
1615 2005-05-31 Atsushi Enomoto <atsushi@ximian.com>
1617 * create-mscompat-collation-table.cs : Now it dumps level1 to 3 values.
1618 ToWidthInsensitive() is implemented here, using an array (which is
1619 to be optimized using CodePointIndexer).
1620 * MSCompatUnicodeTable.cs : renamed as MSCompatUnicodeTable.template
1621 * MSCompatUnicodeTable.template : now it is used to generate
1622 MSCompatUnicodeTable.cs which got ready to be used.
1623 * Makefile : added MSCompatUnicodeTable.cs build support. Now it
1624 supports "make normalization" and "make collation".
1626 2005-05-30 Atsushi Enomoto <atsushi@ximian.com>
1628 * Collation-notes.txt : Description on ICU is very incorrect. Now it
1629 became more rational and sane.
1630 * create-mscompat-collation-table.cs : fixed some indexes.
1631 * Makefile : added "mstablegen" target.
1632 * MSCompatUnicodeTable.cs : removed GetPrimaryWeight(). Minor fix.
1634 2005-05-26 Atsushi Enomoto <atsushi@ximian.com>
1636 * Collation-notes.txt : more analysis on "letters".
1637 * create-mscompat-collation-table.cs : more proof of concepts.
1639 2005-05-25 Atsushi Enomoto <atsushi@ximian.com>
1641 * Collation-notes.txt : more info. Started letter sortkey analysis
1642 (some of other stuff are really non-understandable right now.)
1643 * create-mscompat-collation-table.cs : table generator proof-of-
1644 concept source (not compilable).
1645 * MSCompatUnicodeTable.cs : moved some code to the new source.
1648 2005-05-20 Atsushi Enomoto <atsushi@ximian.com>
1650 * Collation-notes.txt : started level 2 weight analysis.
1652 2005-05-19 Atsushi Enomoto <atsushi@ximian.com>
1654 * Collation-notes.txt : Additional information on how to create
1656 * MSCompatUnicodeTable.cs : implemented part of GetLevel3Weight().
1658 2005-05-19 Atsushi Enomoto <atsushi@ximian.com>
1660 * Collation-notes.txt : More case weight (level 3) analysis. I'm
1661 likely to just write table generator.
1663 2005-05-18 Atsushi Enomoto <atsushi@ximian.com>
1665 * MSCompatUnicodeTable.cs : part of level 4 weight implementation.
1667 2005-05-18 Atsushi Enomoto <atsushi@ximian.com>
1669 * Collation-notes.txt :
1671 Revised comparison methods; backward iteration is possible.
1672 More on char-by-char comparison.
1673 Level 4 comparison is actually a bit more complex.
1675 * Collator.cs : some conceptual updates wrt above.
1677 2005-05-17 Atsushi Enomoto <atsushi@ximian.com>
1679 * Collation-notes.txt : Japanese voice mark is level 2, and Hangul
1680 properties are level 3.
1682 2005-05-17 Atsushi Enomoto <atsushi@ximian.com>
1684 * Collation-notes.txt : Make it more readable. More analysis on
1685 level 3 and 4 sortkey structures.
1686 * Collator.cs : some compilation fixes (not compilable yet).
1688 2005-05-16 Atsushi Enomoto <atsushi@ximian.com>
1690 * Collation-notes.txt : Analysis on variable-weighting (level 5)
1692 * Collator.cs : updated corresponding part of level 5, and more.
1694 2005-05-13 Atsushi Enomoto <atsushi@ximian.com>
1696 * Collation-notes.txt : more updates.
1697 * Collator.cs : rewrote from scratch. Some rough sketch for sortkey
1698 buffer, character iterator and collator methods. Not compiling.
1700 2005-05-13 Atsushi Enomoto <atsushi@ximian.com>
1702 * Collator.cs : Am going to replace it with new one. No need for
1703 CompareOptions-dependent Comparer.
1705 2005-05-13 Atsushi Enomoto <atsushi@ximian.com>
1707 * Collation-notes.txt : There seems a bit more complexity.
1709 2005-05-10 Atsushi Enomoto <atsushi@ximian.com>
1711 * Collation-notes.txt : more updates, being close to write sortkey
1714 2005-05-09 Atsushi Enomoto <atsushi@ximian.com>
1716 * CompareInfoImpl.cs, Collator.cs : conceptual update
1717 * Collation-notes.txt : some corrections and additions.
1718 * Makefile : added LDML input (but it won't be used at all).
1720 2005-04-28 Atsushi Enomoto <atsushi@ximian.com>
1722 * Collation-notes.txt : more updates.
1724 2005-04-26 Atsushi Enomoto <atsushi@ximian.com>
1726 * Collation-notes.txt : more updates.
1728 2005-04-26 Atsushi Enomoto <atsushi@ximian.com>
1730 * Collation-notes.txt : some updates.
1731 * create-mapping-char-source.cs : superscripts and subscripts are also
1732 ignored in IgnoreWidth comparison.
1733 * Makefile : tiny touch fix.
1735 2005-04-25 Atsushi Enomoto <atsushi@ximian.com>
1737 * CompareInfoImpl.cs, Collator.cs : conceptual stuff (not working).
1739 2005-04-25 Atsushi Enomoto <atsushi@ximian.com>
1741 * create-char-mapping-source.cs : Now it generates
1742 ToWidthInsensitive() from combining category <wide> and <narrow>.
1743 * MSCompatUnicodeTable.cs : added ToKanaTypeInsensitive() and
1744 ToWidthInsensitive() for IgnoreKanaType and IgnoreWidth.
1746 2005-04-25 Atsushi Enomoto <atsushi@ximian.com>
1748 * README, LdmlReader.cs, DataStructures.txt : new files.
1750 2005-04-25 Atsushi Enomoto <atsushi@ximian.com>
1752 * CodePointIndexer.cs,
1753 Collation-notes.txt,
1754 CollationElementTable.template,
1755 CollationElementTableUtil.cs,
1756 create-char-mapping-source.cs,
1757 create-collation-element-table.cs,
1758 create-combining-class-source.cs,
1759 create-normalization-source.cs,
1761 MSCompatUnicodeTable.cs,
1762 Normalization.template,
1763 NormalizationTableUtil.cs : initial checkin (to private branch).