1 2010-05-17 Atsushi Enomoto <atsushi@ximian.com>
3 * SimpleCollator.cs : fix extender search index for LastIndexOf().
6 2010-04-20 Damien Diederen <dd@crosstwine.com>
8 * Normalization.cs: Really apply canonical reordering "recursively."
10 Before this, a sequence of code points with the combining
11 classes (22, 33, 11) would be reordered to (22, 11, 33) instead of
12 the correct (11, 22, 33). This is because the 'i--' would be
13 directly cancelled by the 'i++' in the for loop.
15 2010-04-20 Damien Diederen <dd@crosstwine.com>
17 * Normalization.cs: The correct "checkType" argument to
18 Decompose() is NKD or NKFD when normalizing to NKC resp. NKFC.
20 * StringTest.cs: More NFC test cases.
22 2010-04-20 Damien Diederen <dd@crosstwine.com>
24 * Normalization.cs: Implement algorithmic Hangul composition.
25 Calling Normalize(NormalizationForm.FormC) on Korean characters
26 now works properly (bnc#480152).
28 * StringTest.cs: Add test cases for Hangul composition.
30 2010-04-20 Damien Diederen <dd@crosstwine.com>
32 * Normalization.cs: Follow the spec when checking composition pairs.
34 Figure 7 in section 1.3 of http://unicode.org/reports/tr15/ shows
35 how when doing composition, one has to examine the successive
36 (starter, candidate) pairs, and combine if a matching canonical
39 The original algorithm was, instead, iterating on canonical
40 decompositions, and, for each one, trying to match a sequence
41 of (starter, non-starter, ...). This, however, does not produce
42 the same results as it is violating some implicit ordering
43 constraints in the Unicode tables.
45 E.g., when composing the following sequence of codepoints, the
46 original algorithm was picking:
54 and would stop at 1FC2 0313 as there is no decomposition matching
55 it. The new algorithm, which follows the guidance of the pretty
56 figure 7, ends up doing:
66 resulting in the correct 1F92.
68 2010-04-19 Damien Diederen <dd@crosstwine.com>
70 * Normalization.cs: Recursively apply the Unicode decomposition mapping.
72 According to http://www.unicode.org/reports/tr15/tr15-31.html,
75 "To transform a Unicode string into a given Unicode Normalization
76 Form, the first step is to fully decompose the string. [...] Full
77 decomposition involves recursive application of the
78 Decomposition_Mapping values, because in some cases a complex
79 composite character may have a Decomposition_Mapping into a
80 sequence of characters, one of which may also have its own
81 non-trivial Decomposition_Mapping value."
83 2010-02-18 Gabriel Burt <gabriel.burt@gmail.com>
85 * Normalization.cs: Implement algorithmic Hangul decomposition; Calling
86 string.Normalize on Korean characters now works properly (bnc#480152).
87 This reduces the number of errors in 'make test' from 27k to 4.8k.
89 * StringNormalizationTestSource.cs:
90 * Makefile: Use the local, working copy of Normalization etc,so as to make
91 modifying Normalization.cs and then testing your changes with 'make test'
92 possible. Also, fix building/running of tests, patch by Alexander
95 2009-09-18 Atsushi Enomoto <atsushi@ximian.com>
97 * Normalization.cs : Handle blocked characters which are not
98 immediately next to the primary composite character. This fixes
99 some Arabic string sequence normalization.
100 * Makefile : fix test build.
102 2009-09-17 Atsushi Enomoto <atsushi@ximian.com>
104 * Normalization.cs : some renaming for disambiguation.
105 * NormalizationTableUtil.cs : fix some wrong ranges in
106 mapIdxToComposite. This fixes some Arabic normalization (and more).
107 * normalization-notes.txt : added some notes on the implementation.
109 2008-06-19 Atsushi Enomoto <atsushi@ximian.com>
112 - reverted the previous index calculation change. It was correctly
113 implemented and I rather broke it.
114 - fix index calculation on combining.
115 - NFKD was incorrectly directed to combining path. It should not.
116 - Simplify quick check.
118 2008-06-15 Atsushi Enomoto <atsushi@ximian.com>
120 * Normalization.cs : For NFC and NFKC, IsNormalized() was not working
121 enough to check composed characters. It's not possible without
122 the actual composition, so just call Normalize() and compare them.
123 In Normalize() mapping helper didn't pick correct map index since
124 the table for index stores index for "uncompressed" numbers.
125 * NormalizationTableUtil.cs : updated to the latest UCD.
126 * Makefile : to build test, source file must be downloaded too.
128 2008-11-05 Atsushi Enomoto <atsushi@ximian.com>
130 * ucd.cs : Write type for *_count. Add notice to not edit
131 unicode-data.h directly.
133 2008-11-04 Atsushi Enomoto <atsushi@ximian.com>
135 * ucd.cs : new code to generate unicode table for eglib.
137 2008-07-04 Andreas Nahr <ClassDevelopment@A-SoftTech.com>
139 * SortKey: Fix parameter names, add attribute, small formatting
141 2008-06-27 Rodrigo Kumpera <rkumpera@novell.com>
143 * CodePointIndexer.cs : Make TableRange a struct instead
144 of a class so we save 2 memory ops per ToIndex loop.
146 2008-04-02 Atsushi Enomoto <atsushi@ximian.com>
148 * SortKey.cs : check null arguments. Fixed bug #376171.
150 2007-07-20 Atsushi Enomoto <atsushi@ximian.com>
152 * create-mscompat-collation-table.cs : I wonder how long its build
155 2007-03-06 Atsushi Enomoto <atsushi@ximian.com>
157 * SimpleCollator.cs : disable QuickCheckPossible(), which is
158 inaccurate and inefficient. Fixed bug #79714.
160 2007-02-15 Atsushi Enomoto <atsushi@ximian.com>
162 * SimpleCollator.cs : character filtering is needed for
163 OrdinalIgnoreCase in 2.0 profile. Fixed bug #80865.
165 2007-01-25 Atsushi Enomoto <atsushi@ximian.com>
167 * SimpleCollator.cs : GetTailContraction() was broken to pick correct
168 contraction/special sortkey out and thus LastIndexOf() failed when
169 it is involved. Fixed bug #80612.
171 2007-01-22 Atsushi Enomoto <atsushi@ximian.com>
173 * SimpleCollator.cs : for non-StringSort comparison, level5 (- and ')
174 should be still skipped after initial level5 check is done (while
175 they were simply treated as a normal character). Fixed bug #78748.
176 * SortKeyBuffer.cs : Fixed NRE in french sort.
178 2006-12-25 Atsushi Enomoto <atsushi@ximian.com>
180 * SimpleCollator.cs : added IndexOf() implementation for Ordinal
181 and OrdinalIgnoreCase, though Ordinal version is not used (since
182 it is slower than icall).
184 2006-05-30 Miguel de Icaza <miguel@novell.com>
186 * MSCompatUnicodeTable.cs: Remove the fixed loading and compute it
187 just when we actually consume it. This only fixes the
190 2006-04-14 Atsushi Enomoto <atsushi@ximian.com>
192 * README: removed obsolete info.
193 * Normalization.cs : canonical reordering should participate in the
194 decomposition step. In reordering, string append was incomplete.
195 Combining class check is required in NFD check. Icall is written
198 2005-12-07 Zoltan Varga <vargaz@gmail.com>
200 * SimpleCollator.cs: Fix a warning.
202 2005-11-30 Sebastien Pouliot <sebastien@ximian.com>
204 * SimpleCollator.cs: Fix CAS support. The static ctor/var try to get
205 the environment variable MUCH too soon (i.e. the security manager
208 2005-11-29 Atsushi Enomoto <atsushi@ximian.com>
210 * SimpleCollator.cs : direct fast-path optimization for IndexOf().
212 2005-11-29 Atsushi Enomoto <atsushi@ximian.com>
214 * SimpleCollator.cs :
215 - CompareQuick(): added immediateBreakup to avoid extraneous sortkey
217 - QuickCheckPossible(): index used for s1 was incorrect.
219 2005-11-29 Atsushi Enomoto <atsushi@ximian.com>
221 * SimpleCollator.cs : added another quick check for CompareInternal()
222 that does almost ordinal comparison for quick-checkable strings.
223 (It affects on Compare(), IndexOf(), IsSuffix() etc. as well.)
225 2005-11-14 Atsushi Enomoto <atsushi@ximian.com>
227 * MSCompatUnicodeTable.cs : (IsIgnorable) \0 is not ignorable.
230 2005-11-14 Atsushi Enomoto <atsushi@ximian.com>
232 * SimpleCollator.cs :
233 Created another struct to reduce method arguments. Created another
234 flags that keeps "once-matched" state (counterpart of
235 checkedFlags, now neverMatchFlags).
237 2005-11-14 Atsushi Enomoto <atsushi@ximian.com>
239 * SimpleCollator.cs :
240 - Added CompareOrdinalIgnoreCase() for NET_2_0 RTM.
241 - Reduced extra parameter from LastIndexOfSortKey().
242 - LastIndexOf() should use GetTailContraction for the source string.
243 And then, target could match in the middle of the possible
244 "replacement contraction" of the source string, so use
245 LastIndexOfSortKey() to catch them.
246 - Fixed GetTailContraction() that caused index out of range.
248 2005-11-11 Atsushi Enomoto <atsushi@ximian.com>
250 * Makefile : Now use MONO_DISABLE_MANAGED_COLLATION.
251 * SortKey.cs : some members are virtual.
253 2005-10-14 Atsushi Enomoto <atsushi@ximian.com>
255 * SimpleCollator.cs : modified to use stackalloc for byte array.
257 2005-09-27 Atsushi Enomoto <atsushi@ximian.com>
259 * SimpleCollator.cs : in CompareInternal(), there was a possibility of
260 infinite loop. Fixed bug #76243.
262 2005-09-20 Atsushi Enomoto <atsushi@ximian.com>
264 * SimpleCollator.cs : In IsPrefix/IsSuffix, if target is an empty string,
265 immediately return true.
267 2005-09-09 Atsushi Enomoto <atsushi@ximian.com>
269 * SimpleCollator.cs : IsSuffix() optimization logic was buggy, so just
270 use pretty simple way with LastIndexOf() (no significant perf.
273 2005-09-01 Atsushi Enomoto <atsushi@ximian.com>
275 * README, Collation-notes.txt, CollationDataStructures.txt :
276 removing obsolete info and some added some notes.
278 2005-08-10 Atsushi Enomoto <atsushi@ximian.com>
280 * Normalization.cs : remove warned code.
281 * managed-collation.patch : now it's not required anymore.
283 2005-08-10 Atsushi Enomoto <atsushi@ximian.com>
285 * MSCompatUnicodeTable.cs : added IsSortable(string).
287 2005-08-10 Atsushi Enomoto <atsushi@ximian.com>
289 * SimpleCollator.cs : Now all collator methods are thread safe.
291 All instance non-readonly fields turned into arguments of every
292 methods that use those fields.
293 (Sadly it is the end of no-memory-cost collator era. mcs bootstrap
294 now needs +100KB memory consumption.)
296 2005-08-09 Atsushi Enomoto <atsushi@ximian.com>
298 * SimpleCollator.cs : made "checkedFlags" as nullable and made it as
299 an argument of every index methods (to make it thread safe).
301 2005-08-09 Atsushi Enomoto <atsushi@ximian.com>
304 MSCompatUnicodeTable.cs :
305 - Now IsIgnorable() is aggregated to be one invokation to check
306 completely ignorable, nonspacing and symbols.
307 - Introduced "already checked" flags for IndexOf() and LastIndexOf()
308 to skip sortkey binary check on the same characters. Significant
309 perf. improvement for such case as IndexOf("AABCBABC...Z",'Z').
311 2005-08-08 Gert Driesen <drieseng@users.sourceforge.net>
313 * SortKey.cs: Marked Serializable to match MS.NET.
315 2005-08-08 Atsushi Enomoto <atsushi@ximian.com>
317 * create-mscompat-collation-table.cs,
318 Makefile : changed resources output directory.
320 2005-08-04 Atsushi Enomoto <atsushi@ximian.com>
322 * create-normalization-tests.cs,
323 StringNormalizationTestSource.cs : new files for Unicode
324 Normalization test generator.
325 * Makefile : added support for above.
327 2005-08-03 Atsushi Enomoto <atsushi@ximian.com>
329 * NormalizationTableUtil.cs : oops, it does not compile.
330 * managed-collation.patch : I guess having managed resource would be
331 better for collation. At least current code has such #define so
332 Makefile should be in sync with it.
334 2005-08-03 Atsushi Enomoto <atsushi@ximian.com>
336 * create-normalization-source.cs : Fixed CharMapComparer which
337 incorrectly returned 0 when the second arg is shorter. Reduced
338 extraneous helperIndex map. Other minor fixes and code removal.
339 * Normalization.cs : several fixes to support blocked combine handling.
340 * NormalizationTableUtil.cs : tiny member renaming.
342 2005-08-03 Atsushi Enomoto <atsushi@ximian.com>
344 * create-normalization-source.cs,
345 NormalizationTableUtil.cs,
346 Normalization.cs : several bugfixes on index miscomputation.
347 Renamed using aliases (csc will bork). Primary combine safety is now
348 computed during UnicodeData.txt parse.
349 Maximum NFKD length was 18, not 4 (U+FDFA).
351 2005-08-02 Atsushi Enomoto <atsushi@ximian.com>
353 * managed-collation.patch : added Normalization support.
354 * managed-collation-icall.patch : added, including normalization stuff.
356 BTW when will collation code checked in?
358 2005-08-02 Atsushi Enomoto <atsushi@ximian.com>
360 * create-normalization-source.cs : Unified three normalization source
361 generators, to compute IsUnsafe flag. Fixed helperIndex array type
363 * create-char-mapping-source.cs,
364 create-combining-class-source.cs : thus removed.
365 * Makefile : thus modified for the above integration.
366 * NormalizationTableUtil.cs : Extended to contain IsUnsafe flag.
367 * Normalization.cs : Several fixes to make Normalize() actually work.
369 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
371 * create-normalization-source.cs,
373 create-char-mapping-source.cs,
374 create-combining-class-source.cs,
375 Makefile : converted managed array to pointers (like collation stuff).
377 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
379 * NormalizationTableUtil.cs : further table range optimization.
380 * create-normalization-source.cs,
381 create-char-mapping-source.cs,
382 create-combining-class-source.cs : added C header output support.
384 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
386 * create-normalization-source.cs, Normalization.cs :
387 Now property size is < 256, so directly embed value in "props" array.
388 Add QuickCheck(c,checkType) and remove IsNFD/C/KD/KC and delegates.
390 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
392 * create-combining-class-source.cs,
393 create-char-mapping-source.cs,
394 create-normalization-source.cs,
395 NormalizationTableUtil.cs,
396 Normalization.cs : String.Normalize() does not handle surrogate
397 characters. mapping information in DerivedNormalizationProps.txt
398 are not used in the code (those from UnicodeData.txt is used).
399 Hangul syllables are computed instead of embedded in the tables.
400 * managed-collation.patch : removed IntPtrStream and Makefile patches.
402 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
404 * MSCompatUnicodeTable.cs : IsSortable() was broken.
406 2005-07-29 Atsushi Enomoto <atsushi@ximian.com>
408 * MSCompatUnicodeTable.cs : added helper for CompareInfo.IsSortable().
410 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
412 * create-tailoring.cfg : added for convenience of contraction check.
414 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
416 * create-normalization-source.cs,
419 create-mscompat-collation-table.cs,
420 MSCompatUnicodeTableUtil.cs,
422 create-collation-element-table.cs,
423 MSCompatUnicodeTable.cs,
425 create-combining-class-source.cs : added copyright lines.
427 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
429 MSCompatUnicodeTable.cs : removed extraneous definition.
431 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
433 * create-mscompat-collation-table.cs
434 MSCompatUnicodeTable.cs : full C header support, finally.
436 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
439 NormalizationTableUtil.cs,
440 create-char-mapping-source.cs : more aggressive data compression.
441 It now ignores characters that are >= U+10000.
443 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
446 Normalization.template,
447 Normalization.cs : renamed existing file.
449 2005-07-28 Atsushi Enomoto <atsushi@ximian.com>
451 * NormalizationTableUtil.cs,
452 Normalization.template,
453 create-combining-class-source.cs : GetCombiningClass is now
454 implemented as indexer based array.
455 * Makefile : renamed output filename.
456 * create-mscompat-collation-table.cs : removed comments that does not
458 * create-tailoring.cs : use utf-8 output (and fixed filename).
460 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
462 * create-mscompat-collation-table.cs : hacked safer IPA extensions.
463 * Collation-notes.txt : status of sortkey table.
465 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
467 * create-mscompat-collation-table.cs : some Greek mapping fix.
469 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
471 * create-mscompat-collation-table.cs : diacritical weight is not
472 treated correctly when they are picked from letter names, as flags.
474 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
476 * create-mscompat-collation-table.cs : fixed culture-dependent
477 nonspacing mark weight.
479 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
481 * create-mscompat-collation-table.cs : some Hebrew case letter fixes.
482 Some diacritical fixes on symbols.
484 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
486 * create-mscompat-collation-table.cs : Fixed level 3 weight of
487 Arabic presentation forms.
489 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
491 * create-mscompat-collation-table.cs : Fixed some diacritical weight
492 of Arabic presentation forms.
494 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
496 * SimpleCollator.cs : more status updates. It's almost complete,
497 except for sortkey values.
499 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
501 * SimpleCollator.cs : similar optimization also for LastIndexOf().
503 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
505 * SimpleCollator.cs : the previous patch was missing IgnoreNonSpace
508 2005-07-27 Atsushi Enomoto <atsushi@ximian.com>
510 * SimpleCollator.cs : reduced extra sortkey value computation in
511 MatchesForward(). It makes IndexOf() roughly 30% faster.
513 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
515 * SortKey.cs : GetHashCode() returns a value based on its byte data.
518 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
520 * SimpleCollator.cs : consider extractions in invariant culture.
522 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
524 * SimpleCollator.cs : (unsafeFlags) be compact ;-)
526 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
528 * SimpleCollator.cs : When the tail of the target does not match more
529 than 3 times, then IsSuffix() will never be true (3 is the max
530 length of an expansion; \uFB03 -> ffi). It brings significant
531 performance boost when "source" string is very long.
532 * MSCompatUnicodeTable.cs : added MaxExpansionLength constant.
533 Reordered code lines.
535 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
537 * Collation-notes.txt : updated implementation status.
539 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
541 * SimpleCollator.cs : Implemented quick codepoint comparison in
542 Compare(). Comparison became 125x faster.
543 * mono-tailoring-source.txt : added tiny comment.
545 2005-07-26 Atsushi Enomoto <atsushi@ximian.com>
547 * mono-tailoring-source.txt : Added all single sortkey remapping to
548 all cultures (still need to fill contractions and annotate possible
549 buggy mapping referencing to CLDR).
550 * SimpleCollator.cs : removed unused code.
551 * MSCompatUnicodeTable.cs : tiny cast removal.
553 2005-07-25 Atsushi Enomoto <atsushi@ximian.com>
556 create-mscompat-collation-table.cs
557 MSCompatUnicodeTableUtil.cs
558 MSCompatUnicodeTable.cs : Now CJK mapping data is stored as byte
559 arrays. Thus SimpleCollator does not need to use bitwise and shift
560 operations to get sortkey value and they could be managed resources.
562 2005-07-25 Atsushi Enomoto <atsushi@ximian.com>
564 * create-mscompat-collation-table.cs,
565 MSCompatUnicodeTable.cs,
566 MSCompatUnicodeTableUtil.cs : From the result of sortkey comparison
567 between None and IgnoreWidth, width compat table could be computed
568 in somewhat simple way. So removed that table and all related code.
569 Increased the collation resource version.
571 2005-07-25 Atsushi Enomoto <atsushi@ximian.com>
573 * create-mscompat-collation-table.cs : Added C header output support.
575 2005-07-25 Atsushi Enomoto <atsushi@ximian.com>
577 * create-mscompat-collation-table.cs : FillLetterNFKD() could also be
578 applied to Cyrillic letters. Saved some of them.
580 2005-07-24 Atsushi Enomoto <atsushi@ximian.com>
582 * MSCompatUnicodeTable.cs : oh, ok, so we already have
583 GetManifestResourceInternal() ;-)
584 * managed-collation.patch : in Assembly.cs made that method internal.
586 2005-07-24 Atsushi Enomoto <atsushi@ximian.com>
588 * MSCompatUnicodeTable.cs : the pointer based icall code could be
589 also applicable for USE_MANAGED_RESOURCE mode.
591 2005-07-23 Atsushi Enomoto <atsushi@ximian.com>
593 * MSCompatUnicodeTable.cs : added icall support code (not enabled
594 unless the first line is commented out).
596 2005-07-22 Atsushi Enomoto <atsushi@ximian.com>
598 * create-mscompat-collation-table.cs,
599 MSCompatUnicodeTableUtil.cs,
600 MSCompatUnicodeTable.cs : Added resource version output (and ignore
601 in case of version mismatch). Removed obsolete, commented out code.
603 2005-07-22 Atsushi Enomoto <atsushi@ximian.com>
606 MSCompatUnicodeTable.cs,
607 create-mscompat-collation-table.cs : Now they use unmanaged pointers
608 instead of managed arrays.
609 * managed-collation.patch : Now it contains patch for IntPtrStream.cs
610 and Assembly.cs as well.
612 2005-07-22 Atsushi Enomoto <atsushi@ximian.com>
614 * MSCompatUnicodeTable.cs,
615 SimpleCollator.cs : Moved tailoring support classes to
616 MSCompatUnicodeTable.cs and drawn out from SimpleCollator.
617 Now that cjk and tailoring support are filled inside
618 MSCompatUnicodeTable, no managed array is exposed.
620 2005-07-22 Atsushi Enomoto <atsushi@ximian.com>
622 * create-mscompat-collation-table.cs,
624 MSCompatUnicodeTable.cs : Now it's not exposing collation table
625 internals as managed arrays (to switch to unmanaged pointers).
627 2005-07-22 Atsushi Enomoto <atsushi@ximian.com>
629 * create-mscompat-collation-table.cs : tiny nonspacing mark fix.
631 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
633 * create-mscompat-collation-table.cs : Fixed most of Greek mappings.
634 * MSCompatUnicodeTable.cs : don't lock string.
636 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
638 * create-mscompat-collation-table.cs : More Cyrillic diacritical fixes.
640 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
642 * create-mscompat-collation-table.cs : More Latin diacritical fixes.
644 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
646 * create-mscompat-collation-table.cs : There were still missing
647 math symbol mappings. Added several hacky diacritical weight for
650 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
652 * create-mscompat-collation-table.cs : fixed a few diacritical weight
653 on Cyrillic characters. Fixed ParseTailoringSource() to handle
654 non-heading escape sequence (\uXXXX) as expected.
656 2005-07-21 Atsushi Enomoto <atsushi@ximian.com>
658 * create-mscompat-collation-table.cs,
659 MSCompatUnicodeTableUtil.cs,
660 MSCompatUnicodeTable.cs : added more aggressive index limits for
661 table optimization at data size, in cost of speed.
663 2005-07-20 Atsushi Enomoto <atsushi@ximian.com>
665 * create-mscompat-collation-table.cs : fixed Arabic thirtial weight.
667 2005-07-20 Atsushi Enomoto <atsushi@ximian.com>
669 * create-mscompat-collation-table.cs : Mapping for hyphens and
670 punctuation are kinda finished. Rewrote batch mapping method to
671 collect all NFKD. Required modification on mapping is done.
673 2005-07-20 Atsushi Enomoto <atsushi@ximian.com>
675 * create-mscompat-collation-table.cs : minor mapping fixes on accent
676 marks and punctuations.
678 2005-07-20 Atsushi Enomoto <atsushi@ximian.com>
680 * create-mscompat-collation-table.cs : Fixed some MathSymbol mapping
681 and Box drawing mapping.
683 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
685 * create-mscompat-collation-table.cs : Fixed almost all numbers.
687 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
689 * create-mscompat-collation-table.cs : Symbol mappings are almost done.
690 Removed hack that gave dummy mappings to blank symbols.
692 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
694 * create-mscompat-collation-table.cs : more fix on arrows. Fix on box
695 drawings. Some code refactoring to eliminate hack.
697 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
699 * create-mscompat-collation-table.cs : Fixed some secondary weight
700 in Devanagari and arrows.
702 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
704 * create-mscompat-collation-table.cs : a set of tiny mapping fixes.
706 2005-07-19 Atsushi Enomoto <atsushi@ximian.com>
708 * create-mscompat-collation-table.cs : some diacritical fixes for
709 Latin. Added batch mapping method that considers computed
710 diacritical weight (for numbers).
712 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
714 * managed-collation.patch : forgot to add System.String patch.
716 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
718 * MSCompatUnicodeTable.cs : added resource existence check (required
719 for mscorlib transient time from the one without resources to the
722 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
724 * create-mscompat-collation-table.cs : fixed punctuations and hyphen
725 (shift) primary weight.
727 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
729 * create-mscompat-collation-table.cs : more nonspacing mark fixes.
730 Some non-basic Cyrillic diacritical weight fixes.
732 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
734 * create-mscompat-collation-table.cs : some Gurmukhi fixes on level 1
735 and level 3. Tiny Hangul weight fixes.
736 * MSCompatUnicodeTable.cs : U+30F5 and U+30F6 are small Japanese.
738 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
740 * create-mscompat-collation-table.cs : some normal characters who have
741 "narrow" NFKD mapping are regarded as "wide" and thus level 3 weight
742 values were different. Handle U+30FB as category A.
743 * MSCompatUnicodeTable.cs : U+30FB does not have special weight.
745 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
747 * create-mscompat-collation-table.cs : more diacritical weight fixes.
748 Removed some unused code.
750 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
752 * create-mscompat-collation-table.cs : Fixed some Thai and Arabic
755 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
757 * create-mscompat-collation-table.cs : Fixed Syriac nonspacing marks.
759 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
761 * create-mscompat-collation-table.cs : Fixed nonspacing marks in
762 Malayalam, Thai and Lao. Removed extraneous hack.
764 2005-07-15 Atsushi Enomoto <atsushi@ximian.com>
766 * SimpleCollator.cs : rewrote LastIndexOf() to handle source extenders.
767 Some refactoring on IndexOf() code. Removed unused Matches().
768 * Collation-notes.txt : some methods needed to be reimplemented, so
769 rewrote the description.
771 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
773 * SimpleCollator.cs : rewrote IsSuffix() to use CompareInternal().
774 Thus supported extenders in IsSuffix().
776 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
778 * SimpleCollator.cs : more IsSuffix() simplification, but it will be
779 stopped here since it cannot handle extenders (implementing new
782 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
784 * SimpleCollator.cs : simplified IsSuffix() code.
786 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
788 * SimpleCollator.cs : Fixed IndexOf() and LasIndexOf() to search the
789 entire replacement string if char target was an expansion.
790 IsSuffix() was using a method for IsPrefix() which was incorrect.
791 Removed old IsPrefix() code.
793 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
795 * SimpleCollator.cs : IndexOf() was incorrectly sharing the same
796 byte[] field in different areas of code. Now extenders in both
797 source and target really work in IndexOf().
799 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
801 * create-mscompat-collation-table.cs : fixed U+FF9F diacritical weight.
802 * SimpleCollator.cs : handle U+FF9E and U+FF9F as extenders.
804 2005-07-14 Atsushi Enomoto <atsushi@ximian.com>
806 * SimpleCollator.cs : Now FilterExtender() handles all extender
807 support. IndexOf() and LastIndexOf() now supports extenders.
808 IndexOf() and LastIndexOf() did not proceed contraction source
809 length as expected. Tiny refactoring on private IsPrefix() to take
812 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
814 * SimpleCollator.cs : when restoring from expansion, go back to the
815 top of the loop (to avoid index out of range).
816 Now IsPrefix() is implemented to reuse Compare() and thus it now
817 supports extender as well.
818 * Collation-notes.txt : status update. Deleted optimization part in
819 status section (it is duplicate).
821 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
823 * SimpleCollator.cs : some code reordering.
824 * create-mscompat-collation-table.cs : it was still missing U+3094.
826 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
828 * SimpleCollator.cs : Compare() now supports extender (e.g. U+39FC).
830 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
832 * SimpleCollator.cs : In GetSortKey(), don't update previousChar when
833 it is not primary (e.g. don't "extend" diacritical mark).
835 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
837 * managed-collation.patch : CompareInfo.Compare() should consider
838 the possibilities that non-empty string might be actually empty
839 in culture-sensitive context.
841 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
843 * SimpleCollator.cs : IndexOf() and LastIndexOf() returns start when
844 target is "empty" (in culture-sensitive context).
846 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
848 * SimpleCollator.cs : In IndexOf() and LastIndexOf(), skip ignorable
849 characters in target string.
851 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
853 * SimpleCollator.cs : When IgnoreWidth is specified, all Kana
854 characters are regarded as half-width.
855 Even though IgnoreWidth is specified, it should not ignore case.
856 For special weight comparison, the default values (E4) are bigger
857 than non-default values.
858 * SortKeyBuffer.cs : It should save LCID and original string.
859 * create-mscompat-collation-table.cs : For Japanese half-width kana,
860 it should not be counted in widthCompat map since IgnoreWidth does
861 not really ignore those differences.
863 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
865 * create-mscompat-collation-table.cs : Fixed missing Japanese bits.
867 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
869 * create-mscompat-collation-table.cs :
870 tiny diacritical weight fix for U+20D0-U+20E1.
872 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
874 * create-mscompat-collation-table.cs : ja CJK ideograph got completed.
876 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
878 * create-mscompat-collation-table.cs : Fixed CJK custom Japanese
879 mapping. It (maybe as well as other CJK tables) mixes NFKD. For
880 Japanese, modified NFKD table (because of Windows lame design).
882 2005-07-13 Atsushi Enomoto <atsushi@ximian.com>
884 * Makefile : added MONO_USE_MANAGED_COLLATION=no almost everywhere.
885 * MSCompatUnicodeTable.cs : FillCJK() was not invoked. Now it is
886 invoked at any time it is required.
887 * SimpleCollator.cs : call FillCJK() above in .ctor().
888 * MSCompatUnicodeTableUtil.cs : CJK range was wider.
889 * create-mscompat-collation-table.cs : CJK binary was missing the
890 length. CJK remapping is being moved to ModifyUnidata().
891 For cjk-ja mapping, we have to consider compat characters to be
892 added to the map, besides the raw UCA table.
894 2005-07-12 Atsushi Enomoto <atsushi@ximian.com>
896 * SortKeyBuffer.cs : Fixed shift level computation to match w/ Windows.
898 2005-07-12 Atsushi Enomoto <atsushi@ximian.com>
900 * SimpleCollator.cs : fixed LastIndexOf() to handle _target's_
901 contraction as expected. Fixed Compare() to save s2's contraction
903 * TestDriver.cs :added LastIndexOf() tester w/ indexes.
905 2005-07-12 Atsushi Enomoto <atsushi@ximian.com>
907 * managed-collation.patch : Fixed IsPrefix() and IsSuffix(). They
908 incorrectly use Compare().
909 * TestDriver.cs : more moved to nunit tests.
911 2005-07-12 Atsushi Enomoto <atsushi@ximian.com>
913 * SimpleCollator.cs : several fixes on Compare().
914 - Ignorable characters are skippted at the top of the loop.
915 - IgnoreNonSpace is checked to avoid extraneous level 2 comparison.
916 - In such case that s1 index is increased while s2 contraction is
917 replaced, s1 is inconsistently proceeded (bug).
918 - IsIgnorable() now also checks IgnoreNonSpace.
919 - Fixed FilterOptions() that does not work for IgnoreWidth at all.
920 * TestDriver.cs : now some are moved to nunit tests.
921 * Collation-notes.txt : minor todo update.
923 2005-07-11 Atsushi Enomoto <atsushi@ximian.com>
925 * SimpleCollator.cs : Compare() was ignoring such case that both
926 entire strings have '-' to be compared.
927 * Collation-notes.txt : more status updates.
928 * TestDriver.cs : added '-' use cases.
930 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
932 * SimpleCollator.cs : to be same as other buggy part, it now handles
933 U+3005, U+3031 and U+3032 as buggy as Windows. It just repeats
935 Fixed GetSortKey(): if the repeater is U+3005, second weight is 5.
936 * create-mscompat-collation-table.cs : dummy values for extenders.
938 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
940 * SimpleCollator.cs : Special weight fixes on GetSortKey(). Dash type
941 should be computed from ExtenderType, and voice mark weight should
943 * MSCompatUnicodeTable.cs : added tiny comment.
945 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
947 * SortKey.cs : It borked when MONO_USE_MANAGED_COLLATION is not yes.
948 * SimpleCollator.cs : support for extender (U+309D etc.).
950 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
952 * create-mscompat-collation-table.cs : some punct/symbols fix.
953 * managed-collation.patch : new (and temporary) file to support
954 managed collation in mscorlib.
955 * README : described how to use managed collation.
957 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
959 * create-mscompat-collation-table.cs : Further Cyrillic fixes. Handle
960 U+482-4C8 (though needs diacritical fixes).
961 * MSCompatUnicodeTable.cs : tiny comment for alternative impl.
963 2005-07-08 Atsushi Enomoto <atsushi@ximian.com>
965 * create-mscompat-collation-table.cs : Reimplemented Cyrillic weight
966 computation code, since it looks like the same way as Latin letters
967 have. Thus removed all other approach (UCA, by letter name).
969 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
971 * create-mscompat-collation-table.cs : diacritical fix for "double-
972 struck". Syriac nonspacing fixes.
974 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
976 * create-mscompat-collation-table.cs : more math symbol weight fixes.
978 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
980 * create-mscompat-collation-table.cs : fixed Hebrew character sortkeys.
982 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
984 * create-mscompat-collation-table.cs : math symbols U+25A0-U+2600 are
985 implemented (no stub). Some other fixes on category 8-A.
987 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
989 * create-mscompat-collation-table.cs : some minor fixes on Arabic,
990 Korean and Japanese sortkey weights.
992 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
994 * create-mscompat-collation-table.cs : More diacritical fixes.
995 Georgian characters do not have level 2 weights but level 3.
997 2005-07-07 Atsushi Enomoto <atsushi@ximian.com>
999 * create-mscompat-collation-table.cs : Roman numeral characters
1000 have diacritical weight. quick hack for control signs (U+2400..)
1003 2005-07-06 Atsushi Enomoto <atsushi@ximian.com>
1005 * create-mscompat-collation-table.cs : improving Latin mappings.
1006 Setting non-ASCII Latin characters' primary weight between those
1007 ASCII characters, and setting diacritical weight (hacky).
1008 * MSCompatUnicodeTable.cs :
1009 Kanatype check: fixed (voice marks) and improved (comparison order).
1011 2005-07-06 Atsushi Enomoto <atsushi@ximian.com>
1013 * create-mscompat-collation-table.cs : more diacritical fixes.
1014 primary weight fixes on punctuations in category 07.
1016 2005-07-06 Atsushi Enomoto <atsushi@ximian.com>
1018 * create-mscompat-collation-table.cs : several diacritical fixes.
1019 * TestDriver.cs : sortkey dumper should use StringSort.
1021 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1023 * SimpleCollator.cs : fixed incorrect indexer setup. Optimized
1024 GetContraction() call a bit.
1026 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1028 * create-mscompat-collation-table.cs : fixed incorrect level 2
1030 * MSCompatUnicodeTable.cs : remove debug line.
1032 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1034 * MSCompatUnicodeTableUtil.cs,
1035 MSCompatUnicodeTable.cs,
1036 CodePointIndexer.cs,
1037 create-mscompat-collation-table.cs : made some members internal and
1038 accessible from other classes. Many indexes could be 0 by default.
1039 * SimpleCollator.cs : optimizations. avoid method call.
1041 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1043 * Collation-notes.txt : more updates.
1044 * SimpleCollator.cs : Added quick check for Ordinal comparison.
1045 Fixed special weight comparison. It cannot be customizable in the
1046 implementation (and it won't be harmful).
1047 * mono-tailoring-source.txt : thus updated comment.
1049 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1051 * SimpleCollator.cs : Compare() was missing French sort support.
1052 * TestDriver.cs : added example case.
1054 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1056 * Collation-notes.txt : updated status. Eliminated descriptions on
1057 "iterator" (I avoided it for performance concern). Fixed misc.
1058 incorrect descriptions.
1060 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1062 * Collator.cs : Now that SimpleCollator became feature complete, it is
1065 2005-07-05 Atsushi Enomoto <atsushi@ximian.com>
1067 * SimpleCollator.cs : implemented decent Compare() that immediately
1068 stops at first primary difference.
1070 2005-07-04 Atsushi Enomoto <atsushi@ximian.com>
1072 * SimpleCollator.cs : indexers might return -1.
1074 2005-07-04 Atsushi Enomoto <atsushi@ximian.com>
1076 * SimpleCollator.cs : IsPrefix() and IsSuffix() optimization code was
1077 buggy (length check for source was missing).
1079 2005-07-04 Atsushi Enomoto <atsushi@ximian.com>
1081 * create-mscompat-collation-table.cs : Fixed tailoring table output
1082 to be in correct and countable order. Now if tailoring alias was not
1083 found, just stop the build.
1084 * MSCompatUnicodeTable.cs : several build fixes. Now it works to read
1086 * mono-tailoring-source.txt : commented out CJK aliases that miss
1088 * Makefile : needed further filename fixes.
1090 2005-07-04 Atsushi Enomoto <atsushi@ximian.com>
1092 * MSCompatUnicodeTable.cs : renamed from MSCompatUnicodeTable.template
1093 (now it is working as a standalone file).
1094 * Makefile : renamed generated file as MSCompatUnicodeTableGenerated.cs
1095 (the generator now creates both binary resources and C# source).
1097 2005-07-04 Atsushi Enomoto <atsushi@ximian.com>
1099 * create-mscompat-collation-table.cs : Now it generates binary
1100 resources (to parent directory).
1101 * MSCompatUnicodeTable.template : added conditional code that fills
1102 collation tables from manifest resources.
1103 * Makefile : remove collation table binaries as well on "make clean".
1104 Removed extraneous dependency.
1106 2005-07-01 Atsushi Enomoto <atsushi@ximian.com>
1108 * MSCompatUnicodeTable.template,
1109 SimpleCollator.cs : removed extraneous GetExpansion().
1111 2005-07-01 Atsushi Enomoto <atsushi@ximian.com>
1113 * SimpleCollator.cs : IsSuffix() also supports contractions.
1114 * TestDriver.cs : IsSuffix() example contraction cases.
1116 2005-07-01 Atsushi Enomoto <atsushi@ximian.com>
1118 * SimpleCollator.cs : reverted IsSuffix() to return bool (to match w/
1119 what current IsPrefix() does). For expansion of target, IsPrefix()
1120 should check the no-match case that expansion is longer than input.
1121 Some refactory on IsPrefix().
1122 Added GetContractionTal() for IsSuffix() (not used yet).
1124 2005-07-01 Atsushi Enomoto <atsushi@ximian.com>
1126 * TestDriver.cs : added IsPrefix() expansion cases.
1127 * SimpleCollator.cs : IsPrefix() now supports contractions (with much
1128 of complexity), and it now returns bool again.
1129 IndexOf() for replacement should make use of IndexOfPrimitiveChar()
1130 since expansions won't be expanded recursively.
1132 2005-07-01 Atsushi Enomoto <atsushi@ximian.com>
1134 * SimpleCollator.cs : commonized character comparison in IsPrefix()
1135 and IsSuffix(). csc compile fix.
1136 * CompareInfoImpl.cs : deleted.
1138 2005-06-30 Atsushi Enomoto <atsushi@ximian.com>
1140 * TestDriver.cs : added SimpleCollator.ctor() sanity check.
1141 Added replacement contraction example.
1142 * SimpleCollator.cs : Now IndexOf() and LastIndexOf() support
1143 contraction in source string. Extracted matching code to Matches().
1144 Replacement contraction was including extraneous '\x0'.
1146 2005-06-30 Atsushi Enomoto <atsushi@ximian.com>
1148 * Collation-notes.txt : updated status.
1149 * CollationDataStructures.txt : tiny fixes.
1150 * SimpleCollator.cs :
1151 Renamed alias Util to UUtil (MS sys.enterprisesvc has sucky global
1152 namespace Util and csc borked).
1153 GetContraction was incorrectly returning first item.
1154 Private IsPrefix() now returns int (but it might not be in real use).
1155 Extracted simple char comparison to CompareCharSimple().
1156 IndexOf() and LastIndexOf() now fully handle contractions (both
1157 binary key and string replacement) in "target" (for "s" not yet).
1158 * TestDriver.cs : be more verbose.
1159 * mono-tailoring-source.txt : added comment.
1160 * MSCompatUnicodeTable.template :
1161 Renamed alias Util to UUtil (MS sys.enterprisesvc has sucky global
1163 2005-06-30 Atsushi Enomoto <atsushi@ximian.com>
1165 * create-mscompat-collation-table.cs : compute COMBINING blah marks as
1166 well as those characters WITH blah.
1167 * TestDriver.cs : added combining sortkey cases.
1169 2005-06-30 Atsushi Enomoto <atsushi@ximian.com>
1171 * mono-tailoring-source.txt : fixed description on '*' in sortkeys.
1172 * SimpleCollator.cs : Now it fully uses tailoring info. Fixed
1173 contraction search that worked only when string is contraction.
1174 Removed commented code. Minor refactoring.
1175 * TestDriver.cs : added example that uses "ZS" in Hungarian sorting.
1177 2005-06-29 Atsushi Enomoto <atsushi@ximian.com>
1179 * create-mscompat-collation-table.cs,
1180 * mono-tailoring-source.txt : removed extraneous level 4 sortkey
1181 which cannot be supported.
1182 * SimpleCollator.cs : added GetContraction() and used in some places.
1183 Now CompareOptions is set only once. Reordered some code (e.g.
1184 ignorable check -> get compat char -> compare).
1186 2005-06-29 Atsushi Enomoto <atsushi@ximian.com>
1188 * SimpleCollator.cs : sort tailoring tables before actual usage.
1189 Support diacritical remappings (it is customized collation rule
1190 which does not exist in UCA).
1192 2005-06-29 Atsushi Enomoto <atsushi@ximian.com>
1194 * SimpleCollator.cs : build culture specific tailoring table from
1195 TailoringInfo and unified data array.
1196 * create-mscompat-collation-table.cs : Added null termination to
1197 sortkey map tailorings (mostly to save my eyes).
1198 * MSCompatUnicodeTable.template : added public TailoringValues.
1200 2005-06-29 Atsushi Enomoto <atsushi@ximian.com>
1202 * SortKeyBuffer.cs : handle special weight (category 06) characters.
1203 * Collation-notes.txt : Updated description on special weight (it was
1205 * TestDriver.cs : added special weight cases.
1207 2005-06-29 Atsushi Enomoto <atsushi@ximian.com>
1209 * MSCompatUnicodeTable.template : added GetTailoringInfo().
1210 * SimpleCollator.cs : Now tailoring information is acquired and used.
1211 (FrenchSort is supported but Compare() won't work expectedly since
1212 the table is still incomplete for those diacritical marks).
1213 * SortKeyBuffer.cs : On reversing diacritical weights, it should
1214 ignore zeros. Reset() should reset frenchSorted flag.
1216 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1218 * create-mscompat-collation-table.cs : Further fixes on Jamo,
1219 diacritical weights by character name, and *Numbers primary weights.
1221 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1223 * create-mscompat-collation-table.cs : More fix on Devanagari,
1224 Gujarati, Oliya, Tamil and Lao sortkeys.
1226 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1228 * create-mscompat-collation-table.cs : Fixed Georgian, Thai, Gurmukhi
1231 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1233 * create-mscompat-collation-table.cs : Fixed Thai character primary
1234 and secondary values. Fixed Thaana letters. Added more LAMESPEC
1235 CJK compat. Fixed some circled CJK secondary weight.
1236 Hacked some nonspacing mark sortkey value adjustment.
1238 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1240 * create-mscompat-collation-table.cs : CP932.TXT was not parsed as
1241 expected. JIS ordering was incorrect. OtherNumbers that represents
1242 10 or more values were incorrectly computed the offset. Some Hangul
1243 compat characters has different offset.
1245 2005-06-28 Atsushi Enomoto <atsushi@ximian.com>
1247 * create-mscompat-collation-table.cs : Fixed 0x8 category characters.
1248 Added hack for need-to-be-fixed characters to fall into 0xA category.
1249 * create-collation-element-table.cs : previous checkin seem failed :(
1250 * README: updated a bit.
1252 2005-06-24 Atsushi Enomoto <atsushi@ximian.com>
1254 * CodePointIndexer.cs :
1255 removed extraneous switch (I could use empty array for that need).
1256 * CollationElementTableUtil.cs : primary weight type became ushort.
1257 * create-collation-element-table.cs : several bugfixes.
1258 collElem should be int. It was skipping most of entries because of
1259 incorrect string tokenization.
1261 2005-06-23 Atsushi Enomoto <atsushi@ximian.com>
1263 * create-mscompat-collation-table.cs : handle some Jamo NKFD.
1265 2005-06-23 Atsushi Enomoto <atsushi@ximian.com>
1267 * SimpleCollator.cs : forgot to commit in the last checkin.
1268 * create-mscompat-collation-table.cs : fixed arabic shift weight chars.
1269 * TestDriver.cs : switch table dumper and collator testing.
1270 * SortKey.cs : for now comment out internal indexes (not in use).
1272 2005-06-23 Atsushi Enomoto <atsushi@ximian.com>
1274 * MSCompatUnicodeTable.template,
1275 SimpleCollator.cs : support for culture dependent CJK table.
1277 2005-06-23 Atsushi Enomoto <atsushi@ximian.com>
1279 * create-mscompat-collation-table.cs,
1280 MSCompatUnicodeTableUtil.cs : make CJK table more compact.
1282 2005-06-22 Atsushi Enomoto <atsushi@ximian.com>
1284 * SimpleCollator.cs : Fixed stupid index search when start != 0.
1286 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1288 * SimpleCollator.cs : fixed my misunderstanding on LastIndexOf(). It
1289 now starts from "start" and proceeds backward by "length".
1290 * TestDriver.cs : fix warning.
1292 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1294 * TestDriver.cs : more tests.
1295 * SimpleCollator.cs : LastIndexOf() is not setting search length
1296 on iteration. Quick workaround fro String.LastIndexOf() bug (maybe).
1298 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1300 * create-normalization-source.cs : output propValue as uint.
1302 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1304 * SortKey.cs : Now it is System.Globalization.SortKey.
1305 To replace existing implementation, it now requires lcid and
1306 CompareOptions. Added required members.
1307 * SortKeyBuffer.cs : thus .ctor() requires LCID.
1308 * SimpleCollator.cs : made required changes above.
1310 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1312 * CodePointIndexer.cs : added CompressArray(). Now it requires two more
1313 parameters for default index and codepoint.
1314 * CollationElementTableUtil.cs,
1315 NormalizationTableUtil.cs : required changes wrt above change.
1316 * MSCompatUnicodeTableUtil.cs : added for several codepoint indexers.
1317 * MSCompatUnicodeTable.template : Now it uses codepoint indexer.
1318 * create-mscompat-collation-table.cs : Now it outputs compressed array.
1319 * Makefile : now collation requires MSCompatUnicodeTableUtil.cs
1321 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1323 * SimpleCollator.cs :
1324 Implemented IsSuffix() and LastIndexOf().
1325 Several fixes on index > 0 cases.
1326 * TestDriver.cs : sample IsSuffix() and LastIndexOf() usage and more.
1328 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1330 * Collation-notes.txt : updated (status, impl. classes).
1331 * MSCompatUnicodeTable.cs : Korean Jamo are not really expansions.
1333 2005-06-21 Atsushi Enomoto <atsushi@ximian.com>
1335 * SimpleCollator.cs : implemented IndexOf(string,string,CompareOptions)
1336 and IsPrefix(). Tiny code refactory.
1337 * TestDriver.cs : sample IsPrefix() and IndexOf() usage.
1338 * MSCompatUnicodeTable.cs : tiny refactory for CodePointIndexer use.
1340 2005-06-20 Atsushi Enomoto <atsushi@ximian.com>
1342 * SimpleCollator.cs :
1343 IndexOf(string, char, CompareOptions) implementation.
1344 * TestDriver.cs : sample IndexOf() usage.
1346 2005-06-20 Atsushi Enomoto <atsushi@ximian.com>
1348 * create-mscompat-collation-table.cs : was missing most important
1349 kind of blocks - equivalent expansions (e.g. invariant mappings).
1350 More readable mappings.
1352 2005-06-20 Atsushi Enomoto <atsushi@ximian.com>
1354 * mono-tailoring-source.txt : new file. It describes tailoring
1355 information. Basically examined under .NET 1.x.
1356 * create-mscompat-collation-table.cs : consume the file above.
1357 * MSCompatUnicodeTable.template : now tailorings is not a stub.
1358 * CollationDataStructures.txt : minor fixes.
1360 SimpleCollator.cs : added FrenchSort support.
1361 * Collation-notes.txt : added description on Latin primary weights.
1362 * ldml-limited.rng : added note.
1363 * create-tailorings.cs : added note. more serialization (but won't be
1366 2005-06-17 Atsushi Enomoto <atsushi@ximian.com>
1368 * SortKeyBuffer.cs : non-primary character is added to previous
1370 * TestDriver.cs : added example case of above.
1372 2005-06-17 Atsushi Enomoto <atsushi@ximian.com>
1374 * SimpleCollator.cs : IgnoreSymbols support.
1375 * TestDriver.cs : compilation fix. IgnoreSymbols example.
1376 * create-mscompat-collation-table.cs : more Hangul fixes.
1378 2005-06-17 Atsushi Enomoto <atsushi@ximian.com>
1380 * create-mscompat-collation-table.cs : more Hangul fixes.
1381 * SortKey.cs : it will replace sys.globalization.SortKey. It has
1382 some internal members.
1383 * SortKeyBuffer.cs : now it uses SortKey instead of byte[].
1384 * SimpleCollator.cs : CompareOptions support. However I don't think
1385 it will be developed anymore since SortKey never enables IndexOf().
1386 * TestDriver.cs : a few CompareOptions cases.
1388 2005-06-16 Atsushi Enomoto <atsushi@ximian.com>
1390 * SimpleCollator.cs : simple collator implementation that just will
1391 use GetSortKey() for all its basis.
1392 * TestDriver.cs : sample code that uses this collator set.
1393 * MSCompatUnicodeTable.template : removed test driver from here.
1395 2005-06-16 Atsushi Enomoto <atsushi@ximian.com>
1397 * create-mscompat-collation-table.cs : Hangul fixes.
1398 Now less than 300 characters that does not have sortkey weights.
1399 * MSCompatUnicodeTable.template : added FIXME info for Hangul Jamo.
1401 2005-06-16 Atsushi Enomoto <atsushi@ximian.com>
1403 * create-mscompat-collation-table.cs : Added control picture mappings.
1404 Minor primary weight fixes.
1406 2005-06-16 Atsushi Enomoto <atsushi@ximian.com>
1408 * create-mscompat-collation-table.cs : Added mappings for box
1409 drawings and blocks.
1411 2005-06-16 Atsushi Enomoto <atsushi@ximian.com>
1413 * create-mscompat-collation-table.cs : Added mappings for arrows.
1415 2005-06-15 Atsushi Enomoto <atsushi@ximian.com>
1417 * create-mscompat-collation-table.cs : added support for letterlike
1418 characters and squared CJK compatibility characters, ordered by
1419 character names (0x0E category).
1420 * Collation-notes.txt : added description on that.
1422 2005-06-15 Atsushi Enomoto <atsushi@ximian.com>
1424 * MSCompatUnicodeTable.template : Now expansions are simulated.
1425 * create-mscompat-collation-table.cs : filled Korean number level2.
1426 Reordered some code blocks to fill correct diacritical differences.
1427 * Collation-notes.txt : some corrections and minor additions.
1429 2005-06-15 Atsushi Enomoto <atsushi@ximian.com>
1431 * MSCompatUnicodeTable.template :
1432 Now dumper test driver uses SortKeyBuffer for dogfooding.
1433 * create-mscompat-collation-table.cs : some diacritical level fixes
1434 (with non-working extra latin check).
1435 * SortKeyBuffer.cs : several fixes to get working as a practical code.
1436 * Collator.cs : make it compilable, leaving things as NotImplemented.
1438 2005-06-15 Atsushi Enomoto <atsushi@ximian.com>
1440 * create-mscompat-collation-table.cs : some fixes on primary category
1441 07 (miscellaneous symbols and punctuations).
1443 2005-06-14 Atsushi Enomoto <atsushi@ximian.com>
1445 * create-mscompat-collation-table.cs : more mapping fix on numbers,
1446 letters, variable weight characters, circled Japanese and CJK.
1447 * MSCompatUnicodeTable.template : fixed HasSpecialWeight() to be more
1448 inclusive. Simplified dumper code.
1450 2005-06-14 Atsushi Enomoto <atsushi@ximian.com>
1452 * create-mscompat-collation-table.cs : finished Hangul (both Jamo
1453 and Syllables). sortkey dumper diff lines became 8000 from 30000.
1455 2005-06-14 Atsushi Enomoto <atsushi@ximian.com>
1457 * create-mscompat-collation-table.cs : added some nonspacing marks in
1458 either correct or hacky way.
1460 2005-06-13 Atsushi Enomoto <atsushi@ximian.com>
1462 * create-mscompat-collation-table.cs : several improvements. Japanese
1463 Kana support, Hebrew accents, Bengali nonspacing marks, sorting of
1464 numeric characters, diacritically decorated latin alphabets. Fixed
1465 some diacritical weights detection.
1466 * MSCompatUnicodeTable.cs : tiny Japanese fix. Handle nonspacing
1467 marks' primary weight as empty.
1468 * Collation-notes.txt : some updates.
1470 2005-06-13 Atsushi Enomoto <atsushi@ximian.com>
1472 * create-mscompat-collation-table.cs : don't process nonexact NFKD
1473 mapping as equivalent, however store CJK extensions into NFKD map
1474 even if one does not strictly match.
1475 Now am going to fill Hangul into tables (unlike UCA it does not look
1476 possible to calculate sortkey value).
1477 Fixed Cyrillic and Georgian UCA based orderings.
1478 * MSCompatUnicodeTable.template : added CJK extension sortkey
1481 2005-06-10 Atsushi Enomoto <atsushi@ximian.com>
1483 * create-mscompat-collation-table.cs : Fixed latin alphabet support.
1484 Added latin with diacritical and CJK extension.
1485 * MSCompatUnicodeTable.cs : modified dumper code a bit (for my purpose).
1487 2005-06-10 Atsushi Enomoto <atsushi@ximian.com>
1489 * create-mscompat-collation-table.cs : now parses DerivedAge.txt (right
1490 now not used thouth). Filled CJK ideograph, still not perfect.
1491 Fixed number primary keys. NFKD numbers and CJK ideographs are now
1492 considered, including brackets elimination.
1493 * Makefile : now it downloads DerivedAge.txt.
1494 * MSCompatUnicodeTable.template : added dummy code dumper. It computes
1495 PrivateUse, Surrogate and Hangul Syllables.
1496 * Collation-notes.txt : Noted that Hangul Syllables need more love.
1498 2005-06-09 Atsushi Enomoto <atsushi@ximian.com>
1500 * create-tailorings.cs : added configuration support. sort them.
1501 I wonder if it is really usable. Having own format might be better.
1502 * create-mscompat-collation-table.cs : fixing some sortkey numbers,
1503 making closer to windows. Now it handles NFKD in some places.
1504 * MSCompatUnicodeTable.template : Added dummy sortkey dumper driver.
1505 * CollationDataStructures.txt : added description on tailoring
1506 fields, though they are subject to change.
1508 2005-06-07 Atsushi Enomoto <atsushi@ximian.com>
1510 * create-tailorings.cs, ldml-limited.rng : new file.
1511 * LdmlReader.cs : removed old file.
1513 2005-06-07 Atsushi Enomoto <atsushi@ximian.com>
1515 * SortKeyBuffer.cs : split from Collator.cs. Now it considers
1516 practical use, reflecting updated sortkey constant design.
1517 Especially level 4 weight is split to 4 arrays that are merged in
1518 the last stage of GetSortKey().
1519 * Collator.cs : thus SortKeyBuffer is removed from here.
1520 Additionally, removed some extraneous bits in other classes.
1521 * Collation-notes.txt : Some editorial fixes. Added information on
1522 Korean matter (how to compute Hangle Syllables / Hangul Jamo cannot
1523 be stored in simple byte arrays).
1524 * CodePointIndexer.cs,
1525 create-collation-element-table.cs,
1526 CollationElementTable.template,
1527 NormalizationTableUtil.cs : short CodePointIndexer method names.
1528 * create-mscompat-collation-table.cs : Additional info on why some
1529 meaningful characters are ignored in Windows (Unicode version
1530 difference). Removed U+070F from special check (was extraneous).
1532 2005-06-06 Atsushi Enomoto <atsushi@ximian.com>
1534 * MSCompatUnicodeTable.template:
1535 Moved body implementation to table creator and put those bool
1536 results into an array.
1537 * create-mscompat-collation-table.cs :
1538 So imported those methods. Modified array output to emit "0x"
1539 only for more than 9.
1540 * create-normalization-source.cs : ditto on "0x" output matter.
1541 * CollationDataStructures.txt : so now it holds ignorableFlags.
1543 2005-06-03 Atsushi Enomoto <atsushi@ximian.com>
1545 * Collation-notes.txt, CollationDataStructures.txt :
1546 separate document for data structure design.
1548 2005-06-03 Atsushi Enomoto <atsushi@ximian.com>
1550 * create-mscompat-collation-table.cs : added culture-dependent CJK
1551 table creation. It uses CLDR as its basis. (Culture independent CJK
1553 * Makefile : added CLDR archive downloading support.
1554 * MSCompatUnicodeTable.template : tiny renamings.
1555 * Collation-notes.txt : additional CJK info.
1557 2005-06-02 Atsushi Enomoto <atsushi@ximian.com>
1559 * Collation-notes.txt, create-mscompat-collation-table.cs :
1560 added secondary weight support for BlahNumber characters.
1562 2005-06-01 Atsushi Enomoto <atsushi@ximian.com>
1564 * downloaded : added directory. All downloaded files are stored here.
1565 * Makefile : use "downloaded" directory.
1566 Added more auto-download stuff.
1567 * create-mscompat-collation-table.cs :
1568 Added Japanese square kana support.
1570 2005-06-01 Atsushi Enomoto <atsushi@ximian.com>
1572 * Collation-notes.txt : added Estrangela (ancient Syriac) and Thaana.
1573 * create-mscompat-collation-table.cs : added support for Arabic abjad,
1574 Estrangela and Thaana.
1575 * MSCompatUnicodeTable.template : removed BOM.
1577 2005-05-31 Atsushi Enomoto <atsushi@ximian.com>
1579 * Collation-notes.txt : wrong comment cleanup and spelling fixes.
1580 * create-mscompat-collation-table.cs : added diacritic support for
1581 Latin letters (as long as covered in primary weight).
1583 2005-05-31 Atsushi Enomoto <atsushi@ximian.com>
1585 * Makefile : minor fixes. Added warning lines to generated sources.
1587 2005-05-31 Atsushi Enomoto <atsushi@ximian.com>
1589 * create-char-mapping-source.cs :
1590 Removed ToWidthInsensitive() generation.
1592 2005-05-31 Atsushi Enomoto <atsushi@ximian.com>
1594 * create-mscompat-collation-table.cs : Now it dumps level1 to 3 values.
1595 ToWidthInsensitive() is implemented here, using an array (which is
1596 to be optimized using CodePointIndexer).
1597 * MSCompatUnicodeTable.cs : renamed as MSCompatUnicodeTable.template
1598 * MSCompatUnicodeTable.template : now it is used to generate
1599 MSCompatUnicodeTable.cs which got ready to be used.
1600 * Makefile : added MSCompatUnicodeTable.cs build support. Now it
1601 supports "make normalization" and "make collation".
1603 2005-05-30 Atsushi Enomoto <atsushi@ximian.com>
1605 * Collation-notes.txt : Description on ICU is very incorrect. Now it
1606 became more rational and sane.
1607 * create-mscompat-collation-table.cs : fixed some indexes.
1608 * Makefile : added "mstablegen" target.
1609 * MSCompatUnicodeTable.cs : removed GetPrimaryWeight(). Minor fix.
1611 2005-05-26 Atsushi Enomoto <atsushi@ximian.com>
1613 * Collation-notes.txt : more analysis on "letters".
1614 * create-mscompat-collation-table.cs : more proof of concepts.
1616 2005-05-25 Atsushi Enomoto <atsushi@ximian.com>
1618 * Collation-notes.txt : more info. Started letter sortkey analysis
1619 (some of other stuff are really non-understandable right now.)
1620 * create-mscompat-collation-table.cs : table generator proof-of-
1621 concept source (not compilable).
1622 * MSCompatUnicodeTable.cs : moved some code to the new source.
1625 2005-05-20 Atsushi Enomoto <atsushi@ximian.com>
1627 * Collation-notes.txt : started level 2 weight analysis.
1629 2005-05-19 Atsushi Enomoto <atsushi@ximian.com>
1631 * Collation-notes.txt : Additional information on how to create
1633 * MSCompatUnicodeTable.cs : implemented part of GetLevel3Weight().
1635 2005-05-19 Atsushi Enomoto <atsushi@ximian.com>
1637 * Collation-notes.txt : More case weight (level 3) analysis. I'm
1638 likely to just write table generator.
1640 2005-05-18 Atsushi Enomoto <atsushi@ximian.com>
1642 * MSCompatUnicodeTable.cs : part of level 4 weight implementation.
1644 2005-05-18 Atsushi Enomoto <atsushi@ximian.com>
1646 * Collation-notes.txt :
1648 Revised comparison methods; backward iteration is possible.
1649 More on char-by-char comparison.
1650 Level 4 comparison is actually a bit more complex.
1652 * Collator.cs : some conceptual updates wrt above.
1654 2005-05-17 Atsushi Enomoto <atsushi@ximian.com>
1656 * Collation-notes.txt : Japanese voice mark is level 2, and Hangul
1657 properties are level 3.
1659 2005-05-17 Atsushi Enomoto <atsushi@ximian.com>
1661 * Collation-notes.txt : Make it more readable. More analysis on
1662 level 3 and 4 sortkey structures.
1663 * Collator.cs : some compilation fixes (not compilable yet).
1665 2005-05-16 Atsushi Enomoto <atsushi@ximian.com>
1667 * Collation-notes.txt : Analysis on variable-weighting (level 5)
1669 * Collator.cs : updated corresponding part of level 5, and more.
1671 2005-05-13 Atsushi Enomoto <atsushi@ximian.com>
1673 * Collation-notes.txt : more updates.
1674 * Collator.cs : rewrote from scratch. Some rough sketch for sortkey
1675 buffer, character iterator and collator methods. Not compiling.
1677 2005-05-13 Atsushi Enomoto <atsushi@ximian.com>
1679 * Collator.cs : Am going to replace it with new one. No need for
1680 CompareOptions-dependent Comparer.
1682 2005-05-13 Atsushi Enomoto <atsushi@ximian.com>
1684 * Collation-notes.txt : There seems a bit more complexity.
1686 2005-05-10 Atsushi Enomoto <atsushi@ximian.com>
1688 * Collation-notes.txt : more updates, being close to write sortkey
1691 2005-05-09 Atsushi Enomoto <atsushi@ximian.com>
1693 * CompareInfoImpl.cs, Collator.cs : conceptual update
1694 * Collation-notes.txt : some corrections and additions.
1695 * Makefile : added LDML input (but it won't be used at all).
1697 2005-04-28 Atsushi Enomoto <atsushi@ximian.com>
1699 * Collation-notes.txt : more updates.
1701 2005-04-26 Atsushi Enomoto <atsushi@ximian.com>
1703 * Collation-notes.txt : more updates.
1705 2005-04-26 Atsushi Enomoto <atsushi@ximian.com>
1707 * Collation-notes.txt : some updates.
1708 * create-mapping-char-source.cs : superscripts and subscripts are also
1709 ignored in IgnoreWidth comparison.
1710 * Makefile : tiny touch fix.
1712 2005-04-25 Atsushi Enomoto <atsushi@ximian.com>
1714 * CompareInfoImpl.cs, Collator.cs : conceptual stuff (not working).
1716 2005-04-25 Atsushi Enomoto <atsushi@ximian.com>
1718 * create-char-mapping-source.cs : Now it generates
1719 ToWidthInsensitive() from combining category <wide> and <narrow>.
1720 * MSCompatUnicodeTable.cs : added ToKanaTypeInsensitive() and
1721 ToWidthInsensitive() for IgnoreKanaType and IgnoreWidth.
1723 2005-04-25 Atsushi Enomoto <atsushi@ximian.com>
1725 * README, LdmlReader.cs, DataStructures.txt : new files.
1727 2005-04-25 Atsushi Enomoto <atsushi@ximian.com>
1729 * CodePointIndexer.cs,
1730 Collation-notes.txt,
1731 CollationElementTable.template,
1732 CollationElementTableUtil.cs,
1733 create-char-mapping-source.cs,
1734 create-collation-element-table.cs,
1735 create-combining-class-source.cs,
1736 create-normalization-source.cs,
1738 MSCompatUnicodeTable.cs,
1739 Normalization.template,
1740 NormalizationTableUtil.cs : initial checkin (to private branch).