2005-09-01 Atsushi Enomoto * README, Collation-notes.txt, CollationDataStructures.txt : removing obsolete info and some added some notes. 2005-08-10 Atsushi Enomoto * Normalization.cs : remove warned code. * managed-collation.patch : now it's not required anymore. 2005-08-10 Atsushi Enomoto * MSCompatUnicodeTable.cs : added IsSortable(string). 2005-08-10 Atsushi Enomoto * SimpleCollator.cs : Now all collator methods are thread safe. All instance non-readonly fields turned into arguments of every methods that use those fields. (Sadly it is the end of no-memory-cost collator era. mcs bootstrap now needs +100KB memory consumption.) 2005-08-09 Atsushi Enomoto * SimpleCollator.cs : made "checkedFlags" as nullable and made it as an argument of every index methods (to make it thread safe). 2005-08-09 Atsushi Enomoto * SimpleCollator.cs, MSCompatUnicodeTable.cs : - Now IsIgnorable() is aggregated to be one invokation to check completely ignorable, nonspacing and symbols. - Introduced "already checked" flags for IndexOf() and LastIndexOf() to skip sortkey binary check on the same characters. Significant perf. improvement for such case as IndexOf("AABCBABC...Z",'Z'). 2005-08-08 Gert Driesen * SortKey.cs: Marked Serializable to match MS.NET. 2005-08-08 Atsushi Enomoto * create-mscompat-collation-table.cs, Makefile : changed resources output directory. 2005-08-04 Atsushi Enomoto * create-normalization-tests.cs, StringNormalizationTestSource.cs : new files for Unicode Normalization test generator. * Makefile : added support for above. 2005-08-03 Atsushi Enomoto * NormalizationTableUtil.cs : oops, it does not compile. * managed-collation.patch : I guess having managed resource would be better for collation. At least current code has such #define so Makefile should be in sync with it. 2005-08-03 Atsushi Enomoto * create-normalization-source.cs : Fixed CharMapComparer which incorrectly returned 0 when the second arg is shorter. Reduced extraneous helperIndex map. Other minor fixes and code removal. * Normalization.cs : several fixes to support blocked combine handling. * NormalizationTableUtil.cs : tiny member renaming. 2005-08-03 Atsushi Enomoto * create-normalization-source.cs, NormalizationTableUtil.cs, Normalization.cs : several bugfixes on index miscomputation. Renamed using aliases (csc will bork). Primary combine safety is now computed during UnicodeData.txt parse. Maximum NFKD length was 18, not 4 (U+FDFA). 2005-08-02 Atsushi Enomoto * managed-collation.patch : added Normalization support. * managed-collation-icall.patch : added, including normalization stuff. BTW when will collation code checked in? 2005-08-02 Atsushi Enomoto * create-normalization-source.cs : Unified three normalization source generators, to compute IsUnsafe flag. Fixed helperIndex array type in C header output. * create-char-mapping-source.cs, create-combining-class-source.cs : thus removed. * Makefile : thus modified for the above integration. * NormalizationTableUtil.cs : Extended to contain IsUnsafe flag. * Normalization.cs : Several fixes to make Normalize() actually work. 2005-07-29 Atsushi Enomoto * create-normalization-source.cs, Normalization.cs, create-char-mapping-source.cs, create-combining-class-source.cs, Makefile : converted managed array to pointers (like collation stuff). 2005-07-29 Atsushi Enomoto * NormalizationTableUtil.cs : further table range optimization. * create-normalization-source.cs, create-char-mapping-source.cs, create-combining-class-source.cs : added C header output support. 2005-07-29 Atsushi Enomoto * create-normalization-source.cs, Normalization.cs : Now property size is < 256, so directly embed value in "props" array. Add QuickCheck(c,checkType) and remove IsNFD/C/KD/KC and delegates. 2005-07-29 Atsushi Enomoto * create-combining-class-source.cs, create-char-mapping-source.cs, create-normalization-source.cs, NormalizationTableUtil.cs, Normalization.cs : String.Normalize() does not handle surrogate characters. mapping information in DerivedNormalizationProps.txt are not used in the code (those from UnicodeData.txt is used). Hangul syllables are computed instead of embedded in the tables. * managed-collation.patch : removed IntPtrStream and Makefile patches. 2005-07-29 Atsushi Enomoto * MSCompatUnicodeTable.cs : IsSortable() was broken. 2005-07-29 Atsushi Enomoto * MSCompatUnicodeTable.cs : added helper for CompareInfo.IsSortable(). 2005-07-28 Atsushi Enomoto * create-tailoring.cfg : added for convenience of contraction check. 2005-07-28 Atsushi Enomoto * create-normalization-source.cs, SimpleCollator.cs, SortKeyBuffer.cs, create-mscompat-collation-table.cs, MSCompatUnicodeTableUtil.cs, SortKey.cs, create-collation-element-table.cs, MSCompatUnicodeTable.cs, CodePointIndexer.cs, create-combining-class-source.cs : added copyright lines. 2005-07-28 Atsushi Enomoto MSCompatUnicodeTable.cs : removed extraneous definition. 2005-07-28 Atsushi Enomoto * create-mscompat-collation-table.cs MSCompatUnicodeTable.cs : full C header support, finally. 2005-07-28 Atsushi Enomoto * Normalization.cs, NormalizationTableUtil.cs, create-char-mapping-source.cs : more aggressive data compression. It now ignores characters that are >= U+10000. 2005-07-28 Atsushi Enomoto * Makefile, Normalization.template, Normalization.cs : renamed existing file. 2005-07-28 Atsushi Enomoto * NormalizationTableUtil.cs, Normalization.template, create-combining-class-source.cs : GetCombiningClass is now implemented as indexer based array. * Makefile : renamed output filename. * create-mscompat-collation-table.cs : removed comments that does not make sense now. * create-tailoring.cs : use utf-8 output (and fixed filename). 2005-07-27 Atsushi Enomoto * create-mscompat-collation-table.cs : hacked safer IPA extensions. * Collation-notes.txt : status of sortkey table. 2005-07-27 Atsushi Enomoto * create-mscompat-collation-table.cs : some Greek mapping fix. 2005-07-27 Atsushi Enomoto * create-mscompat-collation-table.cs : diacritical weight is not treated correctly when they are picked from letter names, as flags. 2005-07-27 Atsushi Enomoto * create-mscompat-collation-table.cs : fixed culture-dependent nonspacing mark weight. 2005-07-27 Atsushi Enomoto * create-mscompat-collation-table.cs : some Hebrew case letter fixes. Some diacritical fixes on symbols. 2005-07-27 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed level 3 weight of Arabic presentation forms. 2005-07-27 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed some diacritical weight of Arabic presentation forms. 2005-07-27 Atsushi Enomoto * SimpleCollator.cs : more status updates. It's almost complete, except for sortkey values. 2005-07-27 Atsushi Enomoto * SimpleCollator.cs : similar optimization also for LastIndexOf(). 2005-07-27 Atsushi Enomoto * SimpleCollator.cs : the previous patch was missing IgnoreNonSpace case. 2005-07-27 Atsushi Enomoto * SimpleCollator.cs : reduced extra sortkey value computation in MatchesForward(). It makes IndexOf() roughly 30% faster. 2005-07-26 Atsushi Enomoto * SortKey.cs : GetHashCode() returns a value based on its byte data. Removed unused code. 2005-07-26 Atsushi Enomoto * SimpleCollator.cs : consider extractions in invariant culture. 2005-07-26 Atsushi Enomoto * SimpleCollator.cs : (unsafeFlags) be compact ;-) 2005-07-26 Atsushi Enomoto * SimpleCollator.cs : When the tail of the target does not match more than 3 times, then IsSuffix() will never be true (3 is the max length of an expansion; \uFB03 -> ffi). It brings significant performance boost when "source" string is very long. * MSCompatUnicodeTable.cs : added MaxExpansionLength constant. Reordered code lines. 2005-07-26 Atsushi Enomoto * Collation-notes.txt : updated implementation status. 2005-07-26 Atsushi Enomoto * SimpleCollator.cs : Implemented quick codepoint comparison in Compare(). Comparison became 125x faster. * mono-tailoring-source.txt : added tiny comment. 2005-07-26 Atsushi Enomoto * mono-tailoring-source.txt : Added all single sortkey remapping to all cultures (still need to fill contractions and annotate possible buggy mapping referencing to CLDR). * SimpleCollator.cs : removed unused code. * MSCompatUnicodeTable.cs : tiny cast removal. 2005-07-25 Atsushi Enomoto * SimpleCollator.cs create-mscompat-collation-table.cs MSCompatUnicodeTableUtil.cs MSCompatUnicodeTable.cs : Now CJK mapping data is stored as byte arrays. Thus SimpleCollator does not need to use bitwise and shift operations to get sortkey value and they could be managed resources. 2005-07-25 Atsushi Enomoto * create-mscompat-collation-table.cs, MSCompatUnicodeTable.cs, MSCompatUnicodeTableUtil.cs : From the result of sortkey comparison between None and IgnoreWidth, width compat table could be computed in somewhat simple way. So removed that table and all related code. Increased the collation resource version. 2005-07-25 Atsushi Enomoto * create-mscompat-collation-table.cs : Added C header output support. 2005-07-25 Atsushi Enomoto * create-mscompat-collation-table.cs : FillLetterNFKD() could also be applied to Cyrillic letters. Saved some of them. 2005-07-24 Atsushi Enomoto * MSCompatUnicodeTable.cs : oh, ok, so we already have GetManifestResourceInternal() ;-) * managed-collation.patch : in Assembly.cs made that method internal. 2005-07-24 Atsushi Enomoto * MSCompatUnicodeTable.cs : the pointer based icall code could be also applicable for USE_MANAGED_RESOURCE mode. 2005-07-23 Atsushi Enomoto * MSCompatUnicodeTable.cs : added icall support code (not enabled unless the first line is commented out). 2005-07-22 Atsushi Enomoto * create-mscompat-collation-table.cs, MSCompatUnicodeTableUtil.cs, MSCompatUnicodeTable.cs : Added resource version output (and ignore in case of version mismatch). Removed obsolete, commented out code. 2005-07-22 Atsushi Enomoto * SimpleCollator.cs, MSCompatUnicodeTable.cs, create-mscompat-collation-table.cs : Now they use unmanaged pointers instead of managed arrays. * managed-collation.patch : Now it contains patch for IntPtrStream.cs and Assembly.cs as well. 2005-07-22 Atsushi Enomoto * MSCompatUnicodeTable.cs, SimpleCollator.cs : Moved tailoring support classes to MSCompatUnicodeTable.cs and drawn out from SimpleCollator. Now that cjk and tailoring support are filled inside MSCompatUnicodeTable, no managed array is exposed. 2005-07-22 Atsushi Enomoto * create-mscompat-collation-table.cs, SimpleCollator.cs, MSCompatUnicodeTable.cs : Now it's not exposing collation table internals as managed arrays (to switch to unmanaged pointers). 2005-07-22 Atsushi Enomoto * create-mscompat-collation-table.cs : tiny nonspacing mark fix. 2005-07-21 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed most of Greek mappings. * MSCompatUnicodeTable.cs : don't lock string. 2005-07-21 Atsushi Enomoto * create-mscompat-collation-table.cs : More Cyrillic diacritical fixes. 2005-07-21 Atsushi Enomoto * create-mscompat-collation-table.cs : More Latin diacritical fixes. 2005-07-21 Atsushi Enomoto * create-mscompat-collation-table.cs : There were still missing math symbol mappings. Added several hacky diacritical weight for Latin characters. 2005-07-21 Atsushi Enomoto * create-mscompat-collation-table.cs : fixed a few diacritical weight on Cyrillic characters. Fixed ParseTailoringSource() to handle non-heading escape sequence (\uXXXX) as expected. 2005-07-21 Atsushi Enomoto * create-mscompat-collation-table.cs, MSCompatUnicodeTableUtil.cs, MSCompatUnicodeTable.cs : added more aggressive index limits for table optimization at data size, in cost of speed. 2005-07-20 Atsushi Enomoto * create-mscompat-collation-table.cs : fixed Arabic thirtial weight. 2005-07-20 Atsushi Enomoto * create-mscompat-collation-table.cs : Mapping for hyphens and punctuation are kinda finished. Rewrote batch mapping method to collect all NFKD. Required modification on mapping is done. 2005-07-20 Atsushi Enomoto * create-mscompat-collation-table.cs : minor mapping fixes on accent marks and punctuations. 2005-07-20 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed some MathSymbol mapping and Box drawing mapping. 2005-07-19 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed almost all numbers. 2005-07-19 Atsushi Enomoto * create-mscompat-collation-table.cs : Symbol mappings are almost done. Removed hack that gave dummy mappings to blank symbols. 2005-07-19 Atsushi Enomoto * create-mscompat-collation-table.cs : more fix on arrows. Fix on box drawings. Some code refactoring to eliminate hack. 2005-07-19 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed some secondary weight in Devanagari and arrows. 2005-07-19 Atsushi Enomoto * create-mscompat-collation-table.cs : a set of tiny mapping fixes. 2005-07-19 Atsushi Enomoto * create-mscompat-collation-table.cs : some diacritical fixes for Latin. Added batch mapping method that considers computed diacritical weight (for numbers). 2005-07-15 Atsushi Enomoto * managed-collation.patch : forgot to add System.String patch. 2005-07-15 Atsushi Enomoto * MSCompatUnicodeTable.cs : added resource existence check (required for mscorlib transient time from the one without resources to the one with resources. 2005-07-15 Atsushi Enomoto * create-mscompat-collation-table.cs : fixed punctuations and hyphen (shift) primary weight. 2005-07-15 Atsushi Enomoto * create-mscompat-collation-table.cs : more nonspacing mark fixes. Some non-basic Cyrillic diacritical weight fixes. 2005-07-15 Atsushi Enomoto * create-mscompat-collation-table.cs : some Gurmukhi fixes on level 1 and level 3. Tiny Hangul weight fixes. * MSCompatUnicodeTable.cs : U+30F5 and U+30F6 are small Japanese. 2005-07-15 Atsushi Enomoto * create-mscompat-collation-table.cs : some normal characters who have "narrow" NFKD mapping are regarded as "wide" and thus level 3 weight values were different. Handle U+30FB as category A. * MSCompatUnicodeTable.cs : U+30FB does not have special weight. 2005-07-15 Atsushi Enomoto * create-mscompat-collation-table.cs : more diacritical weight fixes. Removed some unused code. 2005-07-15 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed some Thai and Arabic level 2 weight. 2005-07-15 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed Syriac nonspacing marks. 2005-07-15 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed nonspacing marks in Malayalam, Thai and Lao. Removed extraneous hack. 2005-07-15 Atsushi Enomoto * SimpleCollator.cs : rewrote LastIndexOf() to handle source extenders. Some refactoring on IndexOf() code. Removed unused Matches(). * Collation-notes.txt : some methods needed to be reimplemented, so rewrote the description. 2005-07-14 Atsushi Enomoto * SimpleCollator.cs : rewrote IsSuffix() to use CompareInternal(). Thus supported extenders in IsSuffix(). 2005-07-14 Atsushi Enomoto * SimpleCollator.cs : more IsSuffix() simplification, but it will be stopped here since it cannot handle extenders (implementing new approach one). 2005-07-14 Atsushi Enomoto * SimpleCollator.cs : simplified IsSuffix() code. 2005-07-14 Atsushi Enomoto * SimpleCollator.cs : Fixed IndexOf() and LasIndexOf() to search the entire replacement string if char target was an expansion. IsSuffix() was using a method for IsPrefix() which was incorrect. Removed old IsPrefix() code. 2005-07-14 Atsushi Enomoto * SimpleCollator.cs : IndexOf() was incorrectly sharing the same byte[] field in different areas of code. Now extenders in both source and target really work in IndexOf(). 2005-07-14 Atsushi Enomoto * create-mscompat-collation-table.cs : fixed U+FF9F diacritical weight. * SimpleCollator.cs : handle U+FF9E and U+FF9F as extenders. 2005-07-14 Atsushi Enomoto * SimpleCollator.cs : Now FilterExtender() handles all extender support. IndexOf() and LastIndexOf() now supports extenders. IndexOf() and LastIndexOf() did not proceed contraction source length as expected. Tiny refactoring on private IsPrefix() to take stringSort argument. 2005-07-13 Atsushi Enomoto * SimpleCollator.cs : when restoring from expansion, go back to the top of the loop (to avoid index out of range). Now IsPrefix() is implemented to reuse Compare() and thus it now supports extender as well. * Collation-notes.txt : status update. Deleted optimization part in status section (it is duplicate). 2005-07-13 Atsushi Enomoto * SimpleCollator.cs : some code reordering. * create-mscompat-collation-table.cs : it was still missing U+3094. 2005-07-13 Atsushi Enomoto * SimpleCollator.cs : Compare() now supports extender (e.g. U+39FC). 2005-07-13 Atsushi Enomoto * SimpleCollator.cs : In GetSortKey(), don't update previousChar when it is not primary (e.g. don't "extend" diacritical mark). 2005-07-13 Atsushi Enomoto * managed-collation.patch : CompareInfo.Compare() should consider the possibilities that non-empty string might be actually empty in culture-sensitive context. 2005-07-13 Atsushi Enomoto * SimpleCollator.cs : IndexOf() and LastIndexOf() returns start when target is "empty" (in culture-sensitive context). 2005-07-13 Atsushi Enomoto * SimpleCollator.cs : In IndexOf() and LastIndexOf(), skip ignorable characters in target string. 2005-07-13 Atsushi Enomoto * SimpleCollator.cs : When IgnoreWidth is specified, all Kana characters are regarded as half-width. Even though IgnoreWidth is specified, it should not ignore case. For special weight comparison, the default values (E4) are bigger than non-default values. * SortKeyBuffer.cs : It should save LCID and original string. * create-mscompat-collation-table.cs : For Japanese half-width kana, it should not be counted in widthCompat map since IgnoreWidth does not really ignore those differences. 2005-07-13 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed missing Japanese bits. 2005-07-13 Atsushi Enomoto * create-mscompat-collation-table.cs : tiny diacritical weight fix for U+20D0-U+20E1. 2005-07-13 Atsushi Enomoto * create-mscompat-collation-table.cs : ja CJK ideograph got completed. 2005-07-13 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed CJK custom Japanese mapping. It (maybe as well as other CJK tables) mixes NFKD. For Japanese, modified NFKD table (because of Windows lame design). 2005-07-13 Atsushi Enomoto * Makefile : added MONO_USE_MANAGED_COLLATION=no almost everywhere. * MSCompatUnicodeTable.cs : FillCJK() was not invoked. Now it is invoked at any time it is required. * SimpleCollator.cs : call FillCJK() above in .ctor(). * MSCompatUnicodeTableUtil.cs : CJK range was wider. * create-mscompat-collation-table.cs : CJK binary was missing the length. CJK remapping is being moved to ModifyUnidata(). For cjk-ja mapping, we have to consider compat characters to be added to the map, besides the raw UCA table. 2005-07-12 Atsushi Enomoto * SortKeyBuffer.cs : Fixed shift level computation to match w/ Windows. 2005-07-12 Atsushi Enomoto * SimpleCollator.cs : fixed LastIndexOf() to handle _target's_ contraction as expected. Fixed Compare() to save s2's contraction as expected. * TestDriver.cs :added LastIndexOf() tester w/ indexes. 2005-07-12 Atsushi Enomoto * managed-collation.patch : Fixed IsPrefix() and IsSuffix(). They incorrectly use Compare(). * TestDriver.cs : more moved to nunit tests. 2005-07-12 Atsushi Enomoto * SimpleCollator.cs : several fixes on Compare(). - Ignorable characters are skippted at the top of the loop. - IgnoreNonSpace is checked to avoid extraneous level 2 comparison. - In such case that s1 index is increased while s2 contraction is replaced, s1 is inconsistently proceeded (bug). - IsIgnorable() now also checks IgnoreNonSpace. - Fixed FilterOptions() that does not work for IgnoreWidth at all. * TestDriver.cs : now some are moved to nunit tests. * Collation-notes.txt : minor todo update. 2005-07-11 Atsushi Enomoto * SimpleCollator.cs : Compare() was ignoring such case that both entire strings have '-' to be compared. * Collation-notes.txt : more status updates. * TestDriver.cs : added '-' use cases. 2005-07-08 Atsushi Enomoto * SimpleCollator.cs : to be same as other buggy part, it now handles U+3005, U+3031 and U+3032 as buggy as Windows. It just repeats previous character. Fixed GetSortKey(): if the repeater is U+3005, second weight is 5. * create-mscompat-collation-table.cs : dummy values for extenders. 2005-07-08 Atsushi Enomoto * SimpleCollator.cs : Special weight fixes on GetSortKey(). Dash type should be computed from ExtenderType, and voice mark weight should be considered. * MSCompatUnicodeTable.cs : added tiny comment. 2005-07-08 Atsushi Enomoto * SortKey.cs : It borked when MONO_USE_MANAGED_COLLATION is not yes. * SimpleCollator.cs : support for extender (U+309D etc.). 2005-07-08 Atsushi Enomoto * create-mscompat-collation-table.cs : some punct/symbols fix. * managed-collation.patch : new (and temporary) file to support managed collation in mscorlib. * README : described how to use managed collation. 2005-07-08 Atsushi Enomoto * create-mscompat-collation-table.cs : Further Cyrillic fixes. Handle U+482-4C8 (though needs diacritical fixes). * MSCompatUnicodeTable.cs : tiny comment for alternative impl. 2005-07-08 Atsushi Enomoto * create-mscompat-collation-table.cs : Reimplemented Cyrillic weight computation code, since it looks like the same way as Latin letters have. Thus removed all other approach (UCA, by letter name). 2005-07-07 Atsushi Enomoto * create-mscompat-collation-table.cs : diacritical fix for "double- struck". Syriac nonspacing fixes. 2005-07-07 Atsushi Enomoto * create-mscompat-collation-table.cs : more math symbol weight fixes. 2005-07-07 Atsushi Enomoto * create-mscompat-collation-table.cs : fixed Hebrew character sortkeys. 2005-07-07 Atsushi Enomoto * create-mscompat-collation-table.cs : math symbols U+25A0-U+2600 are implemented (no stub). Some other fixes on category 8-A. 2005-07-07 Atsushi Enomoto * create-mscompat-collation-table.cs : some minor fixes on Arabic, Korean and Japanese sortkey weights. 2005-07-07 Atsushi Enomoto * create-mscompat-collation-table.cs : More diacritical fixes. Georgian characters do not have level 2 weights but level 3. 2005-07-07 Atsushi Enomoto * create-mscompat-collation-table.cs : Roman numeral characters have diacritical weight. quick hack for control signs (U+2400..) and box drawings. 2005-07-06 Atsushi Enomoto * create-mscompat-collation-table.cs : improving Latin mappings. Setting non-ASCII Latin characters' primary weight between those ASCII characters, and setting diacritical weight (hacky). * MSCompatUnicodeTable.cs : Kanatype check: fixed (voice marks) and improved (comparison order). 2005-07-06 Atsushi Enomoto * create-mscompat-collation-table.cs : more diacritical fixes. primary weight fixes on punctuations in category 07. 2005-07-06 Atsushi Enomoto * create-mscompat-collation-table.cs : several diacritical fixes. * TestDriver.cs : sortkey dumper should use StringSort. 2005-07-05 Atsushi Enomoto * SimpleCollator.cs : fixed incorrect indexer setup. Optimized GetContraction() call a bit. 2005-07-05 Atsushi Enomoto * create-mscompat-collation-table.cs : fixed incorrect level 2 output type. * MSCompatUnicodeTable.cs : remove debug line. 2005-07-05 Atsushi Enomoto * MSCompatUnicodeTableUtil.cs, MSCompatUnicodeTable.cs, CodePointIndexer.cs, create-mscompat-collation-table.cs : made some members internal and accessible from other classes. Many indexes could be 0 by default. * SimpleCollator.cs : optimizations. avoid method call. 2005-07-05 Atsushi Enomoto * Collation-notes.txt : more updates. * SimpleCollator.cs : Added quick check for Ordinal comparison. Fixed special weight comparison. It cannot be customizable in the implementation (and it won't be harmful). * mono-tailoring-source.txt : thus updated comment. 2005-07-05 Atsushi Enomoto * SimpleCollator.cs : Compare() was missing French sort support. * TestDriver.cs : added example case. 2005-07-05 Atsushi Enomoto * Collation-notes.txt : updated status. Eliminated descriptions on "iterator" (I avoided it for performance concern). Fixed misc. incorrect descriptions. 2005-07-05 Atsushi Enomoto * Collator.cs : Now that SimpleCollator became feature complete, it is not useful anymore. 2005-07-05 Atsushi Enomoto * SimpleCollator.cs : implemented decent Compare() that immediately stops at first primary difference. 2005-07-04 Atsushi Enomoto * SimpleCollator.cs : indexers might return -1. 2005-07-04 Atsushi Enomoto * SimpleCollator.cs : IsPrefix() and IsSuffix() optimization code was buggy (length check for source was missing). 2005-07-04 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed tailoring table output to be in correct and countable order. Now if tailoring alias was not found, just stop the build. * MSCompatUnicodeTable.cs : several build fixes. Now it works to read assembly resources. * mono-tailoring-source.txt : commented out CJK aliases that miss target. * Makefile : needed further filename fixes. 2005-07-04 Atsushi Enomoto * MSCompatUnicodeTable.cs : renamed from MSCompatUnicodeTable.template (now it is working as a standalone file). * Makefile : renamed generated file as MSCompatUnicodeTableGenerated.cs (the generator now creates both binary resources and C# source). 2005-07-04 Atsushi Enomoto * create-mscompat-collation-table.cs : Now it generates binary resources (to parent directory). * MSCompatUnicodeTable.template : added conditional code that fills collation tables from manifest resources. * Makefile : remove collation table binaries as well on "make clean". Removed extraneous dependency. 2005-07-01 Atsushi Enomoto * MSCompatUnicodeTable.template, SimpleCollator.cs : removed extraneous GetExpansion(). 2005-07-01 Atsushi Enomoto * SimpleCollator.cs : IsSuffix() also supports contractions. * TestDriver.cs : IsSuffix() example contraction cases. 2005-07-01 Atsushi Enomoto * SimpleCollator.cs : reverted IsSuffix() to return bool (to match w/ what current IsPrefix() does). For expansion of target, IsPrefix() should check the no-match case that expansion is longer than input. Some refactory on IsPrefix(). Added GetContractionTal() for IsSuffix() (not used yet). 2005-07-01 Atsushi Enomoto * TestDriver.cs : added IsPrefix() expansion cases. * SimpleCollator.cs : IsPrefix() now supports contractions (with much of complexity), and it now returns bool again. IndexOf() for replacement should make use of IndexOfPrimitiveChar() since expansions won't be expanded recursively. 2005-07-01 Atsushi Enomoto * SimpleCollator.cs : commonized character comparison in IsPrefix() and IsSuffix(). csc compile fix. * CompareInfoImpl.cs : deleted. 2005-06-30 Atsushi Enomoto * TestDriver.cs : added SimpleCollator.ctor() sanity check. Added replacement contraction example. * SimpleCollator.cs : Now IndexOf() and LastIndexOf() support contraction in source string. Extracted matching code to Matches(). Replacement contraction was including extraneous '\x0'. 2005-06-30 Atsushi Enomoto * Collation-notes.txt : updated status. * CollationDataStructures.txt : tiny fixes. * SimpleCollator.cs : Renamed alias Util to UUtil (MS sys.enterprisesvc has sucky global namespace Util and csc borked). GetContraction was incorrectly returning first item. Private IsPrefix() now returns int (but it might not be in real use). Extracted simple char comparison to CompareCharSimple(). IndexOf() and LastIndexOf() now fully handle contractions (both binary key and string replacement) in "target" (for "s" not yet). * TestDriver.cs : be more verbose. * mono-tailoring-source.txt : added comment. * MSCompatUnicodeTable.template : Renamed alias Util to UUtil (MS sys.enterprisesvc has sucky global 2005-06-30 Atsushi Enomoto * create-mscompat-collation-table.cs : compute COMBINING blah marks as well as those characters WITH blah. * TestDriver.cs : added combining sortkey cases. 2005-06-30 Atsushi Enomoto * mono-tailoring-source.txt : fixed description on '*' in sortkeys. * SimpleCollator.cs : Now it fully uses tailoring info. Fixed contraction search that worked only when string is contraction. Removed commented code. Minor refactoring. * TestDriver.cs : added example that uses "ZS" in Hungarian sorting. 2005-06-29 Atsushi Enomoto * create-mscompat-collation-table.cs, * mono-tailoring-source.txt : removed extraneous level 4 sortkey which cannot be supported. * SimpleCollator.cs : added GetContraction() and used in some places. Now CompareOptions is set only once. Reordered some code (e.g. ignorable check -> get compat char -> compare). 2005-06-29 Atsushi Enomoto * SimpleCollator.cs : sort tailoring tables before actual usage. Support diacritical remappings (it is customized collation rule which does not exist in UCA). 2005-06-29 Atsushi Enomoto * SimpleCollator.cs : build culture specific tailoring table from TailoringInfo and unified data array. * create-mscompat-collation-table.cs : Added null termination to sortkey map tailorings (mostly to save my eyes). * MSCompatUnicodeTable.template : added public TailoringValues. 2005-06-29 Atsushi Enomoto * SortKeyBuffer.cs : handle special weight (category 06) characters. * Collation-notes.txt : Updated description on special weight (it was incorrect). * TestDriver.cs : added special weight cases. 2005-06-29 Atsushi Enomoto * MSCompatUnicodeTable.template : added GetTailoringInfo(). * SimpleCollator.cs : Now tailoring information is acquired and used. (FrenchSort is supported but Compare() won't work expectedly since the table is still incomplete for those diacritical marks). * SortKeyBuffer.cs : On reversing diacritical weights, it should ignore zeros. Reset() should reset frenchSorted flag. 2005-06-28 Atsushi Enomoto * create-mscompat-collation-table.cs : Further fixes on Jamo, diacritical weights by character name, and *Numbers primary weights. 2005-06-28 Atsushi Enomoto * create-mscompat-collation-table.cs : More fix on Devanagari, Gujarati, Oliya, Tamil and Lao sortkeys. 2005-06-28 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed Georgian, Thai, Gurmukhi sortkey values. 2005-06-28 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed Thai character primary and secondary values. Fixed Thaana letters. Added more LAMESPEC CJK compat. Fixed some circled CJK secondary weight. Hacked some nonspacing mark sortkey value adjustment. 2005-06-28 Atsushi Enomoto * create-mscompat-collation-table.cs : CP932.TXT was not parsed as expected. JIS ordering was incorrect. OtherNumbers that represents 10 or more values were incorrectly computed the offset. Some Hangul compat characters has different offset. 2005-06-28 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed 0x8 category characters. Added hack for need-to-be-fixed characters to fall into 0xA category. * create-collation-element-table.cs : previous checkin seem failed :( * README: updated a bit. 2005-06-24 Atsushi Enomoto * CodePointIndexer.cs : removed extraneous switch (I could use empty array for that need). * CollationElementTableUtil.cs : primary weight type became ushort. * create-collation-element-table.cs : several bugfixes. collElem should be int. It was skipping most of entries because of incorrect string tokenization. 2005-06-23 Atsushi Enomoto * create-mscompat-collation-table.cs : handle some Jamo NKFD. 2005-06-23 Atsushi Enomoto * SimpleCollator.cs : forgot to commit in the last checkin. * create-mscompat-collation-table.cs : fixed arabic shift weight chars. * TestDriver.cs : switch table dumper and collator testing. * SortKey.cs : for now comment out internal indexes (not in use). 2005-06-23 Atsushi Enomoto * MSCompatUnicodeTable.template, SimpleCollator.cs : support for culture dependent CJK table. 2005-06-23 Atsushi Enomoto * create-mscompat-collation-table.cs, MSCompatUnicodeTableUtil.cs : make CJK table more compact. 2005-06-22 Atsushi Enomoto * SimpleCollator.cs : Fixed stupid index search when start != 0. 2005-06-21 Atsushi Enomoto * SimpleCollator.cs : fixed my misunderstanding on LastIndexOf(). It now starts from "start" and proceeds backward by "length". * TestDriver.cs : fix warning. 2005-06-21 Atsushi Enomoto * TestDriver.cs : more tests. * SimpleCollator.cs : LastIndexOf() is not setting search length on iteration. Quick workaround fro String.LastIndexOf() bug (maybe). 2005-06-21 Atsushi Enomoto * create-normalization-source.cs : output propValue as uint. 2005-06-21 Atsushi Enomoto * SortKey.cs : Now it is System.Globalization.SortKey. To replace existing implementation, it now requires lcid and CompareOptions. Added required members. * SortKeyBuffer.cs : thus .ctor() requires LCID. * SimpleCollator.cs : made required changes above. 2005-06-21 Atsushi Enomoto * CodePointIndexer.cs : added CompressArray(). Now it requires two more parameters for default index and codepoint. * CollationElementTableUtil.cs, NormalizationTableUtil.cs : required changes wrt above change. * MSCompatUnicodeTableUtil.cs : added for several codepoint indexers. * MSCompatUnicodeTable.template : Now it uses codepoint indexer. * create-mscompat-collation-table.cs : Now it outputs compressed array. * Makefile : now collation requires MSCompatUnicodeTableUtil.cs 2005-06-21 Atsushi Enomoto * SimpleCollator.cs : Implemented IsSuffix() and LastIndexOf(). Several fixes on index > 0 cases. * TestDriver.cs : sample IsSuffix() and LastIndexOf() usage and more. 2005-06-21 Atsushi Enomoto * Collation-notes.txt : updated (status, impl. classes). * MSCompatUnicodeTable.cs : Korean Jamo are not really expansions. 2005-06-21 Atsushi Enomoto * SimpleCollator.cs : implemented IndexOf(string,string,CompareOptions) and IsPrefix(). Tiny code refactory. * TestDriver.cs : sample IsPrefix() and IndexOf() usage. * MSCompatUnicodeTable.cs : tiny refactory for CodePointIndexer use. 2005-06-20 Atsushi Enomoto * SimpleCollator.cs : IndexOf(string, char, CompareOptions) implementation. * TestDriver.cs : sample IndexOf() usage. 2005-06-20 Atsushi Enomoto * create-mscompat-collation-table.cs : was missing most important kind of blocks - equivalent expansions (e.g. invariant mappings). More readable mappings. 2005-06-20 Atsushi Enomoto * mono-tailoring-source.txt : new file. It describes tailoring information. Basically examined under .NET 1.x. * create-mscompat-collation-table.cs : consume the file above. * MSCompatUnicodeTable.template : now tailorings is not a stub. * CollationDataStructures.txt : minor fixes. * SortKeyBuffer.cs, SimpleCollator.cs : added FrenchSort support. * Collation-notes.txt : added description on Latin primary weights. * ldml-limited.rng : added note. * create-tailorings.cs : added note. more serialization (but won't be used anyways). 2005-06-17 Atsushi Enomoto * SortKeyBuffer.cs : non-primary character is added to previous diacritical weight. * TestDriver.cs : added example case of above. 2005-06-17 Atsushi Enomoto * SimpleCollator.cs : IgnoreSymbols support. * TestDriver.cs : compilation fix. IgnoreSymbols example. * create-mscompat-collation-table.cs : more Hangul fixes. 2005-06-17 Atsushi Enomoto * create-mscompat-collation-table.cs : more Hangul fixes. * SortKey.cs : it will replace sys.globalization.SortKey. It has some internal members. * SortKeyBuffer.cs : now it uses SortKey instead of byte[]. * SimpleCollator.cs : CompareOptions support. However I don't think it will be developed anymore since SortKey never enables IndexOf(). * TestDriver.cs : a few CompareOptions cases. 2005-06-16 Atsushi Enomoto * SimpleCollator.cs : simple collator implementation that just will use GetSortKey() for all its basis. * TestDriver.cs : sample code that uses this collator set. * MSCompatUnicodeTable.template : removed test driver from here. 2005-06-16 Atsushi Enomoto * create-mscompat-collation-table.cs : Hangul fixes. Now less than 300 characters that does not have sortkey weights. * MSCompatUnicodeTable.template : added FIXME info for Hangul Jamo. 2005-06-16 Atsushi Enomoto * create-mscompat-collation-table.cs : Added control picture mappings. Minor primary weight fixes. 2005-06-16 Atsushi Enomoto * create-mscompat-collation-table.cs : Added mappings for box drawings and blocks. 2005-06-16 Atsushi Enomoto * create-mscompat-collation-table.cs : Added mappings for arrows. 2005-06-15 Atsushi Enomoto * create-mscompat-collation-table.cs : added support for letterlike characters and squared CJK compatibility characters, ordered by character names (0x0E category). * Collation-notes.txt : added description on that. 2005-06-15 Atsushi Enomoto * MSCompatUnicodeTable.template : Now expansions are simulated. * create-mscompat-collation-table.cs : filled Korean number level2. Reordered some code blocks to fill correct diacritical differences. * Collation-notes.txt : some corrections and minor additions. 2005-06-15 Atsushi Enomoto * MSCompatUnicodeTable.template : Now dumper test driver uses SortKeyBuffer for dogfooding. * create-mscompat-collation-table.cs : some diacritical level fixes (with non-working extra latin check). * SortKeyBuffer.cs : several fixes to get working as a practical code. * Collator.cs : make it compilable, leaving things as NotImplemented. 2005-06-15 Atsushi Enomoto * create-mscompat-collation-table.cs : some fixes on primary category 07 (miscellaneous symbols and punctuations). 2005-06-14 Atsushi Enomoto * create-mscompat-collation-table.cs : more mapping fix on numbers, letters, variable weight characters, circled Japanese and CJK. * MSCompatUnicodeTable.template : fixed HasSpecialWeight() to be more inclusive. Simplified dumper code. 2005-06-14 Atsushi Enomoto * create-mscompat-collation-table.cs : finished Hangul (both Jamo and Syllables). sortkey dumper diff lines became 8000 from 30000. 2005-06-14 Atsushi Enomoto * create-mscompat-collation-table.cs : added some nonspacing marks in either correct or hacky way. 2005-06-13 Atsushi Enomoto * create-mscompat-collation-table.cs : several improvements. Japanese Kana support, Hebrew accents, Bengali nonspacing marks, sorting of numeric characters, diacritically decorated latin alphabets. Fixed some diacritical weights detection. * MSCompatUnicodeTable.cs : tiny Japanese fix. Handle nonspacing marks' primary weight as empty. * Collation-notes.txt : some updates. 2005-06-13 Atsushi Enomoto * create-mscompat-collation-table.cs : don't process nonexact NFKD mapping as equivalent, however store CJK extensions into NFKD map even if one does not strictly match. Now am going to fill Hangul into tables (unlike UCA it does not look possible to calculate sortkey value). Fixed Cyrillic and Georgian UCA based orderings. * MSCompatUnicodeTable.template : added CJK extension sortkey calculation. 2005-06-10 Atsushi Enomoto * create-mscompat-collation-table.cs : Fixed latin alphabet support. Added latin with diacritical and CJK extension. * MSCompatUnicodeTable.cs : modified dumper code a bit (for my purpose). 2005-06-10 Atsushi Enomoto * create-mscompat-collation-table.cs : now parses DerivedAge.txt (right now not used thouth). Filled CJK ideograph, still not perfect. Fixed number primary keys. NFKD numbers and CJK ideographs are now considered, including brackets elimination. * Makefile : now it downloads DerivedAge.txt. * MSCompatUnicodeTable.template : added dummy code dumper. It computes PrivateUse, Surrogate and Hangul Syllables. * Collation-notes.txt : Noted that Hangul Syllables need more love. 2005-06-09 Atsushi Enomoto * create-tailorings.cs : added configuration support. sort them. I wonder if it is really usable. Having own format might be better. * create-mscompat-collation-table.cs : fixing some sortkey numbers, making closer to windows. Now it handles NFKD in some places. * MSCompatUnicodeTable.template : Added dummy sortkey dumper driver. * CollationDataStructures.txt : added description on tailoring fields, though they are subject to change. 2005-06-07 Atsushi Enomoto * create-tailorings.cs, ldml-limited.rng : new file. * LdmlReader.cs : removed old file. 2005-06-07 Atsushi Enomoto * SortKeyBuffer.cs : split from Collator.cs. Now it considers practical use, reflecting updated sortkey constant design. Especially level 4 weight is split to 4 arrays that are merged in the last stage of GetSortKey(). * Collator.cs : thus SortKeyBuffer is removed from here. Additionally, removed some extraneous bits in other classes. * Collation-notes.txt : Some editorial fixes. Added information on Korean matter (how to compute Hangle Syllables / Hangul Jamo cannot be stored in simple byte arrays). * CodePointIndexer.cs, create-collation-element-table.cs, CollationElementTable.template, NormalizationTableUtil.cs : short CodePointIndexer method names. * create-mscompat-collation-table.cs : Additional info on why some meaningful characters are ignored in Windows (Unicode version difference). Removed U+070F from special check (was extraneous). 2005-06-06 Atsushi Enomoto * MSCompatUnicodeTable.template: Moved body implementation to table creator and put those bool results into an array. * create-mscompat-collation-table.cs : So imported those methods. Modified array output to emit "0x" only for more than 9. * create-normalization-source.cs : ditto on "0x" output matter. * CollationDataStructures.txt : so now it holds ignorableFlags. 2005-06-03 Atsushi Enomoto * Collation-notes.txt, CollationDataStructures.txt : separate document for data structure design. 2005-06-03 Atsushi Enomoto * create-mscompat-collation-table.cs : added culture-dependent CJK table creation. It uses CLDR as its basis. (Culture independent CJK is not ready BTW). * Makefile : added CLDR archive downloading support. * MSCompatUnicodeTable.template : tiny renamings. * Collation-notes.txt : additional CJK info. 2005-06-02 Atsushi Enomoto * Collation-notes.txt, create-mscompat-collation-table.cs : added secondary weight support for BlahNumber characters. 2005-06-01 Atsushi Enomoto * downloaded : added directory. All downloaded files are stored here. * Makefile : use "downloaded" directory. Added more auto-download stuff. * create-mscompat-collation-table.cs : Added Japanese square kana support. 2005-06-01 Atsushi Enomoto * Collation-notes.txt : added Estrangela (ancient Syriac) and Thaana. * create-mscompat-collation-table.cs : added support for Arabic abjad, Estrangela and Thaana. * MSCompatUnicodeTable.template : removed BOM. 2005-05-31 Atsushi Enomoto * Collation-notes.txt : wrong comment cleanup and spelling fixes. * create-mscompat-collation-table.cs : added diacritic support for Latin letters (as long as covered in primary weight). 2005-05-31 Atsushi Enomoto * Makefile : minor fixes. Added warning lines to generated sources. 2005-05-31 Atsushi Enomoto * create-char-mapping-source.cs : Removed ToWidthInsensitive() generation. 2005-05-31 Atsushi Enomoto * create-mscompat-collation-table.cs : Now it dumps level1 to 3 values. ToWidthInsensitive() is implemented here, using an array (which is to be optimized using CodePointIndexer). * MSCompatUnicodeTable.cs : renamed as MSCompatUnicodeTable.template * MSCompatUnicodeTable.template : now it is used to generate MSCompatUnicodeTable.cs which got ready to be used. * Makefile : added MSCompatUnicodeTable.cs build support. Now it supports "make normalization" and "make collation". 2005-05-30 Atsushi Enomoto * Collation-notes.txt : Description on ICU is very incorrect. Now it became more rational and sane. * create-mscompat-collation-table.cs : fixed some indexes. * Makefile : added "mstablegen" target. * MSCompatUnicodeTable.cs : removed GetPrimaryWeight(). Minor fix. 2005-05-26 Atsushi Enomoto * Collation-notes.txt : more analysis on "letters". * create-mscompat-collation-table.cs : more proof of concepts. 2005-05-25 Atsushi Enomoto * Collation-notes.txt : more info. Started letter sortkey analysis (some of other stuff are really non-understandable right now.) * create-mscompat-collation-table.cs : table generator proof-of- concept source (not compilable). * MSCompatUnicodeTable.cs : moved some code to the new source. Some more fixes. 2005-05-20 Atsushi Enomoto * Collation-notes.txt : started level 2 weight analysis. 2005-05-19 Atsushi Enomoto * Collation-notes.txt : Additional information on how to create level 3 tables. * MSCompatUnicodeTable.cs : implemented part of GetLevel3Weight(). 2005-05-19 Atsushi Enomoto * Collation-notes.txt : More case weight (level 3) analysis. I'm likely to just write table generator. 2005-05-18 Atsushi Enomoto * MSCompatUnicodeTable.cs : part of level 4 weight implementation. 2005-05-18 Atsushi Enomoto * Collation-notes.txt : Added task list. Revised comparison methods; backward iteration is possible. More on char-by-char comparison. Level 4 comparison is actually a bit more complex. Misc corrections. * Collator.cs : some conceptual updates wrt above. 2005-05-17 Atsushi Enomoto * Collation-notes.txt : Japanese voice mark is level 2, and Hangul properties are level 3. 2005-05-17 Atsushi Enomoto * Collation-notes.txt : Make it more readable. More analysis on level 3 and 4 sortkey structures. * Collator.cs : some compilation fixes (not compilable yet). 2005-05-16 Atsushi Enomoto * Collation-notes.txt : Analysis on variable-weighting (level 5) sortkey format. * Collator.cs : updated corresponding part of level 5, and more. 2005-05-13 Atsushi Enomoto * Collation-notes.txt : more updates. * Collator.cs : rewrote from scratch. Some rough sketch for sortkey buffer, character iterator and collator methods. Not compiling. 2005-05-13 Atsushi Enomoto * Collator.cs : Am going to replace it with new one. No need for CompareOptions-dependent Comparer. 2005-05-13 Atsushi Enomoto * Collation-notes.txt : There seems a bit more complexity. 2005-05-10 Atsushi Enomoto * Collation-notes.txt : more updates, being close to write sortkey generator code. 2005-05-09 Atsushi Enomoto * CompareInfoImpl.cs, Collator.cs : conceptual update * Collation-notes.txt : some corrections and additions. * Makefile : added LDML input (but it won't be used at all). 2005-04-28 Atsushi Enomoto * Collation-notes.txt : more updates. 2005-04-26 Atsushi Enomoto * Collation-notes.txt : more updates. 2005-04-26 Atsushi Enomoto * Collation-notes.txt : some updates. * create-mapping-char-source.cs : superscripts and subscripts are also ignored in IgnoreWidth comparison. * Makefile : tiny touch fix. 2005-04-25 Atsushi Enomoto * CompareInfoImpl.cs, Collator.cs : conceptual stuff (not working). 2005-04-25 Atsushi Enomoto * create-char-mapping-source.cs : Now it generates ToWidthInsensitive() from combining category and . * MSCompatUnicodeTable.cs : added ToKanaTypeInsensitive() and ToWidthInsensitive() for IgnoreKanaType and IgnoreWidth. 2005-04-25 Atsushi Enomoto * README, LdmlReader.cs, DataStructures.txt : new files. 2005-04-25 Atsushi Enomoto * CodePointIndexer.cs, Collation-notes.txt, CollationElementTable.template, CollationElementTableUtil.cs, create-char-mapping-source.cs, create-collation-element-table.cs, create-combining-class-source.cs, create-normalization-source.cs, Makefile, MSCompatUnicodeTable.cs, Normalization.template, NormalizationTableUtil.cs : initial checkin (to private branch).