3 * XML Schema Inference Rules
9 - that does not expose EntityReference.
10 - that does not contain xsd:* elements.
13 XmlSchemaSet: only that was generated by this utility class. See
14 particle inference section described later.
16 Actually MS implementation has insufficient check for this input,
17 so it accepts more than it expects.
19 *** Allowed schema components
21 Before infering merged particles with premised particles in
22 XmlSchemaSet, we have to know what is expected and what is not:
25 - facets are not supported. [a014.xsd]
26 - xs:all is not supported. [a003.xsd]
27 - xs:group (ref) is not supported. [a004.xsd]
28 - xs:choice that does not contain xs:sequence is not
30 - xs:any is not supported. Only xs:element are expected
31 to be contained in xs:sequence. [a011.xsd]
32 - same name particles that are still not ambiguous
33 are computed into invalid particles. It looks
34 like MS's unexpected bug. [a010.xsd]
35 - attributeGroup looks not supposed to be there (MS has a
36 bug around here). [a006.xsd]
37 - anyAttribute is not regarded as a valid particle, and
38 the output complexType definition just rips them out.
40 - but substitutionGroup is not rejected and it will remain
41 in the output. [a001.xsd]
42 -> It must be rejected. It breaks choice compatibility.
50 First, parameter XmlSchemaSet is compiled[*1] and interpreted into
51 its internal schema representation that is going to be used for
52 XmlReader input examination. The resulting XmlSchemaSet is the same
53 as the input XmlSchemaSet.
55 [*1] FIXME: this design might change.
56 The XmlSchemaSet is compiled and , because 1) it might contain
57 XmlSchemaInclude items. So it won't be possible to process inference
58 inside the input schema set. However, reusing the input reduces
59 some annoyance; to preserve elementFormDefault etc.
61 Second, XmlReader is moved to content (document element) and
62 "element inference" starts from here (described later).
64 Resulting XmlSchemaSet keeps original XmlSchemas into itslef.
65 For example, it keeps elementFormDefault and attributeFormDefault.
67 Basically it will process the XmlReader with existing XmlSchemaSet
68 and won't "merge" two XmlSchemaSets one of which is newly infered
69 from this XmlReader. Because anyways the XmlReader will have to
70 infer sequential nodes (siblings).
72 Once the element definition is determined (or created), any other
73 branches in the schema are ignored.
79 *** attribute component definitions and references.
81 **** ignored attributes
83 xsi:type, xsi:schemaLocation and xsi:noNamespaceSchemaLocation
84 attributes are ignored.
86 **** special attributes
88 If xsi:nil does exist, then its content are not handled, while its
89 attributes are handled.
91 xml:* schema are predetermined; it has a fixed schema for that ns.
93 **** namespaced attributes
95 miscellaneous attributes that resides in a certain namespace is
96 referenced as <attribute ref="qualified-name" />
100 miscellaneous attributes are represented as <attribute name="blah" />
103 *** attribute occurence
105 when defining a complexType for a newly-created element, the attribute
106 can be set as "required". Otherwise, it must be set as "optional".
108 For every element instance occurence, all attributes are tested
109 existence, and if it does not, then it must be set as "optional".
111 *** attribute value types
113 FIXME: need to describe the relaxation of attribute value types.
116 ** Content model inference
118 *** inference processing model
120 Content model consists of two parts;
122 - content type : empty | elementOnly | textOnly | mixed
123 - particle : sequence | choice | all | groupRef
125 On processing reader.Read(), the node is first "tested" against
126 current schema content model. If the current node on the XmlReader
127 is not acceptable, then "content model expansion" happens.
130 - If the current node is text content, then process the
131 text node according to "evaluating text content".
132 - If the current node is an element, then process it
133 in accordance with "evaluating particle".
137 *** evaluating element
139 When an element occured, then it must be accepted as a particle.
140 First, content type must be examined:
143 - If the content type was simpleType, then it is changed
144 into complexType with complexContent and mixed='true'.
145 The infered content particle must be optional.
146 - If the content type was empty, then it is changed into
147 complexType with complexContent (it is not mixed unlike
148 above). The infered content particle must be optional.
149 - If the content type was elementOnly or mixed, no need
153 Next, the content particle must be evaluated.
155 According to the input XmlSchemaSet limitations, there will be
156 only these patterns listed here:
162 - sequence (of element particles)
164 - choice of sequences
168 Every element is tested against current element candidates.
171 - When the target element is a document element, then all
172 the global elements in XmlSchemaSet are the candidates.
175 - If there is a maching name, then that element
176 definition is used as the context element for
177 the node's content, and current particle is
178 in front of the first particle.
179 - If there isn't, then the inference engine creates
180 a new element definition, and content is none
184 - When the target element is infered in a new element
189 **** Particle inference
191 IMPORTANT: Here I tried to formalize the inference, but it is
194 Target {particle} to add:
195 isNew -> <xs:element name={name}> ... </xs:element>
196 !isNew -> <xs:element name={name minOccurs="0"> ... </xs:element>
199 // define complexType and add {particle} to .Particle
201 processcontent(ct.Particle, isNew)
207 empty definition (no content model, no particle)
208 // -> add xs:element name={name} minOccurs="0" to .Particle
209 -> processcontent(ct.Particle, isNew)
212 -> makeComplexContent()
214 complex content / extension
215 -> processContent(cce.Particle, isNew)
217 complex content / restriction
218 -> processContent(ccr.Particle, isNew)
221 -> processContent(ct.Particle, isNew)
224 change to complexType which has complex content mixed="true" and
225 extension. Discard simple type information. Add {particle} to
226 extension's .Particle.
228 processContent(Particle particle, isNew)
229 if particle is either empty or sequence
230 processSequential(particle, 0, false, isNew)
231 else if particle is sequence of choices
232 processLax(particle, 0)
236 processSequential(Sequence particle, int index, bool consumed, bool isNew)
237 particle.Count <= index
238 -> appendSequential(particle, isNew)
240 if (particle[index] has the same name)
241 -> if (consumed) then sequence[index].maxOccurs = inf.
242 InferElement (sequence[index])
243 processParticles(particle, index, true)
246 sequence[index].minOccurs = 0.
247 processParticle(particle, index+1, false)
249 particle = toSequenceOfChoice(particle)
250 processLax(particle, index)
252 processLax(choice, index)
253 foreach (element el in choice.Items)
254 if (el has the same name)
256 processLax(choice, index + 1)
260 appendSequential(particle)
261 if (particle is empty)
262 make particle as sequence
263 sequence.Items.Add(InferElement(null))
266 choice.Items.Add(InferElement(null))
269 *** evaluating text content
271 When text content occured, it must be accepted as simple content.
274 - If the content type was textOnly, then "type relaxation"
275 happens (described later).
276 - If the content type was already mixed, then it is skipped.
277 - If the content type was elementOnly, then the content type
278 becomes mixed and then skipped.
279 - If the content type was empty, then its content type
280 becomes text and then skipped. The type is xs:string (no
281 type promotion will happen since empty value cannot be
282 accepted as any other types handles in this design).
285 (Actually inference is done from non post compilation information.)
287 Note that type relaxation happens only when it is infered as textOnly
288 and it always occurs.
295 All data types are infered from string value; either element content
299 *** primitive type inference
301 When a string is being evaluated as xs:blahblah typed value, it is
302 tried against several types.
305 - First, it is evaluated as xs:boolean; true, false<del>, 1 or 0</del>.
307 - Next, its integer value is computed. 1) If it is
308 successful, then its value range is examined if it
309 matches with unsignedByte, byte, unsignedShort, short,
310 unsignedInt, int, unsignedLong, long, and integer.
312 - If it was not an integer, then it is evaluated as a float
313 number, as a double number, and then as a decimal number
316 - Next, it is examined as xs:dateTime, xs:duration and
317 related schema types.
319 - If if did not match any kind of predefined types, then
320 xs:string is infered. No other string-based types (such
321 as xs:token) are infered.
327 When a string value is being accepted with existing type, the type
328 might have to change to accept it.
332 - xs:int cannot accept "abc"
333 - <del>string with maxLength="3" cannot accept "abcd"</del>
334 facets are not created anyways and thus not supported
335 by this inference engine.
336 - 12345 is not acceptable for xs:unsignedByte, but acceptable
340 Here, the new string value is infered into a simpleType, and then
341 the processor will compute the most specific common type between
342 the existing type and the newly infered type.