* INCOMPLETE * XML Schema Inference Rules ** Requirements XmlReader: XmlSchemaSet: only that was generated by this utility class. See particle inference section described later. Actually MS implementation has insufficient check for this input, so it accepts more than it expects. *** Allowed schema components Before infering merged particles with premised particles in XmlSchemaSet, we have to know what is expected and what is not: ** Processing model First, parameter XmlSchemaSet is compiled[*1] and interpreted into its internal schema representation that is going to be used for XmlReader input examination. The resulting XmlSchemaSet is the same as the input XmlSchemaSet. [*1] FIXME: this design might change. The XmlSchemaSet is compiled and , because 1) it might contain XmlSchemaInclude items. So it won't be possible to process inference inside the input schema set. However, reusing the input reduces some annoyance; to preserve elementFormDefault etc. Second, XmlReader is moved to content (document element) and "element inference" starts from here (described later). Resulting XmlSchemaSet keeps original XmlSchemas into itslef. For example, it keeps elementFormDefault and attributeFormDefault. Basically it will process the XmlReader with existing XmlSchemaSet and won't "merge" two XmlSchemaSets one of which is newly infered from this XmlReader. Because anyways the XmlReader will have to infer sequential nodes (siblings). Once the element definition is determined (or created), any other branches in the schema are ignored. ** Attributes *** attribute component definitions and references. **** ignored attributes xsi:type, xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes are ignored. **** special attributes If xsi:nil does exist, then its content are not handled, while its attributes are handled. xml:* schema are predetermined; it has a fixed schema for that ns. **** namespaced attributes miscellaneous attributes that resides in a certain namespace is referenced as **** local attributes miscellaneous attributes are represented as *** attribute occurence when defining a complexType for a newly-created element, the attribute can be set as "required". Otherwise, it must be set as "optional". For every element instance occurence, all attributes are tested existence, and if it does not, then it must be set as "optional". *** attribute value types FIXME: need to describe the relaxation of attribute value types. ** Content model inference *** inference processing model Content model consists of two parts; - content type : empty | elementOnly | textOnly | mixed - particle : sequence | choice | all | groupRef On processing reader.Read(), the node is first "tested" against current schema content model. If the current node on the XmlReader is not acceptable, then "content model expansion" happens. *** evaluating element When an element occured, then it must be accepted as a particle. First, content type must be examined: Next, the content particle must be evaluated. According to the input XmlSchemaSet limitations, there will be only these patterns listed here: - empty content - simple content - sequence (of element particles) - choice of sequences **** Reader progress Every element is tested against current element candidates. **** Particle inference IMPORTANT: Here I tried to formalize the inference, but it is incomplete notes. Target {particle} to add: isNew -> ... !isNew -> ... no definition // define complexType and add {particle} to .Particle toComplexType() processcontent(ct.Particle, isNew) simpleType makeComplexContent() complexType empty definition (no content model, no particle) // -> add xs:element name={name} minOccurs="0" to .Particle -> processcontent(ct.Particle, isNew) simple content -> makeComplexContent() complex content / extension -> processContent(cce.Particle, isNew) complex content / restriction -> processContent(ccr.Particle, isNew) .Particle -> processContent(ct.Particle, isNew) makeComplexContent() change to complexType which has complex content mixed="true" and extension. Discard simple type information. Add {particle} to extension's .Particle. processContent(Particle particle, isNew) if particle is either empty or sequence processSequential(particle, 0, false, isNew) else if particle is sequence of choices processLax(particle, 0) else error. processSequential(Sequence particle, int index, bool consumed, bool isNew) particle.Count <= index -> appendSequential(particle, isNew) sequence if (particle[index] has the same name) -> if (consumed) then sequence[index].maxOccurs = inf. InferElement (sequence[index]) processParticles(particle, index, true) else -> if (!consumed) sequence[index].minOccurs = 0. processParticle(particle, index+1, false) else particle = toSequenceOfChoice(particle) processLax(particle, index) processLax(choice, index) foreach (element el in choice.Items) if (el has the same name) InferElement (el) processLax(choice, index + 1) return; appendLax(particle) appendSequential(particle) if (particle is empty) make particle as sequence sequence.Items.Add(InferElement(null)) appendLax(choice) choice.Items.Add(InferElement(null)) *** evaluating text content When text content occured, it must be accepted as simple content. (Actually inference is done from non post compilation information.) Note that type relaxation happens only when it is infered as textOnly and it always occurs. ** Type inference All data types are infered from string value; either element content or attribute value. *** primitive type inference When a string is being evaluated as xs:blahblah typed value, it is tried against several types. *** type relaxation When a string value is being accepted with existing type, the type might have to change to accept it. For example: Here, the new string value is infered into a simpleType, and then the processor will compute the most specific common type between the existing type and the newly infered type.