Friday, June 26, 2009

5. Aufgabe (a)

Erläutern Sie die Begriffe schema component, ur-type, particle, initial value, normalized value, assessment, strict assessment, lax assessment, valid restriction sowie PSVI im Zusammenhang mit XML Schema.

Schema components are building blocks of which the schema consists, namely: element and attribute declarations, simple and complex type definitions; attribute group definitions, model group definitions, identity-constrain definitions and notation declarations; annotations, wildcards, model groups, particles and attribute users.

Information item is an abstract representation of some part of an XML document (it can be, for instance, element declaration). Information items form together the XML document Information set or Infoset. Validation is a relation between information items and schema components.
Ur-type
is either (1) a complex type definition present in each XML schema that serves as a root in the type definition hierarchy and is referred to as “anyType” in XML schema namespace or (2) a simple type definition, which is a restriction of ur-type definition (is referred to as “anySimpleType”) and has unconstrained lexical space (?), and its value space is composed of the union of all build-in primitive datatypes value spaces. The mapping from lexical space to value space is unspecified for the items of the second type.
Particle
is a term for element content, consisting of either an element declaration, a wildcard or a model group, together with occurrence constraints.


Initial value, normalized value:

The initial value of an element information item is the string composed of the
[character code] of each character information item in the [children] of that element information item; normalized value of an element or attribute information item is an initial value whose white space, if any, has been normalized according to the value of the whiteSpace facet of the simple type definition used in its validation.


The term absent is used as a distinguished property value denoting absence, in other words absent element = there's no element.

Context-determined declarations - associations established during validation between the elements /attributes information items among the [children] and [attributes] on the one hand, and element and attribute declarations on the other. The term [children] here denotes the ordered list of information items consisting of element, processing instruction, unexpanded entity reference, character, and comment information items, one for each element, processing instruction, reference to an unprocessed external entity, data character, and comment appearing immediately within the current element. If the element is empty, this list has no members. The term [attributes] denotes an ordered list of all attribute information items of a specific element (without namespace declarations)

Assessment is the process of determining whether all of the information items comply with the restrictions stated in the corresponding Schema components and changing the XML document infoset in accordance to the Schema (as a result it becomes PSVI).

Strict assessment the element information item is said to be strictly assessed if the following conditions hold: (1.1) there is a known declaration for the non-absent element and (1.2) a known definition for the non-absent types and they both are valid; (2) the schema validity of both the element and attribute information items has been assessed.
Lax assessment
if neither clause 1.1 nor 1.2 are satisfied and its context-determined declaration is not skip by validating with respect to the ur-type definition, the element information item is said to be laxly assessed.
PSVI stands for post-schema-validation Infoset. It is acquired from the original XML document infoset by bringing it in accordance to the constrains imposed on XML document by the corresponding Schema, which can include assigning default values to the attributes, normalizing element and attribute values, etc.

Valid restriction is a restriction of either simple or complex type for which Derivation constrain holds.


Derivation constraint for simple type

For a simple type definition (call it D, for derived) to be validly derived from a type definition (call this B, for base) given a subset of {extension, restriction, list, union} (of which only restriction is actually relevant) the following conditions must hold: they are (1) either the same type definition or (2.1) restriction is not in the subset and either (2.2) D's base type is B, or D's base type is not the ur-type and is validly derived from B given the subset, D's [variety] is list or union and B is the simple ur-type definition, B's variety is union and D is validly derived from a type definition in B's.


The term [variety] stands for either atomic (a built-in primitive simple type definition), list (a simple type definition) or union ( a non-empty sequence of simple type definitions).


Derivation constraint for complex type

the base type should be a complex type without a restriction, it should also be the ur-type definition, it can be either element-only or mixed; the content type of the complex type definition must be empty; the required attrbitutes for the base type must also be required for the complex type definition; the particle of the complex type definition must be a valid restriction of the particle of the base-type.

Sunday, May 10, 2009

b. Finden Sie 2 Dokumente auf dem HPI-Webserver, die zwar well-formed aber nicht valid XML sind. Geben Sie wiederum ein Fragment des Dokuments an sowie das validity constraint, welches von diesem Dokument verletzt wird.

Die XML Spezifikation zufolge: "An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it."

1. http://kolleg.hpi.uni-potsdam.de/index.php?id=3927

2. http://kolleg.hpi.uni-potsdam.de/index.php?id=3723

Line 73, Column 104: ID "js-menu2" already defined
…:160px;VISIBILITY:hidden;"><div id="js-menu2"><

Line 72, Column 103: ID "js-menu2" first defined here

Die beide Dokumente sind zwar well-formed aber nicht valid, die Validity constraint: ID ist verlezt: "A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them."

Sunday, May 3, 2009

1. Aufgabe

a. Finden Sie 3 HTML-Dokumente auf einem HPI-Webserver(*.hpi.uni-potsdam.de), die kein well-formed XML sind. Geben Sie für jedes Dokument ein Fragment an, welches die XML-Syntax verletzt, sowie die XML- Syntaxregel, die verletzt wird.

1. www.hpi.uni-potsdam.de/forschung/fachgebiete.html

<a href="personen/hpi_fellows_friends.html" onfocus="blurLink(this);">HPI-Fellows & Friends</a>

nicht well-formed. Verletzung der Regel [14] - ampersand in CharData.
& reserved exclusively for markup
usage allowed inside comments, processing instructions, or CDATA sections

2. www.dcl.hpi.uni-potsdam.de/teaching/ftsem

Line 70 Col 20
<p class="MsoHeader" style="text-align: center;" align="center"><span >Dipl.
Inf. Andreas Rasche<br> - no end tag for br

Regel [39] element::= EmptyElemTAg | STag content ETag

3. www.dcl.hpi.uni-potsdam.de/papers/

Line 159 Col 33
<a class="c4" href="http://www.dcl.hpi.uni-potsdam.de/papers/papers/Jahresbericht2007.pdf">
Jahresbericht 2007 der Gruppe Betriebssysteme und Middleware am
HPI</a> Potsdam, June 2008.</p> - no start tag for p

Regel [39] element::= EmptyElemTAg | STag content ETag


Validated with http://validator.w3.org/

1. Aufgabe

c. Definieren Sie einen Dokumenttyp, der zur Beschreibung einer Menge von Studenten verwendet werden kann, für die ihr Name, ihre Matrikelnummer sowie die Menge der von ihnen besuchten Lehrveranstaltungen bekannt ist. Bilden Sie weiterhin ein Dokument, dass diesem Typ entspricht.

DTD

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT studenten ((student*))>
<!ELEMENT student ((lehrveranstaltungen))>
<!ATTLIST student
matrikel-nr CDATA #REQUIRED
name CDATA #REQUIRED
>
<!ELEMENT lehrveranstaltungen ((lehrveranstaltung*))>
<!ELEMENT lehrveranstaltung (#PCDATA)>

XML

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE studenten SYSTEM "studenten.dtd">
< studenten>
< student name="Student 1" matrikel-nr="111111">
< lehrveranstaltungen >
< lehrveranstaltung > XML </lehrveranstaltung >
</lehrveranstaltungen >
</student>
< student name="Student 2" matrikel-nr="222222">
< lehrveranstaltungen >
< lehrveranstaltung > XML </lehrveranstaltung >
< lehrveranstaltung > Semantic Web </lehrveranstaltung >
</lehrveranstaltungen >
</student>
</studenten>


Sunday, April 26, 2009

This blog will serve a group of HPI students (Lars Blumberg, Vitaliy Kats and Daryna Bronnykova) to publish the results of the seminar exercises completed in the course "Datenorientiertes XML" at HPI (Sommersemester 09).