”XML is text, but isn’t meant to be read”
Number three in WorldWideWeb Consortium’s (W3C) ”XML in 10 points” [W3C, 2001] pinpoints one main feature and problem of XML languages. They only describe the contained information based on structural aspects, and separate it from the design. In order to present this information, for example as a web page or in print, they have to be transformed into an appropriate form. XSL is an concept for expressing stylesheets that can transform XML documents. This paper provides an overview over the main concepts of transforming XML with XSL, describes another representative of generic markup, LATEX, and shows, how these concepts can be brought together into practice.
Contents
1 ”XML is text, but isn’t meant to be read”
2 XSL
2.1 XSLT
2.2 XPath
2.3 XSL-FO
2.4 CSS - An Alternative To XSL?
3 LATEX
4 Joining All Together
5 Conclusion
A Appendix
1 ”XML is text, but isn’t meant to be read”
Number three in World Wide Web Consortium’s (W3C) ”XML in 10 points” [W3C, 2001] pinpoints one main feature and problem of XML languages. They only describe the contained information based on structural aspects, and separate it from the design. In order to present this information, for example as a web page or in print, they have to be transformed into an appropriate form.
XSL is an concept for expressing stylesheets that can transform XML documents. This term paper provides an overview over the main concepts of transforming XML with XSL, describes another representative of generic markup, LATEX and shows, how these concepts can be brought together into practice.
2 XSL
The Extensible Stylesheet Language (XSL) is a family of languages for defin- ing XML document transformations specified by W3C. It consists of three parts:
- XSL Transformations (XSLT): a language for transforming XML docu- ments
- XML Path Language (XPath): an expression language used to access or refer nodes of XML documents
- XSL Formatting Objects (XSL-FO): a language for defining formatting semantics
2.1 XSLT
The Extensible Stylesheet Language Transformations (XSLT) is a language for transforming XML documents into another form like HTML, plain text, LATEX or a different XML language. Every transformation process includes three documents:
- the source document, which has to be an XML document and contains the information
- the stylesheet, containing the instructions, describing how the source document has to be transformed, and
- the resulting document, that is generated in the transformation process.
illustration not visible in this excerpt
Figure 1: Interrelation in XSLT process
The interrelation between these three documents is shown in figure 1. An XSLT processor uses the rules of the stylesheet to transform the source document into the resulting document.
Stylesheets contain various commands sorted in three groups according to the stylesheet-hierarchie(cp. [Bongers, 2004] p. 32):
- Root elements act as the root of the stylesheet. There are only two possibilities for root elements: xsl:stylesheet and xsl:transform which are used synonymical, and there is only one of them per stylesheet.
- Top-level elements are directly subordinated to the root element. XSLT
2.0 contains 16 top-level elements like templates, declaration of functions, parameters, and formatting rules.
- Instructions are inferior non-global commands like template calls, con- straints, loops, etc. XSLT 2.0 knows about 31 elements of this group.
The following example is to illustrate this approach.
Firstly we assume a very simple source document (see figure 2). As one can see, it’s adequate that the source is well-formed, it does not have to follow a specific DTD or XML scheme.
illustration not visible in this excerpt
Figure 3: example.xsl
The second needed document is the stylesheet (see figure 3).
It consists of two main tags: XSL commands, marked by the XML namespace pattern xsl:, and literal result elements, the tags not containing xsl: in the example. While XSL commands define actions to be executed by the processor the literal result elements will be passed on unchanged in the result document. The following passage is to give the reader a closer look at the important parts of the example.
<xsl:stylesheet (...)> represents the root element of the stylesheet. Enclosed attributes are the XSLT version (here: 2.0) and the XML namespace declaration for XSL.
<xsl:template match="/"> is the most important top-level element. Normally there is more than one template per stylesheet, but it isn’t nessecary in this case. The attribute match="/" is an XPath pattern, that declares when it should be applied (see section 2.2). Here it matches ”/”, the document root of the source (which is the top level in every XML document and encloses the whole document). So the processor got its starting point. It instantiates the template in the result document and writes the literal result elements like <html> without changes.
When it reaches the instruction <xsl:value-of select="shout"/>, the processor executes it at exactly this point. xsl:value-of generates text being taken from the source. This text concludes from the interpreted XPath expression of the select attribute. It tells the processor to take the value of the element called <shout> and to put it into the result. Then it writes out the other literal result elements and finishes the transforming process. You can see the result in figure 4.
illustration not visible in this excerpt
Figure 4: example.html
As one can see, the example above contains a representative of each of the XSLT command groups; the root element xsl:stylesheet, the top-level element xsl:template, and the instruction xsl:value-of. Though this is a very simple example the reader should not underesti- mate the mightiness of XSLT. It is very powerful and provides the possibil- ity for example to generate whole tables of contents, SVG graphics or even LATEX files.
2.2 XPath
XPath is a subset of the XML Query Language (XQuery) and provides the vocabulary for addressing parts of a XML document. XPath uses a tree based, UNIX filesystem-like view on the document. It assumes a syntax sim- ilar to URIs where elements or attribute qualifiers are separated by slashes
(/). So-called XPath location steps consist of two components, axis specifier and node test : axisspecifier::nodetest().
The Axis Specifier indicates the direction and range of moves in the document tree which are called axes. XPath knows 13 axes with various di- rections and ranges, for example child::, which is the specifier for children of an element, or descendant::, which selects all successors of an element up to the leafs. More axis specifiers are shown in table 1 in appendix A.
The node test selects a subset of nodes defined by the axis specifier, filtering nodes that satisfy the criterias of the node test. Node tests can be for example text(), that selects text nodes, comment(), that selects comment nodes, or explicit named nodes like nodename, which selects every node in the axis.
[...]
-
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X.