Ashley Sheridan​.co.uk

Using XML as a source document for other formats

Posted on

Tags:

At TMW I produced a PHP coding standards document, as there were only such documents available for .Net and the front-end coding technologies.

Initially, this was only available on the internal wiki, but there was a need to then add it to the TMW Tech Blog, which used Markdown format. Not wanting to maintain two sets of documentation, and noticing a possible need for other formats in the future, I set about putting the coding standards into XML and then writing some XSLT translation files to convert this into the required formats.

The source XML is fairly simple, just to meet the initial needs of the document, which was a series of sections containing small code excerpts and paragraphs of text. At the time of writing there is no DTD for this, but one will be added.

The first transformation document I put together was the MediaWiki format. The brunt of the transformations here are done with XSL mode attributes, which allow for multiple templates to be defined for the same section in the source XML, and control over which one is used when generating the rendered output.

<xsl:apply-templates mode="content"/> ... <xsl:template match="h2" mode="content"> <xsl:text> === </xsl:text><xsl:value-of select="text()"/><xsl:text> ===</xsl:text> </xsl:template>

One of the more interesting constructs in this XSL is the template responsible for rendering <code> sections. Within the source there are two types of these: block, and inline. The XSL checks for an attribute on the node called style and checks to see if it is set to block and uses a <pre> tag or an inline <code> tag respectively, as both of these HTML tags are allowed within a MediaWiki page.

<xsl:template match="code" mode="content"> <xsl:choose> <xsl:when test="@style = 'block'"> <xsl:text> </xsl:text> <pre> <xsl:value-of select="text()" disable-output-escaping="yes"/> </pre> <xsl:text> </xsl:text> </xsl:when> <xsl:otherwise> <code><xsl:value-of select="text()" disable-output-escaping="yes"/></code> </xsl:otherwise> </xsl:choose> </xsl:template>

The next two XSL documents I created were for Markdown and the GitHub flavour of Markdown. It's the latter I'll be explaining here, which only has a minor difference of specifying the language of a code snippet.

The first noticeable difference with this transformation is the addition of a table of contents at the top. This is not required by MediaWiki as that is automatically generated by the wiki software, but it was required for the Markdown. Each template for the table of contents is identified by the toc mode which means the template is only applied within the table of contents context in the rendering process.

<xsl:template match="section" mode="toc"> <xsl:text> 1. [</xsl:text> <xsl:value-of select="@title"/>](#<xsl:value-of select="translate(@title, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ ', 'abcdefghijklmnopqrstuvwxyz-')"/><xsl:text>)</xsl:text> <xsl:apply-templates select="h2" mode="toc"/> <xsl:apply-templates select="h3" mode="toc"/> </xsl:template>

Firstly, this matches every <section> in the source XML. The first <xsl:text> node spits out a new line followed by a non-indented 1. [, which is the opening up of a link text. The next line outputs the value of the title attribute followed by a closing ] then spits out the same title attribute within a pair of brackets, to create a Markdown link URL. The translate() function call here converts the value of the attribute to lowercase. There are better functions for this, but there were some limitations in the XSLT parser I was using which meant that I was only able to use XSLT1.0 methods.

Because the table of contents links to anchor points in the document, it's necessary for the headings within the documents to be anchors, which they weren't in the MediaWiki rendering. This is done with templates like the following:

<xsl:template match="h2" mode="content"> <xsl:text> ### </xsl:text> <xsl:variable name="h2"><xsl:value-of select="translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ ', 'abcdefghijklmnopqrstuvwxyz-')"/></xsl:variable> <a name="{$h2}"></a> <xsl:value-of select="text()"/> </xsl:template>

Again, there is the same lowercase transformation of the heading within an <a> (Markdown doesn't allow to link to sections bearing an ID, it still has to use old HTML anchors).

The other part that is a bit odd in the two Markdown XSL files is the templates for the main title and section headings. These use strings of = and - respectively on a new line immediately following the heading text. Because the strings need to be the same length of the heading text, and I was limited to XSLT1.0 functions, I had to use the substring() function on an extremely long string of the appropriate characters, and use string-length() to limit the size. This is a bit of a cumbersome method, and does mean the source string needs to be longer than the longest heading in the XML.

Each of the XSL transforms can be run against the source XML with terminal (I ran this within the BASH terminal, but others will work if you have xsltproc installed and in your environment path) call such as the following:

xsltproc github_md.xsl php\ coding\ standards.xml

This will just output to the standard output stream (typically the terminal window) so to save it to a file, you can pipe it:

xsltproc github_md.xsl php\ coding\ standards.xml > php\ coding\ standards.md