www.voj-tech.net


Markdown in vtCompose

In this article we look at the possibilities of using vtCompose library classes to translate a Markdown formatted text into HTML for the front end of your website. The vtCompose Markdown parser implementation aims to follow the CommonMark spec. It should be noted however that the implementation does not conform to the spec fully and one should refer to the list of Issues filed against the VTCompose\Markdown namespace on the GitLab project page.

The Boilerplate

We need to start by including the vtCompose class loader, importing a few classes and registering the class loader. It is optional to also register the vtCompose error handler but it might be considered a good practice. Should a PHP error occur the handler will throw a VTCompose\ErrorHandling\ErrorException while respecting the error_reporting php.ini directive (more on error handling on the Error Handling page).

<?php

require_once 'VTCompose/Autoloading/Autoloader.php';

use VTCompose\Autoloading\Autoloader;
use VTCompose\ErrorHandling\ErrorHandler;
use VTCompose\Markdown\DomGenerator;
use VTCompose\Markdown\Parser\Parser;
use VTCompose\Xml\Xsl\Transform;

(new Autoloader())->register();
(new ErrorHandler())->register();

Parsing Markdown Text

This is where the real fun begins. Let us use a very simple text containing just a heading. To parse our text and to obtain its abstract syntax tree (AST) we instantiate the VTCompose\Markdown\Parser\Parser class and call the Parser::parse() method.

$text = '# Hello World!';

$parser = new Parser();
$document = $parser->parse($text);

Generating DOM

The $document variable now holds a reference to the AST of the text and it is our choice what to do with it. We could traverse it and produce an HTML output along the way. vtCompose currently does not implement this but it should be rather straightforward to write custom code to do that.

What we do instead is to translate the AST into an XML document object model (DOM) using the VTCompose\Markdown\DomGenerator class. We do this so that we can later apply XSL transformations on the document.

$domGenerator = new DomGenerator();
$domDocument = $domGenerator->generateDom($document);

Out of curiosity, let us take a look at the generated XML document. We can do this by saving it to a file and then outputting the file contents. In the following code snippet we also delete the temporary file to clean up.

$xmlFilename = 'markdown-xsl-test.xml';
$domDocument->save($xmlFilename);
readfile($xmlFilename);
unlink($xmlFilename);

For our example text the output of the previous piece of code is:

<?xml version="1.0" encoding="UTF-8"?>
<document><heading level="1"><text>Hello World!</text></heading></document>

XSL Transformations

Before doing any XSL transformations (XSLT) one might want to import the document element node of $domDocument into a higher level XML document representing perhaps the whole web page with its header, footer and other surrounding elements so that the whole web page HTML can be generated using XSLT. This is however not necessary and we can just transform our original document.

The following block of code is a complete XSL style sheet which can be used to transform a vtCompose-generated XML document representing a Markdown formatted text. Below it is assumed that the file is named markdown.xsl.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="paragraph">
        <p><xsl:apply-templates /></p>
    </xsl:template>

    <xsl:template match="heading[@level = '1']">
        <h1><xsl:apply-templates /></h1>
    </xsl:template>

    <xsl:template match="heading[@level = '2']">
        <h2><xsl:apply-templates /></h2>
    </xsl:template>

    <xsl:template match="heading[@level = '3']">
        <h3><xsl:apply-templates /></h3>
    </xsl:template>

    <xsl:template match="heading[@level = '4']">
        <h4><xsl:apply-templates /></h4>
    </xsl:template>

    <xsl:template match="heading[@level = '5']">
        <h5><xsl:apply-templates /></h5>
    </xsl:template>

    <xsl:template match="heading[@level = '6']">
        <h6><xsl:apply-templates /></h6>
    </xsl:template>

    <xsl:template match="list[@ordered = 'true']">
        <ol><xsl:apply-templates select="@* | *" /></ol>
    </xsl:template>

    <xsl:template match="@start-number[. != '1']">
        <xsl:attribute name="start"><xsl:value-of select="." /></xsl:attribute>
    </xsl:template>

    <xsl:template match="list[@ordered = 'false']">
        <ul><xsl:apply-templates /></ul>
    </xsl:template>

    <xsl:template match="item">
        <li><xsl:apply-templates /></li>
    </xsl:template>

    <xsl:template match="list[@tight = 'true']/item/paragraph">
        <xsl:apply-templates />
    </xsl:template>

    <xsl:template match="code-block">
        <pre><code><xsl:apply-templates select="@*" /><xsl:value-of select="." /></code></pre>
    </xsl:template>

    <xsl:template match="@info">
        <xsl:attribute name="class">language-<xsl:value-of select="." /></xsl:attribute>
    </xsl:template>

    <xsl:template match="table[column]">
        <table>
            <thead><tr><xsl:apply-templates select="column" /></tr></thead>
            <tbody><xsl:apply-templates select="row" /></tbody>
        </table>
    </xsl:template>

    <xsl:template match="table[not(column)]">
        <table><tbody><xsl:apply-templates /></tbody></table>
    </xsl:template>

    <xsl:template match="column">
        <th><xsl:apply-templates /></th>
    </xsl:template>

    <xsl:template match="column[@text-alignment = 'left']">
        <th style="text-align: left"><xsl:apply-templates /></th>
    </xsl:template>

    <xsl:template match="column[@text-alignment = 'right']">
        <th style="text-align: right"><xsl:apply-templates /></th>
    </xsl:template>

    <xsl:template match="column[@text-alignment = 'center']">
        <th style="text-align: center"><xsl:apply-templates /></th>
    </xsl:template>

    <xsl:template match="row">
        <tr><xsl:apply-templates /></tr>
    </xsl:template>

    <xsl:template match="cell">
        <td><xsl:apply-templates /></td>
    </xsl:template>

    <xsl:template match="cell[../../column[position() = count(current()/preceding-sibling::cell) + 1 and
            @text-alignment = 'left']]">
        <td style="text-align: left"><xsl:apply-templates /></td>
    </xsl:template>

    <xsl:template match="cell[../../column[position() = count(current()/preceding-sibling::cell) + 1 and
            @text-alignment = 'right']]">
        <td style="text-align: right"><xsl:apply-templates /></td>
    </xsl:template>

    <xsl:template match="cell[../../column[position() = count(current()/preceding-sibling::cell) + 1 and
            @text-alignment = 'center']]">
        <td style="text-align: center"><xsl:apply-templates /></td>
    </xsl:template>

    <xsl:template match="block-quote">
        <blockquote><xsl:apply-templates /></blockquote>
    </xsl:template>

    <xsl:template match="thematic-break">
        <hr />
    </xsl:template>

    <xsl:template match="html-block">
        <xsl:value-of select="." disable-output-escaping="yes" />
    </xsl:template>

    <xsl:template match="text">
        <xsl:value-of select="." />
    </xsl:template>

    <xsl:template match="emphasis">
        <em><xsl:apply-templates /></em>
    </xsl:template>

    <xsl:template match="strong-emphasis">
        <strong><xsl:apply-templates /></strong>
    </xsl:template>

    <xsl:template match="link">
        <a href="{@destination}"><xsl:apply-templates select="@* | *" /></a>
    </xsl:template>

    <xsl:template match="image">
        <img src="{@destination}" alt="{.}"><xsl:apply-templates select="@*" /></img>
    </xsl:template>

    <xsl:template match="@title[. != '']">
        <xsl:attribute name="title"><xsl:value-of select="." /></xsl:attribute>
    </xsl:template>

    <xsl:template match="code-span">
        <code><xsl:value-of select="." /></code>
    </xsl:template>

    <xsl:template match="hard-line-break">
        <br />
    </xsl:template>

    <xsl:template match="soft-line-break">
        <xsl:text>&#10;</xsl:text>
    </xsl:template>

    <xsl:template match="html-inline">
        <xsl:value-of select="." disable-output-escaping="yes" />
    </xsl:template>

</xsl:stylesheet>

Here we have decided not to import the document element node of $domDocument into any other document, nor have we adjusted the document in any other way. However we have prepared another XSL style sheet importing the markdown.xsl style sheet. Our style sheet is named markdown-xsl-test.xsl and it contains a template matching the '/document' node. Because we have not changed the input XML document the template will be applied on the document element and the text only will be wrapped with the most basic HTML structure around it.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:import href="markdown.xsl" />

    <xsl:output method="html" encoding="UTF-8" media-type="text/html" />

    <xsl:template match="/document">
        <xsl:text disable-output-escaping="yes">&lt;</xsl:text>
        <xsl:text>!DOCTYPE html</xsl:text>
        <xsl:text disable-output-escaping="yes">&gt;</xsl:text>

        <head>
            <title>My Page</title>
        </head>
        <body>
            <xsl:apply-templates />
        </body>
    </xsl:template>

</xsl:stylesheet>

In this final step of processing our Markdown formatted text we use the VTCompose\Xml\Xsl\Transform class to perform the XSL transformation according to our template.

$transform = new Transform();
$transform->load('markdown-xsl-test.xsl');
$html = $transform->transformToXml($domDocument);

echo $html;

The $html variable content is:

<!DOCTYPE html><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>My Page</title>
</head><body><h1>Hello World!</h1></body>

Note that for the transformation we could have used the markdown.xsl template as well which would have resulted in getting a plain markup for the text only without the surrounding HTML structure.