Ashley Sheridan​

Using XSL to convert XML to SQL

Posted on

The new Energizer Europe website has recently launched, and it has certainly been one of the biggest and complicated sites I've ever worked on. Part of this complexity came from the fact that it was to launch across more than 30 regions of Europe and in 10 languages. Now, there are many brilliant people out there who are fluently multi-lingual and can quite easily read and interpret website copy coming to them in a variety of languages. Unfortunately, I'm not one of those people, so I needed to go about this a different way.

The first problem to tackle was getting the content across to the translation company in a format that would describe the text. I figured there were two obvious choices: a CSV or XML. After finding out that the translation company could accept both of these, I plumped for XML, as I knew it would give me a bit more flexibility to describe data in a hierarchical manner, something that would be cumbersome to achieve in a CSV file.

I wrote some small PHP scripts to grab the content from various tables across the database where the source content was stored and output it into a series of custom XML files. Below is an excerpt of one such file:

<?xml version="1.0" encoding="UTF-8"?> <pages lang_id="11"> <page page_id="1" template_id="1" page_name="home" parent_id="0" page_url="/" page_order="0"> <display_name>Home</display_name> <element name="title" template_group="0" group_order="0">Energizer</element> <element name="slider_title" template_group="1" group_order="2">Energizer<sup>®</sup> Ultra+</element> <element name="slider_title" template_group="1" group_order="1">Energizer<sup>®</sup> Ultimate Lithium</element> <element name="slider_title" template_group="1" group_order="3">Rechargeable Batteries</element> <element name="slider_title" template_group="1" group_order="4">Rechargeable Chargers</element> <element name="slider_title" template_group="1" group_order="5">Lighting</element> <element name="slider_title" template_group="1" group_order="6">Specialty Batteries</element> <element name="slider_title" template_group="1" group_order="7">Hearing Aid Batteries</element> <element name="slider_title" template_group="1" group_order="8">Inductive Charger</element> <element name="slider_text" template_group="2" group_order="1">The world's longest lasting AA and AAA batteries<sup>*</sup><br/><small>*In high-tech devices</small></element> <element name="slider_text" template_group="2" group_order="2">Up to 45% longer battery-life<sup>*</sup>; From the manufacturers of the world's first<sup>**</sup> mercury-free alkaline battery.<br/><small>*in digital cameras, vs. Energizer<sup>®</sup> Classic (results vary by camera) **commercially available since 1991.</small></element> etc... </page>

Another advantage of this format was that I could specify extra information which would further describe the content as attributes of the XML tags, leaving all content that was to be translated as text nodes. This worked well, because in the few cases where some HTML had to creep into the database content (like link tags in paragraphs, etc) they would be perfectly preserved when I recieved the translated copy back.

The next step was tricker though. I needed a way to then import this new content into the database again. Again, there were a few choices I had here, such as writing some more PHP scripts to pull data from the DOM, but I went a slightly more unusual route; I wrote XSL files to convert each of the XML files into SQL statements which I could run against the database. Below is the XSL file which converts the above XML snippet:

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl=""> <xsl:template match="pages"> <html> <head> <style type="text/css"> body {font-family: "Courier New", monospaced; font-size: 10pt;} p  {margin: 0px;} p.block {margin-top: 15px;} </style> </head> <body> <p>Pages - Language ID: <xsl:value-of select="@lang_id"/></p> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="page"> <p class="block">INSERT INTO page VALUES (<xsl:value-of select="../@lang_id"/>0<xsl:value-of select="@page_id"/>, <xsl:value-of select="@template_id"/>, "<xsl:value-of select="@page_name"/>", <xsl:value-of select="../@lang_id"/>, <xsl:value-of select="../@lang_id"/>0<xsl:value-of select="@parent_id"/>, "<xsl:value-of select="@page_url"/>", "<xsl:value-of select="./display_name/."/>", <xsl:value-of select="@page_order"/>) ON DUPLICATE KEY UPDATE display_name="<xsl:value-of select="./display_name/."/>";</p> <xsl:apply-templates select="element"/> </xsl:template> <xsl:template match="element"> <p>INSERT INTO page_element VALUES(<xsl:value-of select="../../@lang_id"/>0<xsl:value-of select="../@page_id"/>, "<xsl:value-of select="@name"/>", <xsl:value-of select="../../@lang_id"/>, "<xsl:apply-templates select="./child::node()" mode="escape"/>", <xsl:value-of select="@template_group"/>, <xsl:value-of select="@group_order"/>) ON DUPLICATE KEY UPDATE content="<xsl:apply-templates select="./child::node()" mode="escape"/>";</p> </xsl:template> <xsl:template match="*" mode="escape"> <xsl:text>&lt;</xsl:text> <xsl:value-of select="name()" /> <xsl:apply-templates mode="escape,/var>" select="@*" /> <xsl:text>&gt;</xsl:text> <xsl:apply-templates mode="escape" /> <xsl:text>&lt;/</xsl:text> <xsl:value-of select="name()" /> <xsl:text>&gt;</xsl:text> </xsl:template> <xsl:template match="@*" mode="escape"> <xsl:text> </xsl:text> <xsl:value-of select="name()" /> <xsl:text>=\"</xsl:text> <xsl:value-of select="." /> <xsl:text>\"</xsl:text> </xsl:template> </xsl:stylesheet>

This converts the XML into two types of SQL queries: 1 to insert a record into the page table, and the other to insert individual bits of data for that page in the page_element table. This is done by by creating separate templates within the XSL to handle the <page> and <element> tags in the XML respectively.

The last two template sections are used to allow HTML output to be pushed to the browser if it was part of the content of a node. The template that reads match="*" matches everything, and calls itself recursively if necessary for nested HTML tags in the content. It basically dumps out escaped < & > tags with the HTML tag name in between, and instructs the second template to work its magic for any attributes that might exist.

Finally, you might have noticed that in the last template, the quotation marks are escaped with a back-slash. This is not for the benefit of displaying it in a browser, but for the SQL. Without them being escaped in this fashion, they would break the SQL statements where they occurred.

All-in-all, while not the most elegant of methods, this does solve a problem, and could easily be tailored to other situations.