Orbeon Forms User Guide

Converters

1. Introduction

Converters are processors converting XML documents from one format to another. For example, the standard HTML converter documented below converts an XML document into an HTML document. This HTML document can then be sent to a web browser using the HTTP serializer, or attached to an email with the Email processor.

Converters typically have a data output containing the converted document.

2. Standard Converters

The standard converters convert XML infosets (the XML documents that circulate in Orbeon Forms pipelines) into text according to standard output methods defined by the XSLT specification. They convert to the following formats:

  • XML: a standard XML document
  • HTML: a standard HTML document
  • XHTML: a standard XHTML document
  • Text: any text document

The resulting text is sent to the data output. It is embedded in an XML document as specified by the text document format.

2.1. Configuration

The configuration of the standard converters consists of the following optional elements:

Element Purpose Default
method XSLT output method (one of xml, html, xhtml or text) xml, html or text, depending on the serializer
content-type Content type hint specified on the output document element Specific to each serializer
encoding Encoding hint specified on the output document element utf-8
version HTML or XML version number 4.01 for HTML (ignored for XML, which always output 1.0)
public-doctype The public doctype "-//W3C//DTD HTML 4.01 Transitional//EN" for HTML, none otherwise
system-doctype The system doctype "http://www.w3.org/TR/html4/loose.dtd" for HTML, none otherwise
omit-xml-declaration Specifies whether an XML declaration must be omitted false for XML and HTML (i.e. a declaration is output by default), ignored otherwise
standalone If true, specifies standalone="yes" in the document declaration. If false, specifies standalone="no" in the document declaration. If missing, no standalone attribute is produced. For more information about standalone document declarations, please refer to the relevant section of the XML specification. In most cases, this does not need to be specified. not specified for XML, ignored otherwise
indent Specifies if the output is indented. This means that line breaks maybe be inserted between adjacent elements. The actual level of indentation is specified with the indent-amount configuration element. true (ignored for text method)
indent-amount Specifies the number of indentation space 1 (ignored for text method)

Example:

<config><content-type>text/html</content-type><encoding>utf-8</encoding><version>4.01</version><public-doctype>-//W3C//DTD HTML 4.01//EN</public-doctype><system-doctype>http://www.w3.org/TR/html4/strict.dtd</system-doctype><indent-amount>4</indent-amount></config>

2.2. XML Converter

The XML converter outputs an XML document conform to the XSLT xml semantic. By default, the output is indented with no spaces and encoded using the UTF-8 character set. The default MIME content type is application/xml. The following is a simple XML converter example:

<p:processor name="oxf:xml-converter"><p:input name="config"><config><content-type>application/xml</content-type><encoding>iso-8859-1</encoding><version>1.0</version></config></p:input><p:input name="data" href="oxf:/my-xml-document.xml"/><p:output name="data" id="xml-document"/></p:processor>

This is an example of output produced by the XML converter:

<document xsi:type="xs:string" content-type="application/xml; charset=iso-8859-1"><?xml version="1.0" encoding="iso-8859-1" standalone="no"?> <claim xmlns="http://orbeon.org/oxf/examples/bizdoc/claim"> <insured-info> <general-info> <name-info> <title-prefix>Dr.</title-prefix> <last-name>Doe</last-name> <first-name>John</first-name> <title-suffix/> </name-info> <address> <address-detail> <street-name>N Columbus Dr.</street-name> <street-number>511</street-number> <unit-number/> </address-detail> <city>Chicago</city> <state-province>IL</state-province> <postal-code>60611</postal-code> <country>USA</country> <email>jdoe@acme.org</email> </address> </general-info> <person-info> <gender-code>M</gender-code> <birth-date>1972-10-01</birth-date> <marital-status-code>C</marital-status-code> <occupation>Manager</occupation> </person-info> <family-info> <children> <child> <birth-date>2003-02-02</birth-date> <first-name>Marco</first-name> </child> <child> <birth-date/> <first-name/> </child> </children> <comments>No comments at this point!</comments> </family-info> <claim-info> <accident-type>FOOT</accident-type> <accident-date>2004-07-06</accident-date> <rate/> </claim-info> </insured-info> </claim></document>

2.3. HTML Converter

The HTML converter outputs an HTML document conform to the XSLT html semantic. By default, the doctype is set to HTML 4.0 Transitional and the content is indented with no space and encoded using the UTF-8 character set. The default content type is text/html. The following is a simple HTML converter example:

<p:processor name="oxf:html-converter"><p:input name="config"><config><content-type>text/html</content-type><encoding>iso-8859-1</encoding><public-doctype>-//W3C//DTD HTML 4.01 Transitional//EN</public-doctype><version>4.01</version></config></p:input><p:input name="data"><html><head><title>My HTML document</title></head><body><p>This is the content of the HTML document.</p></body></html></p:input><p:output name="data" id="html-document"/></p:processor>

This is an example of output produced by the HTML converter:

<document xsi:type="xs:string" content-type="text/html; charset=iso-8859-1"><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>My HTML document</title> </head> <body> <p> This is the content of the HTML document. </p> </body> </html></document>
Note

The XML 1.0 Specification prohibits a DOCTYPE definition with a Public ID and no System ID.

2.4. Text Converter

The Text converter outputs a text document conform to the XSLT text semantic. By default, the output is encoded using the UTF-8 character set. This serializer is typically useful for pipelines generating Comma Separated Value (CSV) files. The default content type is text/plain. The following is a simple Text converter example:

<p:processor name="oxf:text-converter"><p:input name="config"><config/></p:input><p:input name="data"><document>This is just plain text. It will be output without the<em>text</em>and<em>em</em>elements.</document></p:input><p:output name="data" id="text-document"/></p:processor>

This is an example of output produced by the Text converter:

<document xsi:type="xs:string" content-type="text/plain; charset=utf-8">This is just plain text. It will be output without the text and em elements.</document>

3. To-XML Converter

The To-XML Converter produces parsed XML documents from a binary document format.

The data input of the To-XML Converter follows the binary document format. Its data output is an XML document. The mandatory config input contains an empty config element reserved for future configuration parameters. This is an example of use:

<p:config xmlns:oxf="http://www.orbeon.com/oxf/processors"><p:param type="output" name="data"/><p:processor name="oxf:url-generator"><p:input name="config"><config><url>parsing-view.xsl</url><content-type>binary/octet-stream</content-type><force-content-type>true</force-content-type></config></p:input><p:output name="data" id="xml-file-as-binary"/></p:processor><p:processor name="oxf:to-xml-converter"><p:input name="data" href="#xml-file-as-binary"/><p:input name="config"><config/></p:input><p:output name="data" ref="data"/></p:processor></p:config>

4. XSL-FO Converter

The XSL-FO Converter produces PDF documents from an XSL-FO description of the page. The default content type is application/pdf.

Note
The input document of the XSL-FO must follow the W3C XSL/FO Recommendation . Note that only subset of the recommendation implemented by FOP 0.20.5 is supported.

The resulting binary stream is sent to the data output. It is embedded in an XML document as specified by the binary document format.

5. XLS Converters

Orbeon Forms ships with the POI library which allows import and export of Microsoft Excel documents. Orbeon Forms uses an Excel file template to define the layout of the spreadsheet. You define cells that will contain the values with a special markup.

5.1. Preparing the Spreadsheet

First, create an Excel spreadsheet with the formatting of your choosing. Apply a special markup to the cell you need to export values to:

  1. Select the cell
  2. Go to the menu Format->Cell
  3. In the Number tab, choose the Custom format and enter a format that looks like: #,##0;"/a/b|/c/d". In this example we have 2 XPath expressions separated by a pipe character (|): /a/b and /c/d. The first XPath expression is used when creating the Excel file (exporting) and is run against the data input document of the To XLS converter. The second expression is optional and is used when recreating an XML document from the Excel file (importing with the From XLS converter).

5.2. To XLS Converter

The To XLS converter takes a config input describing the XLS template file, and a data input containing the values to be inserted in the template. The processor scans the template, and applies XPath expressions to fill in the template. It returns a binary document on it data output.

The config input takes a single config element with one attribute:

template A URL pointing to an XLS template file
<p:processor name="oxf:xls-serializer"><p:input name="config"><config template="oxf:/excel/template.xls"/></p:input><p:input name="data"><currency><value1>10</value1><value2>20</value2><value3>30</value3></currency></p:input></p:processor>

The config element can also contain zero or more repeat-row elements with two attributes, row-num and for-each.

The To XLS converter is typically connected to the HTTP serializer. This allows specifying headers such as Content-Disposition:

<!-- Convert to XLS --><p:processor name="oxf:to-xls-converter"><p:input name="config"><config template="oxf:/examples/employees/export-excel/employees.xls"><repeat-row row-num="3" for-each="employees/employee"/></config></p:input><p:input name="data" href="#workbook"/><p:output name="data" id="xls-binary"/></p:processor><!-- Serialize --><p:processor name="oxf:http-serializer"><p:input name="data" href="#xls-binary"/><p:input name="config"><config><header><name>Content-Disposition</name><value>attachment; filename=employees.xls</value></header></config></p:input></p:processor>

5.3. From XLS Converter

The From XLS converter takes an Excel file (for example uploaded with an XForms upload control), finds special markup cells and reconstructs an XML document from this markup. The converter has one data input which must receive a binary document, and a data output containing the generated XML document. Assume the following XForms model:

<xf:model xmlns:xs="http://www.w3.org/2001/XMLSchema"><xf:instance><form><action/><files><file filename="" mediatype="" size="" xsi:type="xs:anyURI"/></files></form></xf:instance><xf:submission method="post" encoding="multipart/form-data"/></xf:model>

The model can be filled with the following XForms controls:

<xforms:group ref="/form"><p><xforms:upload ref="files/file[1]"/><xforms:submit><xforms:label>Submit</xforms:label><xforms:setvalue ref="action">import</xforms:setvalue></xforms:submit></p></xforms:group>

Then the following pipeline can extract the data from the uploaded file:

<!-- Dereference URI stored in instance and return a binary --><p:processor name="oxf:url-generator"><p:input name="config" href="aggregate('config', aggregate('url', #instance#xpointer(string(/form/files/file[1]))), aggregate('content-type', #instance#xpointer('application/octet-stream')))"/><p:output name="data" id="xls-binary"/></p:processor><!-- Convert file to XML --><p:processor name="oxf:from-xls-converter"><p:input name="data" href="#xls-binary"/><p:output name="data" id="xls"/></p:processor>

This is an example of returned document, given an appropriate configuration of the Excel template:

<workbook><sheet><employees><employee-id>5398</employee-id><firstname>Nils</firstname><lastname>Aas</lastname><phone>(555) 123 0434</phone><title>Norwegian sculptor and illustrator</title><age>70</age><manager-id/><employee-id>5028</employee-id><firstname>Ali</firstname><lastname>Abbasi</lastname><phone>(555) 123 0060</phone><title>BBC Scotland travel presenter</title><age>42</age><manager-id/></employees></sheet></workbook>