July 2, 2014

Plone and eXist-db

Connecting Plone with the XML database eXist-db

Since some years we maintain the Onkopedia site, a medical portal with guidelines in the field of hematology and medical oncology.

The simplified workflow is like this:

  • authors (doctors) write their guidelines in Word
  • an internal editorial process brings the Word document in shape and checks the documents for consistency and for some style-guide compliance
  • we can convert the Word documents to XHTML/CSS using OpenOffice/LibreOffice
  • we generate PDF from XHTML/CSS based on a CSS3 Paged Media workflow and publish the PDF documents together with a HTML version on the web through Plone

For a variety of reasons the complete editorial process will be switched to an XML driven workflow where Plone remains the main web CMS but with a complete new workflow under the hood.  There will be a new Word to XML conversion workflow that will produce XML documents according to some XML schemas that are currently in the making. All documents including their assets (associated images (PNG+SVG), conversion reports etc.) will be stored within the eXist-db XML database.Why eXist-db?

  • hierarchial storage model like Plone/ZODB with collections and subcollections
  • indexes HTML/XML out-of-the-box
  • stores arbitrary binary data
  • support for the most recent XML technologies like latest XSLT and XQuery versions
  • various web-service APIs: WebDAV, REST, HTTP, RESTXQ...
  • easily approachable and easy to use
  • open-source and a smart and helpful community

For the integration of Plone with eXist-db we wrote a small Dexterity-based connector to eXist-db (package zopyx.existdb).

The functionality is similar to the old Reflecto product for mounting a local filesystem into Plone.

The connector provides the following functionality:

  • mounts an arbitary eXist-db collection into Plone
  • traversal support for traversing by path into subcollections
  • indexing support (limited to one content document per Connector instance)
  • pluggable API for custom views
  • ACE editor integration
  • ZIP export from eXist-db
  • ZIP import into eXist-db
  • preliminary API for calling arbitrary XQuery scripts from Plone In our scenario we have several hundreds of individual documents stored in eXist-db in their own collection (aka folder).  Such a collection contains the XML document, the converted HTML version, the PDF version, associated images and SVG graphics, the original Word document etc. Such a collection can be "mounted" into Plone by creating an instance of the Connector associated it with the path to the subcollection. Accessing content by URL traversal is very natural:http://host:port/plone/documents/connector/@@view/path/inside/existdb/index.html - as part of the application it is possible to register dedicated views by type. In our case we have a special view that renders the main content but also some additional information, links to other resources based on the metadata stored within the XML document. eXist-db is fully transactional internally - unfortunately transaction support across multiple API operations is currently not available. This is not a major problems since Plone will basically read only from eXist-db - there are only some special functions that allows us to update some metadata stored within some XML documents from the Plone side.

Right now we access content stored within the XML database through the WebDAV layer of eXist-db. A major relief is the Python pyfilesystem module that abstracts the filesystem layer (local, WebDAV, SFTP, HTTP, ZIP etc.) through a uniform API. You can read and write to ZIP files and WebDAV directories using the same method. This is a huge advantage because you can easily reconfigure your filesystems or underlaying storage layer easily by changing the URLs of the related systems - this real transparency .