December 26, 2007

When the Plone migration fails - doing content-migration only

The standard Plone migration often fails - especially for more or less customized sites and for sites running on some pre-historic or unreleased Plone version. Plone traditionally performs an in-place migration however you often want to create a new Plone site from scratch having the need to move your old content somehow to the new site.

Over the last few month I had the pleasure (or pain) migrating several educational Plone sites to current Plone 3 technology. There were some Plone 2.0 beta-something sites, some Plone 1-ish sites...more or less stock Plone instances with a small list on add-ons. Unfortunately the standard Plone migration never worked. Big problem? Perhaps..or perhaps not.

After running a site for three or four years you collect a lot of trash on your site and you often want to start from scratch when it comes to design and site-structure.  Over this period of time you see people come and go that is reflected within a Plone site as a huge garbage of stale content, messed up permissions and so on. So the general approach is to create a new Plone3 site from scratch and having some mechanism to get your core content over into the new sites with the most important set of metadata. This means you want to migrate the content only and basically don't care about much about any kind site configuration of your old system.

The idea is pretty simple and not too complicated to implement. You have to write a script that performs a catalog search per content type
and exports the content. For my own migrations I created a simple INI-style format containing the metadata for a particular content-type. A generated documents.ini looks like this:

[document-abt6/weigel/resources/plasmids/Alc/pam58]
path = abt6/resources/plasmids/Alc/pam58
group = abt6
filename = exports/documents/tmpchvyF2
text-format= structured-text
content-type = text/plain
id = pam58
title = pAM58_(AlcA_LFY_pAM54)
Description = 
owner = admin
review-state = published
created = 1099242080.130000
effective = 1099242866.000000
expires = 253370674800.000000

The information is directly extract from the original objects by calling the related accessor methods. Since all content-types share the same metadata (dublin core) you can re-use the code for extracting the metadata for all content-types and have to write only specific code for each content-type (e.g. for extracting the start and end date for a news item).

Some remarks to the export format:

  • one .ini file per content-type
  • binary content like image data or the text body or file data is stored in dedicated file on the filesystem and referenced by the filename
  • dates are exported as timestamp

After creating a set of .ini files you can write a small import script that iterates of the list of file and parses them (trivial using the
ConfigParser module of Python). Recreating the content within the new sites is also pretty simple...you call invokeFactory() for creating a new content object and you call the related mutator methods for restoring the field values. If needed you have to call some other Plone API methods for restoring the ownership or for setting the workflow state properly...but this almost pretty easy if you have some experiences with Python, Plone and Archetypes.

This approach is pretty much generic and extensible and made me feel much better in case of failing Plone migration. I am pretty sure that the standard Plone migration machinery has some code for iterating over content or restoring it somehow but I found this simple .ini file format approach very robust and extensible.  The standard Plone migration is getting better from version to version but depending on the level of customizations and third-party products it will always fails at some point. Other alternatives for doing content-migrations with Plone 2.5 or higher are for example tools like GSXML that exports a whole folder structure as XML (which seems to work pretty well and appears pretty much stable). You might also look at XMLForest (which is supposed to work with Plone 2.1 or higher) but I always had some issues.

AND: Never use Export/Import through the ZMI - NEVER - for doing a migration. Export/import is known to work (and designed) for moving stuff between identical installations only!