PyCON 2024 in Pittsburgh experience
I am a regular visitior of PyCON US conference since 2001 and I visited most
This is the first of a series of blog posts that deals with Plone migrations. I am trying to tell the story about how we are doing large-scale Plone migrations which usually involves migration of code, switching add-ons and reorganizing content. This is usually accomplished in our migration through an export of the original Plone site and "magic" importer script on the target system. This migration approach evolved over the years. Our current approach is based on the experience and the work over the last year when we worked on a large-scale migration for the University of Ghent (90.000 content objects, 50 GB of data) and some sites for an industrial customer in Stuttgart.
Plone 5.2 supports a so-called “in-place” migration which is usually the right choice for simple sites with simple content-types and standard add-ons like PloneFormGen. The database file of Plone 4.3 is migrated in-place and transformed into a Plone 5.2 compliant database file. This is the approach in theory. It is often necessary to perform additional work and additional migration steps to migrate an old Plone database to Python 3 format - in particular when the Plone site has a certain age and history (which often implies a lot of cruft and old configurations that can break or interfere with the standard Plone migration). We use this migration approach only for small and simple sites.
The export/import approach allows us to perform a clean export of the content from the old site and to perform a clean import into a fresh Plone 5.2 site. So the new site will not contain any cruft content or obsolete configurations etc. The export of the is usually accomplished using the collective.jsonify add-on for Plone which basically exports all content objects and their metadata as JSON - one JSON file per content object. We also have supplementary code to export additional Plone settings or e.g. the Plone users and groups (if need, unless users and groups are maintained in LDAP or Active Directory or something similar). The exported JSON files are the foundation for a fresh import using the Plone Transmogrifier tool. Transmogrifier is basically a processing pipeline for piping the JSON data into Plone. The import can be customized using so-called “blueprints” in order to inject project-dependent dependencies, filtering steps, etc.
In our larger Plone migration projects, we often use a variation of the Transmogrifier approach. The export of the data and content from the original site is also done using the collective.jsonify add-on. The exported JSON files are then being imported into a migration database (we use ArangoDB). The migration database allows us to perform a partial import, query, and inspect the migration data in order to work on migration aspects and problems more easily than searching to Gigabyte of JSON files on the filesystem with operating system related tools like “grep”). The import is done using a custom migration package (which is to 80% generic for most Plone migrations, the remaining 20% are project-specific adjustments or additions because every migration is different). Our migration package allows us to run partial migration in order to test migrations more flexible. E.g. the migration package allows us to import a subset of content (either one or more folder or only specific content-types). This is very handy and time-saving. Migrating a large Plone site with a lot of content can take many hours. So testing changes on a particular feature with a subset of the data in advance is always better than having a failure on the full migration set after some hours.