We are building web applications since the mid-90s with Python. Historically, we have used a
In the last blog post, I described how to export a Plone site using collective.jsonify to the filesystem as directory of JSON data. This blog post will explain briefly to import the JSON data into ArangoDB as interim migration database
Why ArangoDB? We are using ArangoDB as a document database for storing the JSON data. This allows us to introspect and query the data to be migrated and imported. This is often necessary in complex migration with unknown data and unknown object relations. Using grep or similar tools on a huge JSON dump is unlikely the right approach. Using ArangoDB also allows us to run partial migrations for testing purposes (e.g. for testing the migration on a particular folder or a particular content type).
ArangoDB is multi-model database (key-value store, document store, graph database) and is available as community edition, an enterprise edition and as cloud edition (ArangoDB Oasis). The community edition is sufficient for migration projects. There are various installations options for all operating systems, all Linux distributions and through Docker.
After the installation of ArangoDB, you need to create a dedicated user account within ArangoDB and a dedicated database. This can be easily achieved through the ArangoDB web UI. We assume for the following steps that you have created a database named plone and an ArangoDB user account plone with password secret.
Preparing your target Plone site
We assume that the target Plone site is running on Plone 5.2 and Python 3. The core functionality is integrated in the collective.plone5migration add-on for Plone. Prepare your buildout like this:
[buildout] extends = buildout.cfg auto-checkout += collective.plone5migration sources = sources [sources] collective.plone5migration = git email@example.com:collective/collective.plone5migration.git [instance] eggs += collective.plone5migration
Importing your JSON data into ArangoDB
After running buildout with the configuration given above, you will find this generated import script:
bin/import-jsondump-into-arangodb --help usage: import-jsondump-into-arangodb [-h] [-d DATABASE] [-c COLLECTION] [-url CONNECTION_URL] [-u USERNAME] [-p PASSWORD] [-i IMPORT_DIRECTORY] [-x] optional arguments: -h, --help show this help message and exit -d DATABASE, --database DATABASE ArangoDB database -c COLLECTION, --collection COLLECTION ArangoDB collection -url CONNECTION_URL, --connection-url CONNECTION_URL ArangoDB connection URL -u USERNAME, --username USERNAME ArangoDB username -p PASSWORD, --password PASSWORD ArangoDB password -i IMPORT_DIRECTORY, --import-directory IMPORT_DIRECTORY Import directory with JSON files -x, --drop-collection Drop collection
In the former blog post we created a JSON export in I/tmp/content_test_2020-11-25-15-40-59.
You can import this export directory into ArangoDB using:
bin/import-jsondump-into-arangodb -i /tmp/content_test_2020-11-25-15-40-59 -x -u plone -p secret -d plone
connection=http://localhost:8529 username=root database=ugent collection=import import directory=/tmp/content_test_2020-11-25-15-40-59 truncating existing collection truncating existing collection...DONE ......
A nice progressbar will should you the progress of the import operation. The import speed depends on your local system. A typical import of 100.000 JSON files (100.000 Plone objects) with a total size of 50 GB takes about 45 to 60 minutes (largely dependent on the IO speed of your disk).
collective.plone5migration has been developed by Andreas Jung as part of a customer project with the University Ghent (migration of the ugent.be site to Plone 5).