Cancer diagnosis and treatment
The last few weeks have been quite eventful. On October 21st this year, I received
In the last blog post, I described how to export a Plone site using collective.jsonify to the filesystem as directory of JSON data. This blog post will explain briefly to import the JSON data into ArangoDB as interim migration database
Why ArangoDB? We are using ArangoDB as a document database for storing the JSON data. This allows us to introspect and query the data to be migrated and imported. This is often necessary in complex migration with unknown data and unknown object relations. Using grep or similar tools on a huge JSON dump is unlikely the right approach. Using ArangoDB also allows us to run partial migrations for testing purposes (e.g. for testing the migration on a particular folder or a particular content type).
ArangoDB is multi-model database (key-value store, document store, graph database) and is available as community edition, an enterprise edition and as cloud edition (ArangoDB Oasis). The community edition is sufficient for migration projects. There are various installations options for all operating systems, all Linux distributions and through Docker.
After the installation of ArangoDB, you need to create a dedicated user account within ArangoDB and a dedicated database. This can be easily achieved through the ArangoDB web UI. We assume for the following steps that you have created a database named plone and an ArangoDB user account plone with password secret.
We assume that the target Plone site is running on Plone 5.2 and Python 3. The core functionality is integrated in the collective.plone5migration add-on for Plone. Prepare your buildout like this:
[buildout]
extends = buildout.cfg
auto-checkout +=
collective.plone5migration
sources = sources
[sources]
collective.plone5migration = git git@github.com:collective/collective.plone5migration.git
[instance]
eggs +=
collective.plone5migration
After running buildout with the configuration given above, you will find this generated import script:
bin/import-jsondump-into-arangodb --help
usage: import-jsondump-into-arangodb [-h] [-d DATABASE] [-c COLLECTION] [-url CONNECTION_URL] [-u USERNAME] [-p PASSWORD] [-i IMPORT_DIRECTORY] [-x]
optional arguments:
-h, --help show this help message and exit
-d DATABASE, --database DATABASE
ArangoDB database
-c COLLECTION, --collection COLLECTION
ArangoDB collection
-url CONNECTION_URL, --connection-url CONNECTION_URL
ArangoDB connection URL
-u USERNAME, --username USERNAME
ArangoDB username
-p PASSWORD, --password PASSWORD
ArangoDB password
-i IMPORT_DIRECTORY, --import-directory IMPORT_DIRECTORY
Import directory with JSON files
-x, --drop-collection
Drop collection
In the former blog post we created a JSON export in I/tmp/content_test_2020-11-25-15-40-59.
You can import this export directory into ArangoDB using:
bin/import-jsondump-into-arangodb -i /tmp/content_test_2020-11-25-15-40-59 -x -u plone -p secret -d plone
Output:
connection=http://localhost:8529
username=root
database=ugent
collection=import
import directory=/tmp/content_test_2020-11-25-15-40-59
truncating existing collection
truncating existing collection...DONE
......
A nice progressbar will should you the progress of the import operation. The import speed depends on your local system. A typical import of 100.000 JSON files (100.000 Plone objects) with a total size of 50 GB takes about 45 to 60 minutes (largely dependent on the IO speed of your disk).
collective.plone5migration has been developed by Andreas Jung as part of a customer project with the University Ghent (migration of the ugent.be site to Plone 5).