November 27, 2020

Plone migrations: importing collective.jsonify export data into ArangoDB (Part 3)

Plone migrations: importing collective.jsonify export data into ArangoDB (Part 3)

In the last blog post, I described how to export a Plone site using collective.jsonify to the filesystem as directory of JSON data. This blog post will explain briefly to import the JSON data into ArangoDB as interim migration database


Why ArangoDB? We are using ArangoDB as a document database for storing the JSON data. This allows us to introspect and query the data to be migrated and imported. This is often necessary in complex migration with unknown data and unknown object relations. Using grep or similar tools on a huge JSON dump is unlikely the right approach. Using ArangoDB also allows us to run partial migrations for testing purposes (e.g. for testing the migration on a particular folder or a particular content type).

ArangoDB is multi-model database (key-value store, document store, graph database) and is available as community edition, an enterprise edition and as cloud edition (ArangoDB Oasis). The community edition is sufficient for migration projects. There are various installations options for all operating systems, all Linux distributions and through Docker.

After the installation of ArangoDB, you need to create a dedicated user account within ArangoDB and a dedicated database. This can be easily achieved through the ArangoDB web UI.  We assume for the following steps that you have created a database named plone and an ArangoDB user account plone with password secret.


Preparing your target Plone site

We assume that the target Plone site is running on Plone 5.2 and Python 3. The core functionality is integrated in the collective.plone5migration add-on for Plone. Prepare your buildout like this:

[buildout]
extends = buildout.cfg

auto-checkout +=
    collective.plone5migration     

sources = sources

[sources]
collective.plone5migration = git git@github.com:collective/collective.plone5migration.git


[instance]
eggs +=
    collective.plone5migration

Importing your JSON data into ArangoDB

After running buildout with the configuration given above, you will find this generated import script:

 bin/import-jsondump-into-arangodb --help
usage: import-jsondump-into-arangodb [-h] [-d DATABASE] [-c COLLECTION] [-url CONNECTION_URL] [-u USERNAME] [-p PASSWORD] [-i IMPORT_DIRECTORY] [-x]

optional arguments:
  -h, --help            show this help message and exit
  -d DATABASE, --database DATABASE
                        ArangoDB database
  -c COLLECTION, --collection COLLECTION
                        ArangoDB collection
  -url CONNECTION_URL, --connection-url CONNECTION_URL
                        ArangoDB connection URL
  -u USERNAME, --username USERNAME
                        ArangoDB username
  -p PASSWORD, --password PASSWORD
                        ArangoDB password
  -i IMPORT_DIRECTORY, --import-directory IMPORT_DIRECTORY
                        Import directory with JSON files
  -x, --drop-collection
                        Drop collection

In the former blog post we created a JSON export in I/tmp/content_test_2020-11-25-15-40-59.

You can import this export directory into ArangoDB using:

bin/import-jsondump-into-arangodb -i /tmp/content_test_2020-11-25-15-40-59 -x -u plone -p secret -d plone

Output:

connection=http://localhost:8529
username=root
database=ugent
collection=import
import directory=/tmp/content_test_2020-11-25-15-40-59
truncating existing collection
truncating existing collection...DONE
......

A nice progressbar will should you the progress of the import operation. The import speed depends on your local system. A typical import of 100.000 JSON files (100.000 Plone objects) with a total size of 50 GB takes about 45 to 60 minutes (largely dependent on the IO speed of your disk).


collective.plone5migration has been developed by Andreas Jung as part of a customer project with the University Ghent (migration of the ugent.be site to Plone 5).