Migrating Neo4j community from 3.2 to 4.4

Sometimes opening an older project can be overwhelming and upgrading can be hard. In this blog post we will describe a possibility to migrate Neo4j Community 3.2.14 to Neo4j 4.4.8 Community running on a Kubernetes cluster. The neo4j-vendor-supported path is to migrate between minor versions step by step or have direct migration to any higher major version of the enterprise edition only. Since we want to keep the community edition we will deploy 4.4.8 and only migrate the data within the database. This migration path worked for us since we are only have nodes and relationships in our graph-database and no additional neo4j specific configuration. To follow this post we expect you to have some basic Kubernetes understanding e.g. how to deploy using kubectl.

As described the starting point is a running Neo4j 3.2.14 Community edition which is the latest stable Neo4j 3.2 release and declared EOL and therefore needs migration. From this version we aim to export a GraphML-dump that we can import into our new 4.4.8 database.
We will use the export and import functionality of the APOC library. APOC is available for the different neo4j versions. The APOC version has to match the used neo4j version (in the first two version number, i.e. 3.2). In Neo4j 3.2.14 we installed apoc-3.2.3.6-all.jar .
To deploy this into our currently running neo4j database we will download the needed APOC-jar and copy it to the plugins folder – the right place to activate the plugin for the database. To achieve this, we will use an Init Container in our Kubernetes database deployment. Init Containers allow us to initialize the pod by executing a specialized task. We will use it to load needed extensions and plugins. They will be finished prior to the start of app container. We therefore need a base image that is able to handle our curl and cp-statement (e.g. alpine or basic bitnami image).

      initContainers:
        - name: init-container
          image: neo4j:3.2.14
          command: [ '/bin/sh', "-c", 'curl -L https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.2.3.6/apoc-3.2.3.6-all.jar -O; cp -v apoc-3.2.3.6-all.jar /plugins/ ' ]
          volumeMounts:
            - mountPath: /plugins/
              name: container-plugins
        volumes:
          - name: container-plugins
            emptyDir: { }

For the jar file we need a volume to store it to. This volume is created as an empty directory when applying the deployment and mounted to the neo4j-/plugin/-path. It will contain the APOC-jar after the Init Container succeeded.
After Kubernetes successfully ran the Init Container, the Neo4j container will be started, with the APOC file already available in the right place. Installing APOC is not the only prerequisite to be ready to use it. You also need to configure Neo4j to allow the execution of the APOC porcedures. This is done by whitelisting the property “apoc.export.graphml.all” and enabling the file export. Using Kubernetes we will define these configuration parameters using the following environment variables, that will be injected into the neo4j.conf file.

      containers:
        - name: neo4j
          image: neo4j:3.2.14
          ports:
            - containerPort: 7474
              name: neo4j-port
            - containerPort: 7687
              name: bolt-port
          env:
            - name: NEO4J_dbms_security_procedures_whitelist
              value: 'apoc.export.graphml.all'
            - name: NEO4J_apoc_export_file_enabled
              value: 'true'
          volumeMounts:
            - name: container-plugins
              mountPath: /var/lib/neo4j/plugins

After applying these changes the pod containing the current Neo4j 3.2.14 will restart, providing us a new instance of the database. We can then log into our running pod to create a GraphML file.

To access the cypher-shell

kubectl exec -it neo4j-0 — /var/lib/neo4j/bin/cypher-shell

Login using your database credentials and create a GraphML file by executing

WITH "migration-backup.graphml" AS filename
CALL apoc.export.graphml.all(filename, {useTypes:TRUE, storeNodeIds:FALSE})
YIELD file
RETURN file;

Copy the GraphML-file to your local machine. For a large amount of data add retries to get the complete file without interruption.

kubectl cp namespace/neo4j-0:/migration-backup.graphml /home/migration-backup.graphml –retries=5

Now, we remove the complete 3.2.14 set-up from the cluster including the PVCs that contain the Neo4j 3.2.data (or if you are unsure if you can safely remove the PVC, rename your PVCs in the deployment-file to create new volumes).

Change your configuration to use the newer Neo4j 4.4 image and this time set the environment variables to allow importing data.

          env:
            - name: NEO4J_dbms_security_procedures_whitelist
              value: 'apoc.import.graphml.all'
            - name: NEO4J_apoc_import_file_enabled
              value: 'true'

Apply the new 4.4.8 configuration to your cluster. Copy the GraphML file created with Neo4j 3.2.14 to the database.

kubectl cp /home/migration-backup.graphml namespace/neo4j-0:/var/lib/neo4j/backup/migration-backup.graphml –retries=5

Log into the pod and in the 4.4.8 -cyphershell execute

CALL apoc.import.graphml(“migration-backup.graphml”, {readLabels: true});
Congrats, you should now have your new version 4.4.8 up and running with all data that was previously available on your 3.2.14.

Prospect: we will have more articles on Neo4j community backup coming up soon!

Continue reading: This article is part 2 of a series of 5 articles on Neo4j. The upcoming articles include: Backup for Neo4j Community Edition, Creating your own Neo4j image, Presenting Data with NeoDash.