03.08.2012

Migrating data with Liquibase

Recently, we started integrating Liquibase as a database schema migration tool into most of my team’s projects, for both new from-scratch projects and already existing ones. Liquibase is great because it allows us to use an SCM tool like Git to manage different revisions of an applications database schema – or more specifically, the changes required to migrate the database schema from one revision to another.

While migrating database schemas seems like a pretty straight-forward task at the beginning, things get more complicated as soon as you want to roll back schema changes without dropping your database (and then rebuilding it). Liquibase also supports migrating your data across schema changes, in both directions. But lets start with the basics.

For this example, I only used the Liquibase command line interface, along with the basic MySQL command line client. Of course, Liquibase also integrates nicely with Maven (as a Maven goal) or Spring (as a bean that executes during context initialization).

I start with a very basic table called „Person“, consisting only of an ID (primary key) and a name:


mysql> describe Person;
+-------+--------------+------+-----+---------+----------------+
| Field | Type         | Null | Key | Default | Extra          |
+-------+--------------+------+-----+---------+----------------+
| id    | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| name  | varchar(255) | NO   | UNI | NULL    |                |
+-------+--------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)

Liquibase uses so-called changesets, which are XML-snippets used to describe DDL statements. They are organized in change log files. The following change set is used to create a table (via the „createTable“-tag) and two columns (via the „column“-tag)


<databasechangelog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-2.0.xsd">
  <changeset author="mueller@synyx.de" id="1" runonchange="true">
    <createtable tablename="Person">
      <column autoincrement="true" name="id" type="BIGINT">
        <constraints nullable="false" primarykey="true">
        </constraints>
      </column>
      <column name="name" type="VARCHAR(255)">
        <constraints nullable="false">
        </constraints>
      </column>
    </createtable>
  </changeset>
</databasechangelog>

When I run Liquibase via command line, it sets up the „Person“ table. The relevant command is „update“:


./liquibase --url=jdbc:mysql://localhost:3306/liquiblog --driver=com.mysql.jdbc.Driver --username=root --password="" --changeLogFile=db.changelog-0.1.0.xml <b>update</b>

Liquibase already knows how to roll back certain changesets, like the „createTable“ changeset above. If we call the command line client with „rollbackCount 1“ instead of „update“, it rolls back the last changeset it executed, and the „Person“ table is gone.

Other changesets cannot be rolled back automatically. Consider the following „insert“-changeset that inserts an entry into our „Person“ table:


<databasechangelog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-2.0.xsd">
  <changeset author="mueller@synyx.de" id="init-1">
    <insert tablename="Person">
      <column name="name" value="John Doe">
      </column>
    </insert>
    <rollback>
      DELETE FROM Person WHERE name LIKE 'John Doe';
    </rollback>
  </changeset>
</databasechangelog>

I manually added a „rollback“-tag containg an SQL statement that reverses the changset. Note that the „rollback“-tag can contain either SQL statements as text or certain Liquibase refactoring tags. Since we now have two change log xml files, I created a „master“-file that imports the other files in the order in which they should be executed:


<?xml version="1.0" encoding="UTF-8"?>
<databasechangelog xmlns="http://www.liquibase.org/xml/ns/dbchangelog/1.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation="http://www.liquibase.org/xml/ns/dbchangelog/1.9
                      http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-1.9.xsd">
    <include file="db.changelog-0.1.0.xml"></include>
    <include file="db.changelog-0.1.0.init.xml"></include>
</databasechangelog>

If we run the „update“ command with the master change log file, it checks wether the first changeset was already executed (depending on you rolled it back or not) and then executes the second changeset that adds a „Person“ entry. To make this work, Liquibase create a helper table called „DATABASECHANGELOGS“ containg already-executed change sets along with a hash value (to make sure no-one modifies changesets once they have been executed):


mysql> select id, md5sum, description from DATABASECHANGELOG;
+--------+------------------------------------+--------------+
| id     | md5sum                             | description  |
+--------+------------------------------------+--------------+
| 1      | 3:5a36f447e90b35c3802cb6fe16cb12a7 | Create Table |
| init-1 | 3:43c29e0011ebfcfd9cfbbb8450179a41 | Insert Row   |
+--------+------------------------------------+--------------+
2 rows in set (0.00 sec)

Now that we got the basics running, lets try something more challenging: an actual change to our schema that requires both schema and data migration. Our „Person“ table currently has only a name column, and we decided that we want to split it up into a „firstname“ and a „lastname“ column.

Before beginning work, I have Liquibase „tag“ the database so that we can roll back to this tag later on:


./liquibase --url=jdbc:mysql://localhost:3306/liquiblog --driver=com.mysql.jdbc.Driver --username=root --password="" --changeLogFile=changelog-master.xml <b>tag liquiblog_0_1_0</b>

I created a new change set that adds the two new columns:


<changeset author="mueller@synyx.de" id="1" runonchange="true">
  <addcolumn tablename="Person">
    <column name="firstname" type="VARCHAR(255)">
      <constraints nullable="false">
      </constraints>
    </column>
    <column name="lastname" type="VARCHAR(255)">
      <constraints nullable="false">
      </constraints>
    </column>
  </addcolumn>
</changeset>

Once again, Liquibase knows how to roll back this change set, so we can skip the rollback tag.

Now that the table has two additional columns, we must take care of migrating our existing data to the new schema before deleting the old, now obsolete „name“ column. Since data manipulation is not supported out-of-the-box by Liquibase, we have to use its „sql“ tag to include native SQL statements within a changeset.


<changeset author="mueller@synyx.de" id="2">
  <sql>
    UPDATE Person SET firstname = SUBSTRING_INDEX(name, ' ', 1);
    UPDATE Person SET lastname = SUBSTRING_INDEX(name, ' ', -1);
  </sql>
  <rollback>
    UPDATE Person SET firstname = '';
    UPDATE Person SET lastname = '';
  </rollback>
</changeset>

Note that the content of the „rollback“-tag is kind of redundant, but the tag itself is required because Liquibase prevents us from rolling back changesets that cannot be rolled back implicitly and have no explicit rollback tag.

Once again, after executing Liquibase with the „update“-option, the new changeset is run, and our newly-created „firstname“ and „lastname“ columns now contain data.

Finally, I want to remove the old „name“ column.


<changeset author="mueller@synyx.de" id="3" runonchange="true">
  <dropcolumn columnname="name" tablename="Person">
  </dropcolumn>
  <rollback>
    <addcolumn tablename="Person">
      <column name="name" type="VARCHAR(255)">
        <constraints nullable="false">
        </constraints>
      </column>
    </addcolumn>
    <sql>
      UPDATE Person SET name = CONCAT(firstname, CONCAT(' ', lastname));
   </sql>
  </rollback>
</changeset>

Again, the changeset itself is quite simple because Liquibase supports dropping columns, but the „rollback“-tag is more complicated: I first re-add the old „name“-column using the standart „addColumn“-tag, and then I used a custom SQL statement to set the columns value.

We end up with a new database schema, complete with data:


mysql> select * from Person;
+----+-----------+------------+
| id | firstname | lastname   |
+----+-----------+------------+
|  1 | John      | Doe        |
+----+-----------+------------+
1 rows in set (0.00 sec)

Because we created a tag earlier and included rollback instructions in all our changesets, we can always roll back these modifications without loosing any data! By running..


./liquibase --url=jdbc:mysql://localhost:3306/liquiblog --driver=com.mysql.jdbc.Driver --username=root --password="" --changeLogFile=changelog-master.xml <b>rollback liquiblog_0_1_0</b>

..we get our original database back!

Of course, the example with splitting / concatenating strings is a little far-fetched, but the same principles can be applied to more sophisticated refactorings. I came across the idea for this blog post when we had to split an existing domain class (mapped to a single table) into an abstract base class and two subclasses, preferrably without losing data.

Sven Müller

Migrating data with Liquibase