04.12.2013

NoSQL matters – It does! But think about your data!

The confernence venue

The NoSQL matters conference took place in Barcelona, Spain, from 29-30 November. Barcelona is a big, beautiful ( but crowded) city. The conference venue, the Casa Convalescència, belongs to the complex of the Hospital de la Santa Creu i Sant Pau which was declared World Cultural Heritage Site by the UNESCO. It has a great atmosphere! The conference itself was sold out, and thus more than 150 participants came together to discuss about the field of NoSQL and related technologies. It was well organized, and the schedule left time for discussions and to change the rooms. The concluding ‘session’ brought all the participants together for tapas and beer and encouraged them for further lively discussions.

Day 1: The training day

The first day was a training day. The training sessions were concerned with some of the available NoSQL stores (the graph database Neo4j or the key-value store Riak) or how to model in a NoSQL world. I myself did not participate in the training sessions, as I arrived late on Friday 29 November in Barcelona.

Day 2: The session day

The second day of the conference was a session day with talks covering all the different nuances of the NoSQL world. It was a very interesting day, with lots of things to learn and with lots of opportunities to discuss the presented topics. I will not report on all the talks I visited in detail, but will only briefly summarize them. But before that, I will point some of the points that I (for myself) found most interesting.

The gist: Think about your data

A main concern of many speakers was to make developers think about their applications (again): What do these demand, and what are the best fitting solutions and architectures! Not ‘one-size-fits-all‘ Stonebraker 2005, and ‘there is no free lunch with distributed data‘ HP 2005! To efficiently use the full potential the new stores and technologies offer, a basic understanding of the database developments of the last about 50 years is helpful. Starting from hierarchical and network CODASYL systems, Codds abstraction to tuples was a big step forward Codd 1970. Applications were freed from caring how data is represented on the storage system, the processes of normalization avoids redundancy and anomalies, and the ACID properties were desired in most business applications. Further, the mathematically precise definition of the tuple calculus made the theory formally accessible by set theory.

But the world is in constant change: High availability and horizontal scaling are crucial points in todays business ( web) applications, and consistency might sometimes be weakened in these environments, so BASE can be an appropriate alternative consistency model to the ‘good old’ ACID. With BASE come the phenomena the CAP theorem states Brewer 2002, Gilbert 2002. Some speakers warned: Keep in mind that relational database systems can be a perfectly sound choice! To design suitable applications and to make use of the powerful possibilities of ‘polyglot persistence’ Fowler 2010 and the ‘Lambda architecture for big data’ Marz 2012, the needs of the application have to be well understood, otherwise a sound choice from the high number of database stores is impossible.

Complexity isolation, i. e. partitioning the demands in small parts and address these, can be a good way to achieve good software. Schemas are often desirable, as they introduce structure in your data based on explicit knowledge that can be exploited to maintain a systems’ integrity.

The talks

The conferences great keynote was given by N. Marz. He introduced the ‘Doofus Programmer’ (that we all are) and discussed the ‘insanity’ of the complexity of the database world. His proposal: To go for ‘Human-Fault-Tolerant’ systems by using immutable (but versioned) databases, schemas, precomputation, complexity isolation and the Lambda architecture.

C. Gormley introduced Elasticsearch and its possibilities in detail, showing many examples that demonstrated powerful indexing options. Elasticsearch is a distributed document store, capable to achieve near-real-time data analysis.

The talk by M. Hausenblas was about the Internet of Things and how to use NoSQL technologies to harness it. The amount of data collected on a daily basis is huge. Pro-active (automotive) services, optimization of (logistic) processes, patient monitoring and smart houses and cities are just a few examples. The key technologies to cope with the data can be polyglot persistence and the Lambda architecture. Use different stores for different needs!

D. Turnbull (a historian and computer scientist) presented a nice historical review on database evolution, starting from hierarchical and network systems that ‘evolved’ to NoSQL. He discussed the needs that led to the changes in data modeling.

U. Friedrichsen nicely illustrated the pitfalls of modeling in a BASE world, not with the aim to scare developers, but to enforce them to think about data and consistency properties. Different consistency modes need different approaches, and again: The best fitting store matching the applications demands has to be found! He presented code examples that demonstrated how to deal with some of the BASE ‘phenomena’, and how to achieve consistency properties like ‘read-your-own-writes’.

D. Mytton, founder of Server density, showed an example of how a replicated, fast out-of-the-box fault tolerant database store can be build up with MongoDB. The solution is used for time series analysis and able to serve up to 3333 writes/s with fast response time.

J. Reijn showed how real-time visitor analysis can be achieved by combining Elasticsearch with Couchbase as part of Hippo CMS. He also gave a nice example for choosing stores according to needs: As part of the system Apache Jackrabbit is used, which is a hierarchical (key/value) store.

A scale-in approach to databases was presented by N. Björkman. As hardware and especially RAM are relatively cheap available these days, memory centric databases are possible. By holding database and applications in the same RAM (and letting them share heap space), performance can be significantly increased, and object mapping can be completely avoided. The system Starcounter offers full ACID compliance, a native .NET API and SQL support.

Some recommended articles

Stonebraker 2005 Stonebraker, Çetintemel; “One Size Fits All”: An Idea Whose Time Has Come and Gone, Proceedings of the 21st International Conference on Data Engineering ICDE ’05, IEEE Computer Society, 2005.
HP 2005 HP; There is no free lunch with distributed data white paper, 2005.
Codd 1970 Codd; Relational Model of Data for Large Shared Data Banks, Communications of the ACM 13 : 6, 1970.
Brewer 2000 Brewer; Towards Robust Distributed Systems, Keynote at PODC, 2000.
Gilbert 2002 Gilbert, Lynch; Brewer’s Conjecture and the Feasability of Consitent, Available and Partition-Tolerant Web Services, ACM SIGACT News 23:2, 2002.
Fowler 2010 Fowler; http://martinfowler.com/bliki/PolyglotPersistence.html.
Marz 2012 Marz; Big Data – Principles and best practices of scalable realtime data systems, Early access edition, http://manning.com/marz.

Christian Mennerich