First day of GOTO Aarhus 2012 is over and I’m pretty happy overall. I did change up which talks I decided to watch compared to what I originally had planned. This is my summary of and experience from each of the talks I saw.
First of all I want to quickly mention the Keynote by Rick Falkvinge from the Swedish Pirate Party, it was a great talk with some great points though I think it’s fair to accuse Rick Falkvinge of simplifying the problems a bit – especially when he mentioned how you could get infinate space for you email for free – it’s not really free when it’s in exchange for ads and privacy.
The challenges of connected data
Jim Webber started his talk by giving a short history of databases and how it had all started with Edgar F. Codds paper entitled “A Relational Model of Data for Large Shared Data Banks” where not much changed and everybody though of data as squares with relations between them for some twenty years. It wasn’t until Tim Bernard Lee, being the lonely physicist he was, invented the web to move fleshy colored data around, the world began to think of data as interconnected webs of verticies.
After then having introduces three of the four main types of NoSQL datastores i.e. key/value (Riak), column-stores (Cassandra) and document (MongoDB) united under Martin Fowlers Aggregate Oriented Databases he presented Neo technologies graph-database Neo4j and gave a short introduction to the Cypher query language using the Dr. Who univers as an outset. For more information about cypher see the documentation.
Lastly he described some of the nice use-cases of a graph-database mainly for looking for patterns in datasets for e.g. doing retail analytics or real-time upselling and mentioned some of the customers they had like Adobe Creative Cloud and Cisco Network Management and sales.
Overall it was a very nice and very fun talk in large part due to Jim Webbers … communication style. I would however have prefered if less time had been used on database history and more on the query language and how Neo4j handles horizontal scaleability.
My Agile journey: XP, Scrum, Lean, Kanban and back again
My collegue Mogens and I arrived a few minutes late to this talk and had to sit on the flor which ment that I unfortunately couldn’t take many notes.
Jesper Boeg described how his attitude toward agile and the different methodologies have changed over time changing from very enthusiastic about scrum to very enthusiastic about lean and kanban to embracing both. He also touched on some of the problems different teams face when trying to be agile. It was very enlightning for me since I don’t have that much experience with doing either properly other than from studying and doing it in small projects at the university.
All in all Jesper Boeg concluded that whether you choose one or the other depends on the organization in which you try to introduce it. Scrum is best suited for teams and organizations which are ready for change and won’t work against it where Kanban is less invasive and can be easier to implement if the organization is not ready for a revolutionary change.
Big Data OLTP with Apache Cassandra
I have no doubt that Matt Dennis is a very skilled Cassandra engineer who understand the underlying data model to perfection and who know how to tune all of the knots and bolts in Cassandra, align the data just right, but I’m afraid his talk about Cassandra were not very good for a Cassandra-newbie and I feel he failed to explain the basics of Cassandra which left some in the audiance, myself and two colleagues included, somewhat unsatisfied and confused.
While the results he were able to display in term of adoption-rate, scalability and availability characteristics where very impressive I think atleast mentioning the basics of the data-model would have been appropriate considering the audience.
Having said that I think Cassandra looks to be a great proposition if you have the needs that it serves – it’s a shame it didn’t get a better introduction at GOTO Aarhus 2012. For additional talks about Cassandra see the homepage of Cassandra Summit 2012.
Scaling for Humongous amounts of data with MongoDB
I really liked the talk that Alvin Richards gave on MongoDB and I can’t help but compare it to Matt Dennis’ talk. One of the things Alvin Richards mentioned was that one of the primary reasons why many developers choose NoSQL databases often were to get more flexibility for rapid development compared to the world of RDBMS - having experienced the dread of constant schema-changes in a SQL-database I really think he’s right!
After having given a very short introduction to the reasoning behind MongoDB in relation to HW-price-changes, deployment stories and the CAP theorem Alvin Richards quickly described the three main factors that MongoDB solves i.e. Agility, Flexibility and Cost, three very important but in some of the other NoSQL talks somewhat overlooked factors.
The next part of the talk was based on the premise that we should let our use-cases influence our decisions about how we design our schemas and he presented different ways one could structure the schema in MongoDB to best solve the Twitter use-case. He touched on when you would want to use Partioning or linking versus embedded and buckets. Basicly partitioning (much like an rdbms) is not very good for performance because of the many random reads and seeks it requires in order to load all from, embedded has the problem of large sequential reads and buckets which provides great performance because of small sequential reads.
With a bucket you add an extra dimension to your schema (in this case a temporal one), so a bit like sharding, in the sense that you store e.g. all tweets per user per day in one single document allowing you to quickly load the latest tweets for each user. While this sound wonderful I’m not sure what to think of the influence that this has on the data model if you want to consider other views of the data due to e.g. use-case changes – but maybe that’s just a matter of building up indexes – atleast that’s how Alvin Richards explained it to me when I asked him aftwards.
In the rest of the talk Alvin Richards talked about how to use sharding, replication and mongodbs five different durability-modes to solve the scalability, availability and partition-tolerance problems you might face with MongoDB.
The People vs NOSQL DB’s
While an amusing talk due in part to the bantering I don’t really think it helped much in clearifying when one would choose one NoSQL-strategy over the other – obviously it’s about tradeoffs, I specifically think that Martin Fowler should have mentioned some of the reasons why ThoughWorks had recommended one over the other – something he eluded to, without naming names, in one of his comments.
In the end it came down to a discussion of the CAP theorem and where each of the NoSQL-databases where placed in relation to each of the guarentees. I think it would have been nice if they had also discussed how ease of use and general usability of the database and its APIs affected the market – something which only Chris Anderson from Couchbase mentioned briefly.
In light of the CAP theorem discussion I though it appropriate to include an image by Nathan Hurst from his blogpost Visual Guide to NoSQL Systems. For a discussion of different use-cases for different types of NoSQL databases see 35+ Use Cases For Choosing Your Next NoSQL Database.
All in all a great day at GOTO Aarhus 2012!!