Posted about 1 month ago by avalanche123
My last post brought up a lot of questions on the differences between document-oriented and relational databases, possible use cases for each and approaches and gotchas one should remember when dealing with either. I had some thoughts on the subject, but they didn't feel complete, so I decided to do some research. I started out by googling "document -oriented databases vs relational databases", which brought a number of interesting results. After some intense reading and analyzing, I think I have a good enough understanding of the concepts, strengths and weaknesses of different data stores to write and share my findings.
Relational databases were traditionally the most obvious solution for applications that needed to store retrieve/data. With the growth of internet user-base, the number of reads and writes a typical application needed to perform grew rapidly. This led to the need for scaling. Traditional RDBMSs were hard to scale (SQL operation or Transaction spanning multiple nodes doesn't scale well). With solutions like MySQL Cluster and Oracle RAC, this is much less of a problem now, but it wasn't the case for a while, which led to many companies abandoning traditional RDBMSs for "noSQL" data stores.
Relational Databases, Object Databases, Key-Value Stores, Document Stores, and Extensible Record Stores: A Comparison. By Rick Cattell:
The NoSQL data stores can be categorized into three groups, according to their data model and functionality:
- Key-value Store provide a distributed index for object storage, where the objects are typically not even interpreted by the system: they are stored and handed back to the application as BLOBs, as in the popular memcached system I mentioned. However, these systems usually provide object replication for recovery, partitioning of the data over many machines, and rudimentary object persistence. Examples of key-value stores are memcached, Redis, Riak, Scalaris, and Voldemort.
- Document Stores provide more functionality: the system does recognize the structure of the objects stored. Objects (or documents) may have a variable number of named attributes of various types (integers, strings), objects can grouped into collections, and the system provides a simple query mechanism to search collections for objects with particular attribute values. Like the key-value stores, document stores can also partition the data over many machines, replicate data for automatic recovery, and persist the data. Examples of document stores are SimpleDB, CouchDB, MongoDB, and Dynamo.
- Extensible Record Stores, sometimes called wide column stores, provide a data model more like relational tables, but with a dynamic number of attributes, and like document stores, higher scalability and availability made possible by database partitioning and by abandoning database-wide ACID semantics. Examples of extensible records stores are BigTable, HBase, HyperTable, and Cassandra.
So when to use a document-oriented database and when to use a relational database. The former is usually much better performing and easier to scale, while doesn't provide ACID compliance and data integrity that the later has by definition. This means that if we choose to use document-oriented database, we get more performance and scalability, but need to keep in mind, database level data integrity, transactions and locks are no longer there and will need to be embedded in the application logic itself, which will affect how we write and structure our code. In my opinion, document-oriented databases cannot replace relational databases, and vice versa. Instead, they should be used to solve different kinds of problems. At OpenSky we use both MySQL and MongoDB.
Comments (4) [ add comment ]
RDBMS vs NoSQL Posted by Pierre about about 1 month ago.
I mostly like the idea of using NoSQL as a kind of cache layer or view in front of a RDBMS. The RDMS ensures there are transactions, ACID, complex relations and so on, but for the sake of speed, we wrap (pull, push) data used by frontends / clients around a NoSQL storage. This surely adds another layer to develop (e.g. a web service, that sits in between RDBMS/NoSQL to pull/push), but I like the idea.
Use the Right Tool Posted by Jonathon about about 1 month ago.
As always, it comes down to "use the right tool for the right job". I think the difficult comes when projects have developers/managers who only have experience with one or the other. Couple that with the natural tendency to fear the unknown and you get projects either using RDBMS for applications that could benefit from the performance/freedom of NoSQL or projects using NoSQL where relational integrity or ACID compliance should be a requirement.
just because I can't read it anymore Posted by mahono about about 1 month ago.
CouchDB is ACID compliant: http://couchdb.apache.org/docs/overview.html
nosql isnt the holy grail Posted by Lukas about about 1 month ago.
with nosql you will find yourself doing joins in userland, you will need to write scripts to do some basic bulk updates/deletes, quickly doing some statistics will be fairly painful as well. that being said, a lot of stuff will be easier too.
one more thing, whenever you talk about nosql, realize its a broad market out there, for everything i said above you can probably find one nosql db that works just like an rdbms in one these aspects, so nosql != nosql != rdbms, but each have some subset of eachother where they are the same or similar.