When choosing a relational database for a new project the choice usually depends on several factors, for instance: whether it is a proprietary product or not, availability of extra features, among others.
In the case of NoSQL databases, this selection is much more complex, since it depends not only on the factors above but also in what type of database is better suited for our application. This is because there are four types of NoSQL databases nowadays: key-value, column-based, document-based and graph-based, each with its peculiarities, pros and cons, which will be detailed in this article.
They’re the simplest type. They keep a “tuple” containing a key and its value. A key-value is a simple hash table that is used when all access to the database are made by a primary key. They can only perform three main operations: obtaining a value by its key, saving it and deleting it.
It’s clear that this value, from the point of view of the database, is an opaque object, meaning that the database is unaware of the contents of the value; it is the responsibility of the application to interpret said value. However, not all databases are so value agnostic. For example, Redis, a key-value database type, does come up with certain datatypes that in a way interpret the values stored.
These are some examples of key-value databases: Riak, Redis, Memcached, Hamster DB, Amazon Dynamo.
The main advantage of this type of NoSQL databases is that its simplicity allows for a better performance and scalability.
Some use cases of key-value stores are:
- Session storage: all session information can be stored in a single object (the value) and the key may be the SessionId.
- User profiles / preferences.
- User information associated to a shopping cart.
Conversely, we should not use them in the following situations:
- When you have to apply relationships between data
- When it comes to multi-operation transactions
- When you want to check for the values stored in the value.
Column-based databases store data in groups of columns that have rows that have many columns associated with a key. The columns are groups of related data that can be accessed together. Each column group can be compared to a table in a relational database where the key identifies the row and the last one consists on multiple columns. It’s noteworthy that the different rows don’t need to have the same columns and those can be added to any row without having to do it to others.
We can interpret them as a special case of a key-value database, where the value is no longer an opaque object to start being visible by the database.
Examples of column-based databases are Cassandra, Hypertable and Hbase.
Examples of these databases being used:
- Data analytics in real time
- Large workload on real-time
- Online gaming (e.g.: sending messages in real time)
- Write-intensive systems
- Management of time series data
These databases store documents on the part of “key-value”. A document is a data structure represented by a hierarchical tree that it consist of maps, collections and scalar values, which may be stored in XML, JSON or BSON formats, among others.
Stored documents must be similar between them but they don’t need to follow the exact same structure. It is important to note that the value is no longer opaque to the database, it’s examinable and in many databases you can search for properties within it.
Among the best known engines that can be mentioned are: MongoDB, CouchDB, RavenDB, Lotus Notes.
Some examples are:
- Central repository of logging events
- CMS, blogging (for lack of predefined structure)
- Web analytics, Real Time analytics (to store number of visits to a site)
- E- Commerce (also for its ability to save flexible schemes)
Situations in which it isn’t advisable to use can be:
- Complex transactions, being no longer supported except in RavenDB
- Document consultation of variable structure
They are based on graph theory and use nodes and arcs to represent stored data. They are very useful to store information in models with many relationships. To do this they store entities, called nodes, and relationships, called arcs, on those entities. Both nodes and arcs can have properties. The arches are also directed and allowed to find patterns between nodes. The graph’s organization allows the data to be stored once and then interpreted in different ways based on their relations through various queries. By definition, a graph database is a storage system that provides adjacency without an index: each element has a pointer to its adjacent element, so that searches using indexes are not necessary.
Among the most popular databases are Neo4jJ, Infinite Graph, OrientDB and FlockDB. They can process information as a graph, just store it this way, or both, that would be ideal:
Among the uses that can be given to graph-oriented databases are:
- Interconnected data, such as social networking, where it represents the people and relationships between them.
- Routing services or office where the node could be the location and the relationship could contain the distance to another.
- Engine recommendations: based on the graph generated, it can generate suggestions as “could also be interested in product X” or “could also meet person X”
- Systems with recursive searches with many levels
You shouldn’t use it in the following cases:
- Systems requiring massive upgrades over most of the entities or properties.
- Systems requiring a high data distribution. Referring to the article “Big Data and the CAP theorem” is shown that the graph-oriented databases fall into the CD space, so it isn’t possible to partition the graph.