Start Using MongoDB :: Oracle Alchemist

[This entry is part 1 of 4 in the series Database Diversity]

Guten Tag, Namaste, Hola, Zdravstvuite! While you should always work to master your first language, a simple understanding of other cultures and their languages is always beneficial. In these days of social/career networks and rising global connectedness, being open to diversity is an absolute must.

This is true in the IT world as well. As I mentioned in my article “DBA, Grow Thyself – Moving and Shaking in the Era of Data Dominance” new technologies will happen and the last thing you want is to be seen as either ignorant or obstinate to its potential. While I never advocate senseless database migrations or platform changes, with the incredible diversity available in the data world today learning to at least say “hello” in a variety of environments can only be beneficial.

So for the near future, I will be doing a “Database Diversity” series (along with other posts) where we perform basic operations in a variety of database environments. The basic operations (if applicable) will be: install, startup, create a table, insert rows, create an index, and run a query. Again I stress “if applicable”, as these operations won’t always exist in every type of database. For today, our topic is…

MongoDB

MongoDB is a well-known and popular NoSQL option created by 10gen. It is an open source solution designed to be highly scalable and available with robust options for data replication, processing via built in query operations and custom MapReduce jobs, and document storage. Data in Mongo is document oriented, meaning there are no tables per se but collections of documents, each represented in JSON notation. This makes it an extremely powerful tool for storing dynamic data and working directly with most application development frameworks. It is extensible as well; frameworks like Meteor combine the powerful features of node.js with MongoDB as the default storage engine, and there are even tools out there to convert SQL queries to work with it.

But like it or not, MongoDB is not an RDBMS. In today’s Database Diversity, we will download and install Mongo, create a collection, insert some data, create an index, and query our collection.

Installing MongoDB

MongoDB can be installed on most Linux repos using their respective software installation tools. On RHEL/CentOS/OEL, you can add the 10gen repositories to yum and install it like any other RPM. For Debian/Ubuntu, you can add the apt repo and install via apt-get.

Personally I prefer to download the binaries for my platform and install in the location of my choosing. If you have root privileges, you can choose to put the binaries into a central location like /usr/local/bin. If not, you can download and run mongod (the MongoDB daemon) from any location as any user for very fast deployment and flexibility. The overall download page can be found here, but on a Linux x86_64 platform you don’t even have to go that far. Here I am downloading the latest x86_64 MongoDB and extracting it:

steve@UbuntuVM:~/mongo$ pwd
/home/steve/mongo
steve@UbuntuVM:~/mongo$ wget http://downloads.mongodb.org/linux/mongodb-linux-x86_64-latest.tgz
--2013-02-25 08:58:56--  http://downloads.mongodb.org/linux/mongodb-linux-x86_64-latest.tgz
Resolving downloads.mongodb.org (downloads.mongodb.org)... 72.21.215.171
Connecting to downloads.mongodb.org (downloads.mongodb.org)|72.21.215.171|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94187936 (90M) [application/x-tar]
Saving to: `mongodb-linux-x86_64-latest.tgz'

100%[======================================================================================================================================>] 94,187,936  7.12M/s   in 13s     

2013-02-25 08:59:10 (6.76 MB/s) - `mongodb-linux-x86_64-latest.tgz' saved [94187936/94187936]

steve@UbuntuVM:~/mongo$ tar -zxvf mongodb-linux-x86_64-latest.tgz 
mongodb-linux-x86_64-2013-02-24/README
mongodb-linux-x86_64-2013-02-24/THIRD-PARTY-NOTICES
mongodb-linux-x86_64-2013-02-24/GNU-AGPL-3.0
mongodb-linux-x86_64-2013-02-24/bin/mongodump
mongodb-linux-x86_64-2013-02-24/bin/mongorestore
mongodb-linux-x86_64-2013-02-24/bin/mongoexport
mongodb-linux-x86_64-2013-02-24/bin/mongoimport
mongodb-linux-x86_64-2013-02-24/bin/mongostat
mongodb-linux-x86_64-2013-02-24/bin/mongotop
mongodb-linux-x86_64-2013-02-24/bin/mongooplog
mongodb-linux-x86_64-2013-02-24/bin/mongofiles
mongodb-linux-x86_64-2013-02-24/bin/bsondump
mongodb-linux-x86_64-2013-02-24/bin/mongoperf
mongodb-linux-x86_64-2013-02-24/bin/mongosniff
mongodb-linux-x86_64-2013-02-24/bin/mongod
mongodb-linux-x86_64-2013-02-24/bin/mongos
mongodb-linux-x86_64-2013-02-24/bin/mongo
steve@UbuntuVM:~/mongo$ mv mongodb-linux-x86_64-2013-02-24/ mongo/

Normally, another step in a default installation is to create the /data/db directory, which is the default location for MongoDB file storage. However, as we are running this in a non-root setup for this tutorial, we will change the default location. To start the MongoDB daemon, the “mongod” command is used:

steve@UbuntuVM:~/mongo$ pwd
/home/steve/mongo
steve@UbuntuVM:~/mongo$ mkdir -p data logs
steve@UbuntuVM:~/mongo$ mongo/bin/mongod --fork --dbpath /home/steve/mongo/data --logpath /home/steve/mongo/logs/mongoDB.log
about to fork child process, waiting until server is ready for connections.
forked process: 16471
all output going to: /home/steve/mongo/logs/mongoDB.log
child process started successfully, parent exiting

The --dbpath option allows a custom data directory to be used (in our case, /home/steve/mongo/data). The --fork option tells MongoDB to run as a background daemon. If this command is not used, MongoDB will run in the foreground and a separate window must be used for other operations. However, if you do choose to use the --fork option, you must either supply the --syslog option (log MongoDB operations to syslog) or --logpath with your own custom logfile path and name. The --port command is also an option, but if unset the default connection port is 27017.

Now that we have installed MongoDB, we will attempt a connection via the CLI:

steve@UbuntuVM:~/mongo$ mongo/bin/mongo
MongoDB shell version: 2.4.0-rc1-pre-
connecting to: test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
	http://docs.mongodb.org/
Questions? Try the support group
	http://groups.google.com/group/mongodb-user
>

If we want to see which database we are currently connected to, we can issue the “db” command. You can also switch databases with the “use” command:

> db
test
> use mydb
switched to db mydb

Note that the “mydb” database was never created and even though I’ve switched to it, doesn’t actually exist. You can switch to any database but until you create a collection inside it, it doesn’t actually exist except on your screen. However, if you want to see existing databases, you can use the “show dbs” command.

Creating a Collection and Inserting Data

I put these two topics in the same area because in MongoDB they are actually one and the same. There are no pre-defined tables in MongoDB but collections, a loose gathering of documents that are meant to be logically related in some way. For the sake of learning, you can consider a collection like a table, a document like a row, and a key:value pair like a column. But given the dynamic nature of MongoDB it can almost be dangerous to do so. For instance, in MongoDB a collection can have documents which share no keys in common (except the unique identifier _id key). You are not constrained by specifics and can freely dump nearly any data of any type into any collection. As you can imagine, this can be both good and bad depending on your development process, framework, and standards.

Here is an example MongoDB document:

{
  name: 'Steve Karam'
  database: 'MongoDB'
  nickname: 'Mongo Alchemist'
  really: false
}

But then here is an example of another document that can be part of the same collection:

{
  name: { first: 'Steve', last: 'Karam' },
  contact: [
    { type: 'blog', val: 'https://www.oraclealchemist.com' },
    { type: 'twitter', val: '@OracleAlchemist' } ]
}

This loose collection capability is both extremely intuitive and at the same time confusing for most relational oriented developers and DBAs. However, it grants us powerful capabilities for data storage/retrieval and complex aggregations.

So let’s create a simple collection with three rows and get the data from it (like a “select *”):

> db.employees.insert( { name: 'Bob', title: 'DBA', salary: 80000 } );
> db.employees.insert( { name: 'Jane', title: 'Developer', salary: 70000 } );
> db.employees.insert( { name: 'Kim', title: 'Manager', salary: 130000 } );
> db.employees.find();
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "title" : "Manager", "salary" : 130000 }

For these three simple rows I decided to let MongoDB define the unique _id field, and also kept the elements the same (name, title, salary). However, adding a new element is as simple as updating a document:

> db.employees.update( { name: 'Kim' }, { $set: { office: '37L' } } );
> db.employees.find( { name: 'Kim' } );
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }

By searching for name = ‘Kim’ and using the $set operator to set a new element called ‘office’, the document has been updated and can be seen in the query. It should also be noted that while this mimics what we might do to update a row in a relational database, MongoDB is incredibly more powerful when it comes to data manipulation in-document. Basic operations like insert, update, delete, upsert, etc. are supported along with many more.

Remember how I said the ‘mydb’ database only existed on screen until we actually did something with it? By performing that first insert (Bob), we not only created the mydb database but the employees collection and its first row all in one step.

Indexing

Indexes are just as important in MongoDB as they are in any relational database. The rules for proper indexing performance are nearly identical as well. Columns used for searching or sorting are good candidates for index, with indexes sortable in ascending and descending order. Of course, MongoDB doesn’t stop there. In addition to standard b-tree and b-tree compound indexes it also allows 2d spatial indexes (for geohashed location queries), indexes on sub-documents, TTL indexes for document expiration pruning, and more.

Creating an index is as simple as using the ensureIndex command:

> db.employees.ensureIndex( { 'salary': 1 } );

In this example, an index was created on the “salary” element of the employees collection in ascending order. A value of -1 would have been descending order.

Note that because of the loose style in MongoDB, you can create an index on anything, even elements that don’t exist:

> db.employees.ensureIndex( { 'notexists': 1 } );

With the getIndexes() command we can see what indexes exist on a collection:

> db.employees.getIndexes();
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"ns" : "mydb.employees",
		"name" : "_id_"
	},
	{
		"v" : 1,
		"key" : {
			"salary" : 1
		},
		"ns" : "mydb.employees",
		"name" : "salary_1"
	},
	{
		"v" : 1,
		"key" : {
			"notexists" : 1
		},
		"ns" : "mydb.employees",
		"name" : "notexists_1"
	}
]

Note that there are three: the default unique index on _id (required), an index on salary, and an index on notexists…even though it doesn’t exist at the moment as an element in any document. Since I don’t really need that index, I’ll drop it:

> db.employees.dropIndex( 'notexists_1' );
{ "nIndexesWas" : 3, "ok" : 1 }

Querying a Collection

This is a highly complex topic and we are not going to be able to cover every possible query that MongoDB can do…mostly because it can do nearly anything. Thanks to the group command (GROUP BY on steroids) for relatively simple aggregations with under 20,000 unique groupings and MapReduce (GROUP BY times infinity + 1) which can aggregate data in pretty much any method that can be coded, there is no limit to how you can query MongoDB. But for relatively simple queries, we use the find command:

> db.employees.find();
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }

A find command with no arguments is simply like a “select *” on a table. It will return all documents with all elements. However, you can query for specific elements easily:

> db.employees.find( { office: '37L' } );
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }
> db.employees.find( { name: 'Jane' } );
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }
> db.employees.find( { title: /d.*/i } );
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7674dfea6825c38b0b56"), "name" : "Jane", "title" : "Developer", "salary" : 70000 }

Oh, did I mention you can use regular expressions? The incorporation of perl regular expressions makes the MongoDB engine extremely powerful. You can, of course, also use comparison operators:

> db.employees.find( { salary: { $gt: 75000 } } );
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }
{ "_id" : ObjectId("512b7677dfea6825c38b0b57"), "name" : "Kim", "office" : "37L", "salary" : 130000, "title" : "Manager" }
> db.employees.find( { salary: { $gt: 75000, $lt: 85000 } } );
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }

Again, the list of capabilities with querying in MongoDB goes on and on. Rather than dive into each of them, I do want to make sure our previous information on creating indexes actually did something. Let’s run the salary queries again, this time using explain plans. You can do this by running the explain() method after your find() query:

> db.employees.find( { salary: { $gt: 75000 } } ).explain();
{
	"cursor" : "BtreeCursor salary_1",
	"nscanned" : 2,
	"nscannedObjects" : 2,
	"n" : 2,
	"millis" : 0,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"isMultiKey" : false,
	"indexOnly" : false,
	"indexBounds" : {
		"salary" : [
			[
				75000,
				1.7976931348623157e+308
			]
		]
	}
}

Notice that the query used the salary_1 b-tree index, and shows the manner in which it was used with “indexBounds”. If I had used a non-indexed query, you would have seen:

> db.exmployees.find( { name: 'Kim' } ).explain();
{
	"cursor" : "BasicCursor",
	"nscanned" : 0,
	"nscannedObjects" : 0,
	"n" : 0,
	"millis" : 0,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"isMultiKey" : false,
	"indexOnly" : false,
	"indexBounds" : {
		
	}
}

Notice no indexBounds are defined, and the cursor type is “BasicCursor”. This is more or less a full table scan. If you think it should have used an index, you can supply a hint where you pass in the index name:

> db.employees.find( { salary: { $gt: 75000, $lt: 85000 } } ).hint('salary_1');
{ "_id" : ObjectId("512b7670dfea6825c38b0b55"), "name" : "Bob", "title" : "DBA", "salary" : 80000 }

Conclusion

As I’ve mentioned many times, there is a lot more to MongoDB than these simple operations. However, even knowing the basics can be invaluable in the event that you or your business decides it is the right tool for a job at some point. The things you saw here just scratch the surface of what MongoDB actually does once you incorporate sharding, GridFS, MapReduce, etc. As you learn and grow with MongoDB, keep the following points in mind:

Don’t be afraid to try things out with MongoDB. It is very easy to get up and running, create multiple collections, databases, etc.
MongoDB can be very unforgiving. Dropping all data in a collection is as simple as db.employees.remove(), and once it is gone it is gone.
The documentation for MongoDB is incredible. You can find it all on their site.
If you’re a book learner, MongoDB: The Definitive Guide is an invaluable resource and is very easy to follow.
MongoDB is not SQL, but there are some easy comparisons that may help you come to grips with it.
Want to try MongoDB without ever even installing it? Here’s a mini Mongo implementation at MongoDB’s website.

See you soon for the next Database Diversity article!

3 comments

Manish says:

February 25, 2013 at 12:35 pm

Hello Steve,

Great article and fully agree that in this era of evolving technologies, it’s important to keep learning new stuff.

What I have been struggling with is finding or understanding use cases for these NoSQL databases. Sure, Facebook/Google will find uses for it…how/where does it help say a Manufacturing company’s Sales dept? If you find info. on use-cases for NoSQL/big-data for “traditional” IT shops, that would be a great blog post.

Thanks,

Manish
The Oracle Alchemist says:

February 25, 2013 at 1:06 pm

Great question Manish. In your Manufacturing company’s sales department you might put your general inventory and sales tables in a traditional relational database; the relationships are clearly defined and conform to standard normalization practices. One customer has many purchases. One manufacturer has many products. Easy.

But when someone is building their shopping cart, why use a high concurrency environment with cumbersome SQL queries when you can look up products directly from the application framework with a simple Riak key:value pair? Invoices can be stored in MongoDB as whole documents for fast lookup and aggregation based on any elements later. Complex graphing and visualization can be done with a database like Neo4J or other tools.

For “Big Data” let’s take it a step further. What if we want to record gas mileage, truck types, maintenance, shipment methods, lost shipments, time delayed in customs, inventory lost due to weather or corrosion, the temperature and weather conditions to cross-match with such losses, road conditions, traffic, etc? Doing so could be highly beneficial to improve your supply chain and calculate routes for improved delivery and less lost inventory. To store that kind of data from many different sources and sensors, a relational database would choke in no time. But a distributed filesystem content store like Hadoop with complex MapReduce jobs to process it over time could handle it. I talked a little about the whole “collect EVERYTHING” mentality here: http://www.oraclealchemist.com/news/just-how-big-is-your-data/

In the end, you have to use the right tools for your company. If all you want is a system to process orders then an OLTP relational database will work for you. If you want to store large quantities of differing documents, MongoDB might be better.

I agree that a blog post might be a good idea here! 🙂

This site uses Akismet to reduce spam. Learn how your comment data is processed.

MongoDB

Leave a Reply