April 14, 2010

Looking beyond one's own nose - looking at RabbitMQ and MongoDB

Unsorted remarks on RabbitMQ and MongoDB plus some benchmarks with mass data

For the last couple of weeks I had a closer look at RabbitMQ and MongoDB for some upcoming project.

MongoDB is one representant of the so called NoSQL databases - RabbitMQ is a fast open-source message-queue implementation.

Why MongoDB?

  • it is well-documented
  • it is easy installable and provides a good mix of NoSQL-ness with some SQL-ishness

Why RabbitMQ?

  • No idea - I came across RabbitMQ while doing some research and it was love at the first glance.

Today I made some benchmarks in order to get some rough ideas about the speed.

The task: import 18.000 SGML documents into MongoDB (direct import and with a message queue in between). The 18.000 SGML documents made about 180 MB of data to be imported. The export time of the SGML documents from our Zope-based CMS was about 30 minutes (which I consider being really slow).

Native MongoDB import

The native importer script using the pymongo Python bindings took about 4-5 seconds(!) - which is about 4000-5000 documents per second or 40-50 MB of data per second. This is actually very fast and impressive.

RabbitMQ + MongoDB

In this scenario I setup a message queue within RabbitMQ and inserted all SGML into the queue using a small Python producer script (using the carrot Python bindings for RabbitMQ). At the other end of the queue I had a consumer listening to the queue picking up the data from the queue and inserting it into MongoDB. This solution turned out being (much) slower. Insertion of the 18.000 documents took about 20 seconds. The consumer needed the same time plus 4-5 seconds native insertion time in MongoDB (as with the native import approach)...so overall 4-5 times slower and a throughput of roughly 1000 documents each second or 10 MB per second.

Testing environment

Intel Core 2 DUO (2.66 GHz), MongoDB 1.4 + RabbitMQ 1.7.2 running with out-of-the-box configuration

Conclusions

Hard to tell...MongoDB with pymongo Python bindings is very fast (compare this to the export time of our CMS - although not directly comparable (different hardware, different software). RabbitMQ seems to be a pretty cool approach for coupling different components of an application. One nice thing about the carrot Python bindings for RabbitMQ: you can stuff almost anything into the queue: standard Python types, object instances (using the dedicated pickle serializer or large data). Queues in RabbitMQ can be durable surviving a shutdown of the RabbitMQ server. RabbitMQ supports clustering and various routing options for messages inside the queues....a pretty cool piece of software...and the company behind RabbitMQ (it's open-source) was bought by a division of VMware yesterday...lots of interesting cool stuff coming a long our way right now.