PyCON 2024 in Pittsburgh experience
I am a regular visitior of PyCON US conference since 2001 and I visited most
Unsorted remarks on RabbitMQ and MongoDB plus some benchmarks with mass data
MongoDB is one representant of the so called NoSQL databases - RabbitMQ is a fast open-source message-queue implementation.
Today I made some benchmarks in order to get some rough ideas about the speed.
The task: import 18.000 SGML documents into MongoDB (direct import and with a message queue in between). The 18.000 SGML documents made about 180 MB of data to be imported. The export time of the SGML documents from our Zope-based CMS was about 30 minutes (which I consider being really slow).
The native importer script using the pymongo Python bindings took about 4-5 seconds(!) - which is about 4000-5000 documents per second or 40-50 MB of data per second. This is actually very fast and impressive.
In this scenario I setup a message queue within RabbitMQ and inserted all SGML into the queue using a small Python producer script (using the carrot Python bindings for RabbitMQ). At the other end of the queue I had a consumer listening to the queue picking up the data from the queue and inserting it into MongoDB. This solution turned out being (much) slower. Insertion of the 18.000 documents took about 20 seconds. The consumer needed the same time plus 4-5 seconds native insertion time in MongoDB (as with the native import approach)...so overall 4-5 times slower and a throughput of roughly 1000 documents each second or 10 MB per second.
Intel Core 2 DUO (2.66 GHz), MongoDB 1.4 + RabbitMQ 1.7.2 running with out-of-the-box configuration
Hard to tell...MongoDB with pymongo Python bindings is very fast (compare this to the export time of our CMS - although not directly comparable (different hardware, different software). RabbitMQ seems to be a pretty cool approach for coupling different components of an application. One nice thing about the carrot Python bindings for RabbitMQ: you can stuff almost anything into the queue: standard Python types, object instances (using the dedicated pickle serializer or large data). Queues in RabbitMQ can be durable surviving a shutdown of the RabbitMQ server. RabbitMQ supports clustering and various routing options for messages inside the queues....a pretty cool piece of software...and the company behind RabbitMQ (it's open-source) was bought by a division of VMware yesterday...lots of interesting cool stuff coming a long our way right now.