I want to evaluate the performance of CMS Data Aggregation System (DAS) using MongoDB back-end. I'll give a brief introduction of DAS workflow and present preliminary benchmark results.
Introduction
The DAS aggregates meta-data from several CMS data-services. The workflow is the following
The data has been requested by a user via DAS Query Language (QL). DAS-QL naturally mapped into MongoDB QL. The DAS workflow look-up results into DAS merge collection. If results are not found, it checks DAS raw API cache collection and aggregates data on demand, storing results into merge collection, for more details see [Ref]. The DAS consists of RESTful interface for DAS cache and web servers written in python within CherryPy web framework.
Testbed and tools
To do the benchmarking tests, I setup DAS cache server on 64-bit Linux node with 8 cores (2.33GHz) and 16 GB of RAM. Both MongoDB and DAS cache server resides on the same node. The test client was written in python and used multiprocessing module to spawn requests to DAS server. The reason I used custom python tool instead of available tools, e.g. apache benchmark, is to have the ability to generate random queries for N parallel clients talking to a single DAS server REST API.
Test data
For all tests I used ~50K datasets from CMS DBS system and ~450K block records from both DBS and Phedex CMS systems. All of them were populated into DAS cache up-front, since I was only interested in read tests (DAS have an ability to populate the cache).
The tests consist of three different types of queries:
- all clients use fixed value for DAS queries, e.g. dataset=/a/b/c or block=/a/b/c#123
- all clients use fixed pattern for DAS queries, e.g. dataset=/a* or block=/a*
- all clients use random patterns, e.g. dataset=/a*, dataset=/Z* or block=/a*, block=/Z*
Once the query has been placed into DAS cache server I retrieve only first record out of the result set and ship it back to the client. The respond time is measured as the total time DAS server spends for a particular query.
Benchmark results
First, I tested our CherryPy server and it can sustain a load of 500 parallel clients at the level of 10^-5 sec. Then I populated MongoDB with 50k dataset and 500k block records from DBS and Phedex CMS systems. I performed the read test of MongoDB and DAS using 1-500 parallel clients. During these tests I found several issues with my code, such as missing index, wrong usage of regular expressions and repetition of DAS merging step for non-existing queries. Once those issues were fixed I repeat the tests and found satisfactory results. So, to spice things up, I populated MongoDB with 10x and 100x of statistics and repeated the tests. The plot showing below represents comparison of DAS (red lines) versus MongoDB (blue lines) read tests for 50k (circles), 500k (down triangles), 5M (up triangles) and 50M (stars):
I found these results very satisfactory. As was expected MongoDB can easily sustain such load at the level of few mili-seconds. The DAS numbers also seems reasonable since DAS workflow is much more complicated. It includes DAS parsing, query analysis, analytics, etc.
The most important, the DAS performance seems to be driven by MongoDB back-end and has constant scale factor which can be tuned later.
Next I performed three tests discussed above, see criteria #1-3, with 50M blocks. Here is the plot
The curve with circles points represents test #1, i.e. fixed key-value, while top/down triangles represents pattern value and random pattern value, tests #2 and #3, respectively. As can be seen pattern tests are differ by the order of magnitude from fixed key-value, but almost identical among each other.
Finally, I tested DAS/MongoDB with random queries and random data access, by asking to return me a single record from entire collection (not only the first one as shown above). For that purpose I generated a random index and used idx/limit for MongoDB queries. Here is the results
The blue line shows MongoDB performance, while red shows the DAS. This time the difference between DAS and MongoDB is only one order of magnitude differ with respect to first shown plot and driven by DAS workflow.
For time being I would conclude this blog and continue testing DAS code.
References
No comments:
Post a Comment