In scripted languages such as PHP, SphinxQL should be generally faster than SphinxAPI just because it uses compiled MySQL client implementation instead of interpreted SphinxAPI client. The question is, specifically how much faster? So I’ve just benchmarked that.
The version used was current trunk (r2286). 1000 queries were run against a 1,000,000-document test collection. Both queries and data were taken from a production blog search site. Two runs were made and first run was discarded so that everything would be cached evenly. Both Sphinx and PHP benchmark script were run on the same machine.
Most of the work that API does is all about unpacking matches. So I used a number of different LIMIT values, ranging from 10 to 1000 matches.
To make things even, SphinxAPI path used persistent connections and extended2 matching mode, and SphinxQL path issued not just SELECT … MATCH statement but also SHOW META statement after each query (and, of course, pulled all rows using mysql_fetch_row()).
On my desktop Windows machine (C2D E8500 @ 3.16 Ghz), the results are as follows (total time is graphed, so lower is better).
SphinxQL expectedly wins with 1% to 7% difference when fetching 100 or less matches, and as much as 1.4x difference at fetching up to 1000 matches.
Note that was a production workload at which many queries returned less matches than the limit, frequently no matches at all! A synthetic test repeating some query that returned 1000 matches with a LIMIT of 1000 resulted in SphinxQL being 3.6x faster.
x64 Linux box (C2D E6420 @ 2.13 Ghz) performed differently, SphinxQL won noticeably even at low LIMIT values:
That’s 1.25x difference at a limit of 10 matches and 1.61x difference at a limit of 1000. Synthetic test that always pulled exactly 1000 matches resulted in 4.7x difference.
To summarize, with PHP on Linux (the most common scenario), SphinxQL always outperforms SphinxAPI, from 1.25x on average with small result sets to as much as 4.7x when fetching 1000 rows. It can be expected that APIs in other scripted languages (Perl, Python, etc) will perform more or less the same. Compiled APIs (C, Java, PECL) are however subject to a separate benchmark because in that case CPU overheads caused by the API should be (somewhat) smaller while network traffic overheads (that should in turn be smaller with SphinxAPI) can come into play.
|« April 17, 2010. Sphinx slides from MySQL UC and RIT++ 2010||May 5, 2010. Presentation from Zagreb and upcoming workshop in Moscow »|