In scripted languages such as PHP, SphinxQL should be generally faster than SphinxAPI just because it uses compiled MySQL client implementation instead of interpreted SphinxAPI client. The question is, specifically how much faster? So I’ve just benchmarked that.
The version used was current trunk (r2286). 1000 queries were run against a 1,000,000-document test collection. Both queries and data were taken from a production blog search site. Two runs were made and first run was discarded so that everything would be cached evenly. Both Sphinx and PHP benchmark script were run on the same machine.
Most of the work that API does is all about unpacking matches. So I used a number of different LIMIT values, ranging from 10 to 1000 matches.
To make things even, SphinxAPI path used persistent connections and extended2 matching mode, and SphinxQL path issued not just SELECT … MATCH statement but also SHOW META statement after each query (and, of course, pulled all rows using mysql_fetch_row()).
On my desktop Windows machine (C2D E8500 @ 3.16 Ghz), the results are as follows (total time is graphed, so lower is better).

SphinxQL expectedly wins with 1% to 7% difference when fetching 100 or less matches, and as much as 1.4x difference at fetching up to 1000 matches.
Note that was a production workload at which many queries returned less matches than the limit, frequently no matches at all! A synthetic test repeating some query that returned 1000 matches with a LIMIT of 1000 resulted in SphinxQL being 3.6x faster.
x64 Linux box (C2D E6420 @ 2.13 Ghz) performed differently, SphinxQL won noticeably even at low LIMIT values:

That’s 1.25x difference at a limit of 10 matches and 1.61x difference at a limit of 1000. Synthetic test that always pulled exactly 1000 matches resulted in 4.7x difference.
To summarize, with PHP on Linux (the most common scenario), SphinxQL always outperforms SphinxAPI, from 1.25x on average with small result sets to as much as 4.7x when fetching 1000 rows. It can be expected that APIs in other scripted languages (Perl, Python, etc) will perform more or less the same. Compiled APIs (C, Java, PECL) are however subject to a separate benchmark because in that case CPU overheads caused by the API should be (somewhat) smaller while network traffic overheads (that should in turn be smaller with SphinxAPI) can come into play.
| « April 17, 2010. Sphinx slides from MySQL UC and RIT++ 2010 | May 5, 2010. Presentation from Zagreb and upcoming workshop in Moscow » |


Thanks, Andrew. I was just thinking about doing a benchmark like this recently myself.
Andrew,
Did you try to compare C based PHP driver which should do all heavy lifting in C to SphinxQL and native driver ?
Peter, not yet. I expect it to perform somewhere in between these two.
So am I correct to wonder now about the need to profile:
1. The API implementation which supports multiqueries, and which could be a major effect for data you need grouped && ungrouped
vs.
2. The SphinxQL implementation which doesn’t, but moves the data much faster
If I’m correct, most of the time, 1 will outweigh 2 on anything frontend-related where it’s only a small subset of the data presented, but filters/statistics/counts that can be optimized into an MQ.
Profiling works better than handwaving, though. Fun times ahead.
Interesting.
Jaimie, that is currently (!) correct. However we’re planning to add all the yet-missing things to SphinxQL, multi-queries included. (Mandatory plug: sponsors are welcome.) So the general future direction is SphinxQL.
I think Sphinx is the first product where MQs provide this optimization that will become more and more relevant with filter-type navigation.
With MySQL you only gain marginally, though it does give you access to SQL_FOUND_ROWS. SQL_FOUND_ROWS sounds like a trivial form of MQ itself, actually. It becomes even more stupid when you have to rerun a query with joins and all.
With SQL you either need a temporary table or a stored procedure. I’m pretty sure neither will benchmark quite as well as your MQs.
I’ll be writing this all up when I blog about faceted search and the technology behind it. If Sphinx had SphinxQL MQs and differing-filter optimizations it would give Endeca a big problem.
SphinxQL cannot be used from some tools (MySQL Workbench, MySql.NET Connector)
Is it an idea to implement SphinxQL into SphinxSE?
Advantages I see are:
- Connections to SphinxQL (can be / are) eliminated by the MySQL protocol
- Ability to join the output of SphinxQL on a fast and robust method if needed to MySQL data
- SphinxSE will be faster!
See my forum post @ http://sphinxsearch.com/forum/view.html?id=5896
with some questions and statements about this.
Arian, we might expose some stuff through SphinxSE per customer requests, on a request by request basis, but generally there are no plans to expose everything through SphinxSE. And no, that won’t be faster. MySQL adds quite a bunch of overheads and complexity.
SphinxQL is better than SQL, i am chaning my codes, but how can i change default options in SphinxSQL request.
thanks..
indir, what “default options” are you referring to?