SphinxQL now lets you do everything querying-related that SphinxAPI did, in a simpler, faster, and more convenient way. For most of the features, the mapping of API calls to SphinxQL syntax is straightforwards (either via SQL syntax or using our OPTION clause). However, a few things, namely time segments, geosearches, overrides, index and weight fields, etc might now be less obvious. So let’s discuss them.
Time segments sorting mode can now be easily repeated using INTERVAL and NOW functions. Moreover, it’s so much more flexible in SphinxQL. In custom SphinxAPI mode, turn-points were hardcoded, and limited to sorting. With SphinxQL, you now can control those turn-points, and also use the result for something else, not just sorting.
Specifically, legacy SphinxAPI mode is completely equivalent to computing this INTERVAL expression and sorting by it:
SELECT *, INTERVAL(time_stamp_attr, NOW()-90*86400, NOW()-30*86400, NOW()-7*86400, NOW()-86400, NOW()-3600) AS time_seg ... ORDER BY time_seg DESC, @weight DESC |
86400 is the number of seconds per 24 hours so the turn-points are at 90 days, 30 days, 7 days, 1 day, and 1 hour respectively. That’s exactly how they are hard-coded in SPH_SORT_TIME_SEGMENTS mode.
Note, by the way, that you need not worry about NOW() function used too much or constant multiplication. All that stuff gets optimized by our expression engine and this INTERVAL() expression is simplified down to an efficient search of attribute values in a list of constants.
Geodistance searches are supported through GEODIST(lat1,lon1,lat2,lon2) function. That is similar to SetGeoAnchor() API call but more flexible again. API required that lat1/lon1 are attribute names, and lat2/lon2 are constant floats. SphinxQL does not have that restriction any more, all latitudes and longitudes can be arbitrary expressions!
At the same time we are not sacrificing any speed by making GEODIST() that generic, because we check whether any (or both) of the points are constant expressions, and optimize accordingly. For instance, GEODIST() between two constant points will be computed only once. And GEODIST() with a pair of attributes and a constant anchor will be computed exactly as before, without any extra computational overheads.
Attribute value overrides, depending on how did you use them, can now be implemented using either IF() or IN() functions. For instance, to boost matching posts written by user’s friends, you could use something like:
SELECT *, @weight + IN(author_id,2,3,5,8,13,21)*1000 ... ORDER BY @weight DESC |
And, say, giving a 10% discount on a given category of goods to a current user (because he punched in a valid promo code) can be accomplished like:
SELECT *, IF(category_id=123,0.9,1)*price AS adjusted_price ... ORDER BY adjusted_price ASC |
In this particular case just cutting 10% off could be expressed without an IF(), for example as price*(1-0.1*(category_id=123)), exploiting the fact that boolean expressions in scalar context evaluate to 0 or 1. However IF() might be more efficient when more complex expressions are involved, because it short-circuits. That is, either 2nd or 3rd argument gets evaluated, but never the both of them.
Index and field weights were, frankly speaking, overlooked a while ago and stayed unimplemented for some time. But as of bleeding edge trunk, they are available, through an OPTION clause:
SELECT ... OPTION field_weights=(title=5,abstract=2,content=1) |
To summarize, all the searching things you were able to do with the API are now available from SphinxQL as well, and sometimes in even more flexible and optimal way than before. So be sure to give it a try.
| « May 23, 2010. C++ compiler shootout | July 16, 2010. Sphinx at OSCON Portland July 19-23 » |


Hi Andrew,
nice explanation, but I don’t get it:
SELECT *, IF(category_id=123,0.9,1)*price AS adjusted_price
Maybe it is a misprint here.
–
Thanks,
Yaroslav
Yaroslav, exactly what seems a misprint? That’s valid SphinxQL syntax :-)
Very cool. Great job.
@Yaroslav:
Think of it as this:
$adjusted_ price = $price;
if($category_id === 123)
{
$adjusted_ price *= 0.9;
}
Will SPH_GROUPBY_DAY and other SetGroupBy() date built-in functions be supported in SphinxQL? I know they could be emulated by INTERVAL(), too, but it’s very clumsy when you have to search through the large time spans.
It’s being discussed here:
http://www.sphinxsearch.com/forum/view.html?id=6258
kmike, yes, that’s on the todo list.
Its not something I can directly put my finger on, but we had a lot of trouble using the sphinxQL interface on port 9306, we kept getting exceptions in searchd when the query returned a large amount of data. I think that the mysql protocol emulation has some buffer overflow or bounds checking issues when result sets are large, we store entire articles as string attributes to avoid having to hit the database again on a search listing.
We had no problem writing individual records to the RT indices via the MySQL emulation protocol, but could not reliably read back the results of queries. and had to resort to using the old protocol on 9312 for queries.
Tim, that might have been since fixed in current trunk.