When it comes time to query a Sphinx server it will be helpful to know a bit about the binary API and SphinxQL. In this post, we’ll provide a quick overview and we’ll discuss when it is best to use either SphinxQL or the API.
The Binary API
The binary API is the oldest method to query a Sphinx server and it’s offered in a multitude of languages, both official and unofficial. While it offers an easy programmatic way to query Sphinx indexes, it does come with several disadvantages:
- since it packs all search options into a binary format and unpacks the binary responses to a readable format, it can add an extra CPU load on the client. This load can be a problem, most noticeable on some client languages (like PHP) and when huge lists of filters are used (it’s not so uncommon to filter an attribute by thousands of values);
- the client API needs to be up-to-date in line with the Sphinx server. Using an older API, with a newer searchd, can either lead to the wrong interpretation of the query or to not being able to use newer options;
- The API is only designed for search ( a.k.a. SELECT ) queries. This means content changing operations on a RealTime index are not possible with the API – SphinxQL must be used.
SphinxQL is another way of querying Sphinx. It uses the MySQL 4.1 transport protocol and for this reason no API is required, any MySQL client can be used instead. SphinxQL is an SQL subset, similar to MySQL. However it has some peculiarities:
- while syntax and operators in Sphinx are similar to MySQL, there are several differences. For example, MATCH is not the same as MySQL’s MATCH. SphinxQL has an OPTION clause, which is not present in MySQL, but used frequently in Sphinx queries. Another difference is that in Sphinx, the OR operator in WHERE clauses is not supported, instead it’s supported in SELECT expressions. For this reason, adaptation to various ORMs is possible, but requires some work;
- SphinxQL has a special command SHOW META, that can be executed right after the SELECT ( during same connection), which offers some additional information like total count or docs/hits statistics (and, since 2.1.1, even more: IO and CPU stats if searchd was set to report these).
SphinxQL is currently the only method to perform content changes on a RealTime index, it also has some extra functionality that is not present in the API. For example, truncating a RT index or showing index structure (with DESCRIBE).
Querying in API vs. SphinxQL
No matter which method is used, the results of the query are the same. The main difference is that some types of queries require more definition in SphinxQL (and are therefore more configurable).
For example, the API has several matching modes. In SphinxQL, there is no such option as ‘matching mode’. The matching mode of SphinxQL is the same as SPH_MATCH_EXTENDED2 mode from the API. Nowadays, when you issue a search with the API, it is transformed internally into a SphinxQL command and so, it’s possible to simulate the API legacy matching modes in SphinxQL as well.
API Matching modes in SphinxQL
SELECT * FROM index WHERE MATCH('one two') OPTION ranker = proximity;
SELECT * FROM index WHERE MATCH('one | two') OPTION ranker = matchany;
SELECT * FROM index WHERE MATCH('"one two"/1') OPTION ranker = matchany;
SELECT * FROM index WHERE MATCH('"one two"') OPTION ranker = proximity;
SELECT * FROM index WHERE MATCH('one +two') OPTION ranker = none;
SPH_MATCH_EXTENDED and SPH_MATCH_EXTENDED2
SELECT * FROM index WHERE MATCH('one two') ;
Fullscan means no MATCH is made, to make a fullscan search in SphinxQL it's enough to simply not use MATCH clause
From SphinxQL query to programatic API query
A complete example in SphinxQL:
mysql> SELECT id,user_id FROM myindex WHERE MATCH('@title one @content two') AND property_id BETWEEN 10 and 100 ORDER BY group_id DESC,user_id ASC LIMIT 10,20 OPTION ranker=expr('sum(lcs*user_weight+exact_hit)*1000+bm25'), idf=plain, user_weight=(title=100,content=20);
Using the PHP API this would look like:
$cl = new SphinxClient(); $cl->SetMatchMode(SPH_MATCH_EXTENDED2); $cl->SetFilterRange('property_id', 10,100); $cl->SetSortMode(SPH_SORT_EXTENDED,'group_id DESC,user_id ASC'); $cl->SetFieldWeights(array('title' => 100, 'content' =>20)); $cl->SetQueryFlag('idf','plain'); $cl->SetRankingMode(SPH_RANK_EXPR,'sum(lcs*user_weight+exact_hit)*1000+bm25'); $cl->SetSelect('id,user_id'); $cl->Query('@title one @content two','myindex');
Implementing: from API to SphinxQL
A common example is how to implement GeoAnchor from API in SphinxQL;
$cl->SetGeoAnchor('latitude','longitude',1.3521,0.3432); $cl->SetSortMode(SPH_SORT_EXTENDED, '@geodist ASC'); $cl->SetFilterFloatRange('@geodist',0,1000); $cl->Query('..etc..','myindex');
mysql> SELECT *, GEODIST('latitude','longitude',1.3521,0.3432) as g FROM myindex WHERE MATCH('..etc..') AND g BETWEEN 0 AND 1000 ORDER BY g ASC;
Using the API, it’s very simple to implement a query batch, either to be used to take advantage of multi-query optimization or not (the common usage here is faceting).
$cl->SetSortMode ( SPH_SORT_RELEVANCE ); $cl->AddQuery ( "hello world", "myindex" ); $cl->SetSortMode ( SPH_SORT_ATTR_DESC, "price" ); $cl->AddQuery ( "hello world", "myindex" ); $results = $cl->RunQueries();
The result will contain an array with each query’s results. Additionally, the functions ResetFilters() and ResetGroupBy are used to do a full reset of already added filters or grouping calls.
In SphinxQL, multi-queries can be fired in one row, in the same way as MySQL. Most MySQL protocol clients support multi-queries, however they might have a dedicated multi_query() function or they may need a special flag to be enabled.
Both querying methods can be used with success, depending on the index type. For RealTime indexes, there is no problem with using SphinxQL to update the index and API to make queries on it. However, as noted in a previous post, the API can become slower under high traffic or heavy queries and you should consider switching to SphinxQL.
And last but not least, regarding SphinxQL – don’t think of it as querying a database table. It may be a subset of SQL, and RealTime indexes look like a database table, but with the introduction of JSON attributes, you are actually querying a special data repository, which allows unstructured data as well, using an SQL-like method.
|« July 19, 2013. Sphinx at OSCON 2013||July 31, 2013. LogZilla: Big Data Log Analysis with Sphinx »|