FAQ (Frequently Asked Questions)
APIs only returns what search index actually contains. And currently, Sphinx does not store all of the original row data. (This might change in the future, but there are no immediate plans.) It can store numeric columns but not the text ones. So you will have to pull document IDs from Sphinx and then go to the database for the full row contents. This is usually a matter of exactly 1 additional SQL query, for example:
SELECT * FROM documents WHERE id IN (3,5,7)
Note that the database will usually reorder the rows in the result set! So you will also have to reorder them back in the order as received from Sphinx. With MySQL, you can use its ORDER BY field feature:
SELECT * FROM documents WHERE id IN (3,5,7) ORDER BY FIELD(id,3,5,7)
The output you might be seeing from CLI search does exactly additional SQL queries (and, by the way, in a bad way, with a separate query per each row, instead of using IN() and firing only 1 query for the whole result set). It's only intended for debugging and tesing.
You can do all of this, except for precise arbtrary-length sorting over several indexes.
To filter and group, you can replace the string with an unique numeric ID. Sometimes its possible to create a lookup dictionary in the database (eg. for fixed lists of cities or countries), or even use an existing one, replace strings with their IDs in that dictionary, then filter and group on that ID. If not, you can always replace the string with its checksum, eg. CRC32() or (any) 64 bits taken from MD5() at indexing time (no need to alter the tables!), store it using sql_attr_uint or sql_attr_bigint respectively, and then filter or group on that checksum attribute. (Note that there's a certain chance of CRC32() collisions if you have millions of strings but practically zero chance of MD5() collisions.)
Sorting is harder, but also possible to some extent. First, you can use sql_attr_str2ordinal that replaces every string with its ordinal number (a sequential number in the sorted list of all the uniqie strings mentioned in this index). However, that will produce garbled results when querying through several indexes at a time, because every index will assign its own sequential number to the same strings. Ie. string "zzz" could be number 1000 in index A but at the same time number 1 in index B, and errorneously float to the top when searching through both A+B. Second, you can extract 4-byte or 8-byte substrings, store them as attributes, and sort on those attributes. That will work across different indexes, but sort only on few first bytes.
No, you don't!
SphinxSE is just an ordinary client that talks to searchd over network. You need it when you do not have a native API for your language and want to work through MySQL. Or if you want to optimize certain workloads that pull big Sphinx result sets directly to MySQL and additionally process them there (and avoid overheads of pulling Sphinx set to application and then immediately pushing it to MySQL). However, you are not required to use SphinxSE to work with Sphinx.
There are two places where the default limit of at most 1000 matches per query is imposed.
First, on the server side, you have to change max_matches settings in sphinx.conf, and restart the searchd. (With 0.9.9 and above, you can simply reload the config file using SIGHUP.)
Second, there's also a 3rd argument to SetLimits() API call that also defaults to 1000 and must be raised. (With SphinxSE, use "maxmatches" option.)
Character encoding must be consistent everywhere: namely, when pulling data from SQL, when indexing that data, and when searching. The checklist is as follows:
sql_query_pre = SET NAMES utf8
There are the following routes:
# in sphinx.conf sql_query = SELECT id, ..., CONCAT('_user',userid,'; _group',groupid) FROM docs // in webapp $cl->SetMatchMode ( SPH_MATCH_EXTENDED2 ); $cl->Query ( "my query ( _user123 | _group456 )", "myindex" );
$cl->SetSelect ( "*, userid=123 OR groupid=456 AS mycond" ); $cl->SetFilter ( "mycond", array(1) ); $cl->Query ( "my query", "myindex" );
You can enumerate several fields in the field limit operator:
@(title,body) hello world
This will search for both keywords (hello and world) in both fields (title and body).
Currently, you have to explicitly a) configure Sphinx to index either prefixes (substrings starting at word start) or infixes (general substrings), and b) enable star-syntax (ie. hello wor*) which is disabled by default for compatibility reasons. Refer to documentation on min_prefix_len, min_infix_len, and enable_star for details. Note that indexing time and index size can grow significantly when using these!
This can be caused by configure-time options used to build your MySQL binary and SphinxSE plugin. For instance, pre-built MySQL binaries from Sun usually come built with --with-fast-mutexes switch enabled at configure. That results in a size mismatch between certain internal structures and leads to crashes. Unfortunately, we do not know of a reliable way to automatically detect that.
To fix the issue, either rebuild SphinxSE plugin using ./configure --with-fast-mutexes, or rebuild mysqld binary from the source.
Did not find a specific answer?