Nov 16, 2011. Sphinx 2.0.2-beta is out

Sphinx version 2.0.2-beta is now out, with over 30 new features, options, and other significant changes. It also marks a feature freeze of the 2.0.x branch, and 2.0.3-release is scheduled shortly.

The most important new features in 2.0.2-beta are the new MVA64 attribute type, dict=keywords and MVA support in RT indexes, expression-based ranker, ATTACH INDEX statement, and WHERE support in UPDATE statement.

Read on for a quick discussion of the biggest new ones and the planned release cycle, or just proceed to Downloads and grab it already!

MVA64 attributes are a further extension of our MVA feature. MVAs let you store a set of unsigned 32-bit values, MVA64 now enables signed 64-bit values in addition to that. This is useful to avoid CRC32 collisions when hashing string tags, or to encode extra auxiliary data into your MVA. They're supported in both disk and RT indexes.

Word start (prefix) searching in RT indexes is now supported through the dict=keywords feature. We deliberately chose to not implement prefix and infix pre-indexing from disk indexes in RT, because that would have been a huge memory hog. Substring (infix) searching support with dict=keywords is in development now, and schedule for 2.1.x branch.

MVA and index_exact_words in RT indexes are also supported now.

The new ATTACH INDEX statement is important because it lets you quickly convert from disk to RT index. That's right, you now can quickly batch index the main bulk of your data, then easily switch to RT, and keep updating the RT index. Right now, ATTACH only lets you convert the index once, and requires an empty RT target. However, enabling batch imports into existing, non-empty RT indexes is planned. That is why we chose to reserve ATTACH upfront instead of something like CONVERT.

UPDATE statement now supports full-blown conditions in WHERE, both with disk and RT indexes, too. So, say, updating prices on a 1000 rows from vendor X or just marking them for deletion just became 1000 times easier. Just like attributes UPDATEs themselves, this should also work in both disk and RT backends.

And last but not least, you can now create your own relevance formulas on the fly with the aid of the expression ranker. Previously, computing relevance values was limited to a number of built-in rankers, and changing the formula that combined various internal ranking factors (not to be confused with attributes) involved writing C++ code, and rebuilding and restarting searchd. From now on, you only need to specify a short and sweet formula, you can do on the fly on a per-query basis, and many more ranking factors are now available than we ever computed before. Amusingly, this is not even super slow: on my smallish 1,000,000-document test collection, emulating the default ranker with the expression based one was just 1.1x to 1.3x slower than the compiled C++ code.

Besides the new features, we have also been busy improving our internal testing, build, and release processes. And the next release, 2.0.3-release is now scheduled in approximately 1 month from now, as opposed to the long standing tradition of shipping it when it's done. We did a lot of testing for 2.0.2-beta, there are no known major issues, and the pre-existing functionality must be more stable than ever. However, our policy is not to tag it "release" until it received even more testing from the community. Thus, if you bump into anything in 2.0.2-beta, please be sure to report the issue! The earlier you report it, the sooner it gets fixed.

From there, the release plan is as follows. 2.0.x branch is now frozen and will only receive bug fixes. So after 2.0.3-release we are going to publish bugfix releases on regular basis. The specific intervals are going to be anywhere between 1 and 3 months, depending on the number and severity of reported and fixed issues.

Overall, we added a bunch of cool new stuff in 2.0.2, 2.0.x branch is now feature frozen and will be supported from here, there's more new exciting stuff coming in 2.1.x, and this post is getting long. So grab a new version, give it a whirl, and be sure to report any issues so we could work on them for the forthcoming 2.0.3. And thanks for choosing Sphinx!

Permalink // // digg // stumble


#1. Barry Hunter | 2011-11-16 22:31:28

How do you specify the expression in the API?


then what?

From looking at sphinxapi.php looks like a second param to setRankingMode is possible.

But its not documented

Also recommend
assert ( $ranker === 0 || $ranker>=1 && $ranker<SPH_RANK_TOTAL );

evaulates to a string, which then ==0 in interger context.

#2. Andrew Aksyonoff | 2011-11-16 23:04:08

@Barry, it's the 2nd argument to SetRankingMode() indeed:

$client->SetRankingMode(SPH_RANK_EXPR, "sum(lcs)");

Missing documentation was just overlooked and is a documentation bug. Will fix, thanks :)

#3. Ambrose Niels | 2011-11-23 10:38:35

Is is possible to use the Levenshtein distiance between query and document as
1. matching criteria
that is as a custom ranker
2. sort function
=> currently only functions like ABS(),COS(),FIBONACCI() are supported: )

#4. Andrew Aksyonoff | 2011-11-23 11:51:48

@Ambrose, we do not (yet) ship a built-in Levenstein function, however you can create a UDF and use it in SELECT expressions. That must cover sorting. Expression ranker is a bit different story and, IIRC, does not support UDFs at this point.

#5. Yukron | 2012-01-15 22:15:22


#6. Yukron | 2012-01-15 22:23:38


#7. Yukron | 2012-01-15 22:25:57


#8. Yukron | 2012-01-15 22:23:58


#9. Yukron | 2012-01-15 22:28:13

ЛучС?ая альтернатива - SphinxQL. РќРµ стоит переезжать РёР·-Р·Р° проблем СЃРѕ сторонней библиотекой ;)

Add comment

Name and Comment are required; other fields are optional. Email will never be displayed nor transferred nor used for anything else except getting back in touch. Register and then login to skip spambot check and filling your name every time. Sorry, no markup at all for now.