Archive for the ‘General’ Category

Welcome to the new Sphinx Search website!

Tuesday, November 2nd, 2010

As you have undoubtedly noticed we have a new website design.  We finally decided it was time to drop the 1996 look and move to the modern age, albeit very slowly and only slightly modern. As with anything new there will be quirks and potential issues that could affect your familiarity or even use. So please be patient and let us know if anything gets weird or breaks by filling out our contact us form.  In the subject line simply state  ’Sphinx Website Feedback’ and provide us with your comments.

FYI: We plan more changes from here, like finally implementing Sphinx for searches, unified signon for all different parts of the website, and a revamp of the powered-by and services page. Again feel free to reach out to us with suggestions, applause, or issues.

Sphinx Search Conference Update & Sphinx Team in Moscow

Thursday, October 21st, 2010

For of those of you who have yet to notice, we are having a free Sphinx Conference Day in Moscow, Russia Sunday the 24th of October 2010. So if you are in the area take a look at the conference registration page and let us know if you are coming.

If you are already registered and itching to attend yet require further details, we posted the much anticipated  update on the Venue and Conference Schedule to the conference registration page. If you did not receive an email from our conference team referencing the above link and update then please let us know ASAP because you are probably not registered for the conference.
(more…)

How Sphinx relevance ranking works

Tuesday, August 17th, 2010

Over time, we added quite a bunch of matching and ranking modes to Sphinx, and will be adding more. A number of different questions that regularly pop up, ranging from “how do I force this document ranked the 1st” from “how do I draw 1 to 5 stars depending on match quality”, do in fact boil down to matching and ranking internals. So let’s cover that: just how do matching and ranking modes work, what weighting factors contribute to the final weight and how, how does one tweak stuff, etc. And, of course, the stars, our destination.
(more…)

Sphinx at OSCON Portland July 19-23

Friday, July 16th, 2010

Just wanted to give you a quick update, that we will be at OSCON in Portland Oregon, from the 19th-23rd of July. Peter Zaitsev of Percona & the MySQL Performance Blog, will be moderating a Birds of a Feather Session on Sphinx in 2010, slated for Thursday the 22nd of July at 7 PM.

If you are in the area or up for a casual meeting to discuss your use or would like to learn more about Sphinx please let us know by contacting Rich Kelm, who will  be representing us at OSCON.

Doing time segments, geodistance searches, and overrides in SphinxQL

Sunday, June 27th, 2010

SphinxQL now lets you do everything querying-related that SphinxAPI did, in a simpler, faster, and more convenient way. For most of the features, the mapping of API calls to SphinxQL syntax is straightforwards (either via SQL syntax or using our OPTION clause). However, a few things, namely time segments, geosearches, overrides, index and weight fields, etc might now be less obvious. So let’s discuss them.
(more…)

C++ compiler shootout

Sunday, May 23rd, 2010

I’ve been curious for some time just how differently various C++ compilers might perform on a real world code base, how much improvement over time can one expect, etc. Today, I finally did a benchmark on that.

The compilers used were GNU gcc 3.4.6 (pretty much the oldest you can expect these days, but still found in the wild, e.g. on that Centos 4.7 box I used for benchmarks); GNU gcc 4.5.0 (bleeding edge, built from source); and Intel icc 11.1. Hardware was 2x dual-core Xeon 3.6 Ghz, making a total of 4x cores.
(more…)

Presentation from Zagreb and upcoming workshop in Moscow

Wednesday, May 5th, 2010

As planned I’m currently in Zagreb, Croatia and I just gave a talk on Sphinx at #dorscluc earlier today. Grab the slides here at this “Meet the Sphinx” link if you want to find out what was in those secret bullets at the bottom that the presentation machine chopped off :)

I will also be doing a workshop tomorrow again, and another workshop is scheduled in Moscow at Devconf, on May 18th. It will be in Russian. You can check out the workshop plan and register on Devconf website. I will be covering all the tasks people are, in my experience, typically facing so it should be a perfect match (aka a crash course) for people new into Sphinx.

SphinxAPI vs SphinxQL benchmark

Sunday, April 25th, 2010

In scripted languages such as PHP, SphinxQL should be generally faster than SphinxAPI just because it uses compiled MySQL client implementation instead of interpreted SphinxAPI client. The question is, specifically how much faster? So I’ve just benchmarked that.

(more…)

Sphinx slides from MySQL UC and RIT++ 2010

Saturday, April 17th, 2010

MySQL UC 2010 in Santa Clara ended yesterday. It was a busy but interesting time for us, and I’d like to thank everyone who made it so by attending our BOF, my talk, or found the time and met us. For those who haven’t been able to attend, I’ve just uploaded slides from my “Sphinx: full-text search in 2010″ talk to the Presentations section.

In the meantime Maciej Dobrzanski from Percona delivered his “Improving MySQL-based applications performance with Sphinx” at RIT++ 2010 conference in Moscow so thanks fly out to him as well.

I will also be speaking at DORS/CLUC event on May 5-7 in Zagreb, Croatia so if you’re interested in meeting there, let us know.

2 cents on 2% optimizations

Friday, April 9th, 2010

Is it ever worth the effort to spend time doing optimizations that result in just 1-2% performance improvement? Sort of itches to answer “just focus on features and quit being that obsessed with performance”, right?

At my previous day job, which was in video games, I once went through an optimization death-march (or maybe death-October, to be more precise). The sole goal was to get the 3D renderer run at 30 fps min. The problem was that it went down to 20 fps and sometimes even worse at certain camera angles. To add some “icing” on the “cake”, when you enable V-sync to avoid tearing, falling to 19.9 fps means you’re actually doing 15, because monitor runs at 60 Hz, and if you haven’t managed to show a new frame within 3/60-ths of a second (20 fps) since the last one, you’re waiting until the next V-sync that happens at 4/60-ths of a second (15 fps). Now, human eye runs at 24 fps (basically), so 30 fps is perfectly smooth, 20 fps is slightly uncomfortable but generally OK, but 15 fps is quite laggy.

Another problem was that there were no more major optimizations to pull out of the hat and save the day. (Battle hardened graphics developer would silently insert minor intentional hidden.. “reserves” here and there over the course of the project, so that artists would do their best to hit the budget in that “reserved” version, then quickly use all those reserves the week before shipping gold master and boost the frame rate nicely. Well. If I’m ever back to video games, I’m definitely doing that.)

So I started with a certain especially bad camera angle that resulted in 19.1 fps or something like that. And kept trying all the minor changes I could come up with. Some of them weren’t even optimizations, actually, because once implemented, they’d hurt my precious-s-s fps.

Most of those optimizations were tiny. Changes that improved things by 0.1 fps, which is 0.5%, did get committed into trunk. Most of the changes were in 0.1 to 0.5 fps range. I got a huge one once that made a whopping 1.2 fps of an improvement. Huge. Once.

That was pretty exhausting. But a week or two later, we had 25+ fps min. That, in turn, was pretty satisfying. Also, that was a 30% improvement over 19 fps that initially seemed “impossible” to optimize.

Optimizations in general, including tiny 2% optimizations, pile up. And they pile up in a non-linear fashion. 30 different 2% optimizations result in 1.81x improvement, not 1.6x one. 10 different 5% ones result in 1.63x, not 1.5x. Of course, big optimizations pile up even better. But you rarely get many of those if you write your code more or less properly.

So do we hunt every single 2% optimization possibility in Sphinx? No, we definitely don’t. 20X difference on a code that gets executed once on startup and eats 0.001 sec anyway? Could not care less. A new feature that introduces 1% general indexing impact that is very complicated (if at all possible) to eliminate? Introduce this delay (with a heavy sigh), it’s extra 30 seconds per hour after all. But we don’t blindly dismiss these tiny things either, and when it takes reasonable effort to write slightly more efficient code, we’re going for that. Because that piles up.

Not that I’m not doing a death-march on Sphinx when I have a chance. But I’m rather starting with an analogue of “25 fps” the next time.