Archive for April, 2010

SphinxAPI vs SphinxQL benchmark

Sunday, April 25th, 2010

In scripted languages such as PHP, SphinxQL should be generally faster than SphinxAPI just because it uses compiled MySQL client implementation instead of interpreted SphinxAPI client. The question is, specifically how much faster? So I’ve just benchmarked that.

(more…)

Sphinx slides from MySQL UC and RIT++ 2010

Saturday, April 17th, 2010

MySQL UC 2010 in Santa Clara ended yesterday. It was a busy but interesting time for us, and I’d like to thank everyone who made it so by attending our BOF, my talk, or found the time and met us. For those who haven’t been able to attend, I’ve just uploaded slides from my “Sphinx: full-text search in 2010″ talk to the Presentations section.

In the meantime Maciej Dobrzanski from Percona delivered his “Improving MySQL-based applications performance with Sphinx” at RIT++ 2010 conference in Moscow so thanks fly out to him as well.

I will also be speaking at DORS/CLUC event on May 5-7 in Zagreb, Croatia so if you’re interested in meeting there, let us know.

2 cents on 2% optimizations

Friday, April 9th, 2010

Is it ever worth the effort to spend time doing optimizations that result in just 1-2% performance improvement? Sort of itches to answer “just focus on features and quit being that obsessed with performance”, right?

At my previous day job, which was in video games, I once went through an optimization death-march (or maybe death-October, to be more precise). The sole goal was to get the 3D renderer run at 30 fps min. The problem was that it went down to 20 fps and sometimes even worse at certain camera angles. To add some “icing” on the “cake”, when you enable V-sync to avoid tearing, falling to 19.9 fps means you’re actually doing 15, because monitor runs at 60 Hz, and if you haven’t managed to show a new frame within 3/60-ths of a second (20 fps) since the last one, you’re waiting until the next V-sync that happens at 4/60-ths of a second (15 fps). Now, human eye runs at 24 fps (basically), so 30 fps is perfectly smooth, 20 fps is slightly uncomfortable but generally OK, but 15 fps is quite laggy.

Another problem was that there were no more major optimizations to pull out of the hat and save the day. (Battle hardened graphics developer would silently insert minor intentional hidden.. “reserves” here and there over the course of the project, so that artists would do their best to hit the budget in that “reserved” version, then quickly use all those reserves the week before shipping gold master and boost the frame rate nicely. Well. If I’m ever back to video games, I’m definitely doing that.)

So I started with a certain especially bad camera angle that resulted in 19.1 fps or something like that. And kept trying all the minor changes I could come up with. Some of them weren’t even optimizations, actually, because once implemented, they’d hurt my precious-s-s fps.

Most of those optimizations were tiny. Changes that improved things by 0.1 fps, which is 0.5%, did get committed into trunk. Most of the changes were in 0.1 to 0.5 fps range. I got a huge one once that made a whopping 1.2 fps of an improvement. Huge. Once.

That was pretty exhausting. But a week or two later, we had 25+ fps min. That, in turn, was pretty satisfying. Also, that was a 30% improvement over 19 fps that initially seemed “impossible” to optimize.

Optimizations in general, including tiny 2% optimizations, pile up. And they pile up in a non-linear fashion. 30 different 2% optimizations result in 1.81x improvement, not 1.6x one. 10 different 5% ones result in 1.63x, not 1.5x. Of course, big optimizations pile up even better. But you rarely get many of those if you write your code more or less properly.

So do we hunt every single 2% optimization possibility in Sphinx? No, we definitely don’t. 20X difference on a code that gets executed once on startup and eats 0.001 sec anyway? Could not care less. A new feature that introduces 1% general indexing impact that is very complicated (if at all possible) to eliminate? Introduce this delay (with a heavy sigh), it’s extra 30 seconds per hour after all. But we don’t blindly dismiss these tiny things either, and when it takes reasonable effort to write slightly more efficient code, we’re going for that. Because that piles up.

Not that I’m not doing a death-march on Sphinx when I have a chance. But I’m rather starting with an analogue of “25 fps” the next time.

Facets, multi-queries, and searching 3x faster

Monday, April 5th, 2010

A number of Sphinx features is frequently overlooked, and multi-queries is one of them. I’ve heard a myth that Sphinx does not have faceted searching. In fact, it perfectly does. Moreover, multi-query mechanism is more powerful and flexible than just that, it lets you do more than just facets. But the documentation does not ever mention the word “facets” indeed, and voila, a myth is born. Well, time to debunk it!

So what are those multi-queries and how they can make your searches 3x faster?
(more…)