Downloads
Services
Community
Resources
About

// full-text diary

Sphinx at OSCON Portland July 19-23


posted on July 16th, 2010 by rich in Conferences, General

Just wanted to give you a quick update, that we will be at OSCON in Portland Oregon, from the 19th-23rd of July. Peter Zaitsev of Percona & the MySQL Performance Blog, will be moderating a Birds of a Feather Session on Sphinx in 2010, slated for Thursday the 22nd of July at 7 PM.

If you are in the area or up for a casual meeting to discuss your use or would like to learn more about Sphinx please let us know by contacting Rich Kelm, who will  be representing us at OSCON.

SphinxQL now lets you do everything querying-related that SphinxAPI did, in a simpler, faster, and more convenient way. For most of the features, the mapping of API calls to SphinxQL syntax is straightforwards (either via SQL syntax or using our OPTION clause). However, a few things, namely time segments, geosearches, overrides, index and weight fields, etc might now be less obvious. So let’s discuss them.
Read the rest of this entry »

C++ compiler shootout


posted on May 23rd, 2010 by shodan in General

I’ve been curious for some time just how differently various C++ compilers might perform on a real world code base, how much improvement over time can one expect, etc. Today, I finally did a benchmark on that.

The compilers used were GNU gcc 3.4.6 (pretty much the oldest you can expect these days, but still found in the wild, e.g. on that Centos 4.7 box I used for benchmarks); GNU gcc 4.5.0 (bleeding edge, built from source); and Intel icc 11.1. Hardware was 2x dual-core Xeon 3.6 Ghz, making a total of 4x cores.
Read the rest of this entry »

As planned I’m currently in Zagreb, Croatia and I just gave a talk on Sphinx at #dorscluc earlier today. Grab the slides here at this “Meet the Sphinx” link if you want to find out what was in those secret bullets at the bottom that the presentation machine chopped off :)

I will also be doing a workshop tomorrow again, and another workshop is scheduled in Moscow at Devconf, on May 18th. It will be in Russian. You can check out the workshop plan and register on Devconf website. I will be covering all the tasks people are, in my experience, typically facing so it should be a perfect match (aka a crash course) for people new into Sphinx.

SphinxAPI vs SphinxQL benchmark


posted on April 25th, 2010 by shodan in General

In scripted languages such as PHP, SphinxQL should be generally faster than SphinxAPI just because it uses compiled MySQL client implementation instead of interpreted SphinxAPI client. The question is, specifically how much faster? So I’ve just benchmarked that.

Read the rest of this entry »

Sphinx slides from MySQL UC and RIT++ 2010


posted on April 17th, 2010 by shodan in General

MySQL UC 2010 in Santa Clara ended yesterday. It was a busy but interesting time for us, and I’d like to thank everyone who made it so by attending our BOF, my talk, or found the time and met us. For those who haven’t been able to attend, I’ve just uploaded slides from my “Sphinx: full-text search in 2010″ talk to the Presentations section.

In the meantime Maciej Dobrzanski from Percona delivered his “Improving MySQL-based applications performance with Sphinx” at RIT++ 2010 conference in Moscow so thanks fly out to him as well.

I will also be speaking at DORS/CLUC event on May 5-7 in Zagreb, Croatia so if you’re interested in meeting there, let us know.

2 cents on 2% optimizations


posted on April 9th, 2010 by shodan in General

Is it ever worth the effort to spend time doing optimizations that result in just 1-2% performance improvement? Sort of itches to answer “just focus on features and quit being that obsessed with performance”, right?

At my previous day job, which was in video games, I once went through an optimization death-march (or maybe death-October, to be more precise). The sole goal was to get the 3D renderer run at 30 fps min. The problem was that it went down to 20 fps and sometimes even worse at certain camera angles. To add some “icing” on the “cake”, when you enable V-sync to avoid tearing, falling to 19.9 fps means you’re actually doing 15, because monitor runs at 60 Hz, and if you haven’t managed to show a new frame within 3/60-ths of a second (20 fps) since the last one, you’re waiting until the next V-sync that happens at 4/60-ths of a second (15 fps). Now, human eye runs at 24 fps (basically), so 30 fps is perfectly smooth, 20 fps is slightly uncomfortable but generally OK, but 15 fps is quite laggy.

Another problem was that there were no more major optimizations to pull out of the hat and save the day. (Battle hardened graphics developer would silently insert minor intentional hidden.. “reserves” here and there over the course of the project, so that artists would do their best to hit the budget in that “reserved” version, then quickly use all those reserves the week before shipping gold master and boost the frame rate nicely. Well. If I’m ever back to video games, I’m definitely doing that.)

So I started with a certain especially bad camera angle that resulted in 19.1 fps or something like that. And kept trying all the minor changes I could come up with. Some of them weren’t even optimizations, actually, because once implemented, they’d hurt my precious-s-s fps.

Most of those optimizations were tiny. Changes that improved things by 0.1 fps, which is 0.5%, did get committed into trunk. Most of the changes were in 0.1 to 0.5 fps range. I got a huge one once that made a whopping 1.2 fps of an improvement. Huge. Once.

That was pretty exhausting. But a week or two later, we had 25+ fps min. That, in turn, was pretty satisfying. Also, that was a 30% improvement over 19 fps that initially seemed “impossible” to optimize.

Optimizations in general, including tiny 2% optimizations, pile up. And they pile up in a non-linear fashion. 30 different 2% optimizations result in 1.81x improvement, not 1.6x one. 10 different 5% ones result in 1.63x, not 1.5x. Of course, big optimizations pile up even better. But you rarely get many of those if you write your code more or less properly.

So do we hunt every single 2% optimization possibility in Sphinx? No, we definitely don’t. 20X difference on a code that gets executed once on startup and eats 0.001 sec anyway? Could not care less. A new feature that introduces 1% general indexing impact that is very complicated (if at all possible) to eliminate? Introduce this delay (with a heavy sigh), it’s extra 30 seconds per hour after all. But we don’t blindly dismiss these tiny things either, and when it takes reasonable effort to write slightly more efficient code, we’re going for that. Because that piles up.

Not that I’m not doing a death-march on Sphinx when I have a chance. But I’m rather starting with an analogue of “25 fps” the next time.

Facets, multi-queries, and searching 3x faster


posted on April 5th, 2010 by shodan in General

A number of Sphinx features is frequently overlooked, and multi-queries is one of them. I’ve heard a myth that Sphinx does not have faceted searching. In fact, it perfectly does. Moreover, multi-query mechanism is more powerful and flexible than just that, it lets you do more than just facets. But the documentation does not ever mention the word “facets” indeed, and voila, a myth is born. Well, time to debunk it!

So what are those multi-queries and how they can make your searches 3x faster?
Read the rest of this entry »

Full-text search BOF session at MySQL Conference


posted on March 31st, 2010 by shodan in Conferences, General

Just wanted to post a quick update that our Birds-of-a-Feather session proposal for MySQL Conference got approved too, so come join us on Tuesday, April 13th, 7:00 PM PST if you are around Santa Clara Convention Center. We’ll be discussing all things full-text search, including Sphinx, of course, but not really limited to it. Evening BOF sessions were free to attend previous years, so consider it even if you aren’t attending the conference itself.

Now if you are, I couldn’t help myself but plug a reminder of a Sphinx: full-text search in 2010 talk the next evening, that is, Wednesday, April 14th, 5:15 PM PST. Note that the talk will be an overview of newly added features rather than a bird-view of the entire system. So if you haven’t used Sphinx before, I suggest fetching a copy and going through some tutorial before the talk. Even though we aren’t pushing our install as “the famous 5-minute” one, it still doesn’t take more than that. (So, too bad this punch line is taken.)

Sphinx vs MySQL expression benchmarks


posted on March 29th, 2010 by shodan in General

Curiously, even though we support arbitary arithmetic expressions for a while, I’ve never actually benchmarked how does our implementation compare to other ones, for instance, to MySQL. Time to fill that gap.

I used current trunk version of Sphinx (namely r2265) and MySQL 5.0.37, both running on Windows. Test data was 1 million random rows generated with following PHP script. Table type defaults to InnoDB.
Read the rest of this entry »