As you might know, the Sphinx team is focused not only on full-text search improvements like the blended characters support we introduced in 2.0.1-beta, we also care about general performance improvements. One of the performance questions we run into frequently is, “how do you measure a single query’s speed.. especially in a scalable, distributed environment?”
One of the application of distributed indexes in Sphinx is parallelizing queries across many CPU cores even when running on a single server. There’s a well known trick to have an agent line (or three) pointing to the very same master searchd instance. Only problem with that approach is, every query entails a bunch of one-off TCP connections, extra forks, and other redundant internal work. Which is okay when you’re serving a few heavy queries but might spin over 50% of your CPU in system time doing those works when you’re doing many quick ones.
Now that’s a problem, but starting with 1.10-beta, there is a solution, called dist_threads directive. So if you’re still doing that agent=localhost trick, and suffering from TCP stack pressure and/or seeing way too much system time in top(1) or vmstat(8), do read on, you are eligible. (As a collateral, if you’re still on anything pre-2.0.1, you should seriously consider upgrading, too.)