MySQL fulltext search (FTS) is old and well known. It has a simple setup and requires only small changes for querying. For many people it’s more than enough to provide fulltext search. So, here’s the question: why add a new variable – Sphinx – into the system when the database already has the basic functionality?
We hear from many MySQL FTS users. They come to us with variations on a common theme. This theme goes something like this:
“We were using MySQL FTS, our popularity grew, our data grew, and our database choked. Administrative overhead was outrageous, indexes were finicky and we needed a more out-of-process search experience where the indexes were not completely dependent on the database. We chose Sphinx because everyone knows its super fast.”
In this post we’ll be running through some of the more obvious reasons that MySQL FTS users have chosen Sphinx. We’ll also be providing some examples of actual users who have made the move. So, let’s get to it — there are three outstanding reasons to bring Sphinx into the mix.
The first reason: flexibility
Does MySQL FTS have enough features? The answer is simple: no! It provides only basic options, like stopwords or minimum word length. It also doesn’t feature operators like quorum or proximity and the ranking system is not customizable. You may be wondering, “why do I need these?” This answer is also simple: try to use your search.. If you think it’s working fine, think again. Ask your users. You may be surprised.
The ability to ‘tune’ search is a must for systems where search plays a vital role. It’s simple: if your on-line shop brings bad (irrelevant, slow) results in searching, people will leave.. It’s not safe to assume they will diligently navigate the maze you’ve set before them to find the product they want. Searching must be easy. People have been spoiled by Google, we’re not in the 90′s anymore. A bad search experience could even raise questions about your service in general.
The second reason: total cost of ownership
And, unfortunately, throwing-in hardware when things get slow is not usually the wisest solution. There is a point when throwing-in hardware stops making sense. MySQL is a database, it’s designed to store data, not to be efficient at searching text. A single Sphinx server can replace several MySQL FTS servers. Sphinx indexes not only consume less memory, but can also serve several times more queries per second.
The cost of implementation (regardless if we’re talking about DIY or hiring a consultant) is paid once, the cost of servers is paid every month. If outrageous bills are not enough of an argument (or if you have money to burn), the speed provided by Sphinx should be. Let’s admit it, nobody has patience these days.. Everyone wants instant load, instant results. Again, everyone is used to Google, no one wants to stare at that ‘loading gif’ animation for seconds … and seconds.
High bounce-rates can be worse than the bills you’re paying for all of your servers. Bouncing users ultimately means less revenue. Again, if we’re talking about an on-line shop, the user must find what she’s looking for as fast as possible – and for a simple reason – you want her to buy! If it takes too much time to find what she wants, she’s most likely not going to go find something else to buy. She’s not going to think “Well, I’m still here, I better buy something” (she’s not in a supermarket) she will simply grow impatient and close the browser page!
Three: we make it really easy!
The following has been taken straight from documentation.
Starting with version 0.9.9-rc2, Sphinx searchd daemon supports MySQL binary network protocol and can be accessed with regular MySQL API. For instance, ‘mysql’ CLI client program works well. Here’s an example of querying Sphinx using the MySQL client:
$ mysql -P 9306 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 2.0.8 (r3831) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> SELECT * FROM test1 WHERE MATCH('test') -> ORDER BY group_id ASC OPTION ranker=bm25; +------+--------+----------+------------+ | id | weight | group_id | date_added | +------+--------+----------+------------+ | 4 | 1442 | 2 | 1231721236 | | 2 | 2421 | 123 | 1231721236 | | 1 | 2421 | 456 | 1231721236 | +------+--------+----------+------------+ 3 rows in set (0.00 sec)
Note that mysqld was not even running on the test machine. Everything was handled by searchd itself.
The new access method is supported in addition to native APIs which all still work perfectly well. In fact, both access methods can be used at the same time. Also, native API is still the default access method. MySQL protocol support needs to be additionally configured. This is a matter of 1-line config change, adding a new listener with mysql41 specified as a protocol:
listen = localhost:9306:mysql41
Just supporting the protocol and not the SQL syntax would be useless so Sphinx now also supports a subset of SQL that we dubbed SphinxQL. It supports the standard querying all the index types with SELECT, modifying RT indexes with INSERT, REPLACE, and DELETE, and much more.
SphinxQL is our SQL dialect that exposes all of the search daemon functionality using a standard SQL syntax with a few Sphinx-specific extensions. Everything available via the SphinxAPI is also available via SphinxQL but not vice versa; for instance, writes into RT indexes are only available via SphinxQL.
SphinxSE is MySQL storage engine which can be compiled into MySQL server 5.x using its pluggable architecture. It is not available for MySQL 4.x series. It also requires MySQL 5.0.22 or higher in 5.0.x series, or MySQL 5.1.12 or higher in 5.1.x series.
Despite the name, SphinxSE does not actually store any data itself. It is actually a built-in client which allows MySQL server to talk to
searchd, run search queries, and obtain search results. All indexing and searching happen outside MySQL.
Obvious SphinxSE applications include:
- easier porting of MySQL FTS applications to Sphinx;
- allowing Sphinx use with progamming languages for which native APIs are not available yet;
- optimizations when additional Sphinx result set processing on MySQL side is required (eg. JOINs with original document tables, additional MySQL-side filtering, etc).
Go take a look at our powered-by page: you’ll see many examples of happy Sphinx users who’ve come to us using MySQL FTS. Here, we’re just going to provide a few quick examples to show why the move to Sphinx has been made.
The Homeownershub.com team provides some interesting insights. When asked, “why Sphinx?”, they had this to say:
“The problem that Sphinx lets us solve is the efficiency of utilizing server resources which are always in short supply, no matter how much you got. MySQL is very heavy on resources and fulltext search specifically is a slow process compared to other SQL operations. Unfortunately, results of fulltext queries cannot be effectively cached either, lest they fill up the allocated memory limit and force the system to constantly update the cache, thus defeating the whole purpose of cache. Each of our pages requires about a dozen or so fulltext-type operations and then about three times as many of other, more efficient SQL queries that could be easily cached and stored for a long time and, most importantly, used again multiple times, making cache more efficient. In addition, we are leveraging MySQL’s speed by caching table indexes as much as possible but if we tried to cache fulltext ones, we’d run out of memory very fast.
So we delegate fulltext searches to Sphinx (and those searches are also much more configurable than SQL) and end up executing more tasks in parallel without having to start too many very heavy MySQL threads and open too many large temp tables. None of our fulltext SQL searches would fit into memory, no matter how small, so we’re getting some speed improvement by eliminating unnecessary disk operations, too!
Oh, did I mention that we also use Sphinx for actual search? LOL!”
As you can see, flexibility and efficient use of server resources stand at the top of their list. The Homeownershub.com team are not the only ones to have noticed Sphinx’s efficient use of server resources.. Let’s take a look at why the ePHOTOzine.com team chose Sphinx:
“With a growing community and database we quickly found that we out grew MySQL’s fulltext. MyISAM was not ideal due to the table locking and we required more flexible and speedier search results. After discussing and researching our options with our server administrators, we chose Sphinx and are still very impressed. The speed of the queries are unbelievable. We are quickly finding that Sphinx is an option we go to for more advanced queries for performance reasons.”
These kinds of examples are common among Sphinx + MySQL users. One last example (short and to-the-point) is provided by the Torrenthound.com team. They told us that they even “use Sphinx for default category listing as it is faster than MySQL’s order by clause without any query. Server load with 2.5M records is down about 90%.” We don’t want to throw a long list of quotes at you, but as we mentioned earlier, you can go take a look at a lot more of these examples on our powered-by page.
So, in short: there are many reasons to give Sphinx a try. It’s free (open source), it’s easy to integrate, it’s super powerful, flexible, and it won’t suck your server resources dry. Maybe you didn’t even know there was a better way.. Hopefully this post has shown you that the sky is the limit when it comes to leveraging the flexible, lightning-fast, search tools provided by Sphinx.
|« June 21, 2013. Faceted Search with Sphinx||July 2, 2013. Geo distances with Sphinx »|