Jan 15, 2013. RealTime Index Improvements in 2.1.1

We are proud to announce that the Sphinx 2.1.x tree has been finalized. Sphinx 2.1.1-beta will be released in the very near future! We were so excited about the work we’ve accomplished that we couldn’t wait to start sharing some of the new features that will be included in 2.1 series.

We will start off this series of posts with something that got a lot of love in 2.1 series: RealTime (RT) Indexes.

Brief summary of RT related updates is as follows. First, after many improvements and optimizations, RT Indexing fully caught up with all of the Sphinx’s on-disk indexing features. SPZ and infix (substring) indexing are now supported. Second, RT Indexing now comes with double buffering, meaning that updates should not stall (too) much any more. Third, we addressed the fragmentation and degradation issues, and added OPTIMIZE INDEX. Fourth, we also added a small helper that lets you “reset” the index, a TRUNCATE RTINDEX statement.

RT vs text structure

It is now possible to use SENTENCE, ZONE and PARAGRAPH operators in a full text search over a RT index.  See extended syntax on how to use these operators. But basically, you enable analyzers that detect sentence boundaries (index_sp=1), or paragraph boundaries (index_sp=1 and html_strip=1), or document zone boundaries (html_strip=1 and index_zones=h*,th,title), and then the Sphinx indexer stores the boundary locations into the index, and then you can limit your searches to particular sentences, paragraphs, zones, or continuous zone spans respectively. That worked nicely since 2.0, and now in 2.1 we also extended this indexing mode to include RT indexes.

RT vs substrings

In the past, infixes (aka substring searches) were not possible in RT — and this has come to an end! Now you can use them in RT indexes just as well, using dict = keywords.

index myrtindex {
       type = rt
       min_infix_len = 3
       dict = keywords
       ...
}

Unlike disk indexes, we intentionally limit substring searches support to dict=keywords in RT. Even though it’s technically possible to implement pre-indexing of all the possible keyword substrings with dict=crc in RT, the impact on the index size is way too big. That pre-indexing is less of an issue with disk indexes, where the majority of the data can stay on disk, and only the (bloated) dictionary needs to be loaded into RAM. But with RT, the entire current working set stays in RAM. So implementing that code path (ie. substrings + dict=crc) would mean either requiring 3-10x more RAM, or causing 3-10x worse RT index fragmentation, and that wasn’t a price we were willing to pay.

Also, using dict=keywords has another nice side effect. With that dictionary type we can support not only substrings, but wildcards like [subs?tr*ngs] just as well. With dict=crc, that would have been impossible. And yes, wildcards do now work in RT too.

RT vs INSERT stalls

A problem we imagine that some of you might have encountered in the past is that when a lot of INSERTs were made the data needed to be occasionally dumped to the disk. While searches still worked fine during the dumping, this caused the index to be locked for subsequent INSERT operations. And disks are slow! So with a big enough rt_mem_limit, insertions could occasionally stall for a significant time (seconds or in extreme cases even minutes).

To solve this, we introduced a so-called double buffer that makes the insertions (and other changes actually) much more seamless. We now reserve a small (hard coded) fraction of rt_mem_limit memory for a “last line of defence” buffer. And during the dumping to disk, we now accept incoming INSERTs and process them in that buffer, instead of just stalling them until the disk writes succeed. Of course, if the update pressure is peaking and is so huge that this 2nd buffer overflows during the few seconds while we write data to disk, the subsequent INSERTs will still stall and pile up. However under normal load INSERTs should now never stall much.

Oh, and this is an internal change. No configuration changes or anything needed. Just upgrade and you’re all set. So long story short, if your RT INSERTs were occasionally taking a lot of time, just upgrade to 2.1 and changes are that all rather than just most INSERTs will be super quick.

RT vs fragmentation

Here is another challenge that many of you likely faced in previous versions of Sphinx: when high volumes of updates and deletes were being performed on a RT index with many chunks internal data structures (kill-lists, index chunks etc) became highly fragmented. This fragmentation slowed search and made the size of the index files grow. We did a lot of work on optimizing internal processes and you should see a great improvement if you do a lot of changes (INSERT, UPDATE, REPLACE, DELETE etc) on your RT index.

Disk chunk optimization is now implemented via the following SphinxQL command:

OPTIMIZE INDEX rtindex.

This command will merge all disk chunks. That reduces fragmentation and also physically purges the no-longer-needed data. Optimization happens in a background thread, and index is searchable at all times.

Similar to indexing, the RT optimization can be intensive disk I/O, so it might be desired to control the I/O operations, so there are two new options for that in searchd section of configuration:

  • rt_merge_iops – limits the number of IO operations caused by a RT optimization thread per second. Default is 0 (unlimited). Limiting the number of iops is especially useful with plain old HDDs as they can only handle 70-100 iops per disk.
  • rt_merge_maxiosize – limits the maximum size of the IO operation caused by a RT optimization thread. Default is currently 256K and accepted values are in 0 to 2M range. Decreasing this option might be useful with SSDs because they perform better with lower block sizes.

Here’s an example that limits RT optimization thread to at most 40×1 MB IOs per second:

searchd {
    ...
    rt_merge_iops = 40
    rt_merge_maxiosize = 1048576
    ...
}

New RT tool, TRUNCATE statement

In 2.0.2 we introduced the ATTACH command which allowed you to convert an on-disk index to a RT index. Now, we are introducing a new command that can clear out a RT index:

TRUNCATE RTINDEX rtindex

This command will dispose the in-memory data, unlink all the disk files and clear out the associated binary logs. Please note that this command only purges the data but does not change the index schema! So if you want to modify the index structure (ie. columns and fields), for now you will still need to stop searchd process and delete the index files.

TRUNCATE does not do any heavyweight processing, it mostly simply unlinks the files and disposes of the allocated memory, so it’s pretty quick (think ‘fractions of a second’ quick). So, for instance, if you want to rebuild the entire RT index from a data source really quick, you can now build a disk index, then reset the existing one, then convert the newly built disk index to RT again:

TRUNCATE RTINDEX myrtindex
ATTACH INDEX ondisk TO RTINDEX myrtindex

Apparently, rebuilding the “ondisk” index might take a while, but the flipping part will be instant, as both TRUNCATE and ATTACH are really, really lightweight operations.

Conclusion

As you see we’ve improved RT indexes a lot. They’re as functionally complete as the disk indexes now, and a number of performance issues were corrected. However we’ve been working on much more than that, and RT changes are only the tip of the iceberg :) So stay posted for more updates on what to expect from the forthcoming Sphinx 2.1… or grab the trunk already and see it for yourself in action!


« »

7 Responses to “RealTime Index Improvements in 2.1.1”

  1. Nils says:

    Awesome, exactly what I need.

  2. Michail says:

    Great job!
    And when you are able to access RT index on binary protocol?

  3. adrian says:

    @Michail: You can search RT indexes using the binary protocol in the same way you do for on-disk ones.

  4. Diego says:

    Amazing, and so needed

  5. Denis says:

    Hi!

    Is there double buffer in sphinx-2.1.0-rf9f8b5f14900 ?

    Kind regards,
    Denis

  6. shodan says:

    Denis, most likely there isn’t. Judging by the revision number, that was some temporary branch, now integrated into 2.1 branch. And the double buffer (among a number of other nice things) was added directly to the 2.1 branch.

  7. Denis says:

    Thank you, shodan!

Leave a Reply