Nov 11, 2011. Sphinx memory consumption

Let’s step back from performance and highlight how Sphinx uses memory.

There are two different ways to store data in Sphinx: either in an on-disk, or a real-time index.

Lets start with the good old on-disk indexes. Basically, a Sphinx on-disk index is a set of files generated by indexer during the indexing process. All of them, except .spd and .spp are kept in memory for performance reasons. Here’s an example:

index lj1m
{
        type       = plain
        source     = src_lj1m
        path       = lj1m
        ...
}

Please note that type = plain is optional and might be omitted. ls -lah for this index is showing us the following:

-rw-r--r-- 1 vlad vlad  12M 2010-12-22 09:01 lj1m.spa
-rw-r--r-- 1 vlad vlad 334M 2010-12-22 09:01 lj1m.spd
-rw-r--r-- 1 vlad vlad  438 2010-12-22 09:01 lj1m.sph
-rw-r--r-- 1 vlad vlad  13M 2010-12-22 09:01 lj1m.spi
-rw-r--r-- 1 vlad vlad    0 2010-12-22 09:01 lj1m.spk
-rw------- 1 vlad vlad    0 2011-10-27 16:57 lj1m.spl
-rw-r--r-- 1 vlad vlad    0 2010-12-22 09:01 lj1m.spm
-rw-r--r-- 1 vlad vlad 111M 2010-12-22 09:01 lj1m.spp
-rw-r--r-- 1 vlad vlad    1 2010-12-22 09:01 lj1m.sps

So the total index size is 469 megabytes. However, memory consumption for this index will be 469M minus .spd and .spp size. 469M-334M-111M=25M. So Sphinx will need 25 megabytes of RAM to serve this index. Additionally, every query will need a bit of extra memory for buffers, result set, etc but typically that memory is comparatively small, and this method provides a pretty good general estimate.

RT indexes are slightly different. They allocate memory up to a rt_mem_limit as you insert new documents. Once this rt_mem_limit is exhausted, Sphinx flushes all the data from the RAM chunk to the disk (and creates a new on-disk index), then resets it and continues with an empty RAM chunk. So for the RT index you can estimate memory consumption by calculating the size of all on-disk chunks (minus .spd & .spp sizes as noted above) plus the maximum RAM chunk size (rt_mem_limit).

Feb 2013 update: while all of the above still holds true, and useful to estimate maximum RAM use, too, there is now an easier method to check the current RAM use, SHOW INDEX myindex STATUS statement. Among other things, it returns a ram_bytes counter.

The last thing to mention is a memory required by indexer to build actual index. There is only one option to control memory usage there, mem_limit. We generally recommend to set it anywhere between 128M and 1024M (depending on your documents collection size). Upto 2047M is supported, but interestingly enough, going over 1G limit barely improves indexing performance.

Tags:


« »

3 Responses to “Sphinx memory consumption”

  1. Using the ondisk_dict = 1 option, of course, can reduce memory usage for on-disk indexes a bit as well. That’d mean omitting the .spi file, if I recall correctly.

  2. Miha Svalina says:

    Article says:
    ‘Where total index size is 469 Megabytes. Memory consumption for this index will be 469M minus .spd and .spp size. 496M-334M-111M=51M. Sphinx will need a 51 megabyte of memory to perform search against this index.’

    If you sum up all data sizes you get 470M (12M+334M+13M+111M = 470M well perhaps we should calculate in bytes and then /(1024*1024) to be more accurate).

    I think there is a mistake, 469M != 496M. So 469M-334M-111M = 24M.

  3. shodan says:

    Miha, you are right, that was a typo, thanks. The method itself is still correct, though. Now as of 2.1 there is also an easier way, SHOW INDEX STATUS.

Leave a Reply