anonymous user

Forums   Register   Login   Forgot your login/password?   Search

RT Performance Degradation Cutoff

Common forum | 1 | 2 | 3 | 4 | 5 | ... | 453 | 454 | 455 | 456 | next »» | Create new thread

cbeans

Name: Chris
Posts: 13

2011-06-24 20:37:20 | reply!


I've recently been running some performance comparisons between using Sphinx and Lucene
indexing collections of email inboxes. In the smaller tests, Sphinx was winning across
the board, and was much faster at processing the input and indexing it, but I did notice
that its generated indexes were about three times larger than those of Lucene. Granted,
Lucene has a lot more support for taming and cropping and sorting indexes to be just the
information that you want, but when these indexes began to exceed my 40gb of RAM, I
started to (perhaps predictably) notice some serious performance degradation. I've seen
analysis elsewhere showing cutoffs for RT performance against, say, the plain indexes,
and I was curious if it was primarily the rate of growth of the indexes that limits their
performance. Has anyone else experienced something similar, or have any advice for how to
trim down the indexes to squeeze out better performance?

Tomat

Name: Stas Klinov
Posts: 909

to: cbeans, 2011-06-26 17:42:52 | reply!


> I've recently been running some performance comparisons between using Sphinx and Lucene
> indexing collections of email inboxes. In the smaller tests, Sphinx was
...
the rate of growth of the indexes that limits their performance. Has anyone else
experienced something similar, or have any advice for how to trim down the indexes to
squeeze out better performance?

Its hard to say something without actual numbers, config etc as you could not set
rt_mem_limit and it by default is 32 mb so pushing a lot of data to such index gives you
many disk chunks and you performance could degrade greatly.

cbeans

Name: Chris
Posts: 13

to: Tomat, 2011-06-27 18:41:13 | reply!


> Its hard to say something without actual numbers, config etc as you could not set
> rt_mem_limit and it by default is 32 mb so pushing a lot of data to such index gives you
> many disk chunks and you performance could degrade greatly.
>


As it stands, I've got the indexes broken into 64 partitions, corresponding to the
partitions of the Mailboxes as they are broken up into separate MySQL databases.
rt_mem_limit is set to 2gb, so we shouldn't have all too many disk chunks. Additionally,
how does the size of the indexes when dumped out compare against their size when the
daemon is using them? Will this 70gb get reduced at all?

Tomat

Name: Stas Klinov
Posts: 909

to: cbeans, 2011-06-28 07:05:34 | reply!


> As it stands, I've got the indexes broken into 64 partitions, corresponding to the
> partitions of the Mailboxes as they are broken up into separate MySQL databases.
> rt_mem_limit is set to 2gb, so we shouldn't have all too many disk chunks. Additionally,
> how does the size of the indexes when dumped out compare against their size when the
> daemon is using them? Will this 70gb get reduced at all?

Could you provide disk chunks count for your indexes?

cbeans

Name: Chris
Posts: 13

to: Tomat, 2011-06-28 18:52:22 | reply!


> Could you provide disk chunks count for your indexes?
>

I can indeed, but first I want to make sure that I'm accurately relaying the information.
Is it the case that each index gets an associated .ram file to use as the dump whenever
searchd halts, and that only once this .ram file is going to exceed the chunk size are
further chunks allocated with a set of files with a trailing .i.sp* for increasing i's?
If I understand things correctly, then we have the 64 ram files after the dump, 12 sets
of .0.sp* files, 4 sets of .1.sp* files, and 4 sets of .2.sp* files. If I understand
things correctly, that means we've got 84 chunks.

Tomat

Name: Stas Klinov
Posts: 909

to: cbeans, 2011-06-28 20:57:14 | reply!


> dump whenever searchd halts, and that only once this .ram file is going to exceed the
> chunk size are further chunks allocated with a set of files with a trailing .i.sp* for
> increasing i's? If I understand things correctly, then we have the 64 ram files after the
> dump, 12 sets of .0.sp* files, 4 sets of .1.sp* files, and 4 sets of .2.sp* files. If I
> understand things correctly, that means we've got 84 chunks.

Yes, as explained here
http://sphinxsearch.com/docs/current.html#rt-internals

RT dumps plain index as disk chunk on ram chunk overgrow rt_mem_limit
So you have 4 RT index with 4 to 12 disk chunks per index.

And daemon performs search over disk chunks then ram chunk per index(s).
That is why it could be slow in case you issue query to all indexes in that case daemon
should perform search over 64 (ram) * ( 4 to 12 )( plain ) indexes.

Tomat

Name: Stas Klinov
Posts: 909

to: Tomat, 2011-06-28 20:58:30 | reply!


> RT dumps plain index as disk chunk on ram chunk overgrow rt_mem_limit
> So you have 4 RT index with 4 to 12 disk chunks per index.
>

I've misspelled - Not 4 Rt indexes but 64 RT indexes

Tomat

Name: Stas Klinov
Posts: 909

to: cbeans, 2011-06-28 21:01:26 | reply!


> increasing i's? If I understand things correctly, then we have the 64 ram files after the
> dump, 12 sets of .0.sp* files, 4 sets of .1.sp* files, and 4 sets of .2.sp*

It could be better to measure RT chunks not counting .0.*, .1.*, but counting

rt_index1.ram
rt_index1.0.*
rt_index1.1.*
rt_index1.2.*
rt_index1.3.*

So you have RT index with 4 disk chunks ( plain indexes )

Common forum | 1 | 2 | 3 | 4 | 5 | ... | 453 | 454 | 455 | 456 | next »» | Create new thread