View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000721Sphinxgeneralpublic2011-02-26 22:082011-03-07 17:59
Reporteromakase 
Assigned ToTomat 
PrioritynormalSeveritycrashReproducibilityalways
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version1.11-dev 
Target VersionFixed in Version 
Summary0000721: RT indexing crashing under heavy write load, descending document ids
DescriptionUsing r2689.

Discovered while querying a RT index receiving thousands of writes per second. On certain keywords the index would crash completely on queries that accessed attributes.

The indexes wouldn't load with debug build. Ran searchd with debug and gdb and fresh indexes, crashed within 15 seconds with the attached backtrace.
TagsNo tags attached.
Attached Filesrtf file icon r2689-crash-1.rtf [^] (1,956 bytes) 2011-02-26 22:08
rtf file icon r2689-crash-query.rtf [^] (3,649 bytes) 2011-02-26 22:22

- Relationships

-  Notes
(0001220)
omakase (reporter)
2011-02-26 22:24

I just attached a second file which is a seg fault generated while running the non --with-debug build with gdb. This seg fault is triggered by a query for a specific keyword. 9/10 queries succeed, some words cause this seg fault and in this case only when the query orders or uses an attribute in the select.
(0001221)
Tomat (manager)
2011-02-27 10:11

Could you provide index, config and query log wich cause this crash which help us to reproduce it locally?

You could send that to ftp://sphinxsearch.com [^] ( user:sphinxbugs pass:stillhappen ). This ftp is write only for customers that is why you could safely drop even your private data.

It could be enough to send only *.ram, *.meta, *.kill part of your indexes.

To investigate the second crash it worth to post crash info from searchd.log as there is a query stored which cause that crash.
(0001222)
omakase (reporter)
2011-03-02 00:04

Will try get you something end of day. Because it's crashing immediately I don't see anything in .ram, .meta, .kill files...

I'm creating a script that you should be able to run locally w/ my conf and gdb in debug mode that reproduces the docid assertion:
Assertion failed: (DOCINFO2ID ( &dRows[i] ) > DOCINFO2ID ( &dRows[i-iStride] )), function CheckSegmentRows, file sphinxrt.cpp, line 1558.
(0001223)
omakase (reporter)
2011-03-02 00:43

OK uploaded bug-721 with an index and a script to reproduce the bug locally.
(0001224)
omakase (reporter)
2011-03-02 04:09

I was also able to reproduce the querying bug by running the non debug build, inserting data with the attached script for about a minute and then running this query:
select * from dist WHERE MATCH('the') ORDER BY created_at DESC LIMIT 500;

Might have to try a few different keywords to reproduce it...
(0001225)
Tomat (manager)
2011-03-02 09:53

and what is a stream module? as I have such error
ImportError: No module named stream

Could you provide module location?
(0001226)
omakase (reporter)
2011-03-02 09:57

Hmmm, I uploaded again, there should be a stream.py in that folder. The upload must have failed the first time, or you have to run the script from within that directory. Are you on the IRC channel?
(0001227)
Tomat (manager)
2011-03-02 10:12

This time another module missed
ImportError: No module named anyjson

I've enter IRC
(0001228)
omakase (reporter)
2011-03-02 10:29

http://pastie.org/private/nka0be6ain8hyuye93j3a [^]

two dirty fixes to get things to run
1. keeps track of last merged row to make sure we aren't adding dupe ids
2. just return 0 when there would be a null pointer exception
(0001234)
omakase (reporter)
2011-03-04 03:04

Also should note that I'm doing batch REPLACE INTO queries because I am occasionally trying to insert dupe documents so batch with a batch INSERT everything fails if one record is a dupe.
(0001235)
Tomat (manager)
2011-03-04 07:17

Yes, the core of this crash that it hasn't handled records with the same id in the same batch.
I'm going to fix that. Till fix you could manually filter out those documents to pass this crash.
(0001236)
omakase (reporter)
2011-03-04 07:28

Were you able to reproduce any of the querying issues? Do they also stem from the duplicate records being indexed?
(0001237)
Tomat (manager)
2011-03-04 08:17

I haven't reproduced any of querying issues. But in case of broken segments from duplicate documents in a batch it should crash on search.
(0001241)
Tomat (manager)
2011-03-04 11:34

this issue has just fixed at r2701

- Issue History
Date Modified Username Field Change
2011-02-26 22:08 omakase New Issue
2011-02-26 22:08 omakase File Added: r2689-crash-1.rtf
2011-02-26 22:22 omakase File Added: r2689-crash-query.rtf
2011-02-26 22:24 omakase Note Added: 0001220
2011-02-27 10:11 Tomat Note Added: 0001221
2011-02-27 10:11 Tomat Status new => assigned
2011-02-27 10:11 Tomat Assigned To => Tomat
2011-03-02 00:04 omakase Note Added: 0001222
2011-03-02 00:43 omakase Note Added: 0001223
2011-03-02 04:09 omakase Note Added: 0001224
2011-03-02 09:53 Tomat Note Added: 0001225
2011-03-02 09:57 omakase Note Added: 0001226
2011-03-02 10:12 Tomat Note Added: 0001227
2011-03-02 10:29 omakase Note Added: 0001228
2011-03-04 03:04 omakase Note Added: 0001234
2011-03-04 07:17 Tomat Note Added: 0001235
2011-03-04 07:28 omakase Note Added: 0001236
2011-03-04 08:17 Tomat Note Added: 0001237
2011-03-04 11:34 Tomat Note Added: 0001241
2011-03-04 11:34 Tomat Status assigned => resolved
2011-03-04 11:34 Tomat Resolution open => fixed
2011-03-07 17:59 Tomat Status resolved => closed


Copyright © 2000 - 2010 MantisBT Group
Powered by Mantis Bugtracker