Prototype: function BuildExcerpts ( $docs, $index, $words, $opts=array() )
Excerpts (snippets) builder function. Connects to searchd,
asks it to generate excerpts (snippets) from given documents, and returns the results.
$docs is a plain array of strings that carry the documents' contents.
$index is an index name string. Different settings (such as charset,
morphology, wordforms) from given index will be used.
$words is a string that contains the keywords to highlight. They will
be processed with respect to index settings. For instance, if English stemming
is enabled in the index, "shoes" will be highlighted even if keyword is "shoe".
Starting with version 0.9.9-rc1, keywords can contain wildcards, that work similarly to
star-syntax available in queries.
$opts is a hash which contains additional optional highlighting parameters:
A string to insert before a keyword match. Starting with version 1.10-beta, a %PASSAGE_ID% macro can be used in this string. The macro is replaced with an incrementing passage number within a current snippet. Numbering starts at 1 by default but can be overridden with "start_passage_id" option. In a multi-document call, %PASSAGE_ID% would restart at every given document. Default is "<b>".
A string to insert after a keyword match. Starting with version 1.10-beta, a %PASSAGE_ID% macro can be used in this string. Default is "</b>".
A string to insert between snippet chunks (passages). Default is " ... ".
Maximum snippet size, in symbols (codepoints). Integer, default is 256.
How much words to pick around each matching keywords block. Integer, default is 5.
Whether to highlight exact query phrase matches only instead of individual keywords. Boolean, default is false.
Whether to extract single best passage only. Boolean, default is false.
Whether to additionaly break passages by phrase boundary characters, as configured in index settings with phrase_boundary directive. Boolean, default is false.
Whether to sort the extracted passages in order of relevance (decreasing weight), or in order of appearance in the document (increasing position). Boolean, default is false.
Added in version 1.10-beta. Whether to handle $words as a query in extended syntax, or as a bag of words (default behavior). For instance, in query mode ("one two" | "three four") will only highlight and include those occurrences "one two" or "three four" when the two words from each pair are adjacent to each other. In default mode, any single occurrence of "one", "two", "three", or "four" would be highlighted. Boolean, default is false.
Added in version 1.10-beta. Ignores the snippet length limit until it includes all the keywords. Boolean, default is false.
Added in version 1.10-beta. Limits the maximum number of passages that can be included into the snippet. Integer, default is 0 (no limit).
Added in version 1.10-beta. Limits the maximum number of keywords that can be included into the snippet. Integer, default is 0 (no limit).
Added in version 1.10-beta. Specifies the starting value of
%PASSAGE_ID% macro (that gets detected and expanded in before_match,
after_match strings). Integer, default is 1.
Added in version 1.10-beta. Whether to handle $docs as data to extract snippets from (default behavior), or to treat it as file names, and load data from specified files on the server side. Starting with version 2.0.1-beta, up to dist_threads worker threads per request will be created to parallelize the work when this flag is enabled. Boolean, default is false. Starting with version 2.0.2-beta, building of the snippets could be parallelized between remote agents. Just set the 'dist_threads' param in the config to the value greater than 1, and then invoke the snippets generation over the distributed index, which contain only one(!) first(!) local agent and several remotes.
Added in version 2.0.2-beta. It assumes "load_files" option, and works only with distributed snippets generation with remote agents. The source files for snippets could be distributed among different agents, and the main daemon will merge together all non-erroneous results. So, if one agent of the distributed index has 'file1.txt', another has 'file2.txt' and you call for the snippets with both these files, the sphinx will merge results from the agents together, so you will get the snippets from both 'file1.txt' and 'file2.txt'.
Added in version 1.10-beta. HTML stripping mode setting. Defaults to "index", which means that index settings will be used. The other values are "none" and "strip", that forcibly skip or apply stripping irregardless of index settings; and "retain", that retains HTML markup and protects it from highlighting. The "retain" mode can only be used when highlighting full documents and thus requires that no snippet size limits are set. String, allowed values are "none", "strip", "index", and "retain".
Added in version 1.10-beta. Allows empty string to be returned as highlighting result when a snippet could not be generated (no keywords match, or no passages fit the limit). By default, the beginning of original text would be returned instead of an empty string. Boolean, default is false.
Added in version 2.0.1-beta. Ensures that passages do not cross a sentence, paragraph, or zone boundary (when used with an index that has the respective indexing settings enabled). String, allowed values are "sentence", "paragraph", and "zone".
Added in version 2.0.1-beta. Emits an HTML tag with an enclosing zone name before each passage. Boolean, default is false.
Snippets extraction algorithm currently favors better passages (with closer phrase matches), and then passages with keywords not yet in snippet. Generally, it will try to highlight the best match with the query, and it will also to highlight all the query keywords, as made possible by the limtis. In case the document does not match the query, beginning of the document trimmed down according to the limits will be return by default. Starting with 1.10-beta, you can also return an empty snippet instead case by setting "allow_empty" option to true.
Returns false on failure. Returns a plain array of strings with excerpts (snippets) on success.