Jan 24, 2013. SphinxQL 2.1.1 Cheat Sheet

In preparation for the release of Sphinx 2.1.1, we’re posting this SphinxQL “Cheat Sheet” to give you a taste of what to  expect. Brief summary: we added GROUPBY() that returns a grouping key (pretty useful with MVAs), GROUP_CONCAT(), EXIST() to let you query sets of indexes with both existent and non-existent columns, LIKE ‘%substring%’ clause support to SHOW and other statements that can dump a lot of data, added more statistics to SHOW META and a new SHOW INDEX STATUS statement, added UPDATE … OPTION ignore_nonexistent_columns, and a few more things like subselects that we’re going to discuss in the future posts.

New Grouping Functions

GROUPBY()function will return the value by which the grouping was made. This is equivalent to the special legacy attribute @groupby in the API. Note that you cannot use both the old special style attribute and the new function in a query.

mysql> select *,groupby()  from test group by mva;
+------+------+------+------+-----------+
| id   | uid  | gid  | mva  | groupby() |
+------+------+------+------+-----------+
|    1 |    1 |    2 | 2,3  |         2 |
|    1 |    1 |    2 | 2,3  |         3 |
|    2 |    2 |    5 | 3,4  |         4 |
+------+------+------+------+-----------+
3 rows in set (0.00 sec)

GROUP_CONCAT() function gives you the option to return a column with concatenated values from a specific attribute on a query that has applied a grouping, just like the MySQL counterpart would do. Note that currently GROUP_CONCAT only supports scalar and MVA fields (int, bigint, float, mva, and mva64):

mysql> select *,group_concat(uid)  from test group by mva;
+------+------+------+------+-------------------+
| id   | uid  | gid  | mva  | group_concat(uid) |
+------+------+------+------+-------------------+
|    1 |    1 |    2 | 2,3  | 1                 |
|    1 |    1 |    2 | 2,3  | 2,1               |
|    2 |    2 |    5 | 3,4  | 2                 |
+------+------+------+------+-------------------+
3 rows in set (0.01 sec)

The EXIST() function

Previously, searching in non-existing fulltext fields was possible using @@relaxed operator, but attributes didn’t have anything similar until now. The EXIST() function is an equivalent of @@relaxed, but it is specifically to be used for attributes:

SELECT *, EXIST('gid', 6) as cnd FROM i1, i2 WHERE cnd>5

This example will return results only from the index that doesn’t have the “gid” attribute. The first parameter is the attribute name that you’re looking for.  The second parameter is a default value to be used in case the attribute is not found. So instead of having a NULL value, which we currently do not support (hey, Sphinx is not a full blown SQL server… just yet), you need to specify a default value. This can be used not only to have a returned value, but also to grab documents that either miss or have the attribute declared in the first position.

EXIST() can be used only in SELECT statements. This function can have a primary use in searches across multiple indexes that don’t have the same attributes schema.

LIKE operator

The LIKE operator is added to SHOW STATUS, SHOW META, DESCRIBE, SHOW TABLES, and SHOW AGENT STATUS statements.

For example, getting only total_found from a SHOW META statement can be done using:

mysql> SHOW META LIKE 'total_found';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total_found   | 346   |
+---------------+-------+
1 row in set (0.00 sec)

Note that you can also use standard SQL wildcards in LIKE. Both ‘%’ that matches any number of characters (including zero) and ‘_’ that matches exactly one character are supported:

mysql> DESCRIBE myindex LIKE 's_urce%';
+-------------+------+
| Field       | Type |
+-------------+------+
| source_id   | uint |
| source_type | uint |
+-------------+------+
2 rows in set (0.00 sec)

Statistics in SHOW META

Previously, starting searchd with –iostats –cpustats would log cpu and io related information in query.log, as well as the agents total spent – in case of a distributed index. Starting with 2.1.1, this information is also available in SHOW META.

mysql> show meta;
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| total                 | 2     |
| total_found           | 2     |
| time                  | 0.000 |
| cpu_time              | 0.335 |
| agents_cpu_time       | 0.000 |
| io_read_time          | 0.000 |
| io_read_ops           | 0     |
| io_read_kbytes        | 0.0   |
| io_write_time         | 0.000 |
| io_write_ops          | 0     |
| io_write_kbytes       | 0.0   |
| agent_io_read_time    | 0.000 |
| agent_io_read_ops     | 0     |
| agent_io_read_kbytes  | 0.0   |
| agent_io_write_time   | 0.000 |
| agent_io_write_ops    | 0     |
| agent_io_write_kbytes | 0.0   |
+-----------------------+-------+
17 rows in set (0.00 sec)

SHOW INDEX … STATUS operator

Another cool little thing we’ve extended SphinxQL this is a SHOW INDEX {indexname} STATUS operator that reports a number of interesting metrics. indexed_documents and indexed_bytes counters were accessible earlier via indextool –dumpheader. But simply exposing those via SHOW INDEX STATUS would not be interesting enough so we’ve also added field lengths (that now get computed when index_field_lengths=1) and memory use counters to its output. More counters might follow; and your feedback as to which per-index counters exactly you would be interested in is welcome.

mysql> show index lj status;
+----------------------+-----------+
| Variable_name        | Value     |
+----------------------+-----------+
| index_type           | disk      |
| indexed_documents    | 499984    |
| indexed_bytes        | 553430410 |
| field_tokens_title   | 1158953   |
| field_tokens_content | 83718821  |
| ram_bytes            | 37515362  |
+----------------------+-----------+
6 rows in set (0.00 sec)

Ignoring non-existent columns when doing updates

We talked earlier about having the possibility to query over a distributed index that might not have all of the same attributes across it’s indexes. Although,  if you want to make attribute updates on that distributed index, one of the indexes would not be updated because we specified that attributes that are not everywhere are not to be indexed. And that would raise an error, too, which is not always the desired behavior.

To overcome this, we have added a new OPTION that will tell the UPDATE command not to fail if it has tried to change an attribute that doesn’t exist.

This comes as an OPTION clause for UPDATE statements:

UPDATE index ... OPTION ignore_nonexistent_columns=

It also comes as an OPTION for UpdateAttributes API function:

UpdateAttributes ( $index, $attrs, $values, $mva=false, $ignorenonexistent=false )

Stay tuned for more announcements on the soon-to-be-released Sphinx 2.1.1.


« »

3 Responses to “SphinxQL 2.1.1 Cheat Sheet”

  1. Andrew says:

    Is is possible to order the results of GROUP_CONCAT? I tried WITHIN GROUP ORDER BY but that does not seem to work.

  2. shodan says:

    Andrew, nice to see you :-)

    I am a bit confused as to what ordering do you mean here. The WITHIN GROUP clause lets you pick the row that will represent the group, but it should not affect the aggregate functions (be it GROUP_CONCAT or MIN or MAX) in any way.

    As for ordering the result set by the result that GROUP_CONCAT() generated, I don’t think we ever tested that, but that *should* be possible.

  3. Andrew says:

    Thanks for responding, shodan.

    The latter is what I would like to achieve. I need to have a grouped resultset ordered by @weight, each of which have their own subgroup list of ids that are ordered by @weight.

    Something similar to MySQL:

    SELECT group_concat(@id ORDER BY @weight DESC) AS ids FROM myindex WHERE MATCH(‘somequery’) GROUP BY mygroup ORDER BY @weight DESC

Leave a Reply