Not sure how many of you have seen Elastic Search http://www.elasticsearch.com/ from
the demos, info on there it looks pretty good. It's a nice framework for handling groups
of search nodes and is built on top of Lucene.
It got me thinking about how hard it would be to build something similar from a high
level would be on top of Sphinx, and also wondering how useful it would be given that you
can't retrieve full documents as you can with ES.
I had some thoughts about areas that could be very helpful if you could configure
multiple sphinx indexes as a group via some kind of web service or even from a command
If you only indexed data that had passed through your application in some kind of
structured form (XML, JSON, CSV, etc) then you could add a notion of sharding to the app
Easy management of distributed (sharded) indexes
You could define a column to shard by, sharding algorithm and let the software make sure
that the right indexes were on the right machines, configs were all correct, etc. With a
really strong setup you could perhaps handle re-sharding, or loss of a search node
With a set of search nodes you could divide up indexing tasks between machines in the
cluster. You could do this perhaps with each node indexing one shard or you could go
further and have your app take an existing index, chunk the data into N pieces and then
re-assemble it with index merging.
Publishing data on which indexes exist where
This could be useful for apps using the sphinx index to let them know which servers to
create and if the system was fault aware it could just dish out new information to the
app about which servers to use.
I'm sure there's a bunch more stuff it could do that I've not thought of but you can see
how it could be pretty handy. I hope I've seeded some ideas!