Jul 24, 2014. Distributed Sphinx Search in Docker Containers

We went through the basics of running Sphinx in a Docker container here. Now, in this blog post, we’ll use Docker containers to play around with some distributed indexing/search. Check it out for a simple demonstration of Sphinx HA in action.

The idea is simple

Run many instances of Sphinx in many Docker containers. Each instance will listen from inside it’s container on a unique port and the container will expose that port to the host machine. Some of the nodes will be mirrors of each other and some will contain unique information.

In this example, each of the nodes in the first set of agent mirrors contain the first 100 rows from a MySQL table (also running in Docker) and they all listen on ports starting with 93. The nodes in the second set of agent mirrors contain indexes with the next 100 docs and they listen on ports starting with 94. I started each of those nodes manually, but I created a script to build the configuration for the master node. I used ports starting with the same two digits for each set of agent mirrors to make identifying them easy… there may be a better way to approach this.

For the curious, the master container’s Sphinx configuration file ends up looking something like this:

index dist
{
   type=distributed
   agent=172.17.42.1:9307|172.17.42.1:9306:test
   agent=172.17.42.1:9407|172.17.42.1:9406:test
   ha_strategy=nodeads
}
 
searchd
{
   listen=9999:mysql41
   log=/var/log/sphinx/searchd.log
   query_log=/var/log/sphinx/query.log
   query_log_format=sphinxql
   read_timeout=5
   max_children=30
   pid_file=/var/run/sphinx/searchd.pid
   workers=threads
}

Get the Dockerfile and other stuff

All the necessary files (including makelord.sh, which builds the master’s configuration file), and more explanation, can be found here. Download these files and build an image (-t tags your image, name it whatever you want):

docker build -t sphinx/dist .

Alternatively, you might choose to just pull the image from the Docker Hub, like this:

docker pull stefobark/sphinxdocker

Also, I went through each of the steps, with more detail, in this blog post.

Walkthrough

This video demonstrates starting up the master node (with the distributed index type that maps to the agents), successfully querying it, stopping a container, and then successfully querying it again. I set ha_strategy to nodeads. To learn more about other high availability strategies, go read the documentation.



Bye Bye

Basic info, but hopefully it inspired some creative Sphinx-thoughts. Thanks for reading!

Happy Sphinxing


« »

2 Responses to “Distributed Sphinx Search in Docker Containers”

  1. marcel says:

    Are there any articles about how to distribute the data among sphinx instances, especially if they are mirrored?

  2. steve says:

    There’s more info on building Sphinx clusters in this blog post. Also, info on the different ha_strategies can be found here.

Leave a Reply