SQL data source in Sphinx

These days I am getting more and more into Sphinx and falling in love with it :) . In this post we will see how to make use of SQL data source to create indexes.

Sphinx has inbuilt support for MySQL and PostgreSQL databases. The drivers for these databases can connect to the database using its native C/C++ API, execute the queries and fetch the data. The data returned by the query is then indexed. Without any further ado let's see this in action...

To get started with sphinx please read my earlier posts on installation, indexing using xmlpipe data source and searching.

Let's say we have two tables in MySQL viz books and authors. Each book belongs to one author. Database structure is as follows..

SQL:
  1. CREATE TABLE `authors` (
  2.   `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  3.   `name` varchar(50) NOT NULL,
  4.   PRIMARY KEY (`id`)
  5. ) ENGINE=MyISAM;
  6.  
  7. CREATE TABLE `books` (
  8.   `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  9.   `title` varchar(255) NOT NULL,
  10.   `author_id` int(10) UNSIGNED NOT NULL,
  11.   `publish_date` datetime NOT NULL,
  12.   PRIMARY KEY (`id`)
  13. ) ENGINE=MyISAM;

We want to create an index for searching books. We also want the search results to be filtered by author. The sphinx configuration will be as follows

CODE:
  1. # Define the source for the data.
  2. source src
  3. {
  4.     # We will use the mysql datasource
  5.     type               = mysql
  6.     # connection parameters
  7.     sql_host           = localhost
  8.     sql_user           = root
  9.     sql_pass           = mypass
  10.     sql_db             = mydb
  11.     # query that will fetch the data from db
  12.     sql_query          = SELECT id, title, UNIX_TIMESTAMP(publish_date) \
  13.         AS publish_date, author_id FROM books
  14.     # attributes
  15.     sql_attr_uint      = author_id
  16.     sql_attr_timestamp = publish_date
  17. }
  18. # Define the index.
  19. index books
  20. {
  21.     # Which source to use
  22.     source = src
  23.     # Path where to store the index data
  24.     path = /usr/local/sphinx/var/data/books
  25.     # Charset of the data
  26.     charset_type = utf-8
  27.     # Minimum lenght of a word to be indexed.
  28.     min_word_len = 3
  29. }
  30. # Indexer definition
  31. indexer
  32. {
  33.     # Memory limit for the indexer
  34.     mem_limit = 32M
  35. }
  36.  
  37. # Searchd settings
  38. searchd
  39. {
  40.     # Port to listen on
  41.     port          = 3312
  42.     # Next few are the paths to log files
  43.     log        = /usr/local/sphinx/var/log/searchd.log
  44.     query_log     = /usr/local/sphinx/var/log/query.log
  45.     # Maximum amount of concurrent searches to run - 0 for unlimited
  46.     max_children  = 30
  47.     # Path to pid file
  48.     pid_file      = /usr/local/sphinx/var/log/searchd.pid
  49. }

The important configuration options are :

  • sql_query : The query to be run to fetch the data from the database. There can be only one main query per data source. Document ID must be the very first field in the query and it must be unique unsigned integer. Rest of the fields will be fulltext indexed (unless they are defined as attributes).
  • sql_attr_uint : Option to specify the unsigned integer attribute. Results can be filtered using the attributes and attributes are not fulltext indexed.
  • sql_attr_timestamp : Option to specify the timestamp attribute.

After creating the configuration file run the indexer command:

/usr/local/sphinx/bin/indexer --config /path/to/config/file --all

If search daemon is not running then start it by running the following command as root

/usr/local/sphinx/bin/searchd

The php code in our application to search the index will be as follows..

PHP:
  1. <?php
  2. // Include the sphinx API class
  3. require('sphinxapi.php');
  4. // Connect to sphinx server
  5. $sp = new SphinxClient();
  6. // Set the server
  7. $sp->SetServer('localhost', 3312);
  8. // SPH_MATCH_ALL will match all words in the search term
  9. $sp->SetMatchMode(SPH_MATCH_ALL);
  10. // We want an array with complete per match information including the document ids
  11. $sp->SetArrayResult(true);
  12. // Set the filter so that only those books which has author_id = 1 or 3 are returned
  13. $sp->SetFilter('author_id', array(1,3));
  14. /**
  15. * Run the search query. Here the first argument is the search term
  16. * and the second is the name of the index to search in.
  17. * Search term can come from a search form
  18. */
  19. $results = $sp->Query('search term', 'books');
  20. ?>

$results['matches'] will be the an array with all the matched documents and you can then use this in your application accordingly.

About Abbas Ali

Abbas Ali is a Mechanical Engineer by education. He turned to programming and took it as a profession just after finishing his studies. He is fascinated equally by both machines and computers. He leads the team of dynamic programmers at SANIsoft and works as a Technology Manager. He is also an active developer on the Coppermine Picture Gallery team.

Leave a Reply