These days I am getting more and more into Sphinx and falling in love with it
. In this post we will see how to make use of SQL data source to create indexes.
Sphinx has inbuilt support for MySQL and PostgreSQL databases. The drivers for these databases can connect to the database using its native C/C++ API, execute the queries and fetch the data. The data returned by the query is then indexed. Without any further ado let's see this in action...
Let's say we have two tables in MySQL viz books and authors. Each book belongs to one author. Database structure is as follows..
-
CREATE TABLE `authors` (
-
`id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
-
`name` varchar(50) NOT NULL,
-
PRIMARY KEY (`id`)
-
) ENGINE=MyISAM;
-
-
CREATE TABLE `books` (
-
`id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
-
`title` varchar(255) NOT NULL,
-
`author_id` int(10) UNSIGNED NOT NULL,
-
`publish_date` datetime NOT NULL,
-
PRIMARY KEY (`id`)
-
) ENGINE=MyISAM;
We want to create an index for searching books. We also want the search results to be filtered by author. The sphinx configuration will be as follows
-
# Define the source for the data.
-
source src
-
{
-
# We will use the mysql datasource
-
type = mysql
-
# connection parameters
-
sql_host = localhost
-
sql_user = root
-
sql_pass = mypass
-
sql_db = mydb
-
# query that will fetch the data from db
-
sql_query = SELECT id, title, UNIX_TIMESTAMP(publish_date) \
-
AS publish_date, author_id FROM books
-
# attributes
-
sql_attr_uint = author_id
-
sql_attr_timestamp = publish_date
-
}
-
# Define the index.
-
index books
-
{
-
# Which source to use
-
source = src
-
# Path where to store the index data
-
path = /usr/local/sphinx/var/data/books
-
# Charset of the data
-
charset_type = utf-8
-
# Minimum lenght of a word to be indexed.
-
min_word_len = 3
-
}
-
# Indexer definition
-
indexer
-
{
-
# Memory limit for the indexer
-
mem_limit = 32M
-
}
-
-
# Searchd settings
-
searchd
-
{
-
# Port to listen on
-
port = 3312
-
# Next few are the paths to log files
-
log = /usr/local/sphinx/var/log/searchd.log
-
query_log = /usr/local/sphinx/var/log/query.log
-
# Maximum amount of concurrent searches to run - 0 for unlimited
-
max_children = 30
-
# Path to pid file
-
pid_file = /usr/local/sphinx/var/log/searchd.pid
-
}
The important configuration options are :
- sql_query : The query to be run to fetch the data from the database. There can be only one main query per data source. Document ID must be the very first field in the query and it must be unique unsigned integer. Rest of the fields will be fulltext indexed (unless they are defined as attributes).
- sql_attr_uint : Option to specify the unsigned integer attribute. Results can be filtered using the attributes and attributes are not fulltext indexed.
- sql_attr_timestamp : Option to specify the timestamp attribute.
After creating the configuration file run the indexer command:
/usr/local/sphinx/bin/indexer --config /path/to/config/file --all
If search daemon is not running then start it by running the following command as root
/usr/local/sphinx/bin/searchd
The php code in our application to search the index will be as follows..
-
<?php
-
// Include the sphinx API class
-
require('sphinxapi.php');
-
// Connect to sphinx server
-
$sp = new SphinxClient();
-
// Set the server
-
$sp->SetServer('localhost', 3312);
-
// SPH_MATCH_ALL will match all words in the search term
-
$sp->SetMatchMode(SPH_MATCH_ALL);
-
// We want an array with complete per match information including the document ids
-
$sp->SetArrayResult(true);
-
// Set the filter so that only those books which has author_id = 1 or 3 are returned
-
/**
-
* Run the search query. Here the first argument is the search term
-
* and the second is the name of the index to search in.
-
* Search term can come from a search form
-
*/
-
$results = $sp->Query('search term', 'books');
-
?>
$results['matches'] will be the an array with all the matched documents and you can then use this in your application accordingly.