Sphinx search engine and PHP (Part 1 – Installation and Indexing)
In the two part series we will see how to install sphinx, prepare the index and then search the index from our php scripts.
Shpinx is a free open source SQL full-text search engine and for those who might confuse it with the Great Sphinx of Giza, Sphinx is the acronym for SQL Phrase Index. Some key features of this search engine are (from official site):
- high indexing speed (upto 10 MB/sec on modern CPUs)
- high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
- high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
- supports distributed searching (since v.0.9.6)
- supports MySQL natively (MyISAM and InnoDB tables are both supported)
- supports phrase searching
- supports phrase proximity ranking, providing good relevance
- supports English and Russian stemming
- supports any number of document fields (weights can be changed on the fly)
- supports document groups
- supports stopwords
- supports different search modes ("match all", "match phrase" and "match any" as of v.0.9.5)
- generic XML interface which greatly simplifies custom integration
- pure-PHP (ie. NO module compiling etc) search client API
The instructions given should work with most systems. I have used the following:
- Sphinx 0.9.9
- Kubuntu 9.10 (Any *nix OS should work)
- PHP 5.2.10
- MySQL 5.1.37
Installation
- As with any other application, you first need to download and extract the latest sphinx tar ball.
- Go to the root directory of sphinx and issue the ./configure command. I used --prefix=/usr/local/sphinx option to keep all sphinx related files in a single directory. Other important option is --with-mysql which specifies the directory to look for MySQL include and library files. Use this only if auto detection fails.
- Build and install by issuing make followed by make install command.
- When you are done, all the binaries should be in /usr/local/sphinx/bin directory.
Indexing
Shpinx uses special data structures called as index to facilitate full-text search queries. To build index we can use the indexer utility. We need a configuration file to tell indexer what and how to index. Lets see how to do all of this using a real world example.
Problem:
I have two tables members and addresses and Each member can have many addresses. Lets create an index to hold member names and all addresses so that full text search can be performed on any of those fields. The SQL for the tables is..
-
CREATE TABLE IF NOT EXISTS `members` (
-
`id` int(11) NOT NULL AUTO_INCREMENT,
-
`name` varchar(50) NOT NULL,
-
`email` varchar(100) NOT NULL,
-
`phone` int(10) NOT NULL,
-
`join_date` date NOT NULL,
-
PRIMARY KEY (`id`)
-
) ;
-
-
CREATE TABLE IF NOT EXISTS `addresses` (
-
`id` int(11) NOT NULL AUTO_INCREMENT,
-
`member_id` int(11) NOT NULL,
-
`address` varchar(255) NOT NULL,
-
`city` varchar(50) NOT NULL,
-
`zipcode` varchar(15) NOT NULL,
-
PRIMARY KEY (`id`)
-
) ;
Now create a sphinx configuration file /usr/local/sphinx/etc/sphinx.conf. I have put inline comments to explain each setting.
Note: The example below assumes that you have installed sphinx with --prefix=/usr/local/shpinx. If not then change the paths accordingly.
-
# Define the source for the data.
-
source src1
-
{
-
# We will use the xmlpipe2 datasource
-
type = xmlpipe2
-
-
# Command which should be executed to get the xml. The following php script outputs the required xml to stdout
-
xmlpipe_command = php /path/to/makeindex.php
-
}
-
-
# Define the index.
-
index members
-
{
-
# Which source to use
-
source = src1
-
-
# Path where to store the index data
-
path = /usr/local/sphinx/var/data/members
-
-
# Charset of the data
-
charset_type = utf-8
-
-
# Minimum lenght of a word to be indexed.
-
min_word_len = 3
-
}
-
-
# Indexer definition
-
indexer
-
{
-
# Memory limit for the indexer
-
mem_limit = 32M
-
}
Note: For a full list of options that can be specified in sphinx.conf, please see this
We are using xmlpipe2 datasource type. For this we will create a php script, makeindex.php, which will output the well-formed xml as required by the indexer to stdout. We will be specifying the schema (i.e. the set of fields and attributes) in the xml itself.
We will get the data from members and addresses table and will create the required xml. We will be using PHP 5's native xmlWriter class for this purpose. For brevity i am using mysql_* functions without any error trapping. In your actual code you should put all the error checks. Also the following code is just for illustration purposes.
-
<?php
-
// File: /path/to/makeindex.php
-
-
-
$data[$member['id']]['member'] = $member;
-
$data[$member['id']]['addresses'][] = $address;
-
}
-
}
-
-
$xmlwriter = new xmlWriter;
-
$xmlwriter->openMemory();
-
$xmlwriter->setIndent(true);
-
$xmlwriter->startDocument('1.0', 'UTF-8');
-
$xmlwriter->startElement('sphinx:docset');
-
-
$xmlwriter->startElement('sphinx:schema');
-
-
$xmlwriter->startElement('sphinx:field');
-
$xmlwriter->writeAttribute("name", "name");
-
$xmlwriter->endElement();
-
-
$xmlwriter->startElement('sphinx:field');
-
$xmlwriter->writeAttribute("name", "addresses");
-
$xmlwriter->endElement();
-
-
$xmlwriter->endElement();
-
-
foreach($data as $id => $member) {
-
$xmlwriter->startElement('sphinx:document');
-
$xmlwriter->writeAttribute("id", $id);
-
-
$xmlwriter->startElement('name');
-
$xmlwriter->text($member['member']['name']);
-
$xmlwriter->endElement();
-
-
-
$xmlwriter->startElement('addresses');
-
$buffer = '';
-
foreach($member['addresses'] as $address) {
-
$buffer .= $address['address'] . ' ' . $address['city'] . ' ' . $address['zipcode'] . ' ';
-
}
-
$xmlwriter->text($buffer);
-
$xmlwriter->endElement();
-
-
$xmlwriter->endElement();
-
}
-
$xmlwriter->endElement();
-
Above php script should output xml similar to this.. (you will get different set of data)
-
<?xml version="1.0" encoding="UTF-8"?>
-
<sphinx:docset>
-
<sphinx:schema>
-
<sphinx:field name="name"/>
-
<sphinx:field name="addresses"/>
-
</sphinx:schema>
-
<sphinx:document id="1">
-
<name>Abbas Ali</name>
-
<addresses>Shantinagar Nagpur 440002 Itwari Arcade Nagpur 440001 </addresses>
-
</sphinx:document>
-
<sphinx:document id="2">
-
<name>Amit Badkas</name>
-
<addresses>Shivajinagar Dharampeth Nagpur 440024 wadi amravati 440021 </addresses>
-
</sphinx:document>
-
</sphinx:docset>
Now to run the indexer issue the following command
/usr/local/sphinx/bin/indexer --all
This will create indexes for all the index defined in sphinx configuration file.
To test whether the index got created or not issue the following command
/usr/local/sphinx/bin/search searchterm
Replace searchterm with actual term and it should output the results.
In the next post, we will see how to search the index from a php script.
Update: Part 2 – Searching from PHP
About this entry
You’re currently reading “ Sphinx search engine and PHP (Part 1 – Installation and Indexing) ,” an entry on SANIsoft – PHP for E Biz
- Published:
- 1.11.10 / 10:33am
- Author:
- Abbas Ali
6 Comments
Jump to comment form | comments rss | trackback uri