In the two part series we will see how to install sphinx, prepare the index and then search the index from our php scripts.
Shpinx is a free open source SQL full-text search engine and for those who might confuse it with the Great Sphinx of Giza, Sphinx is the acronym for SQL Phrase Index. Some key features of this search engine are (from official site):
- high indexing speed (upto 10 MB/sec on modern CPUs)
- high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
- high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
- supports distributed searching (since v.0.9.6)
- supports MySQL natively (MyISAM and InnoDB tables are both supported)
- supports phrase searching
- supports phrase proximity ranking, providing good relevance
- supports English and Russian stemming
- supports any number of document fields (weights can be changed on the fly)
- supports document groups
- supports stopwords
- supports different search modes (“match all”, “match phrase” and “match any” as of v.0.9.5)
- generic XML interface which greatly simplifies custom integration
- pure-PHP (ie. NO module compiling etc) search client API
The instructions given should work with most systems. I have used the following:
- Sphinx 0.9.9
- Kubuntu 9.10 (Any *nix OS should work)
- PHP 5.2.10
- MySQL 5.1.37
Installation
- As with any other application, you first need to download and extract the latest sphinx tar ball.
- Go to the root directory of sphinx and issue the ./configure command. I used –prefix=/usr/local/sphinx option to keep all sphinx related files in a single directory. Other important option is –with-mysql which specifies the directory to look for MySQL include and library files. Use this only if auto detection fails.
- Build and install by issuing make followed by make install command.
- When you are done, all the binaries should be in /usr/local/sphinx/bin directory.
Indexing
Shpinx uses special data structures called as index to facilitate full-text search queries. To build index we can use the indexer utility. We need a configuration file to tell indexer what and how to index. Lets see how to do all of this using a real world example.
Problem:
I have two tables members and addresses and Each member can have many addresses. Lets create an index to hold member names and all addresses so that full text search can be performed on any of those fields. The SQL for the tables is..
- CREATE TABLE IF NOT EXISTS `members` (
- `id` int(11) NOT NULL AUTO_INCREMENT,
- `name` varchar(50) NOT NULL,
- `email` varchar(100) NOT NULL,
- `phone` int(10) NOT NULL,
- `join_date` date NOT NULL,
- PRIMARY KEY (`id`)
- ) ;
- CREATE TABLE IF NOT EXISTS `addresses` (
- `id` int(11) NOT NULL AUTO_INCREMENT,
- `member_id` int(11) NOT NULL,
- `address` varchar(255) NOT NULL,
- `city` varchar(50) NOT NULL,
- `zipcode` varchar(15) NOT NULL,
- PRIMARY KEY (`id`)
- ) ;
Now create a sphinx configuration file /usr/local/sphinx/etc/sphinx.conf. I have put inline comments to explain each setting.
Note: The example below assumes that you have installed sphinx with –prefix=/usr/local/shpinx. If not then change the paths accordingly.
- # Define the source for the data.
- source src1
- {
- # We will use the xmlpipe2 datasource
- type = xmlpipe2
- # Command which should be executed to get the xml. The following php script outputs the required xml to stdout
- xmlpipe_command = php /path/to/makeindex.php
- }
- # Define the index.
- index members
- {
- # Which source to use
- source = src1
- # Path where to store the index data
- path = /usr/local/sphinx/var/data/members
- # Charset of the data
- charset_type = utf-8
- # Minimum lenght of a word to be indexed.
- min_word_len = 3
- }
- # Indexer definition
- indexer
- {
- # Memory limit for the indexer
- mem_limit = 32M
- }
Note: For a full list of options that can be specified in sphinx.conf, please see this
We are using xmlpipe2 datasource type. For this we will create a php script, makeindex.php, which will output the well-formed xml as required by the indexer to stdout. We will be specifying the schema (i.e. the set of fields and attributes) in the xml itself.
We will get the data from members and addresses table and will create the required xml. We will be using PHP 5′s native xmlWriter class for this purpose. For brevity i am using mysql_* functions without any error trapping. In your actual code you should put all the error checks. Also the following code is just for illustration purposes.
- <?php
- // File: /path/to/makeindex.php
- $dbh = mysql_connect('localhost', 'user', 'password');
- mysql_select_db('dbname');
- $result = mysql_query("SELECT * FROM members");
- while($member = mysql_fetch_assoc($result)) {
- $data[$member['id']]['member'] = $member;
- $addressResult = mysql_query("SELECT * FROM addresses WHERE member_id = '{$member['id']}'");
- $data[$member['id']]['addresses'] = array();
- while($address = mysql_fetch_assoc($addressResult)) {
- $data[$member['id']]['addresses'][] = $address;
- }
- }
- $xmlwriter = new xmlWriter;
- $xmlwriter->openMemory();
- $xmlwriter->setIndent(true);
- $xmlwriter->startDocument('1.0', 'UTF-8');
- $xmlwriter->startElement('sphinx:docset');
- $xmlwriter->startElement('sphinx:schema');
- $xmlwriter->startElement('sphinx:field');
- $xmlwriter->writeAttribute("name", "name");
- $xmlwriter->endElement();
- $xmlwriter->startElement('sphinx:field');
- $xmlwriter->writeAttribute("name", "addresses");
- $xmlwriter->endElement();
- $xmlwriter->endElement();
- foreach($data as $id => $member) {
- $xmlwriter->startElement('sphinx:document');
- $xmlwriter->writeAttribute("id", $id);
- $xmlwriter->startElement('name');
- $xmlwriter->text($member['member']['name']);
- $xmlwriter->endElement();
- $xmlwriter->startElement('addresses');
- $buffer = '';
- foreach($member['addresses'] as $address) {
- $buffer .= $address['address'] . ' ' . $address['city'] . ' ' . $address['zipcode'] . ' ';
- }
- $xmlwriter->text($buffer);
- $xmlwriter->endElement();
- $xmlwriter->endElement();
- }
- $xmlwriter->endElement();
- print $xmlwriter->flush();
Above php script should output xml similar to this.. (you will get different set of data)
- <?xml version="1.0" encoding="UTF-8"?>
- <sphinx:docset>
- <sphinx:schema>
- <sphinx:field name="name"/>
- <sphinx:field name="addresses"/>
- </sphinx:schema>
- <sphinx:document id="1">
- <name>Abbas Ali</name>
- <addresses>Shantinagar Nagpur 440002 Itwari Arcade Nagpur 440001 </addresses>
- </sphinx:document>
- <sphinx:document id="2">
- <name>Amit Badkas</name>
- <addresses>Shivajinagar Dharampeth Nagpur 440024 wadi amravati 440021 </addresses>
- </sphinx:document>
- </sphinx:docset>
Now to run the indexer issue the following command
/usr/local/sphinx/bin/indexer –all
This will create indexes for all the index defined in sphinx configuration file.
To test whether the index got created or not issue the following command
/usr/local/sphinx/bin/search searchterm
Replace searchterm with actual term and it should output the results.
In the next post, we will see how to search the index from a php script.
Update: Part 2 – Searching from PHP

i would like to suggest that you must have some useful and good looking design for your blog but any way is that very informative posting i have learn a lot from this post thanks for sharing.
hi could you please finish part 2? I have everything working up untill this point and im not sure how to get my search form to work with this. this is a great tutorial btw!
@cosmic: I intend to do the part 2 as soon as possible. Please subscribe to the blog so that you get a notification as soon as I post it.
Thanks for this short yet very informative tutorial. You’ve unlocked my understanding of the xmlpipe2 process.
I’ve never used xmlWriter so this might be a typical newbie question but i was wondering if instead of using the text() method, one should use the writeCData() method ?
Hi, Enjoying the tutorial, still trying to get my head around what exactly is going on with your conf file.
I started out with the Sphinx test and created the quick start demo but your instruction to create a conf in the same dir is a little confusing. Or maybe it is just not explicit enough for me, being a new kid on the block.
The conf resides in /usr/local/sphinx/ by default and has all options commented out, are you saying to add your config content to the file or replace the content with yours?
If so how does it know the DB connection etc, don’t tell me I think I know, From the PHP file.
Can have multiple config files residing in projects folders?
@Chris: For the above example, you should replace the contents of conf file with what I have shown above.
When using xmlpipe2 datasource – db connection is established in the php script which streams the xml so no question of defining the db connection in conf file.
Yes multiple config files can reside in projects folders and you need to use “-c” option while indexing (and starting searchd) to specify which config file to use.
How access/retrieve file hash that have been indexing with Sphinx?
@Dian, can you rephrase your question? Didn’t understand what you are asking.
You have to add Namespace to root element:
Why on earth would you write a whole XML script for your data when Sphinx can index data from MySQL directly?!
Madness…
@Jason SQL datasource has been discussed here – http://www.sanisoft.com/blog/2010/08/23/sql-data-source-in-sphinx/. The reason XML datasource is mentioned here because Sphinx supports it
.
detail :the usage of sphinx 2.0.3 cpu is always high level(50-90%),and the number of processes are about 1500. i want to know why? it lead to the results of search can not dispaly sometimes. i guess it is because that the numbers of process are too large. maybe it is the problem of setting? my server is 16 nucleus, 24m internal memory