Sphinx search engine and PHP (Part 1 – Installation and Indexing)

In the two part series we will see how to install sphinx, prepare the index and then search the index from our php scripts.

Shpinx is a free open source SQL full-text search engine and for those who might confuse it with the Great Sphinx of Giza,  Sphinx is the acronym for SQL Phrase Index. Some key features of this search engine are (from official site):

  • high indexing speed (upto 10 MB/sec on modern CPUs)
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
  • supports distributed searching (since v.0.9.6)
  • supports MySQL natively (MyISAM and InnoDB tables are both supported)
  • supports phrase searching
  • supports phrase proximity ranking, providing good relevance
  • supports English and Russian stemming
  • supports any number of document fields (weights can be changed on the fly)
  • supports document groups
  • supports stopwords
  • supports different search modes (“match all”, “match phrase” and “match any” as of v.0.9.5)
  • generic XML interface which greatly simplifies custom integration
  • pure-PHP (ie. NO module compiling etc) search client API

The instructions given should work with most systems. I have used the following:

  • Sphinx 0.9.9
  • Kubuntu 9.10 (Any *nix OS should work)
  • PHP 5.2.10
  • MySQL 5.1.37

Installation

  1. As with any other application, you first need to download and extract the latest sphinx tar ball.
  2. Go to the root directory of sphinx and issue the ./configure command. I used –prefix=/usr/local/sphinx option to keep all sphinx related files in a single directory. Other important option is –with-mysql which specifies the directory to look for MySQL include and library files. Use this only if auto detection fails.
  3. Build and install by issuing make followed by make install command.
  4. When you are done, all the binaries should be in /usr/local/sphinx/bin directory.

Indexing

Shpinx uses special data structures called as index to facilitate full-text search queries. To build index we can use the indexer utility. We need a configuration file to tell indexer what and how to index. Lets see how to do all of this using a real world example.

Problem:

I have two tables members and addresses and Each member can have many addresses. Lets create an index to hold member names and all addresses so that full text search can be performed on any of those fields. The SQL for the tables is..

  1. CREATE TABLE IF NOT EXISTS `members` (
  2. `id` int(11) NOT NULL AUTO_INCREMENT,
  3. `name` varchar(50) NOT NULL,
  4. `email` varchar(100) NOT NULL,
  5. `phone` int(10) NOT NULL,
  6. `join_date` date NOT NULL,
  7. PRIMARY KEY (`id`)
  8. ) ;
  9.  
  10. CREATE TABLE IF NOT EXISTS `addresses` (
  11. `id` int(11) NOT NULL AUTO_INCREMENT,
  12. `member_id` int(11) NOT NULL,
  13. `address` varchar(255) NOT NULL,
  14. `city` varchar(50) NOT NULL,
  15. `zipcode` varchar(15) NOT NULL,
  16. PRIMARY KEY (`id`)
  17. ) ;

Now create a sphinx configuration file /usr/local/sphinx/etc/sphinx.conf. I have put inline comments to explain each setting.

Note: The example below assumes that you have installed sphinx with –prefix=/usr/local/shpinx. If not then change the paths accordingly.

  1. # Define the source for the data.
  2. source src1
  3. {
  4.     # We will use the xmlpipe2 datasource
  5.     type  = xmlpipe2
  6.    
  7.     # Command which should be executed to get the xml. The following php script outputs the required xml to stdout
  8.     xmlpipe_command  = php /path/to/makeindex.php
  9. }
  10.  
  11. # Define the index.
  12. index members
  13. {
  14.     # Which source to use
  15.     source = src1
  16.    
  17.     # Path where to store the index data
  18.     path = /usr/local/sphinx/var/data/members
  19.    
  20.     # Charset of the data
  21.     charset_type = utf-8
  22.    
  23.     # Minimum lenght of a word to be indexed.
  24.     min_word_len = 3
  25. }
  26.  
  27. # Indexer definition
  28. indexer
  29. {
  30.     # Memory limit for the indexer
  31.     mem_limit = 32M
  32. }

Note: For a full list of options that can be specified in sphinx.conf, please see this

We are using xmlpipe2 datasource type. For this we will create a php script, makeindex.php, which will output the well-formed xml as required by the indexer to stdout. We will be specifying the schema (i.e. the set of fields and attributes) in the xml itself.

We will get the data from members and addresses table and will create the required xml. We will be using PHP 5′s native xmlWriter class for this purpose. For brevity i am using mysql_* functions without any error trapping. In your actual code you should put all the error checks. Also the following code is just for illustration purposes.

  1. <?php
  2. // File: /path/to/makeindex.php
  3. $dbh = mysql_connect('localhost', 'user', 'password');
  4. mysql_select_db('dbname');
  5.  
  6. $result = mysql_query("SELECT * FROM members");
  7.  
  8. while($member = mysql_fetch_assoc($result)) {
  9.     $data[$member['id']]['member'] = $member;
  10.     $addressResult = mysql_query("SELECT * FROM addresses WHERE member_id = '{$member['id']}'");
  11.     $data[$member['id']]['addresses'] = array();
  12.     while($address = mysql_fetch_assoc($addressResult)) {
  13.         $data[$member['id']]['addresses'][] = $address;
  14.     }
  15. }
  16.  
  17. $xmlwriter = new xmlWriter;
  18. $xmlwriter->openMemory();
  19. $xmlwriter->setIndent(true);
  20. $xmlwriter->startDocument('1.0', 'UTF-8');
  21. $xmlwriter->startElement('sphinx:docset');
  22.  
  23. $xmlwriter->startElement('sphinx:schema');
  24.  
  25. $xmlwriter->startElement('sphinx:field');
  26. $xmlwriter->writeAttribute("name", "name");
  27. $xmlwriter->endElement();
  28.  
  29. $xmlwriter->startElement('sphinx:field');
  30. $xmlwriter->writeAttribute("name", "addresses");
  31. $xmlwriter->endElement();
  32.  
  33. $xmlwriter->endElement();
  34.  
  35. foreach($data as $id => $member) {
  36.     $xmlwriter->startElement('sphinx:document');
  37.     $xmlwriter->writeAttribute("id", $id);
  38.    
  39.     $xmlwriter->startElement('name');
  40.     $xmlwriter->text($member['member']['name']);
  41.     $xmlwriter->endElement();
  42.    
  43.    
  44.     $xmlwriter->startElement('addresses');
  45.     $buffer = '';
  46.     foreach($member['addresses'] as $address) {
  47.         $buffer .= $address['address'] . ' ' . $address['city'] . ' ' . $address['zipcode'] . ' ';
  48.     }
  49.     $xmlwriter->text($buffer);
  50.     $xmlwriter->endElement();
  51.    
  52.     $xmlwriter->endElement();
  53. }
  54. $xmlwriter->endElement();
  55.  
  56. print $xmlwriter->flush();

Above php script should output xml similar to this.. (you will get different set of data)

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <sphinx:docset>
  3.  <sphinx:schema>
  4.   <sphinx:field name="name"/>
  5.   <sphinx:field name="addresses"/>
  6.  </sphinx:schema>
  7.  <sphinx:document id="1">
  8.   <name>Abbas Ali</name>
  9.   <addresses>Shantinagar Nagpur 440002 Itwari Arcade Nagpur 440001 </addresses>
  10.  </sphinx:document>
  11.  <sphinx:document id="2">
  12.   <name>Amit Badkas</name>
  13.   <addresses>Shivajinagar Dharampeth Nagpur 440024 wadi amravati 440021 </addresses>
  14.  </sphinx:document>
  15. </sphinx:docset>

Now to run the indexer issue the following command

/usr/local/sphinx/bin/indexer –all

This will create indexes for all the index defined in sphinx configuration file.

To test whether the index got created or not issue the following command

/usr/local/sphinx/bin/search searchterm

Replace searchterm with actual term and it should output the results.

In the next post, we will see how to search the index from a php script.

Update: Part 2 – Searching from PHP

About Abbas Ali

Abbas Ali is a Mechanical Engineer by education. He turned to programming and took it as a profession just after finishing his studies. He is fascinated equally by both machines and computers. He leads the team of dynamic programmers at SANIsoft and works as a Technology Manager. He is also an active developer on the Coppermine Picture Gallery team.

17 Responses to Sphinx search engine and PHP (Part 1 – Installation and Indexing)

  1. Template Design January 18, 2010 at 6:31 pm #

    i would like to suggest that you must have some useful and good looking design for your blog but any way is that very informative posting i have learn a lot from this post thanks for sharing.

  2. cosmic January 25, 2010 at 6:39 am #

    hi could you please finish part 2? I have everything working up untill this point and im not sure how to get my search form to work with this. this is a great tutorial btw!

  3. Abbas Ali January 27, 2010 at 8:19 am #

    @cosmic: I intend to do the part 2 as soon as possible. Please subscribe to the blog so that you get a notification as soon as I post it.

  4. chris February 3, 2010 at 7:32 pm #

    Thanks for this short yet very informative tutorial. You’ve unlocked my understanding of the xmlpipe2 process. :)

    I’ve never used xmlWriter so this might be a typical newbie question but i was wondering if instead of using the text() method, one should use the writeCData() method ?

  5. Chris Rymer September 21, 2010 at 10:09 pm #

    Hi, Enjoying the tutorial, still trying to get my head around what exactly is going on with your conf file.

    I started out with the Sphinx test and created the quick start demo but your instruction to create a conf in the same dir is a little confusing. Or maybe it is just not explicit enough for me, being a new kid on the block.

    The conf resides in /usr/local/sphinx/ by default and has all options commented out, are you saying to add your config content to the file or replace the content with yours?

    If so how does it know the DB connection etc, don’t tell me I think I know, From the PHP file.

    Can have multiple config files residing in projects folders?

  6. Abbas Ali September 22, 2010 at 9:56 am #

    @Chris: For the above example, you should replace the contents of conf file with what I have shown above.

    When using xmlpipe2 datasource – db connection is established in the php script which streams the xml so no question of defining the db connection in conf file.

    Yes multiple config files can reside in projects folders and you need to use “-c” option while indexing (and starting searchd) to specify which config file to use.

  7. Dian March 12, 2011 at 7:10 am #

    How access/retrieve file hash that have been indexing with Sphinx?

  8. Abbas Ali March 14, 2011 at 12:00 pm #

    @Dian, can you rephrase your question? Didn’t understand what you are asking.

  9. Juraj Krivda May 18, 2011 at 2:03 pm #

    You have to add Namespace to root element:

  10. Jason November 24, 2011 at 2:33 pm #

    Why on earth would you write a whole XML script for your data when Sphinx can index data from MySQL directly?!

    Madness…

  11. Abbas Ali November 24, 2011 at 3:13 pm #

    @Jason SQL datasource has been discussed here – http://www.sanisoft.com/blog/2010/08/23/sql-data-source-in-sphinx/. The reason XML datasource is mentioned here because Sphinx supports it :) .

  12. mingfeng.wu May 10, 2012 at 11:47 am #

    detail :the usage of sphinx 2.0.3 cpu is always high level(50-90%),and the number of processes are about 1500. i want to know why? it lead to the results of search can not dispaly sometimes. i guess it is because that the numbers of process are too large. maybe it is the problem of setting? my server is 16 nucleus, 24m internal memory

Trackbacks/Pingbacks

  1. abcphp.com - January 11, 2010

    Sphinx search engine and PHP (Part 1 – Installation and Indexing) at SANIsoft – PHP for E Biz…

    Shpinx is a free open source SQL full-text search engine and for those who might confuse it with the Great Sphinx of Giza, Sphinx is the acronym for SQL Phrase Index. Some key features of this search engine are (from official site):…

  2. Sphinx search engine and PHP (Part 2 – Searching from PHP) at SANIsoft – PHP for E Biz - January 29, 2010

    [...] earlier post we saw how to install and configure sphinx. We also created an index for sphinx to search from. Now [...]

  3. SQL data source in Sphinx at SANIsoft – PHP for E Biz - August 23, 2010

    [...] get started with sphinx please read my earlier posts on installation, indexing using xmlpipe data source and [...]

  4. Extended query syntax in Sphinx search engine at SANIsoft – PHP for E Biz - March 7, 2011

    [...] Sphinx search engine TweetWe have been discussing sphinx for the past several months and have seen how to install it, how to perform a full-text search and how to update the [...]

  5. Geo-distance search in Sphinx at SANIsoft – PHP for E Biz - May 2, 2011

    [...] Sphinx makes it very easy to perform geo-distance based searches in such cases. All you need is the latitude and longitude attributes in your Sphinx index. We have already seen how to create Sphinx indexes. [...]

Leave a Reply