Sharing technology, ideas, insights!
Call: +91 710 466 0336         Email: hello@sanisoft.com

Blog

Sphinx search engine and PHP (Part 1 – Installation and Indexing)

In the two part series we will see how to install sphinx, prepare the index and then search the index from our php scripts.

Shpinx is a free open source SQL full-text search engine and for those who might confuse it with the Great Sphinx of Giza,  Sphinx is the acronym for SQL Phrase Index. Some key features of this search engine are (from official site):

  • high indexing speed (upto 10 MB/sec on modern CPUs)
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
  • supports distributed searching (since v.0.9.6)
  • supports MySQL natively (MyISAM and InnoDB tables are both supported)
  • supports phrase searching
  • supports phrase proximity ranking, providing good relevance
  • supports English and Russian stemming
  • supports any number of document fields (weights can be changed on the fly)
  • supports document groups
  • supports stopwords
  • supports different search modes (“match all”, “match phrase” and “match any” as of v.0.9.5)
  • generic XML interface which greatly simplifies custom integration
  • pure-PHP (ie. NO module compiling etc) search client API

The instructions given should work with most systems. I have used the following:

  • Sphinx 0.9.9
  • Kubuntu 9.10 (Any *nix OS should work)
  • PHP 5.2.10
  • MySQL 5.1.37

Installation

  1. As with any other application, you first need to download and extract the latest sphinx tar ball.
  2. Go to the root directory of sphinx and issue the ./configure command. I used –prefix=/usr/local/sphinx option to keep all sphinx related files in a single directory. Other important option is –with-mysql which specifies the directory to look for MySQL include and library files. Use this only if auto detection fails.
  3. Build and install by issuing make followed by make install command.
  4. When you are done, all the binaries should be in /usr/local/sphinx/bin directory.

Indexing

Shpinx uses special data structures called as index to facilitate full-text search queries. To build index we can use the indexer utility. We need a configuration file to tell indexer what and how to index. Lets see how to do all of this using a real world example.

Problem:

I have two tables members and addresses and Each member can have many addresses. Lets create an index to hold member names and all addresses so that full text search can be performed on any of those fields. The SQL for the tables is..

Now create a sphinx configuration file /usr/local/sphinx/etc/sphinx.conf. I have put inline comments to explain each setting.

Note: The example below assumes that you have installed sphinx with –prefix=/usr/local/shpinx. If not then change the paths accordingly.

Note: For a full list of options that can be specified in sphinx.conf, please see this

We are using xmlpipe2 datasource type. For this we will create a php script, makeindex.php, which will output the well-formed xml as required by the indexer to stdout. We will be specifying the schema (i.e. the set of fields and attributes) in the xml itself.

We will get the data from members and addresses table and will create the required xml. We will be using PHP 5’s native xmlWriter class for this purpose. For brevity i am using mysql_* functions without any error trapping. In your actual code you should put all the error checks. Also the following code is just for illustration purposes.

Above php script should output xml similar to this.. (you will get different set of data)

Now to run the indexer issue the following command

/usr/local/sphinx/bin/indexer –all

This will create indexes for all the index defined in sphinx configuration file.

To test whether the index got created or not issue the following command

/usr/local/sphinx/bin/search searchterm

Replace searchterm with actual term and it should output the results.

In the next post, we will see how to search the index from a php script.

Update: Part 2 – Searching from PHP

About the Author

Abbas Ali is a Mechanical Engineer by education. He turned to programming and took it as a profession just after finishing his studies. He is fascinated equally by both machines and computers. He leads the team of dynamic programmers at SANIsoft and works as a Technology Manager. He is also an active developer on the Coppermine Picture Gallery team.

18 comments

  1. Pingback: abcphp.com

  2. i would like to suggest that you must have some useful and good looking design for your blog but any way is that very informative posting i have learn a lot from this post thanks for sharing.

  3. hi could you please finish part 2? I have everything working up untill this point and im not sure how to get my search form to work with this. this is a great tutorial btw!

  4. Pingback: Sphinx search engine and PHP (Part 2 – Searching from PHP) at SANIsoft – PHP for E Biz

  5. Thanks for this short yet very informative tutorial. You’ve unlocked my understanding of the xmlpipe2 process. :)

    I’ve never used xmlWriter so this might be a typical newbie question but i was wondering if instead of using the text() method, one should use the writeCData() method ?

  6. Pingback: SQL data source in Sphinx at SANIsoft – PHP for E Biz

  7. Hi, Enjoying the tutorial, still trying to get my head around what exactly is going on with your conf file.

    I started out with the Sphinx test and created the quick start demo but your instruction to create a conf in the same dir is a little confusing. Or maybe it is just not explicit enough for me, being a new kid on the block.

    The conf resides in /usr/local/sphinx/ by default and has all options commented out, are you saying to add your config content to the file or replace the content with yours?

    If so how does it know the DB connection etc, don’t tell me I think I know, From the PHP file.

    Can have multiple config files residing in projects folders?

  8. @Chris: For the above example, you should replace the contents of conf file with what I have shown above.

    When using xmlpipe2 datasource – db connection is established in the php script which streams the xml so no question of defining the db connection in conf file.

    Yes multiple config files can reside in projects folders and you need to use “-c” option while indexing (and starting searchd) to specify which config file to use.

  9. Pingback: Extended query syntax in Sphinx search engine at SANIsoft – PHP for E Biz

  10. Pingback: Geo-distance search in Sphinx at SANIsoft – PHP for E Biz

  11. detail :the usage of sphinx 2.0.3 cpu is always high level(50-90%),and the number of processes are about 1500. i want to know why? it lead to the results of search can not dispaly sometimes. i guess it is because that the numbers of process are too large. maybe it is the problem of setting? my server is 16 nucleus, 24m internal memory

  12. I know I am very late at party but I am hoping you can still help me with the issue I am facing.

    You mentioned that a member can have multiple addresses. I have a similar situation where there can be multiple associations. For example, A can have multiple Bs and B in turn can have multiple Cs and search should return As if anything matches from A and related Bs and Cs.

    I am wondering what’s the most efficient way to index such data. Further, does your example work if there are multiple addresses associated with same member?

    Thanks in anticipation.

Leave a Reply