Integrating Sphinx into Perl Applications

Sphinx is a full-text search engine (http://www.sphinxsearch.com) designed
primarily for full-text search of database content.  It has many features but in
my opinion its best assets are speed of search and scalability.

We started using Sphinx when MySQL built-in full-text search was becoming too
slow and too CPU intensive, and of questionable accuracy.  Sphinx is lightning
fast compared to MySQL and provides better results relevancy.

This note is about integration with the standalone Sphinx search server. Sphinx
also has a component (‘SphinxSE’) that runs as a MySQL 5 engine so can be used as
a direct replacement for MySQL full-text search; to use SphinxSE, standard Perl
DBI should be all that is necessary.

What you will need:

The following CPAN modules are likely to be useful:

Sphinx::Search
Sphinx::Manager
Sphinx::Config

Sphinx::Manager provides facilities to start and stop the search server and to
run the indexer.

Sphinx::Search provides the search API.

Sphinx::Config allows you to read/write the Sphinx configuration files from
code, in case you wish to maintain the configuration elsewhere (e.g. in your
database).

Putting it all together:

Running the Sphinx searchd server

Sphinx operates most efficiently if it is allowed to run persistently as a
background service.  Theoretically, you could start the Sphinx server, do a
search and then stop it on every request, with a small amount of overhead – but
here we will consider just the typical case.

Ideally you will use your operating system tools start such as daemontools,
monit or just the SysV startup scripts to start and monitor searchd, rather than
have to worry about it in your perl app.  But, if you need or want to start it
in perl:

  use Sphinx::Manager;
  my $mgr = Sphinx::Manager->new({ config_file => ’/etc/sphinx.conf’ });
  $mgr->start_searchd;

You should verify that the effective UID of your perl app has all of the appropriate
permissions:

  • to create and write to the PID file (see ’searchd’ section of config, ‘pid_file’)
  • to create and write to the log file (see ’searchd’/'log’)
  • to read the Sphinx database files (‘path’ in each of your ‘index’ specifications)

Adding Content to the Index

  use Sphinx::Manager;
  my $mgr = Sphinx::Manager->new({ config_file => ’/etc/sphinx.conf’ });
  $mgr->run_indexer('--rotate');

Sphinx gets its content for indexing directly from the database, according to
the ’sql_query’ given in the config file.  ‘run_indexer’ simply runs the command
line version of the Sphinx indexer program.  You can pass any indexer arguments
through to ‘run_indexer’; ‘–rotate’ is typical, to force searchd to start using
the newly created index without disrupting searches while indexing is
occurring.

Searching

Make sure you have a version of Sphinx::Search that is compatible with searchd.
A compatibility list is given at the top of the Sphinx::Search perldoc.
Hopefully a point will be reached where the Sphinx::Search client can support a
range of searchd versions, but for the moment that is impractical.

Sphinx::Search can be used with any logging object that supports error, warn,
info and debug methods.  In this example I have used Log::Log4perl.

  use Sphinx::Search;
  use Log::Log4perl qw(:easy);
  Log::Log4perl->easy_init($DEBUG);
  $sph = Sphinx::Search->new( log => Log::Log4perl->get_logger('sphinx.search') );
  my $results = $sph->setMatchMode(SPH_MATCH_ALL)
                    ->Query("...");

Configuring

Sphinx::Config provides the tools to read and write the Sphinx configuration file.

A typical problem is that searchd is running on a non-standard port (the default
is 3312), so how will your perl app know where to find it?  Obviously you don’t
want to hard-code port numbers in case they change…

use Sphinx::Search;
use Sphinx::Config;
use Log::Log4perl qw(:easy);

Log::Log4perl->easy_init($DEBUG);

$sph = Sphinx::Search->new( log => Log::Log4perl->get_logger(’sphinx.search’) );

# Get port from config file
$conf = Sphinx::Config->new;
$conf->parse(‘/etc/sphinx.conf’);
my $port = $conf->get(’searchd’, undef, ‘port’);

# Tell Sphinx client
$sph->setServer(‘localhost’, $port);

my $results = $sph->Query(“…”);


Enjoy

We have had a considerable amount of success using Perl and Sphinx.  I hope you
do too.


Delicious Bookmark this on Delicious submit to reddit

Leave a Comment

You must be logged in to post a comment.