<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jon Schutz Technical Notes and Recommendations &#187; Sphinx Search Engine</title>
	<atom:link href="http://notes.jschutz.net/topics/sphinx-search-engine/feed/" rel="self" type="application/rss+xml" />
	<link>http://notes.jschutz.net</link>
	<description>Useful snippets technical info and recommendations</description>
	<lastBuildDate>Thu, 24 Jun 2010 07:07:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Sphinx Search Engine Performance</title>
		<link>http://notes.jschutz.net/2009/04/sphinx-search-engine-performance/</link>
		<comments>http://notes.jschutz.net/2009/04/sphinx-search-engine-performance/#comments</comments>
		<pubDate>Thu, 16 Apr 2009 06:05:41 +0000</pubDate>
		<dc:creator>jon</dc:creator>
				<category><![CDATA[Sphinx Search Engine]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/?p=87</guid>
		<description><![CDATA[The following is a summary of some real-world data collected from the Sphinx query logs on a cluster of 15 servers.  Each server runs its own copy of Sphinx, Apache, a busy web application, MySQL and miscellaneous services.  
The dataset contains 453 million query log instances from 180 Sphinx indexes, collected over several [...]]]></description>
			<content:encoded><![CDATA[<p>The following is a summary of some real-world data collected from the Sphinx query logs on a cluster of 15 servers.  Each server runs its own copy of Sphinx, Apache, a busy web application, MySQL and miscellaneous services.  </p>
<p>The dataset contains 453 million query log instances from 180 Sphinx indexes, collected over several months, using Sphinx version 0.9.8 on Linux kernel 2.6.18.  The servers are all Dell PowerEdge 1950 with Quad Core Intel® Xeon® E5335, 2&#215;4MB Cache, 2.0GHz, 1333MHz FSB, SATA drives, 7200rpm.</p>
<p>Keep in mind, though, that this is real world data and <strong>not a controlled test</strong>.  This is how Sphinx performed in our environment, for the particular way we use Sphinx.</p>
<p>The graph below displays the response time distribution for all servers and all indexes, and shows, for example, that 60% of queries complete within 0.01 secs, 80% within 0.1 secs and 99% within 0.5 secs.  Response times tend to occur in 3 bands (corresponding to the peaks in the frequency graph) &#8211; <0.001 sec, 0.03 sec and 0.3secs, which partly relates to the number of disk accesses required to fulfil a request.  At 0.001 sec, all data is in memory, while at 0.3 secs, several disk accesses are occurring.  Whilst the middle peak is not so obvious in this graph, the per-server or per-index graphs often have different distributions but still tend to have peaks at one or more of these three bands.<br />
<img src="http://notes.jschutz.net/wp-content/uploads/2009/04/rt_total.png" alt="Sphinx Query Response Times Total for all servers, all indexes" title="Sphinx Query Response Times Total for all servers, all indexes" width="500" class="aligncenter size-full wp-image-91" /></p>
<p>The next observation is that query word count affects performance, but not necessarily in proportion to the number of query words, as shown in the graph below. 1-4 word queries consistently offer best performance.  The 6-50 words range is consistently the slowest, most likely because the chance of finding documents with multiple matches is high so there is extra ranking effort involved. Above 50, there is presumably a higher chance of having words with few matches, which speeds up the ranking process.<br />
<img src="http://notes.jschutz.net/wp-content/uploads/2009/04/wc_rt_total.png" alt="Sphinx Query Response Time by Query Word Count" title="Sphinx Query Response Time by Query Word Count" width="500" class="aligncenter size-full wp-image-93" /></p>
<p>Finally, we see that the size of the inverted index (.spd files) also affects performance.  The three graphs below show how the response time distribution tends to move to the right as the index size increases.  The larger the index, the higher the chance that data will need to be re-read from disk (rather than from Sphinx-internal or system buffers/cache), hence this is not unexpected.<br />
<img src="http://notes.jschutz.net/wp-content/uploads/2009/04/index-rt-cumulative1.png" alt="Sphinx Query Response Times for Index Sizes 1MB - 3MB" title="Sphinx Query Response Times for Index Sizes 1MB - 3MB" width="500" class="aligncenter size-full wp-image-88" /><br />
<img src="http://notes.jschutz.net/wp-content/uploads/2009/04/index-rt-cumulative2.png" alt="Sphinx Query Response Times for Index Sizes 3MB - 30MB" title="Sphinx Query Response Times for Index Sizes 3MB - 30MB" width="500" class="aligncenter size-full wp-image-89" /><img src="http://notes.jschutz.net/wp-content/uploads/2009/04/index-rt-cumulative3.png" alt="Sphinx Query Response Times for Index Sizes &gt;30MB" title="Sphinx Query Response Times for Index Sizes &gt;30MB" width="500" class="aligncenter size-full wp-image-90" /></p>
<p>Here is a <a href="http://notes.jschutz.net/wp-content/uploads/2009/04/sphinx-performance.pdf">PDF summary of Sphinx performance</a> for this dataset, including many additional graphs of the data by server and by index.</p>
]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/2009/04/sphinx-search-engine-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Integrating Sphinx into Perl Applications</title>
		<link>http://notes.jschutz.net/2008/11/integrating-sphinx-into-perl-applications/</link>
		<comments>http://notes.jschutz.net/2008/11/integrating-sphinx-into-perl-applications/#comments</comments>
		<pubDate>Mon, 03 Nov 2008 00:27:17 +0000</pubDate>
		<dc:creator>jon</dc:creator>
				<category><![CDATA[Perl]]></category>
		<category><![CDATA[Sphinx Search Engine]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/?p=18</guid>
		<description><![CDATA[Sphinx is a full-text search engine (http://www.sphinxsearch.com) designed
primarily for full-text search of database content.  It has many features but in
my opinion its best assets are speed of search and scalability.
We started using Sphinx when MySQL built-in full-text search was becoming too
slow and too CPU intensive, and of questionable accuracy.  Sphinx is lightning
fast compared to MySQL [...]]]></description>
			<content:encoded><![CDATA[<p>Sphinx is a full-text search engine (http://www.sphinxsearch.com) designed<br />
primarily for full-text search of database content.  It has many features but in<br />
my opinion its best assets are speed of search and scalability.</p>
<p>We started using Sphinx when MySQL built-in full-text search was becoming too<br />
slow and too CPU intensive, and of questionable accuracy.  Sphinx is lightning<br />
fast compared to MySQL and provides better results relevancy.</p>
<p>This note is about integration with the standalone Sphinx search server. Sphinx<br />
also has a component (&#8216;SphinxSE&#8217;) that runs as a MySQL 5 engine so can be used as<br />
a direct replacement for MySQL full-text search; to use SphinxSE, standard Perl<br />
DBI should be all that is necessary.</p>
<h2></h2>
<h2>What you will need:</h2>
<p>The following CPAN modules are likely to be useful:</p>
<p>Sphinx::Search<br />
Sphinx::Manager<br />
Sphinx::Config</p>
<p>Sphinx::Manager provides facilities to start and stop the search server and to<br />
run the indexer.</p>
<p>Sphinx::Search provides the search API.</p>
<p>Sphinx::Config allows you to read/write the Sphinx configuration files from<br />
code, in case you wish to maintain the configuration elsewhere (e.g. in your<br />
database).</p>
<h2>Putting it all together:</h2>
<h3>Running the Sphinx searchd server</h3>
<p>Sphinx operates most efficiently if it is allowed to run persistently as a<br />
background service.  Theoretically, you could start the Sphinx server, do a<br />
search and then stop it on every request, with a small amount of overhead &#8211; but<br />
here we will consider just the typical case.</p>
<p>Ideally you will use your operating system tools start such as daemontools,<br />
monit or just the SysV startup scripts to start and monitor searchd, rather than<br />
have to worry about it in your perl app.  But, if you need or want to start it<br />
in perl:</p>
<pre>  use Sphinx::Manager;</pre>
<pre>
  my $mgr = Sphinx::Manager-&gt;new({ config_file =&gt; ’/etc/sphinx.conf’ });</pre>
<pre>  $mgr-&gt;start_searchd;</pre>
<p>You should verify that the effective UID of your perl app has all of the appropriate<br />
permissions:</p>
<ul>
<li>to create and write to the PID file (see &#8217;searchd&#8217; section of config, &#8216;pid_file&#8217;)</li>
<li>to create and write to the log file (see &#8217;searchd&#8217;/'log&#8217;)</li>
<li>to read the Sphinx database files (&#8216;path&#8217; in each of your &#8216;index&#8217; specifications)</li>
</ul>
<h3>Adding Content to the Index</h3>
<pre>  use Sphinx::Manager;</pre>
<pre>
  my $mgr = Sphinx::Manager-&gt;new({ config_file =&gt; ’/etc/sphinx.conf’ });</pre>
<pre>  $mgr-&gt;run_indexer('--rotate');</pre>
<p>Sphinx gets its content for indexing directly from the database, according to<br />
the &#8217;sql_query&#8217; given in the config file.  &#8216;run_indexer&#8217; simply runs the command<br />
line version of the Sphinx indexer program.  You can pass any indexer arguments<br />
through to &#8216;run_indexer&#8217;; &#8216;&#8211;rotate&#8217; is typical, to force searchd to start using<br />
the newly created index without disrupting searches while indexing is<br />
occurring.</p>
<h3>Searching</h3>
<p>Make sure you have a version of Sphinx::Search that is compatible with searchd.<br />
A compatibility list is given at the top of the Sphinx::Search perldoc.<br />
Hopefully a point will be reached where the Sphinx::Search client can support a<br />
range of searchd versions, but for the moment that is impractical.</p>
<p>Sphinx::Search can be used with any logging object that supports error, warn,<br />
info and debug methods.  In this example I have used Log::Log4perl.</p>
<pre>  use Sphinx::Search;</pre>
<pre>  use Log::Log4perl qw(:easy);</pre>
<pre>
  Log::Log4perl-&gt;easy_init($DEBUG);</pre>
<pre>
  $sph = Sphinx::Search-&gt;new( log =&gt; Log::Log4perl-&gt;get_logger('sphinx.search') );</pre>
<pre>
  my $results = $sph-&gt;setMatchMode(SPH_MATCH_ALL)</pre>
<pre>                    -&gt;Query("...");</pre>
<h3>Configuring</h3>
<p>Sphinx::Config provides the tools to read and write the Sphinx configuration file.</p>
<p>A typical problem is that searchd is running on a non-standard port (the default<br />
is 3312), so how will your perl app know where to find it?  Obviously you don&#8217;t<br />
want to hard-code port numbers in case they change&#8230;</p>
<p>use Sphinx::Search;<br />
use Sphinx::Config;<br />
use Log::Log4perl qw(:easy);</p>
<p>Log::Log4perl-&gt;easy_init($DEBUG);</p>
<p>$sph = Sphinx::Search-&gt;new( log =&gt; Log::Log4perl-&gt;get_logger(&#8217;sphinx.search&#8217;) );</p>
<p># Get port from config file<br />
$conf = Sphinx::Config-&gt;new;<br />
$conf-&gt;parse(&#8216;/etc/sphinx.conf&#8217;);<br />
my $port = $conf-&gt;get(&#8217;searchd&#8217;, undef, &#8216;port&#8217;);</p>
<p># Tell Sphinx client<br />
$sph-&gt;setServer(&#8216;localhost&#8217;, $port);</p>
<p>my $results = $sph-&gt;Query(&#8220;&#8230;&#8221;);</p>
<pre></pre>
<h3>Enjoy</h3>
<p>We have had a considerable amount of success using Perl and Sphinx.  I hope you<br />
do too.</p>
<pre></pre>
<h2></h2>
<h2></h2>
]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/2008/11/integrating-sphinx-into-perl-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sphinx::Search 0.08 released to CPAN</title>
		<link>http://notes.jschutz.net/2007/11/sphinxsearch-008-released-to-cpan/</link>
		<comments>http://notes.jschutz.net/2007/11/sphinxsearch-008-released-to-cpan/#comments</comments>
		<pubDate>Fri, 23 Nov 2007 23:26:13 +0000</pubDate>
		<dc:creator>jon</dc:creator>
				<category><![CDATA[Sphinx Search Engine]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/10/sphinx-search-engine/sphinxsearch-008-released-to-cpan</guid>
		<description><![CDATA[I have just uploaded to CPAN the latest version of Sphinx::Search, the Perl API for the Sphinx Search Engine.
Search for Sphinx::Search on CPAN to get the latest.
Version 0.08 is suitable for  Sphinx 0.9.8-svn-r871 and later (currently r909).  This version fixes a couple of bugs related to error checking.
I have been asked a few [...]]]></description>
			<content:encoded><![CDATA[<p>I have just uploaded to CPAN the latest version of Sphinx::Search, the Perl API for the <a href="http://www.sphinxsearch.com/">Sphinx Search Engine.</a></p>
<p>Search for <a href="http://search.cpan.org/search?query=Sphinx%3A%3ASearch&amp;mode=all">Sphinx::Search</a> on CPAN to get the latest.</p>
<p>Version 0.08 is suitable for  Sphinx 0.9.8-svn-r871 and later (currently r909).  This version fixes a couple of bugs related to error checking.</p>
<p>I have been asked a few times what makes Sphinx::Search different from the Perl API that comes bundled in the contrib directory of the Sphinx distribution. The bundled Sphinx.pm was used as the starting point of Sphinx::Search. Maintenance of that version appears to have lapsed at sphinx-0.9.7, so many of the newer API calls are not available there.  Sphinx::Search is mostly compatible with the old Sphinx.pm except:</p>
<ul>
<li>On failure, Sphinx::Search returns undef rather than 0 or -1.</li>
<li>Sphinx::Search ’Set’ functions are cascadable, e.g. you can do<br />
<code>Sphinx::Search-&gt;new -&gt;SetMatchMode(SPH_MATCH_ALL) -&gt;SetSortMode(SPH_SORT_RELEVANCE) -&gt;Query("search terms")</code></li>
<li>Sphinx::Search also provides documentation and unit tests, which were the main motivations for branching from the earlier work.</li>
<li></li>
</ul>
<p>Sphinx has proven to be a very efficient and better quality search engine than the built-in MySQL full text search. It is an order of magnitude faster for large data sets and provides better options for controlling search result relevancy.</p>
]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/2007/11/sphinxsearch-008-released-to-cpan/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
