<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/rss2full.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>All Notes Technical</title>
	
	<link>http://notes.jschutz.net</link>
	<description>Useful snippets of statistical and other technical info</description>
	<pubDate>Wed, 19 Nov 2008 13:17:54 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.5</generator>
	<language>en</language>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/AllNotesTechnical" type="application/rss+xml" /><item>
		<title>MySQL Multi-Select Performance - The Sequel</title>
		<link>http://feeds.feedburner.com/~r/AllNotesTechnical/~3/458374641/mysql-multi-select-performance-the-sequel</link>
		<comments>http://notes.jschutz.net/20/mysql/mysql-multi-select-performance-the-sequel#comments</comments>
		<pubDate>Wed, 19 Nov 2008 13:07:44 +0000</pubDate>
		<dc:creator>jon</dc:creator>
		
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/?p=20</guid>
		<description><![CDATA[Following my original post, it was suggested to me that one of the following may give better performance:

SELECT &#8230; UNION SELECT &#8230;
Using a temporary table with an index.

Well, not so.  I have added the above cases to my benchmarking script, and updated the graph as shown below.
SELECT &#8230; UNION gave all sorts of problems.  Firstly, [...]]]></description>
			<content:encoded><![CDATA[<p>Following my <a href="http://notes.jschutz.net/19/perl/mysql-many-row-select-performance">original post</a>, it was suggested to me that one of the following may give better performance:</p>
<ul>
<li>SELECT &#8230; UNION SELECT &#8230;</li>
<li>Using a temporary table with an index.</li>
</ul>
<p>Well, not so.  I have added the above cases to my <a href="http://notes.jschutz.net/data/multi-select-bench2.txt" target="_blank">benchmarking script</a>, and updated the graph as shown below.</p>
<p>SELECT &#8230; UNION gave all sorts of problems.  Firstly, it broke at a query set size of 1000 with the error</p>
<pre>Can't open file: './bench/test1.frm' (errno: 24)</pre>
<p>After a bit of searching I found that the remedy for this was to increase the MySQL open_files_limit setting (was 1024, increased to 8192).  This got it going again, only to fall over once more at a query set size of 10000, this time with the error</p>
<pre>parser stack overflow near 'UNION SELECT ...</pre>
<p>to which I could not find a solution.  In any case, the performance as shown in the graph is closely tracking the exponential degradation of the SELECT + OR case.  Conclusion: SELECT UNIONs are not suited for a large number of unions.  Useful when merging the results of several different SELECT statements, though.</p>
<p>The addition of an index to the temporary table also had no appreciable effect in this test, probably because MySQL will use the index in the main table to search while scanning through the temporary table.  Perhaps there might be an improvement for the case where the temporary table is larger than the main table - but that would imply duplicates in the temporary table.</p>
<p><img src="/images/multi-select-performance2.png" alt="" width="750" /></p>
<img src="http://feeds.feedburner.com/~r/AllNotesTechnical/~4/458374641" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/20/mysql/mysql-multi-select-performance-the-sequel/feed</wfw:commentRss>
		<feedburner:origLink>http://notes.jschutz.net/20/mysql/mysql-multi-select-performance-the-sequel</feedburner:origLink></item>
		<item>
		<title>MySQL - Many-row SELECT Performance - “OR” bad, “IN” good</title>
		<link>http://feeds.feedburner.com/~r/AllNotesTechnical/~3/454838194/mysql-many-row-select-performance</link>
		<comments>http://notes.jschutz.net/19/perl/mysql-many-row-select-performance#comments</comments>
		<pubDate>Sun, 16 Nov 2008 12:02:47 +0000</pubDate>
		<dc:creator>jon</dc:creator>
		
		<category><![CDATA[MySQL]]></category>

		<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/?p=19</guid>
		<description><![CDATA[Consider the situation where you have a list of row IDs and you need to retrieve the data for each of the rows.  The simplest way is to make one query per row, i.e.
(A) SELECT * from data_table WHERE id=?
For a large number of rows, that results in a lot of queries.  This could be [...]]]></description>
			<content:encoded><![CDATA[<p>Consider the situation where you have a list of row IDs and you need to retrieve the data for each of the rows.  The simplest way is to make one query per row, i.e.</p>
<p>(A) SELECT * from data_table WHERE id=?</p>
<p>For a large number of rows, that results in a lot of queries.  This could be condensed into one query, such as:</p>
<p>(B) SELECT * from data_table WHERE id=1 OR id=2 OR id=3 &#8230;</p>
<p>or</p>
<p>(C) SELECT * from data_table WHERE id IN (1,2,3,&#8230;)</p>
<p>When constructing potentially large SQL statements such as these (imagine if you wanted to retrieve 1,000,000 rows), it&#8217;s important to take into account the max_allowed_packet size which restricts the length of the query.  It might be necessary to divide the data up into several blocks and make a query for each block to ensure max_allowed_packet is not exceeded.</p>
<p>Another approach is to create a temporary table, insert the keys of the required rows, then do a JOIN query to retrieve the data, i.e.</p>
<p>(D) CREATE TEMPORARY TABLE tmp ( id INT(11) );</p>
<p>INSERT INTO tmp (id) VALUES (1), (2), (3), &#8230;</p>
<p>SELECT d.* FROM data_table d JOIN tmp USING (id)</p>
<p>This approach is somewhat cleaner, particularly when multiple keys are involved.  With multiple keys the WHERE syntax of the prior options becomes:</p>
<p>WHERE (key1=x1 AND key2=y1) OR (key1=x2 AND key2=y2) &#8230;</p>
<p>or</p>
<p>WHERE (key1, key2) IN ((x1, y1), (x2, y2), &#8230;)</p>
<p>Under the temporary table approach, the question then arises as to how to most efficiently insert the data. A &#8216;LOAD DATA INFILE&#8217; approach is the most efficient way to load a table, but here we assume this is not an option as it is not readily portable (due to security settings that differ between local and remote MySQL daemons).  The example (D) above assumes a long INSERT statement, which again may be affected by max_allowed_packet.  Other options include:</p>
<p>(E) Multiple single INSERTs, INSERT INTO tmp (id) VALUE (?)</p>
<p>(F) Multiple single INSERTs in a transaction block, begin_work .. commit</p>
<p>(G) Multiple single INSERTs as an array, using the DBI execute_array() function</p>
<p>(H) As for (G), in a transaction block.</p>
<p>These options were benchmarked using MySQL 5.0.45 and the results are shown in the figure below.  As would be expected, the use of single select statements scales linearly.  For small query set sizes, the setup times for the different query approaches have significant impact on the performance; as the query set size increases, three classes emerge - one group that performs similarly to single selects, another that performs much much better, and one that lives on a completely different planet (one you wouldn&#8217;t want to visit).  In summary:</p>
<ul>
<li>That SELECT + IN(&#8230;) (case C) offers best performance when the query set size is above 30 or so.  It is also interesting to note that the performance of SELECT + IN(&#8230;) is very similar to using a temporary table with a single, long INSERT statement for large query set sizes, presumably because internally the IN(&#8230;) operation is essentially implemented as a temporary table.</li>
<li>That SELECT + OR (case B) is a good choice for query set size &lt; 30</li>
<li><strong>That SELECT + OR hits a point where performance becomes exponentially worse</strong> (not shown on the graph, for the largest data set the performance reaches 1300s per query set!  Curiously, this is elapsed time, but CPU time does not significantly increase. This suggest there are some inefficient data moves/swapping occurring).</li>
</ul>
<p>In short, as a rule of thumb, <em>use SELECT + OR for query sets &lt; 30 in size, and SELECT + IN(&#8230;) otherwise.</em></p>
<p>The SELECT + OR performance is a significant result; the Perl SQL::Abstract library turns a WHERE specification such as { A =&gt; [ 1, 2, 3] } into  WHERE ( ( ( A = ? ) OR ( A = ? ) OR ( A = ? ) ) ).  It will do the same if there are 1000 options (try it - perl -MSQL::Abstract -e &#8216;$sql = SQL::Abstract-&gt;new; $w = $sql-&gt;where({ A =&gt; [ 1 .. 1000]}); print $w&#8217;).  Thus libraries that use SQL::Abstract, such as DBIx::Class, are similarly affected.  A perfectly reasonable approach from the library&#8217;s perspective, but potentially a significant performance hit if used in this manner.</p>
<p>Feel free to <a href="/data/multi-select-bench.txt" target="_blank">review my benchmarking code</a> and tell me if I&#8217;ve got it wrong&#8230;</p>
<p>UPDATE Nov 19 2008:  There is a <a href="http://notes.jschutz.net/20/mysql/mysql-multi-select-performance-the-sequel">sequel </a>post that looks at SELECT &#8230; UNION and using a temporary table with an index.</p>
<p><img src="/images/multi-select-performance.png" alt="" width="750" /></p>
<img src="http://feeds.feedburner.com/~r/AllNotesTechnical/~4/454838194" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/19/perl/mysql-many-row-select-performance/feed</wfw:commentRss>
		<feedburner:origLink>http://notes.jschutz.net/19/perl/mysql-many-row-select-performance</feedburner:origLink></item>
		<item>
		<title>Integrating Sphinx into Perl Applications</title>
		<link>http://feeds.feedburner.com/~r/AllNotesTechnical/~3/440427406/integrating-sphinx-into-perl-applications</link>
		<comments>http://notes.jschutz.net/18/perl/integrating-sphinx-into-perl-applications#comments</comments>
		<pubDate>Mon, 03 Nov 2008 00:27:17 +0000</pubDate>
		<dc:creator>jon</dc:creator>
		
		<category><![CDATA[Perl]]></category>

		<category><![CDATA[Sphinx Search Engine]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/?p=18</guid>
		<description><![CDATA[Sphinx is a full-text search engine (http://www.sphinxsearch.com) designed
primarily for full-text search of database content.  It has many features but in
my opinion its best assets are speed of search and scalability.
We started using Sphinx when MySQL built-in full-text search was becoming too
slow and too CPU intensive, and of questionable accuracy.  Sphinx is lightning
fast compared to MySQL [...]]]></description>
			<content:encoded><![CDATA[<p>Sphinx is a full-text search engine (http://www.sphinxsearch.com) designed<br />
primarily for full-text search of database content.  It has many features but in<br />
my opinion its best assets are speed of search and scalability.</p>
<p>We started using Sphinx when MySQL built-in full-text search was becoming too<br />
slow and too CPU intensive, and of questionable accuracy.  Sphinx is lightning<br />
fast compared to MySQL and provides better results relevancy.</p>
<p>This note is about integration with the standalone Sphinx search server. Sphinx<br />
also has a component (&#8217;SphinxSE&#8217;) that runs as a MySQL 5 engine so can be used as<br />
a direct replacement for MySQL full-text search; to use SphinxSE, standard Perl<br />
DBI should be all that is necessary.</p>
<h2></h2>
<h2>What you will need:</h2>
<p>The following CPAN modules are likely to be useful:</p>
<p>Sphinx::Search<br />
Sphinx::Manager<br />
Sphinx::Config</p>
<p>Sphinx::Manager provides facilities to start and stop the search server and to<br />
run the indexer.</p>
<p>Sphinx::Search provides the search API.</p>
<p>Sphinx::Config allows you to read/write the Sphinx configuration files from<br />
code, in case you wish to maintain the configuration elsewhere (e.g. in your<br />
database).</p>
<h2>Putting it all together:</h2>
<h3>Running the Sphinx searchd server</h3>
<p>Sphinx operates most efficiently if it is allowed to run persistently as a<br />
background service.  Theoretically, you could start the Sphinx server, do a<br />
search and then stop it on every request, with a small amount of overhead - but<br />
here we will consider just the typical case.</p>
<p>Ideally you will use your operating system tools start such as daemontools,<br />
monit or just the SysV startup scripts to start and monitor searchd, rather than<br />
have to worry about it in your perl app.  But, if you need or want to start it<br />
in perl:</p>
<pre>  use Sphinx::Manager;</pre>
<pre>
  my $mgr = Sphinx::Manager-&gt;new({ config_file =&gt; ’/etc/sphinx.conf’ });</pre>
<pre>  $mgr-&gt;start_searchd;</pre>
<p>You should verify that the effective UID of your perl app has all of the appropriate<br />
permissions:</p>
<ul>
<li>to create and write to the PID file (see &#8217;searchd&#8217; section of config, &#8216;pid_file&#8217;)</li>
<li>to create and write to the log file (see &#8217;searchd&#8217;/'log&#8217;)</li>
<li>to read the Sphinx database files (&#8217;path&#8217; in each of your &#8216;index&#8217; specifications)</li>
</ul>
<h3>Adding Content to the Index</h3>
<pre>  use Sphinx::Manager;</pre>
<pre>
  my $mgr = Sphinx::Manager-&gt;new({ config_file =&gt; ’/etc/sphinx.conf’ });</pre>
<pre>  $mgr-&gt;run_indexer('--rotate');</pre>
<p>Sphinx gets its content for indexing directly from the database, according to<br />
the &#8217;sql_query&#8217; given in the config file.  &#8216;run_indexer&#8217; simply runs the command<br />
line version of the Sphinx indexer program.  You can pass any indexer arguments<br />
through to &#8216;run_indexer&#8217;; &#8216;&#8211;rotate&#8217; is typical, to force searchd to start using<br />
the newly created index without disrupting searches while indexing is<br />
occurring.</p>
<h3>Searching</h3>
<p>Make sure you have a version of Sphinx::Search that is compatible with searchd.<br />
A compatibility list is given at the top of the Sphinx::Search perldoc.<br />
Hopefully a point will be reached where the Sphinx::Search client can support a<br />
range of searchd versions, but for the moment that is impractical.</p>
<p>Sphinx::Search can be used with any logging object that supports error, warn,<br />
info and debug methods.  In this example I have used Log::Log4perl.</p>
<pre>  use Sphinx::Search;</pre>
<pre>  use Log::Log4perl qw(:easy);</pre>
<pre>
  Log::Log4perl-&gt;easy_init($DEBUG);</pre>
<pre>
  $sph = Sphinx::Search-&gt;new( log =&gt; Log::Log4perl-&gt;get_logger('sphinx.search') );</pre>
<pre>
  my $results = $sph-&gt;setMatchMode(SPH_MATCH_ALL)</pre>
<pre>                    -&gt;Query("...");</pre>
<h3>Configuring</h3>
<p>Sphinx::Config provides the tools to read and write the Sphinx configuration file.</p>
<p>A typical problem is that searchd is running on a non-standard port (the default<br />
is 3312), so how will your perl app know where to find it?  Obviously you don&#8217;t<br />
want to hard-code port numbers in case they change&#8230;</p>
<p>use Sphinx::Search;<br />
use Sphinx::Config;<br />
use Log::Log4perl qw(:easy);</p>
<p>Log::Log4perl-&gt;easy_init($DEBUG);</p>
<p>$sph = Sphinx::Search-&gt;new( log =&gt; Log::Log4perl-&gt;get_logger(&#8217;sphinx.search&#8217;) );</p>
<p># Get port from config file<br />
$conf = Sphinx::Config-&gt;new;<br />
$conf-&gt;parse(&#8217;/etc/sphinx.conf&#8217;);<br />
my $port = $conf-&gt;get(&#8217;searchd&#8217;, undef, &#8216;port&#8217;);</p>
<p># Tell Sphinx client<br />
$sph-&gt;setServer(&#8217;localhost&#8217;, $port);</p>
<p>my $results = $sph-&gt;Query(&#8221;&#8230;&#8221;);</p>
<pre></pre>
<h3>Enjoy</h3>
<p>We have had a considerable amount of success using Perl and Sphinx.  I hope you<br />
do too.</p>
<pre></pre>
<h2></h2>
<h2></h2>
<img src="http://feeds.feedburner.com/~r/AllNotesTechnical/~4/440427406" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/18/perl/integrating-sphinx-into-perl-applications/feed</wfw:commentRss>
		<feedburner:origLink>http://notes.jschutz.net/18/perl/integrating-sphinx-into-perl-applications</feedburner:origLink></item>
		<item>
		<title>Adding Action Timings to your Catalyst Output</title>
		<link>http://feeds.feedburner.com/~r/AllNotesTechnical/~3/273557207/adding-action-timings-to-your-catalyst-output</link>
		<comments>http://notes.jschutz.net/17/perl/adding-action-timings-to-your-catalyst-output#comments</comments>
		<pubDate>Sat, 19 Apr 2008 14:22:27 +0000</pubDate>
		<dc:creator>jon</dc:creator>
		
		<category><![CDATA[Catalyst MVC Framework]]></category>

		<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/17/perl/adding-action-timings-to-your-catalyst-output</guid>
		<description><![CDATA[About a year ago, onemogin wrote an article on adding action timings to the HTML output of a Catalyst app.  To do so, it was necessary to access $c-&#62;stats, which at the time was an internal object (that is, there was no published API for it) and therefore subject to change.  As of [...]]]></description>
			<content:encoded><![CDATA[<p>About a year ago, onemogin wrote an article on <a href="http://www.onemogin.com/blog/559-adding-action-timings-to-your-output.html" target="_blank">adding action timings to the HTML output of a Catalyst app</a>.  To do so, it was necessary to access $c-&gt;stats, which at the time was an internal object (that is, there was no published API for it) and therefore subject to change.  As of Catalyst-Runtime 5.7012, $c-&gt;stats has a defined interface and returns a Catalyst::Stats object (or your own class, if you provide one) rather than the Tree::Simple object that it used to.</p>
<p>It&#8217;s easy to fix your code to work with 5.7012.  Onemogin&#8217;s code in the end() method looked like this:</p>
<pre><span style="color: #b1b100">  my</span> <span style="color: #0000ff">$tree</span> = <span style="color: #0000ff">$c</span>-&gt;<span style="color: #006600">stats</span><span style="color: #66cc66">(</span><span style="color: #66cc66">)</span>;

<span style="color: #b1b100">  my</span> <span style="color: #0000ff">$dvisit</span> = <span style="color: #000000; font-weight: bold">new</span> Tree::<span style="color: #006600">Simple</span>::<span style="color: #006600">Visitor</span><span style="color: #66cc66">(</span><span style="color: #66cc66">)</span>;
<span style="color: #0000ff">  $tree</span>-&gt;<span style="color: #006600">accept</span><span style="color: #66cc66">(</span><span style="color: #0000ff">$dvisit</span><span style="color: #66cc66">)</span>;
<span style="color: #0000ff">  $c</span>-&gt;<span style="color: #006600">stash</span>-&gt;<span style="color: #66cc66">{</span><span style="color: #ff0000">'action_stats'</span><span style="color: #66cc66">}</span> = <span style="color: #0000ff">$dvisit</span>-&gt;<span style="color: #006600">getResults</span><span style="color: #66cc66">(</span><span style="color: #66cc66">)</span>;</pre>
<p>which needs to become this:</p>
<pre>  my @report = $c-&gt;stats-&gt;report;
  $c-&gt;stash-&gt;{action_stats}= \@report;</pre>
<p>and your template will also need to change; here&#8217;s an example:</p>
<p><pre> 
 &lt;div id="stats"&gt;
 &lt;table border="0" cellspacing="0" cellpadding="0"&gt;
 [% space = '&amp;nbsp;&amp;nbsp;' %]
 &lt;tr&gt;&lt;th&gt;Action&lt;/th&gt;&lt;th&gt;Time&lt;/th&gt;&lt;/tr&gt;
 [% FOREACH r=action_stats %]
 &lt;tr&gt;&lt;td class="description"&gt;[% space.repeat(r.0) %][% r.1 | html %]&lt;/td&gt;
&lt;td class="elapsed"&gt;[% UNLESS r.3 %]+[% END %][% r.2 %]s&lt;/td&gt;&lt;/tr&gt;
 [% END %]
 &lt;/table&gt;
 &lt;/div&gt;
</pre></p>
<p>to produce an end result such as:</p>
<p><style type="text/css">
#stats { 
       font-family: arial,helvetica,sans-serif; 
       font-size: 100%;
       background: #E0E0E0;
       margin: 0;
       padding: 0;
       border: solid 1px #F00000; 
       width: 400px; 
       }

#stats table {
       width: 100%;
       border: 0;
}

#stats th {
       color: #FFFFFF;
       font-size: 120%;
       font-weight: bold;
       background: #D47878;
       border: 0; 
       }
#stats td {
       color: #000000;
       border: 0;
       line-height: 120%;
       padding-left: 5%;
       }
#stats td.elapsed {
       background: #F8F8F8;
       padding-left: 5%;
}
</style>
<div  id="stats">
<table border="0" cellspacing="0" cellpadding="0">

<tr><th>Action</th><th>Time</th></tr>

<tr><td class="description">/default</td><td class="elapsed">0.005895s</td></tr>

<tr><td class="description">&nbsp;&nbsp;-&gt; /look_left</td><td class="elapsed">0.00091s</td></tr>

<tr><td class="description">&nbsp;&nbsp;&nbsp;&nbsp;- starting critical bit</td><td class="elapsed">+0.000479s</td></tr>

<tr><td class="description">&nbsp;&nbsp;&nbsp;&nbsp;- critical bit complete</td><td class="elapsed">+0.000208s</td></tr>

<tr><td class="description">&nbsp;&nbsp;-&gt; /look_right</td><td class="elapsed">0.000587s</td></tr>

<tr><td class="description">&nbsp;&nbsp;-&gt; /look_left</td><td class="elapsed">0.000799s</td></tr>

<tr><td class="description">&nbsp;&nbsp;&nbsp;&nbsp;- starting critical bit</td><td class="elapsed">+0.000441s</td></tr>

<tr><td class="description">&nbsp;&nbsp;&nbsp;&nbsp;- critical bit complete</td><td class="elapsed">+0.000169s</td></tr>

<tr><td class="description">&nbsp;&nbsp;-&gt; /cross_over</td><td class="elapsed">0.001766s</td></tr>

<tr><td class="description">/end</td><td class="elapsed">0.000462s</td></tr>

</table>
</div>
</p>
<p>Here&#8217;s the bit of controller code that generated the example:</p>
<p><pre>
sub default : Private {
    my ( $self, $c ) = @_;

    $c->forward('look_left');
    $c->forward('look_right');
    $c->forward('look_left');
    $c->forward('cross_over');
}

sub look_left : Private {
    my ( $self, $c ) = @_;
    for (1 .. 100) {};
    $c->stats->profile("starting critical bit");
    for (1 .. 100) {};
    $c->stats->profile("critical bit complete");
}

sub look_right : Private {
    for (1 .. 1000) {};
}
sub cross_over : Private {
    for (1 .. 10000) {};
}

sub end : ActionClass('RenderView') {
    my ( $self, $c ) = @_;
    my @report = $c->stats->report;
    $c->stash->{action_stats}= \@report;
}

</pre></p>
<img src="http://feeds.feedburner.com/~r/AllNotesTechnical/~4/273557207" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/17/perl/adding-action-timings-to-your-catalyst-output/feed</wfw:commentRss>
		<feedburner:origLink>http://notes.jschutz.net/17/perl/adding-action-timings-to-your-catalyst-output</feedburner:origLink></item>
		<item>
		<title>MSIE Cookies Bite Back!</title>
		<link>http://feeds.feedburner.com/~r/AllNotesTechnical/~3/234906984/msie-cookies-bite-back</link>
		<comments>http://notes.jschutz.net/16/web-development/msie-cookies-bite-back#comments</comments>
		<pubDate>Thu, 14 Feb 2008 11:37:41 +0000</pubDate>
		<dc:creator>jon</dc:creator>
		
		<category><![CDATA[Web Development]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/16/web-development/msie-cookies-bite-back</guid>
		<description><![CDATA[Here we are in 2008.  We build computers with RAM measured in GB and disk in TB.  I just discovered (the hard way) that Microsoft Internet Explorer can only handle 4096 bytes of cookies for a page in JavaScript.  Total.  Not each.  Total.
Worse, if the cookies on your page exceed [...]]]></description>
			<content:encoded><![CDATA[<p>Here we are in 2008.  We build computers with RAM measured in GB and disk in TB.  I just discovered (the hard way) that Microsoft Internet Explorer can only handle 4096 bytes of cookies for a page in JavaScript.  Total.  Not each.  Total.</p>
<p>Worse, if the cookies on your page exceed this limit and you try to read the cookies using document.cookie, you don&#8217;t just get some of the cookies or a set that is truncated to 4096 bytes; you get NOTHING.</p>
<p>From the Microsoft Knowledge Base:  &#8220;For one domain name, each cookie is limited to 4,096 bytes. This total can exist as one name-value pair of 4 kilobytes (KB) or as up to 20 name-value pairs that total 4 KB. &#8230; If you use the <strong>document.cookie</strong> property to retrieve the cookie on the client side, the <strong>document.cookie</strong> property can retrieve only 4,096 bytes. This byte total can be one name-value pair of 4 KB, or it can be up to 20 name-value pairs that have a total size of 4 KB.&#8221;</p>
<p>Stack that up against RFC 2965, which says:</p>
<pre>   ...general-use

   user agents SHOULD provide each of the following minimum capabilities

   individually, although not necessarily simultaneously:      *  at least 300 cookies

*  at least 4096 bytes per cookie (as measured by the characters

         that comprise the cookie non-terminal in the syntax description

         of the Set-Cookie2 header, and as received in the Set-Cookie2

         header)

*  at least 20 cookies per unique host or domain name

User agents created for specific purposes or for limited-capacity

   devices SHOULD provide at least 20 cookies of 4096 bytes, to ensure

   that the user can interact with a session-based origin server.</pre>
<p>According to the references, this problem applies up to MSIE 6.0, but testing shows it is still a problem in IE 7.</p>
<p>Needless to say, this is only a problem in IE.  Firefox and Safari, although they presumably have some limit, do not suffer the same ridiculously small bound.</p>
<p>Test it yourself; here is a simple <a href="http://notes.jschutz.net/cookie-demo/example.html" target="_blank">cookie limit test page</a> containing a script that sets 10 cookies, each of about 72 bytes, printing document.cookies at each iteration.  On first visit, the cookies disappear at iteration 6, and on subsequent visits at iteration 1 (until you clear cookies or close your browser).</p>
<p>I wonder how many shopping carts this has broken.</p>
<p>References:</p>
<ul>
<li><a href="http://support.microsoft.com/kb/306070">Number and size limits of a cookie in Internet Explorer</a></li>
<li><a href="http://support.microsoft.com/kb/820536/">Document.Cookie Property Returns an Empty String</a></li>
<li><a href="http://www.ietf.org/rfc/rfc2965.txt">RFC 2965 HTTP State Management Mechanism</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/AllNotesTechnical/~4/234906984" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/16/web-development/msie-cookies-bite-back/feed</wfw:commentRss>
		<feedburner:origLink>http://notes.jschutz.net/16/web-development/msie-cookies-bite-back</feedburner:origLink></item>
		<item>
		<title>Page Load Times and Visitor Abandonment</title>
		<link>http://feeds.feedburner.com/~r/AllNotesTechnical/~3/234144998/page-load-times-and-visitor-abandonment</link>
		<comments>http://notes.jschutz.net/15/website-optimisation/page-load-times-and-visitor-abandonment#comments</comments>
		<pubDate>Wed, 13 Feb 2008 04:22:08 +0000</pubDate>
		<dc:creator>jon</dc:creator>
		
		<category><![CDATA[Website Optimisation]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/15/website-optimisation/page-load-times-and-visitor-abandonment</guid>
		<description><![CDATA[This is some 2005 data, courtesy  http://www.marketingexperiments.com/improving-website-conversion/page-weight.html and http://www.emarketer.com/



Visitor Abandonment


Page Load Time
Percent of Users
Continuing to Wait


10 seconds
84%


15 seconds
51%


20 seconds
26%


30 seconds
5%



 What You Need To UNDERSTAND: You will lose nearly half your visitors if they have to wait longer than 15 seconds for a page to load. Only 5% of visitors will wait longer than [...]]]></description>
			<content:encoded><![CDATA[<p align="left">This is some 2005 data, courtesy  http://www.marketingexperiments.com/improving-website-conversion/page-weight.html and <a href="http://www.emarketer.com/" onclick="exit=false">http://www.emarketer.com/</a></p>
<p class="subContentLg" align="left">
<table border="1" cellpadding="2" cellspacing="0">
<tr>
<th colspan="2">Visitor Abandonment</th>
</tr>
<tr>
<td><strong>Page Load Time</strong></td>
<td align="right"><strong>Percent of Users<br />
Continuing to Wait</strong></td>
</tr>
<tr>
<td>10 seconds</td>
<td align="right">84%</td>
</tr>
<tr>
<td>15 seconds</td>
<td align="right">51%</td>
</tr>
<tr>
<td>20 seconds</td>
<td align="right">26%</td>
</tr>
<tr>
<td>30 seconds</td>
<td align="right">5%</td>
</tr>
</table>
<p><!-- #BeginLibraryItem "/Library/p_check_image_2.lbi" --></p>
<p class="subContentLg" align="left"><img src="http://www.marketingexperiments.com/images/p_check_icon.gif" alt="Check box" class="checkImage" border="0" vspace="5" /> <!-- #EndLibraryItem --><strong>What You Need To UNDERSTAND:</strong> You will lose nearly half your visitors if they have to wait longer than 15 seconds for a page to load. Only 5% of visitors will wait longer than 30 seconds.</p>
<p class="subContentLg" align="left">In 2008, where most of the population is on broadband, I expect that visitors are less patient than ever.</p>
<p class="subContentLg" align="left">Update:  2006 data, quoted from http://www.avactis.com/forums/index.php?showtopic=238 : &#8220;The research shows that <strong>four seconds</strong> is the maximum length of time an average online shopper will wait for a Web page to load before abandoning one retail site and moving on to another.&#8221;</p>
<p class="subContentLg" align="left">&nbsp;</p>
<img src="http://feeds.feedburner.com/~r/AllNotesTechnical/~4/234144998" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/15/website-optimisation/page-load-times-and-visitor-abandonment/feed</wfw:commentRss>
		<feedburner:origLink>http://notes.jschutz.net/15/website-optimisation/page-load-times-and-visitor-abandonment</feedburner:origLink></item>
		<item>
		<title>MP3 to WAV Conversion on Linux</title>
		<link>http://feeds.feedburner.com/~r/AllNotesTechnical/~3/207001854/mp3-to-wav-conversion-on-linux</link>
		<comments>http://notes.jschutz.net/14/linux/mp3-to-wav-conversion-on-linux#comments</comments>
		<pubDate>Thu, 27 Dec 2007 12:50:03 +0000</pubDate>
		<dc:creator>jon</dc:creator>
		
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/14/linux/mp3-to-wav-conversion-on-linux</guid>
		<description><![CDATA[MP3 to WAV conversion is remarkably simple:
mpg123 -w out.wav in.mp3
For the purpose of  writing an audio CD, a sample rate of 44100 Hz and stereo output are essential:
mpg123 --stereo -r 44100 -w out.wav in.mp3
And then to write the WAV files to a CD:
cdrecord -audio -pad *.wav
]]></description>
			<content:encoded><![CDATA[<p>MP3 to WAV conversion is remarkably simple:</p>
<pre>mpg123 -w out.wav in.mp3</pre>
<p>For the purpose of  writing an audio CD, a sample rate of 44100 Hz and stereo output are essential:</p>
<pre>mpg123 --stereo -r 44100 -w out.wav in.mp3</pre>
<p>And then to write the WAV files to a CD:</p>
<p>cdrecord -audio -pad *.wav</p>
<img src="http://feeds.feedburner.com/~r/AllNotesTechnical/~4/207001854" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/14/linux/mp3-to-wav-conversion-on-linux/feed</wfw:commentRss>
		<feedburner:origLink>http://notes.jschutz.net/14/linux/mp3-to-wav-conversion-on-linux</feedburner:origLink></item>
		<item>
		<title>IE Cookie Handling Policies</title>
		<link>http://feeds.feedburner.com/~r/AllNotesTechnical/~3/192207521/ie-cookie-handling-policies</link>
		<comments>http://notes.jschutz.net/12/website-optimisation/ie-cookie-handling-policies#comments</comments>
		<pubDate>Thu, 29 Nov 2007 04:38:30 +0000</pubDate>
		<dc:creator>jon</dc:creator>
		
		<category><![CDATA[Website Optimisation]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/12/website-optimisation/ie-cookie-handling-policies</guid>
		<description><![CDATA[This is a summary of Internet Explorer settings for handling cookies, under the so-called &#8220;Privacy&#8221; options; IE6 and IE7 are the same, although some of the wording has changed in the descriptions.  It&#8217;s important to keep these in mind when issuing cookies.  The Wikipedia article on HTTP Cookies outlines some of the alternatives.

Block [...]]]></description>
			<content:encoded><![CDATA[<p>This is a summary of Internet Explorer settings for handling cookies, under the so-called &#8220;Privacy&#8221; options; IE6 and IE7 are the same, although some of the wording has changed in the descriptions.  It&#8217;s important to keep these in mind when issuing cookies.  The <a href="http://en.wikipedia.org/wiki/HTTP_cookie">Wikipedia article on HTTP Cookies</a> outlines some of the alternatives.</p>
<ul>
<li>Block All Cookies<br />
Blocks all cookies from all web sites from being accepted, and won&#8217;t send any existing cookies.Should be renamed &#8220;Unusable&#8221;.</li>
<li>High<br />
Blocks all cookies from websites that do not carry a compact privacy policy (P3P) and cookies that contain personally identifiable (contact) information.  A November 2007 study shows that only about 4% of sites use P3P, so this security setting is almost as unusable as &#8220;Block All Cookies&#8221;.</li>
<li> Medium High<br />
Same as &#8220;High&#8221; for 3rd party cookies.  Also blocks first party cookies that contain personally identifiable information.</li>
<li> Medium<br />
As above, but rather than &#8220;blocking&#8221; first party cookies that contain personally identifiable information, it only &#8220;restricts&#8221; them.  Efforts to find the difference between &#8220;block&#8221; and &#8220;restrict&#8221; have so far been fruitless.  It may mean that cookies are accepted but not sent (how useful would that be?), or that such cookies can only be used in the same web page that created them (i.e. a restriction on the domain/path components of the cookie), of that the cookies are not kept beyond the current session.</li>
<li> Low<br />
Same as  &#8220;High&#8221; for 3rd party cookies.  No restrictions on first party cookies.</li>
<li> Accept All Cookies<br />
No restrictions.</li>
</ul>
<p>In contrast, Safari (MacIntosh) allows the simple options: Accept Cookies Always/Never/Only sites you navigate to. (i.e. Always/Never/Only First Party).  Firefox by default allows all cookies except where specific exceptions have been defined.  There do not seem to be any Firefox extensions which emulate the IE or Safari behaviour - which perhaps places into perspective the real threat that third party cookies are(n&#8217;t) in general.</p>
<p><strong>References</strong></p>
<p><a href="http://www.securityspace.com/s_survey/data/man.200710/p3p.html">P3P Usage Survey</a></p>
<img src="http://feeds.feedburner.com/~r/AllNotesTechnical/~4/192207521" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/12/website-optimisation/ie-cookie-handling-policies/feed</wfw:commentRss>
		<feedburner:origLink>http://notes.jschutz.net/12/website-optimisation/ie-cookie-handling-policies</feedburner:origLink></item>
		<item>
		<title>Sphinx::Search 0.08 released to CPAN</title>
		<link>http://feeds.feedburner.com/~r/AllNotesTechnical/~3/189561459/sphinxsearch-008-released-to-cpan</link>
		<comments>http://notes.jschutz.net/10/sphinx-search-engine/sphinxsearch-008-released-to-cpan#comments</comments>
		<pubDate>Fri, 23 Nov 2007 23:26:13 +0000</pubDate>
		<dc:creator>jon</dc:creator>
		
		<category><![CDATA[Sphinx Search Engine]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/10/sphinx-search-engine/sphinxsearch-008-released-to-cpan</guid>
		<description><![CDATA[I have just uploaded to CPAN the latest version of Sphinx::Search, the Perl API for the Sphinx Search Engine.
Search for Sphinx::Search on CPAN to get the latest.
Version 0.08 is suitable for  Sphinx 0.9.8-svn-r871 and later (currently r909).  This version fixes a couple of bugs related to error checking.
I have been asked a few [...]]]></description>
			<content:encoded><![CDATA[<p>I have just uploaded to CPAN the latest version of Sphinx::Search, the Perl API for the <a href="http://www.sphinxsearch.com/">Sphinx Search Engine.</a></p>
<p>Search for <a href="http://search.cpan.org/search?query=Sphinx%3A%3ASearch&amp;mode=all">Sphinx::Search</a> on CPAN to get the latest.</p>
<p>Version 0.08 is suitable for  Sphinx 0.9.8-svn-r871 and later (currently r909).  This version fixes a couple of bugs related to error checking.</p>
<p>I have been asked a few times what makes Sphinx::Search different from the Perl API that comes bundled in the contrib directory of the Sphinx distribution. The bundled Sphinx.pm was used as the starting point of Sphinx::Search. Maintenance of that version appears to have lapsed at sphinx-0.9.7, so many of the newer API calls are not available there.  Sphinx::Search is mostly compatible with the old Sphinx.pm except:</p>
<ul>
<li>On failure, Sphinx::Search returns undef rather than 0 or -1.</li>
<li>Sphinx::Search ’Set’ functions are cascadable, e.g. you can do<br />
<code>Sphinx::Search-&gt;new -&gt;SetMatchMode(SPH_MATCH_ALL) -&gt;SetSortMode(SPH_SORT_RELEVANCE) -&gt;Query("search terms")</code></li>
<li>Sphinx::Search also provides documentation and unit tests, which were the main motivations for branching from the earlier work.</li>
<li></li>
</ul>
<p>Sphinx has proven to be a very efficient and better quality search engine than the built-in MySQL full text search. It is an order of magnitude faster for large data sets and provides better options for controlling search result relevancy.</p>
<img src="http://feeds.feedburner.com/~r/AllNotesTechnical/~4/189561459" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/10/sphinx-search-engine/sphinxsearch-008-released-to-cpan/feed</wfw:commentRss>
		<feedburner:origLink>http://notes.jschutz.net/10/sphinx-search-engine/sphinxsearch-008-released-to-cpan</feedburner:origLink></item>
		<item>
		<title>Google Searches per Day</title>
		<link>http://feeds.feedburner.com/~r/AllNotesTechnical/~3/189215711/google-searches-per-day</link>
		<comments>http://notes.jschutz.net/9/internet-search/google-searches-per-day#comments</comments>
		<pubDate>Fri, 23 Nov 2007 00:08:15 +0000</pubDate>
		<dc:creator>jon</dc:creator>
		
		<category><![CDATA[Internet Search]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/9/internet-search/google-searches-per-day</guid>
		<description><![CDATA[As of August 2007, Google is handling 1200 Million searches per day on average worldwide, according to a Clickz article reporting on Comscore data.  Yahoo is a long way behind at 275 Million, and MSN at 70 Million.  Baidu (a Chinese search engine) beats MSN, coming in at 105 Million.
2006 figures for the [...]]]></description>
			<content:encoded><![CDATA[<p>As of August 2007, Google is handling 1200 Million searches per day on average worldwide, according to a Clickz article reporting on Comscore data.  Yahoo is a long way behind at 275 Million, and MSN at 70 Million.  Baidu (a Chinese search engine) beats MSN, coming in at 105 Million.</p>
<p>2006 figures for the US only put Google at 91 Million searches per day.</p>
<p>In June 2007, Wikipedia received an average of 55.6 Million referrals per day from Google.  Guessing that each Google search results in 2 click-throughs on average, that means Wikipedia is getting about 2% of Google&#8217;s organic traffic.  Wikipedia is ranked #8 in Alexa&#8217;s traffic rankings, behind Yahoo, Google, YouTube, Live &amp; MSN, Myspace and Facebook.  Given that these probably get most of their traffic directly rather than via search engines, Wikipedia may be the most-referred site from Google.</p>
<p>Alexa&#8217;s top 50 is dominated by search engines and various forms of social networking sites, with Ebay, Amazon and a few porn sites thrown in.</p>
<p>Google lists its most popular search terms on a daily basis in its <a href="http://www.google.com/trends/hottrends">Hot Trends</a> area.  Leading up to Thanksgiving in the US, about 10% of the top 100 search terms contained the word &#8220;turkey&#8221; - turkey cooking time, turkey recipe, how long to cook a turkey, roasting turkey, turkey thermometer, turkey soup, and more.  About 40% of the most popular queries related to Thanksgiving in some way.</p>
<p><strong>References:</strong></p>
<p><a href="http://www.clickz.com/3627303">Clickz article: Worldwide Internet: Now Serving 61 Billion Searches per Month</a></p>
<p><a href="http://www.alexa.com/site/ds/top_sites?ts_mode=global&amp;lang=none">Alexa Top 500</a><br />
<a href="http://leuksman.com/log/2007/06/07/wikimedia-page-views/">Wikimedia page views</a></p>
<p><a href="http://searchenginewatch.com/showPage.html?page=2156461">Search Engine Watch 2006 data</a></p>
<img src="http://feeds.feedburner.com/~r/AllNotesTechnical/~4/189215711" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/9/internet-search/google-searches-per-day/feed</wfw:commentRss>
		<feedburner:origLink>http://notes.jschutz.net/9/internet-search/google-searches-per-day</feedburner:origLink></item>
	</channel>
</rss>
