<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jon Schutz Technical Notes and Recommendations &#187; Perl</title>
	<atom:link href="http://notes.jschutz.net/topics/perl/feed/" rel="self" type="application/rss+xml" />
	<link>http://notes.jschutz.net</link>
	<description>Useful snippets technical info and recommendations</description>
	<lastBuildDate>Thu, 24 Jun 2010 07:07:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Perl client for Facebook&#8217;s scribe logging software</title>
		<link>http://notes.jschutz.net/2009/04/perl-client-for-facebooks-scribe-logging-software/</link>
		<comments>http://notes.jschutz.net/2009/04/perl-client-for-facebooks-scribe-logging-software/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 04:13:57 +0000</pubDate>
		<dc:creator>jon</dc:creator>
				<category><![CDATA[Perl]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/?p=109</guid>
		<description><![CDATA[Scribe is a log aggregator, developed at Facebook and released as open source.  Scribe is built on Thrift, a cross-language RPC type platform, and therefore it is possible to use scribe with any of the Thrift-supported languages.  Whilst Perl is one of the supported languages, there is little in the way of working [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://developers.facebook.com/scribe/" rel="nofollow">Scribe</a> is a log aggregator, developed at Facebook and released as open source.  Scribe is built on <a href="http://incubator.apache.org/thrift/" rel="nofollow">Thrift</a>, a cross-language RPC type platform, and therefore it is possible to use scribe with any of the Thrift-supported languages.  Whilst Perl is one of the supported languages, there is little in the way of working examples, so here&#8217;s how I did it:</p>
<ol>
<li> Install Thrift.
<li> Build and install FB303 perl modules
<pre>
  cd thrift/contrib/fb303
  # Edit if/fb303.thrift and add the line 'namespace perl Facebook.FB303' after the other namespace declarations
  thrift --gen perl if/fb303.thrift
  sudo cp -a gen-perl/ /usr/local/lib/perl5/site_perl/5.10.0 # or wherever you keep your site perl
</pre>
<p>This creates the modules Facebook::FB303::Constants, Facebook::FB303::FacebookService and Facebook::FB303::Types.
</li>
<li> Install Scribe.
<li> Build and install Scribe perl modules
<pre>
  cd scribe
  # Edit if/scribe.thrift and add 'namespace perl Scribe.Thrift' after the other namespace declarations
  thrift -I /path/to/thrift/contrib/ --gen perl scribe.thrift
  sudo cp -a gen-perl/Scribe /usr/local/lib/perl5/site_perl/5.10.0/ # or wherever
</pre>
</li>
<p>This creates the modules Scribe::Thrift::Constants, Scribe::Thrift::scribe, Scribe::Thrift::Types.</p>
<ol>
<p>Here is an example program that uses the client (reading one line at a time from stdin and sending to a scribe instance running locally on port 1465):</p>
<pre>
#! /usr/bin/perl

use Scribe::Thrift::scribe;
use Thrift::Socket;
use Thrift::FramedTransport;
use Thrift::BinaryProtocol;
use strict;
use warnings;

my $host = 'localhost';
my $port = 1465;
my $cat = $ARGV[0] || 'test';

my $socket = Thrift::Socket->new($host, $port);
my $transport = Thrift::FramedTransport->new($socket);
my $proto = Thrift::BinaryProtocol->new($transport);

my $client = Scribe::Thrift::scribeClient->new($proto, $proto);
my $le = Scribe::Thrift::LogEntry->new({ category => $cat });

$transport->open();

while (my $line = <>) {
    $le->message($line);
    my $result = $client->Log([ $le ]);
    if ($result == Scribe::Thrift::ResultCode::TRY_LATER) {
	print STDERR "TRY_LATER\n";
    }
    elsif ($result != Scribe::Thrift::ResultCode::OK) {
	print STDERR "Unknown result code: $result\n";
    }
}

$transport->close();
</pre>
<p><b>UPDATE</b> Log::Dispatch::Scribe is now available on CPAN.  Also works with Log::Log4perl.  Note though, you still need to install Thrift and Scribe perl modules as described above.</p>
]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/2009/04/perl-client-for-facebooks-scribe-logging-software/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL &#8211; Many-row SELECT Performance &#8211; &#8220;OR&#8221; bad, &#8220;IN&#8221; good</title>
		<link>http://notes.jschutz.net/2008/11/mysql-many-row-select-performance/</link>
		<comments>http://notes.jschutz.net/2008/11/mysql-many-row-select-performance/#comments</comments>
		<pubDate>Sun, 16 Nov 2008 12:02:47 +0000</pubDate>
		<dc:creator>jon</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/?p=19</guid>
		<description><![CDATA[Consider the situation where you have a list of row IDs and you need to retrieve the data for each of the rows.  The simplest way is to make one query per row, i.e.
(A) SELECT * from data_table WHERE id=?
For a large number of rows, that results in a lot of queries.  This could be [...]]]></description>
			<content:encoded><![CDATA[<p>Consider the situation where you have a list of row IDs and you need to retrieve the data for each of the rows.  The simplest way is to make one query per row, i.e.</p>
<p>(A) SELECT * from data_table WHERE id=?</p>
<p>For a large number of rows, that results in a lot of queries.  This could be condensed into one query, such as:</p>
<p>(B) SELECT * from data_table WHERE id=1 OR id=2 OR id=3 &#8230;</p>
<p>or</p>
<p>(C) SELECT * from data_table WHERE id IN (1,2,3,&#8230;)</p>
<p>When constructing potentially large SQL statements such as these (imagine if you wanted to retrieve 1,000,000 rows), it&#8217;s important to take into account the max_allowed_packet size which restricts the length of the query.  It might be necessary to divide the data up into several blocks and make a query for each block to ensure max_allowed_packet is not exceeded.</p>
<p>Another approach is to create a temporary table, insert the keys of the required rows, then do a JOIN query to retrieve the data, i.e.</p>
<p>(D) CREATE TEMPORARY TABLE tmp ( id INT(11) );</p>
<p>INSERT INTO tmp (id) VALUES (1), (2), (3), &#8230;</p>
<p>SELECT d.* FROM data_table d JOIN tmp USING (id)</p>
<p>This approach is somewhat cleaner, particularly when multiple keys are involved.  With multiple keys the WHERE syntax of the prior options becomes:</p>
<p>WHERE (key1=x1 AND key2=y1) OR (key1=x2 AND key2=y2) &#8230;</p>
<p>or</p>
<p>WHERE (key1, key2) IN ((x1, y1), (x2, y2), &#8230;)</p>
<p>Under the temporary table approach, the question then arises as to how to most efficiently insert the data. A &#8216;LOAD DATA INFILE&#8217; approach is the most efficient way to load a table, but here we assume this is not an option as it is not readily portable (due to security settings that differ between local and remote MySQL daemons).  The example (D) above assumes a long INSERT statement, which again may be affected by max_allowed_packet.  Other options include:</p>
<p>(E) Multiple single INSERTs, INSERT INTO tmp (id) VALUE (?)</p>
<p>(F) Multiple single INSERTs in a transaction block, begin_work .. commit</p>
<p>(G) Multiple single INSERTs as an array, using the DBI execute_array() function</p>
<p>(H) As for (G), in a transaction block.</p>
<p>These options were benchmarked using MySQL 5.0.45 and the results are shown in the figure below.  As would be expected, the use of single select statements scales linearly.  For small query set sizes, the setup times for the different query approaches have significant impact on the performance; as the query set size increases, three classes emerge &#8211; one group that performs similarly to single selects, another that performs much much better, and one that lives on a completely different planet (one you wouldn&#8217;t want to visit).  In summary:</p>
<ul>
<li>That SELECT + IN(&#8230;) (case C) offers best performance when the query set size is above 30 or so.  It is also interesting to note that the performance of SELECT + IN(&#8230;) is very similar to using a temporary table with a single, long INSERT statement for large query set sizes, presumably because internally the IN(&#8230;) operation is essentially implemented as a temporary table.</li>
<li>That SELECT + OR (case B) is a good choice for query set size &lt; 30</li>
<li><strong>That SELECT + OR hits a point where performance becomes exponentially worse</strong> (not shown on the graph, for the largest data set the performance reaches 1300s per query set!  Curiously, this is elapsed time, but CPU time does not significantly increase. This suggest there are some inefficient data moves/swapping occurring).</li>
</ul>
<p>In short, as a rule of thumb, <em>use SELECT + OR for query sets &lt; 30 in size, and SELECT + IN(&#8230;) otherwise.</em></p>
<p>The SELECT + OR performance is a significant result; the Perl SQL::Abstract library turns a WHERE specification such as { A =&gt; [ 1, 2, 3] } into  WHERE ( ( ( A = ? ) OR ( A = ? ) OR ( A = ? ) ) ).  It will do the same if there are 1000 options (try it &#8211; perl -MSQL::Abstract -e &#8216;$sql = SQL::Abstract-&gt;new; $w = $sql-&gt;where({ A =&gt; [ 1 .. 1000]}); print $w&#8217;).  Thus libraries that use SQL::Abstract, such as DBIx::Class, are similarly affected.  A perfectly reasonable approach from the library&#8217;s perspective, but potentially a significant performance hit if used in this manner.</p>
<p>Feel free to <a href="/data/multi-select-bench.txt" target="_blank">review my benchmarking code</a> and tell me if I&#8217;ve got it wrong&#8230;</p>
<p>UPDATE Nov 19 2008:  There is a <a href="http://notes.jschutz.net/20/mysql/mysql-multi-select-performance-the-sequel">sequel </a>post that looks at SELECT &#8230; UNION and using a temporary table with an index.</p>
<p><img src="/images/multi-select-performance.png" alt="" width="750" /></p>
]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/2008/11/mysql-many-row-select-performance/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Integrating Sphinx into Perl Applications</title>
		<link>http://notes.jschutz.net/2008/11/integrating-sphinx-into-perl-applications/</link>
		<comments>http://notes.jschutz.net/2008/11/integrating-sphinx-into-perl-applications/#comments</comments>
		<pubDate>Mon, 03 Nov 2008 00:27:17 +0000</pubDate>
		<dc:creator>jon</dc:creator>
				<category><![CDATA[Perl]]></category>
		<category><![CDATA[Sphinx Search Engine]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/?p=18</guid>
		<description><![CDATA[Sphinx is a full-text search engine (http://www.sphinxsearch.com) designed
primarily for full-text search of database content.  It has many features but in
my opinion its best assets are speed of search and scalability.
We started using Sphinx when MySQL built-in full-text search was becoming too
slow and too CPU intensive, and of questionable accuracy.  Sphinx is lightning
fast compared to MySQL [...]]]></description>
			<content:encoded><![CDATA[<p>Sphinx is a full-text search engine (http://www.sphinxsearch.com) designed<br />
primarily for full-text search of database content.  It has many features but in<br />
my opinion its best assets are speed of search and scalability.</p>
<p>We started using Sphinx when MySQL built-in full-text search was becoming too<br />
slow and too CPU intensive, and of questionable accuracy.  Sphinx is lightning<br />
fast compared to MySQL and provides better results relevancy.</p>
<p>This note is about integration with the standalone Sphinx search server. Sphinx<br />
also has a component (&#8216;SphinxSE&#8217;) that runs as a MySQL 5 engine so can be used as<br />
a direct replacement for MySQL full-text search; to use SphinxSE, standard Perl<br />
DBI should be all that is necessary.</p>
<h2></h2>
<h2>What you will need:</h2>
<p>The following CPAN modules are likely to be useful:</p>
<p>Sphinx::Search<br />
Sphinx::Manager<br />
Sphinx::Config</p>
<p>Sphinx::Manager provides facilities to start and stop the search server and to<br />
run the indexer.</p>
<p>Sphinx::Search provides the search API.</p>
<p>Sphinx::Config allows you to read/write the Sphinx configuration files from<br />
code, in case you wish to maintain the configuration elsewhere (e.g. in your<br />
database).</p>
<h2>Putting it all together:</h2>
<h3>Running the Sphinx searchd server</h3>
<p>Sphinx operates most efficiently if it is allowed to run persistently as a<br />
background service.  Theoretically, you could start the Sphinx server, do a<br />
search and then stop it on every request, with a small amount of overhead &#8211; but<br />
here we will consider just the typical case.</p>
<p>Ideally you will use your operating system tools start such as daemontools,<br />
monit or just the SysV startup scripts to start and monitor searchd, rather than<br />
have to worry about it in your perl app.  But, if you need or want to start it<br />
in perl:</p>
<pre>  use Sphinx::Manager;</pre>
<pre>
  my $mgr = Sphinx::Manager-&gt;new({ config_file =&gt; ’/etc/sphinx.conf’ });</pre>
<pre>  $mgr-&gt;start_searchd;</pre>
<p>You should verify that the effective UID of your perl app has all of the appropriate<br />
permissions:</p>
<ul>
<li>to create and write to the PID file (see &#8217;searchd&#8217; section of config, &#8216;pid_file&#8217;)</li>
<li>to create and write to the log file (see &#8217;searchd&#8217;/'log&#8217;)</li>
<li>to read the Sphinx database files (&#8216;path&#8217; in each of your &#8216;index&#8217; specifications)</li>
</ul>
<h3>Adding Content to the Index</h3>
<pre>  use Sphinx::Manager;</pre>
<pre>
  my $mgr = Sphinx::Manager-&gt;new({ config_file =&gt; ’/etc/sphinx.conf’ });</pre>
<pre>  $mgr-&gt;run_indexer('--rotate');</pre>
<p>Sphinx gets its content for indexing directly from the database, according to<br />
the &#8217;sql_query&#8217; given in the config file.  &#8216;run_indexer&#8217; simply runs the command<br />
line version of the Sphinx indexer program.  You can pass any indexer arguments<br />
through to &#8216;run_indexer&#8217;; &#8216;&#8211;rotate&#8217; is typical, to force searchd to start using<br />
the newly created index without disrupting searches while indexing is<br />
occurring.</p>
<h3>Searching</h3>
<p>Make sure you have a version of Sphinx::Search that is compatible with searchd.<br />
A compatibility list is given at the top of the Sphinx::Search perldoc.<br />
Hopefully a point will be reached where the Sphinx::Search client can support a<br />
range of searchd versions, but for the moment that is impractical.</p>
<p>Sphinx::Search can be used with any logging object that supports error, warn,<br />
info and debug methods.  In this example I have used Log::Log4perl.</p>
<pre>  use Sphinx::Search;</pre>
<pre>  use Log::Log4perl qw(:easy);</pre>
<pre>
  Log::Log4perl-&gt;easy_init($DEBUG);</pre>
<pre>
  $sph = Sphinx::Search-&gt;new( log =&gt; Log::Log4perl-&gt;get_logger('sphinx.search') );</pre>
<pre>
  my $results = $sph-&gt;setMatchMode(SPH_MATCH_ALL)</pre>
<pre>                    -&gt;Query("...");</pre>
<h3>Configuring</h3>
<p>Sphinx::Config provides the tools to read and write the Sphinx configuration file.</p>
<p>A typical problem is that searchd is running on a non-standard port (the default<br />
is 3312), so how will your perl app know where to find it?  Obviously you don&#8217;t<br />
want to hard-code port numbers in case they change&#8230;</p>
<p>use Sphinx::Search;<br />
use Sphinx::Config;<br />
use Log::Log4perl qw(:easy);</p>
<p>Log::Log4perl-&gt;easy_init($DEBUG);</p>
<p>$sph = Sphinx::Search-&gt;new( log =&gt; Log::Log4perl-&gt;get_logger(&#8217;sphinx.search&#8217;) );</p>
<p># Get port from config file<br />
$conf = Sphinx::Config-&gt;new;<br />
$conf-&gt;parse(&#8216;/etc/sphinx.conf&#8217;);<br />
my $port = $conf-&gt;get(&#8217;searchd&#8217;, undef, &#8216;port&#8217;);</p>
<p># Tell Sphinx client<br />
$sph-&gt;setServer(&#8216;localhost&#8217;, $port);</p>
<p>my $results = $sph-&gt;Query(&#8220;&#8230;&#8221;);</p>
<pre></pre>
<h3>Enjoy</h3>
<p>We have had a considerable amount of success using Perl and Sphinx.  I hope you<br />
do too.</p>
<pre></pre>
<h2></h2>
<h2></h2>
]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/2008/11/integrating-sphinx-into-perl-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adding Action Timings to your Catalyst Output</title>
		<link>http://notes.jschutz.net/2008/04/adding-action-timings-to-your-catalyst-output/</link>
		<comments>http://notes.jschutz.net/2008/04/adding-action-timings-to-your-catalyst-output/#comments</comments>
		<pubDate>Sat, 19 Apr 2008 14:22:27 +0000</pubDate>
		<dc:creator>jon</dc:creator>
				<category><![CDATA[Catalyst MVC Framework]]></category>
		<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/17/perl/adding-action-timings-to-your-catalyst-output</guid>
		<description><![CDATA[About a year ago, onemogin wrote an article on adding action timings to the HTML output of a Catalyst app.  To do so, it was necessary to access $c-&#62;stats, which at the time was an internal object (that is, there was no published API for it) and therefore subject to change.  As of [...]]]></description>
			<content:encoded><![CDATA[<p>About a year ago, onemogin wrote an article on <a href="http://www.onemogin.com/blog/559-adding-action-timings-to-your-output.html" target="_blank">adding action timings to the HTML output of a Catalyst app</a>.  To do so, it was necessary to access $c-&gt;stats, which at the time was an internal object (that is, there was no published API for it) and therefore subject to change.  As of Catalyst-Runtime 5.7012, $c-&gt;stats has a defined interface and returns a Catalyst::Stats object (or your own class, if you provide one) rather than the Tree::Simple object that it used to.</p>
<p>It&#8217;s easy to fix your code to work with 5.7012.  Onemogin&#8217;s code in the end() method looked like this:</p>
<pre><span style="color: #b1b100">  my</span> <span style="color: #0000ff">$tree</span> = <span style="color: #0000ff">$c</span>-&gt;<span style="color: #006600">stats</span><span style="color: #66cc66">(</span><span style="color: #66cc66">)</span>;

<span style="color: #b1b100">  my</span> <span style="color: #0000ff">$dvisit</span> = <span style="color: #000000; font-weight: bold">new</span> Tree::<span style="color: #006600">Simple</span>::<span style="color: #006600">Visitor</span><span style="color: #66cc66">(</span><span style="color: #66cc66">)</span>;
<span style="color: #0000ff">  $tree</span>-&gt;<span style="color: #006600">accept</span><span style="color: #66cc66">(</span><span style="color: #0000ff">$dvisit</span><span style="color: #66cc66">)</span>;
<span style="color: #0000ff">  $c</span>-&gt;<span style="color: #006600">stash</span>-&gt;<span style="color: #66cc66">{</span><span style="color: #ff0000">'action_stats'</span><span style="color: #66cc66">}</span> = <span style="color: #0000ff">$dvisit</span>-&gt;<span style="color: #006600">getResults</span><span style="color: #66cc66">(</span><span style="color: #66cc66">)</span>;</pre>
<p>which needs to become this:</p>
<pre>  my @report = $c-&gt;stats-&gt;report;
  $c-&gt;stash-&gt;{action_stats}= \@report;</pre>
<p>and your template will also need to change; here&#8217;s an example:</p>
<p><pre> 
 &lt;div id="stats"&gt;
 &lt;table border="0" cellspacing="0" cellpadding="0"&gt;
 [% space = '&amp;nbsp;&amp;nbsp;' %]
 &lt;tr&gt;&lt;th&gt;Action&lt;/th&gt;&lt;th&gt;Time&lt;/th&gt;&lt;/tr&gt;
 [% FOREACH r=action_stats %]
 &lt;tr&gt;&lt;td class="description"&gt;[% space.repeat(r.0) %][% r.1 | html %]&lt;/td&gt;
&lt;td class="elapsed"&gt;[% UNLESS r.3 %]+[% END %][% r.2 %]s&lt;/td&gt;&lt;/tr&gt;
 [% END %]
 &lt;/table&gt;
 &lt;/div&gt;
</pre></p>
<p>to produce an end result such as:</p>
<p><style type="text/css">
#stats { 
       font-family: arial,helvetica,sans-serif; 
       font-size: 100%;
       background: #E0E0E0;
       margin: 0;
       padding: 0;
       border: solid 1px #F00000; 
       width: 400px; 
       }

#stats table {
       width: 100%;
       border: 0;
}

#stats th {
       color: #FFFFFF;
       font-size: 120%;
       font-weight: bold;
       background: #D47878;
       border: 0; 
       }
#stats td {
       color: #000000;
       border: 0;
       line-height: 120%;
       padding-left: 5%;
       }
#stats td.elapsed {
       background: #F8F8F8;
       padding-left: 5%;
}
</style>
<div  id="stats">
<table border="0" cellspacing="0" cellpadding="0">

<tr><th>Action</th><th>Time</th></tr>

<tr><td class="description">/default</td><td class="elapsed">0.005895s</td></tr>

<tr><td class="description">&nbsp;&nbsp;-&gt; /look_left</td><td class="elapsed">0.00091s</td></tr>

<tr><td class="description">&nbsp;&nbsp;&nbsp;&nbsp;- starting critical bit</td><td class="elapsed">+0.000479s</td></tr>

<tr><td class="description">&nbsp;&nbsp;&nbsp;&nbsp;- critical bit complete</td><td class="elapsed">+0.000208s</td></tr>

<tr><td class="description">&nbsp;&nbsp;-&gt; /look_right</td><td class="elapsed">0.000587s</td></tr>

<tr><td class="description">&nbsp;&nbsp;-&gt; /look_left</td><td class="elapsed">0.000799s</td></tr>

<tr><td class="description">&nbsp;&nbsp;&nbsp;&nbsp;- starting critical bit</td><td class="elapsed">+0.000441s</td></tr>

<tr><td class="description">&nbsp;&nbsp;&nbsp;&nbsp;- critical bit complete</td><td class="elapsed">+0.000169s</td></tr>

<tr><td class="description">&nbsp;&nbsp;-&gt; /cross_over</td><td class="elapsed">0.001766s</td></tr>

<tr><td class="description">/end</td><td class="elapsed">0.000462s</td></tr>

</table>
</div>
</p>
<p>Here&#8217;s the bit of controller code that generated the example:</p>
<p><pre>
sub default : Private {
    my ( $self, $c ) = @_;

    $c->forward('look_left');
    $c->forward('look_right');
    $c->forward('look_left');
    $c->forward('cross_over');
}

sub look_left : Private {
    my ( $self, $c ) = @_;
    for (1 .. 100) {};
    $c->stats->profile("starting critical bit");
    for (1 .. 100) {};
    $c->stats->profile("critical bit complete");
}

sub look_right : Private {
    for (1 .. 1000) {};
}
sub cross_over : Private {
    for (1 .. 10000) {};
}

sub end : ActionClass('RenderView') {
    my ( $self, $c ) = @_;
    my @report = $c->stats->report;
    $c->stash->{action_stats}= \@report;
}

</pre></p>
]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/2008/04/adding-action-timings-to-your-catalyst-output/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Unicode Character Classes</title>
		<link>http://notes.jschutz.net/2007/11/unicode-character-classes/</link>
		<comments>http://notes.jschutz.net/2007/11/unicode-character-classes/#comments</comments>
		<pubDate>Tue, 13 Nov 2007 23:30:43 +0000</pubDate>
		<dc:creator>jon</dc:creator>
				<category><![CDATA[Perl]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[character classes]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://notes.jschutz.net/8/unicode/unicode-character-classes</guid>
		<description><![CDATA[These are the Unicode &#8220;General Category&#8221; character class names used in regular expression matching, e.g. in Perl, \pP  or \p{Punctuation} to match all Unicode characters having the &#8220;punctuation&#8221; property.


Expression
Syntax
Long Name
Description


Letter
:L
Letter
Matches any letter, Ll &#124; Lm &#124; Lo &#124; Lt &#124; Lu


Uppercase letter
:Lu
Uppercase_Letter
Matches any one capital letter. For example, :Luhe matches &#8220;The&#8221; but not &#8220;the&#8221;.


Lowercase [...]]]></description>
			<content:encoded><![CDATA[<p>These are the Unicode &#8220;General Category&#8221; character class names used in regular expression matching, e.g. in Perl, \pP  or \p{Punctuation} to match all Unicode characters having the &#8220;punctuation&#8221; property.</p>
<table cellspacing="0">
<tr>
<th>Expression</th>
<th>Syntax</th>
<th>Long Name</th>
<th>Description</th>
</tr>
<tr>
<td>Letter</td>
<td>:L</td>
<td>Letter</td>
<td>Matches any letter, Ll | Lm | Lo | Lt | Lu</td>
</tr>
<tr>
<td>Uppercase letter</td>
<td>:Lu</td>
<td>Uppercase_Letter</td>
<td>Matches any one capital letter. For example, <code class="ce">:Luhe</code> matches &#8220;The&#8221; but not &#8220;the&#8221;.</td>
</tr>
<tr>
<td>Lowercase letter</td>
<td>:Ll</td>
<td>Lowercase_Letter</td>
<td>Matches any one lower case letter. For example, <code class="ce">:Llhe</code> matches &#8220;the&#8221; but not &#8220;The&#8221;.</td>
</tr>
<tr>
<td>Title case letter</td>
<td>:Lt</td>
<td>Titlecase_Letter</td>
<td>Matches characters that combine an uppercase letter with a lowercase letter, such as Nj and Dz.</td>
</tr>
<tr>
<td>Modifier letter</td>
<td>:Lm</td>
<td>Modifier_Letter</td>
<td>Matches letters or punctuation, such as commas, cross accents, and double prime, used to indicate modifications to the preceding letter.</td>
</tr>
<tr>
<td>Other letter</td>
<td>:Lo</td>
<td>Other_Letter</td>
<td>Matches other letters, such as gothic letter ahsa.</td>
</tr>
<tr>
<td>Cased letter</td>
<td>:LC</td>
<td>Cased_Letter</td>
<td>Matches any letter with case, Ll | Lt | Lu</td>
</tr>
<tr>
<td>Mark</td>
<td>:M</td>
<td>Mark</td>
<td>Matches any mark, Mc | Me | Mn</td>
</tr>
<tr>
<td>Non-spacing mark</td>
<td>:Mn</td>
<td>Nonspacing_Mark</td>
<td>Matches non-spacing marks.</td>
</tr>
<tr>
<td>Combining mark</td>
<td>:Mc</td>
<td>Spacing_Mark</td>
<td>Matches combining marks.</td>
</tr>
<tr>
<td>Enclosing mark</td>
<td>:Me</td>
<td>Enclosing_Mark</td>
<td>Matches enclosing marks.</td>
</tr>
<tr>
<td>Number</td>
<td>:N</td>
<td>Number</td>
<td>Matches any number, Nd | Nl | No</td>
</tr>
<tr>
<td>Decimal digit</td>
<td>:Nd</td>
<td>Decimal_Number</td>
<td>Matches decimal digits such as 0-9 and their full-width equivalents.</td>
</tr>
<tr>
<td>Letter digit</td>
<td>:Nl</td>
<td>Letter_Number</td>
<td>Matches letter digits such as roman numerals and ideographic number zero.</td>
</tr>
<tr>
<td>Other digit</td>
<td>:No</td>
<td>Other_Number</td>
<td>Matches other digits such as old italic number one.</td>
</tr>
<tr>
<td>Punctuation</td>
<td>:<span>P</span></td>
<td>Punctuation</td>
<td>Matches any puncutation, Pc | Pd | Pe | Pf | Pi | Po | Ps</td>
</tr>
<tr>
<td>Connector punctuation</td>
<td> <img src='http://notes.jschutz.net/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> c</td>
<td>Connector_Punctuation</td>
<td>Matches the underscore or underline mark.</td>
</tr>
<tr>
<td>Dash punctuation</td>
<td> <img src='http://notes.jschutz.net/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> d</td>
<td>Dash_Punctuation</td>
<td>Matches the dash mark.</td>
</tr>
<tr>
<td>Open punctuation</td>
<td> <img src='http://notes.jschutz.net/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> s</td>
<td>Open_Punctuation</td>
<td>Matches opening punctuation such as open brackets and braces.</td>
</tr>
<tr>
<td>Close punctuation</td>
<td> <img src='http://notes.jschutz.net/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> e</td>
<td>Close_Punctuation</td>
<td>Matches closing punctuation such as closing brackets and braces.</td>
</tr>
<tr>
<td>Initial quote punctuation</td>
<td> <img src='http://notes.jschutz.net/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> i</td>
<td>Initial_Punctuation</td>
<td>Matches initial double quotation marks.</td>
</tr>
<tr>
<td>Final quote punctuation</td>
<td> <img src='http://notes.jschutz.net/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> f</td>
<td>Final_Punctuation</td>
<td>Matches single quotation marks and ending double quotation marks.</td>
</tr>
<tr>
<td>Other punctuation</td>
<td> <img src='http://notes.jschutz.net/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> o</td>
<td>Other_Punctuation</td>
<td>Matches commas (,), ?, &#8220;, !, @, #, %, &amp;, *, \, colons (:), semi-colons (;), &#8216;, and /.</td>
</tr>
<tr>
<td>Symbol</td>
<td>:S</td>
<td>Symbol</td>
<td>Matches any symbol, Sc | Sk | Sm | So</td>
</tr>
<tr>
<td>Math symbol</td>
<td>:Sm</td>
<td>Math_Symbol</td>
<td>Matches +, =, ~, |, &lt;, and &gt;.</td>
</tr>
<tr>
<td>Currency symbol</td>
<td>:Sc</td>
<td>Currency_Symbol</td>
<td>Matches $ and other currency symbols.</td>
</tr>
<tr>
<td>Modifier symbol</td>
<td>:Sk</td>
<td>Modifier_Symbol</td>
<td>Matches modifier symbols such as circumflex accent, grave accent, and macron.</td>
</tr>
<tr>
<td>Other symbol</td>
<td>:So</td>
<td>Other_Symbol</td>
<td>Matches other symbols, such as the copyright sign, pilcrow sign, and the degree sign.</td>
</tr>
<tr>
<td>Separator</td>
<td>:Z</td>
<td>Separator</td>
<td>Matches any separator, Zl | Zp | Zs</td>
</tr>
<tr>
<td>Paragraph separator</td>
<td>:Zp</td>
<td>Paragraph_Separator</td>
<td>Matches the Unicode character U+2029.</td>
</tr>
<tr>
<td>Space separator</td>
<td>:Zs</td>
<td>Space_Separator</td>
<td>Matches blanks.</td>
</tr>
<tr>
<td>Line separator</td>
<td>:Zl</td>
<td>Line_Separator</td>
<td>Matches the Unicode character U+2028.</td>
</tr>
<tr>
<td>Other control</td>
<td>:Cc</td>
<td>Control</td>
<td>Matches end of line.</td>
</tr>
<tr>
<td>Other format</td>
<td>:Cf</td>
<td>Format</td>
<td>Formatting control character such as the bidirectional control characters.</td>
</tr>
<tr>
<td>Surrogate</td>
<td>:Cs</td>
<td>Surrogate</td>
<td>Matches one half of a surrogate pair.</td>
</tr>
<tr>
<td>Other private-use</td>
<td>:Co</td>
<td>Private_Use</td>
<td>Matches any character from the private-use area.</td>
</tr>
<tr>
<td>Other not assigned</td>
<td>:Cn</td>
<td>Unassigned</td>
<td>Matches characters that do not map to a Unicode character.</td>
</tr>
</table>
<p><strong>References:</strong>  <a href="http://www.unicode.org"></a></p>
<p><a href="http://www.unicode.org">unicode.org</a></p>
<p><a href="http://www.unicode.org/Public/UNIDATA/UCD.html#Properties">Unicode Character Properties</a></p>
<p><a href="http://www.unicode.org/reports/tr18/">Unicode Regular Expressions</a></p>
<p><a href="http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt" target="_blank">Unicode Property Aliases </a></p>
<p><a href="http://search.cpan.org/~nwclark/perl-5.8.8/pod/perlre.pod" target="_blank">Perl Regular Expressions</a></p>
<p><a href="http://www.pcre.org/" target="_blank">PCRE</a></p>
]]></content:encoded>
			<wfw:commentRss>http://notes.jschutz.net/2007/11/unicode-character-classes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
