Scribe Server Configuration

Due to reconfiguration at Sourceforge, the Scribe Configuration page is no longer there.  This is a copy of the page http://scribeserver.wiki.sourceforge.net/Configuration rescued from Google’s cache.

Configuring Scribe

The Scribe Server can be configured by:

  1. the file specified in the -c command line option
  2. the file at DEFAULT_CONF_FILE_LOCATION in env_default.h

Global Configuration Variables

port: assigned to variable “port”

  • what port the scribe server will listen on
  • default 0, passed at command line with -p, can also be set in conf file

max_msg_per_second: assigned to variable “maxMsgPerSecond”

  • used in scribeHandler::throttleDeny
  • default 100,000

max_queue_size: in bytes, assigned to variable “maxQueueSize”

  • used in scribeHandler::Log
  • default 500,000 bytes

check_interval: in seconds, assigned to variable “checkPeriod”

  • used to control how often to check each store
  • default 5

new_thread_per_category: boolean, assigned to variable “newThreadPerCategory”

  • If true, will create a new thread for every category seen. Otherwise, will only create a single thread for every store defined in the configuration.
  • default true

Example:

port=1463
max_msg_per_second=2000000
max_queue_size=10000000
check_interval=3

Store Configuration

Scribe Server determines how to log messages based on the Stores defined in the configuration. Every store must specify what message category it handles with two exceptions:
default store: The ‘default’ category handles any category that is not handled by any other store. There can only be one default store.

  • category=default

prefix stores: If the specified category ends in a *, the store will handle all categories that begin with the specified prefix.

  • category=web*

Store Configuration Variables

category: Determines which messages are handled by this store
type:

  • Currently used types (defined in Store::createStore)
    1. file
    2. buffer
    3. network
    4. bucket
    5. thriftfile
    6. null
    7. multi

target_write_size: 16,384 bytes by default

  • determines how large to let the message queue grow for a given category before processing the messages

max_write_interval: 10 seconds by default

  • determines how long to let the messages queue for a given category before processing the messages

Example:

<store>
category=statistics
type=file
target_write_size=20480
max_write_interval=2

...

</store>

File Store Configuration

File Stores write messages to a file.

file_path: defaults to “/tmp”
base_filename: defaults to category name
rotate_period: “hourly”, “daily”, or “never”; “never” by default

  • determines how often to create new files

rotate_hour: 0-23, 1 by default

  • if rotation_period is daily, determines what hour of day to rotate

rotate_minute 0-59, 15 by default

  • if rotation_period is daily or hourly, determines how many minutes after the hour to rotate

max_size: 1,000,000,000 bytes by default

  • determines approximately how large to let a file grow before rotating to a new file

write_meta: “yes” or anything else; false by default

  • whether to log the following metadata in each file:
      1. the length of each message is prepended to the message as an unsigned integer
      2. if the file was rotated, the last line will contain “scribe_meta<new_logfile>: ” followed by the next filename

fs_type: currently only “std” is supported; “std” by default
chunk_size: 0 by default

  • if a chunk size is specified, no messages within the file will cross chunk boundaries unless there are messages larger than the chunk size

add_newlines: 0 or 1, 0 by default

  • if set to 1, will write a newline after every message

create_symlink: “yes” or anything else; “yes” by default

  • if true, will maintain a symlink that points to the most recently written file

Example:

<store>
category=sprockets
type=file
file_path=/tmp/sprockets
base_filename=sprockets_log
max_size=1000000
add_newlines=1
rotate_period=daily
rotate_hour=0
rotate_minute=10
</store>

Network Store Configuration

Network Stores forward messages to other Scribe Servers.

remote_host: name or ip of remote host to forward messages
remote_port: port number on remote host
timeout: socket timeout, in MS; defaults to DEFAULT_SOCKET_TIMEOUT_MS, which is set to 5000 in store.h
use_conn_pool: “yes” or anything else; defaults to false

  • whether to use connection pooling instead of opening up multiple connections to each remote host

Example:

<store>
category=default
type=network
remote_host=hal
remote_port=1465
</store>

Buffer Store Configuration

Buffer Stores must have two sub-stores named “primary” and “secondary”. Buffer Stores will first attempt to Log messages to the primary store and only log to the secondary if the primary is not available. Once the primary store comes back online, a Buffer store will read messages out of the secondary store and send them to the primary store. Only stores that are readable (store that implement the readOldest() method) may be used as secondary store. Currently, the only readable stores are File Stores and Null Stores.

max_queue_length: 2,000,000 messages by default

  • if the number of messages in the queue exceeds this value, the buffer store will switch to writing to the secondary store

buffer_send_rate: 1 by default

  • determines, for each check_interval, how many times to read a group of messages from the secondary store and send them to the primary store

retry_interval: 300 seconds by default

  • how long to wait to retry sending to the primary store after failing to write to the primary store

retry_interval_range: 60 seconds by default

  • will randomly pick a retry interval that is within this range of the specified retry_interval

Example:

<store>
category=default
type=buffer
buffer_send_rate=1
retry_interval=30
retry_interval_range=10

<primary>
type=network
remote_host=wopr
remote_port=1456
</primary>

<secondary>
type=file
file_path=/tmp
base_filename=thisisoverwritten
max_size=10000000
</secondary>
</store>

Note! When the network connection is re-established, the messages from the secondary store are sent one whole file at a time. Thus max_size determines not only the size of the file that triggers rotation, but also the size of the network messages; if this is too large, the receiver may not be able to handle it. Best to keep it to a number that can be comfortably handled in memory. max_size does not limit the total number of messages that can be buffered (presumably that’s limited by the amount of space available on the filesystem).

Bucket Store Configuration

Bucket Stores will hash messages to multiple files using a prefix of each message as the key.
a Bucket Store must have a substore named “bucket” that is either a File Store or ThriftFile Store.

num_buckets: defaults to 1

  • number of buckets to hash into
  • messages that cannot be hashed into any bucket will be put into a special bucket number 0

bucket_type: “key_hash” or “key_modulo”
delimiter: must be an ascii code between 0 and 255; if 0, uses DEFAULT_DELIMITER (set in common.h)

  • The message prefix up to(but not including) the first occurrence of the delimiter will be used as the key to do the hash/modulo

bucket_subdir: the name of each subdirectory will be this name followed by the bucket number
Example:

<store>
category=bucket_me
type=bucket
num_buckets=5
bucket_subdir=bucket
bucket_type=key_hash
delimiter=58

<bucket>
type=file
fs_type=std
file_path=/tmp/scribetest
base_filename=bucket_me
</bucket>
</store>

Null Store Configuration

Null Stores can be used to tell Scribe to ignore all messages of a given category.

(no configuration parameters)
Example:

<store>
category=tps_report*
type=null
</store>

Multi Store Configuration

A Multi Store is a store that will forward all messages to multiple sub-stores.

A Multi Store may have any number of substores named “store0″, “store1″, “store2″, etc
report_success: “all” or “any”, defaults to “all”

  • whether all substores or any substores must succeed in logging a message in order for the Multi Store to report the message logging as successful

Example:

<store>
category=default
type=multi
target_write_size=20480
max_write_interval=1

<store0>
type=file
file_path=/tmp/store0
</store0>

<store1>
type=file
file_path=/tmp/store1
</store1>
</store>
<store>

Thriftfile Store Configuration

A Thriftfile store is similar to a File store except that it stores messages in a Thrift TFileTransport file.

file_path: defaults to “/tmp”
base_filename: defaults to category name
rotate_period: “hourly”, “daily”, or “never”; “never” by default

  • determines how often to create new files

rotate_hour: 0-23, 1 by default

  • if rotation_period is daily, determines what hour of day to rotate

rotate_minute 0-59, 15 by default

  • if rotation_period is daily or hourly, determines how many minutes after the hour to rotate

max_size: 1,000,000,000 bytes by default

  • determines approximately how large to let a file grow before rotating to a new file

write_meta: “yes” or anything else; false by default

  • whether to log the following metadata in each file:
      1. the length of each message is prepended to the message as an unsigned integer
      2. if the file was rotated, the last line will contain “scribe_meta<new_logfile>: ” followed by the next filename

fs_type: currently only “std” is supported; “std” by default
chunk_size: 0 by default

  • if a chunk size is specified, no messages within the file will cross chunk boundaries unless there are messages larger than the chunk size

create_symlink: “yes” or anything else; “yes” by default

  • if true, will maintain a symlink that points to the most recently written file

flush_frequency_ms: milliseconds, will use TFileTransport default of 3000ms if not specified

  • determines how frequently to sync the Thrift file to disk

msg_buffer_size: in bytes, will use TFileTransport default of 0 if not specified

  • if non-zero, store will reject any writes larger than this size

Example:

<store>
category=sprockets
type=thriftfile
file_path=/tmp/sprockets
base_filename=sprockets_log
max_size=1000000
flush_frequency_ms=2000
</store>

Comments

Delicious Bookmark this on Delicious submit to reddit

Perl client for Facebook’s scribe logging software

Scribe is a log aggregator, developed at Facebook and released as open source. Scribe is built on Thrift, a cross-language RPC type platform, and therefore it is possible to use scribe with any of the Thrift-supported languages. Whilst Perl is one of the supported languages, there is little in the way of working examples, so here’s how I did it:

  1. Install Thrift.
  2. Build and install FB303 perl modules
      cd thrift/contrib/fb303
      # Edit if/fb303.thrift and add the line 'namespace perl Facebook.FB303' after the other namespace declarations
      thrift --gen perl if/fb303.thrift
      sudo cp -a gen-perl/ /usr/local/lib/perl5/site_perl/5.10.0 # or wherever you keep your site perl
    

    This creates the modules Facebook::FB303::Constants, Facebook::FB303::FacebookService and Facebook::FB303::Types.

  3. Install Scribe.
  4. Build and install Scribe perl modules
      cd scribe
      # Edit if/scribe.thrift and add 'namespace perl Scribe.Thrift' after the other namespace declarations
      thrift -I /path/to/thrift/contrib/ --gen perl scribe.thrift
      sudo cp -a gen-perl/Scribe /usr/local/lib/perl5/site_perl/5.10.0/ # or wherever
    
  5. This creates the modules Scribe::Thrift::Constants, Scribe::Thrift::scribe, Scribe::Thrift::Types.

      Here is an example program that uses the client (reading one line at a time from stdin and sending to a scribe instance running locally on port 1465):

      #! /usr/bin/perl
      
      use Scribe::Thrift::scribe;
      use Thrift::Socket;
      use Thrift::FramedTransport;
      use Thrift::BinaryProtocol;
      use strict;
      use warnings;
      
      my $host = 'localhost';
      my $port = 1465;
      my $cat = $ARGV[0] || 'test';
      
      my $socket = Thrift::Socket->new($host, $port);
      my $transport = Thrift::FramedTransport->new($socket);
      my $proto = Thrift::BinaryProtocol->new($transport);
      
      my $client = Scribe::Thrift::scribeClient->new($proto, $proto);
      my $le = Scribe::Thrift::LogEntry->new({ category => $cat });
      
      $transport->open();
      
      while (my $line = <>) {
          $le->message($line);
          my $result = $client->Log([ $le ]);
          if ($result == Scribe::Thrift::ResultCode::TRY_LATER) {
      	print STDERR "TRY_LATER\n";
          }
          elsif ($result != Scribe::Thrift::ResultCode::OK) {
      	print STDERR "Unknown result code: $result\n";
          }
      }
      
      $transport->close();
      

      UPDATE Log::Dispatch::Scribe is now available on CPAN. Also works with Log::Log4perl. Note though, you still need to install Thrift and Scribe perl modules as described above.

Comments

Delicious Bookmark this on Delicious submit to reddit

Sphinx Search Engine Performance

The following is a summary of some real-world data collected from the Sphinx query logs on a cluster of 15 servers. Each server runs its own copy of Sphinx, Apache, a busy web application, MySQL and miscellaneous services.

The dataset contains 453 million query log instances from 180 Sphinx indexes, collected over several months, using Sphinx version 0.9.8 on Linux kernel 2.6.18. The servers are all Dell PowerEdge 1950 with Quad Core Intel® Xeon® E5335, 2×4MB Cache, 2.0GHz, 1333MHz FSB, SATA drives, 7200rpm.

Keep in mind, though, that this is real world data and not a controlled test. This is how Sphinx performed in our environment, for the particular way we use Sphinx.

The graph below displays the response time distribution for all servers and all indexes, and shows, for example, that 60% of queries complete within 0.01 secs, 80% within 0.1 secs and 99% within 0.5 secs. Response times tend to occur in 3 bands (corresponding to the peaks in the frequency graph) – <0.001 sec, 0.03 sec and 0.3secs, which partly relates to the number of disk accesses required to fulfil a request. At 0.001 sec, all data is in memory, while at 0.3 secs, several disk accesses are occurring. Whilst the middle peak is not so obvious in this graph, the per-server or per-index graphs often have different distributions but still tend to have peaks at one or more of these three bands.
Sphinx Query Response Times Total for all servers, all indexes

The next observation is that query word count affects performance, but not necessarily in proportion to the number of query words, as shown in the graph below. 1-4 word queries consistently offer best performance. The 6-50 words range is consistently the slowest, most likely because the chance of finding documents with multiple matches is high so there is extra ranking effort involved. Above 50, there is presumably a higher chance of having words with few matches, which speeds up the ranking process.
Sphinx Query Response Time by Query Word Count

Finally, we see that the size of the inverted index (.spd files) also affects performance. The three graphs below show how the response time distribution tends to move to the right as the index size increases. The larger the index, the higher the chance that data will need to be re-read from disk (rather than from Sphinx-internal or system buffers/cache), hence this is not unexpected.
Sphinx Query Response Times for Index Sizes 1MB - 3MB
Sphinx Query Response Times for Index Sizes 3MB - 30MBSphinx Query Response Times for Index Sizes >30MB

Here is a PDF summary of Sphinx performance for this dataset, including many additional graphs of the data by server and by index.

Comments

Delicious Bookmark this on Delicious submit to reddit

MySQL Multi-Select Performance – The Sequel

Following my original post, it was suggested to me that one of the following may give better performance:

  • SELECT … UNION SELECT …
  • Using a temporary table with an index.

Well, not so.  I have added the above cases to my benchmarking script, and updated the graph as shown below.

SELECT … UNION gave all sorts of problems.  Firstly, it broke at a query set size of 1000 with the error

Can't open file: './bench/test1.frm' (errno: 24)

After a bit of searching I found that the remedy for this was to increase the MySQL open_files_limit setting (was 1024, increased to 8192).  This got it going again, only to fall over once more at a query set size of 10000, this time with the error

parser stack overflow near 'UNION SELECT ...

to which I could not find a solution.  In any case, the performance as shown in the graph is closely tracking the exponential degradation of the SELECT + OR case.  Conclusion: SELECT UNIONs are not suited for a large number of unions.  Useful when merging the results of several different SELECT statements, though.

The addition of an index to the temporary table also had no appreciable effect in this test, probably because MySQL will use the index in the main table to search while scanning through the temporary table.  Perhaps there might be an improvement for the case where the temporary table is larger than the main table – but that would imply duplicates in the temporary table.

Comments (1)

Delicious Bookmark this on Delicious submit to reddit

MySQL – Many-row SELECT Performance – “OR” bad, “IN” good

Consider the situation where you have a list of row IDs and you need to retrieve the data for each of the rows.  The simplest way is to make one query per row, i.e.

(A) SELECT * from data_table WHERE id=?

For a large number of rows, that results in a lot of queries.  This could be condensed into one query, such as:

(B) SELECT * from data_table WHERE id=1 OR id=2 OR id=3 …

or

(C) SELECT * from data_table WHERE id IN (1,2,3,…)

When constructing potentially large SQL statements such as these (imagine if you wanted to retrieve 1,000,000 rows), it’s important to take into account the max_allowed_packet size which restricts the length of the query.  It might be necessary to divide the data up into several blocks and make a query for each block to ensure max_allowed_packet is not exceeded.

Another approach is to create a temporary table, insert the keys of the required rows, then do a JOIN query to retrieve the data, i.e.

(D) CREATE TEMPORARY TABLE tmp ( id INT(11) );

INSERT INTO tmp (id) VALUES (1), (2), (3), …

SELECT d.* FROM data_table d JOIN tmp USING (id)

This approach is somewhat cleaner, particularly when multiple keys are involved.  With multiple keys the WHERE syntax of the prior options becomes:

WHERE (key1=x1 AND key2=y1) OR (key1=x2 AND key2=y2) …

or

WHERE (key1, key2) IN ((x1, y1), (x2, y2), …)

Under the temporary table approach, the question then arises as to how to most efficiently insert the data. A ‘LOAD DATA INFILE’ approach is the most efficient way to load a table, but here we assume this is not an option as it is not readily portable (due to security settings that differ between local and remote MySQL daemons).  The example (D) above assumes a long INSERT statement, which again may be affected by max_allowed_packet.  Other options include:

(E) Multiple single INSERTs, INSERT INTO tmp (id) VALUE (?)

(F) Multiple single INSERTs in a transaction block, begin_work .. commit

(G) Multiple single INSERTs as an array, using the DBI execute_array() function

(H) As for (G), in a transaction block.

These options were benchmarked using MySQL 5.0.45 and the results are shown in the figure below.  As would be expected, the use of single select statements scales linearly.  For small query set sizes, the setup times for the different query approaches have significant impact on the performance; as the query set size increases, three classes emerge – one group that performs similarly to single selects, another that performs much much better, and one that lives on a completely different planet (one you wouldn’t want to visit).  In summary:

  • That SELECT + IN(…) (case C) offers best performance when the query set size is above 30 or so.  It is also interesting to note that the performance of SELECT + IN(…) is very similar to using a temporary table with a single, long INSERT statement for large query set sizes, presumably because internally the IN(…) operation is essentially implemented as a temporary table.
  • That SELECT + OR (case B) is a good choice for query set size < 30
  • That SELECT + OR hits a point where performance becomes exponentially worse (not shown on the graph, for the largest data set the performance reaches 1300s per query set!  Curiously, this is elapsed time, but CPU time does not significantly increase. This suggest there are some inefficient data moves/swapping occurring).

In short, as a rule of thumb, use SELECT + OR for query sets < 30 in size, and SELECT + IN(…) otherwise.

The SELECT + OR performance is a significant result; the Perl SQL::Abstract library turns a WHERE specification such as { A => [ 1, 2, 3] } into  WHERE ( ( ( A = ? ) OR ( A = ? ) OR ( A = ? ) ) ).  It will do the same if there are 1000 options (try it – perl -MSQL::Abstract -e ‘$sql = SQL::Abstract->new; $w = $sql->where({ A => [ 1 .. 1000]}); print $w’).  Thus libraries that use SQL::Abstract, such as DBIx::Class, are similarly affected.  A perfectly reasonable approach from the library’s perspective, but potentially a significant performance hit if used in this manner.

Feel free to review my benchmarking code and tell me if I’ve got it wrong…

UPDATE Nov 19 2008:  There is a sequel post that looks at SELECT … UNION and using a temporary table with an index.

Comments (1)

Delicious Bookmark this on Delicious submit to reddit

Integrating Sphinx into Perl Applications

Sphinx is a full-text search engine (http://www.sphinxsearch.com) designed
primarily for full-text search of database content.  It has many features but in
my opinion its best assets are speed of search and scalability.

We started using Sphinx when MySQL built-in full-text search was becoming too
slow and too CPU intensive, and of questionable accuracy.  Sphinx is lightning
fast compared to MySQL and provides better results relevancy.

This note is about integration with the standalone Sphinx search server. Sphinx
also has a component (‘SphinxSE’) that runs as a MySQL 5 engine so can be used as
a direct replacement for MySQL full-text search; to use SphinxSE, standard Perl
DBI should be all that is necessary.

What you will need:

The following CPAN modules are likely to be useful:

Sphinx::Search
Sphinx::Manager
Sphinx::Config

Sphinx::Manager provides facilities to start and stop the search server and to
run the indexer.

Sphinx::Search provides the search API.

Sphinx::Config allows you to read/write the Sphinx configuration files from
code, in case you wish to maintain the configuration elsewhere (e.g. in your
database).

Putting it all together:

Running the Sphinx searchd server

Sphinx operates most efficiently if it is allowed to run persistently as a
background service.  Theoretically, you could start the Sphinx server, do a
search and then stop it on every request, with a small amount of overhead – but
here we will consider just the typical case.

Ideally you will use your operating system tools start such as daemontools,
monit or just the SysV startup scripts to start and monitor searchd, rather than
have to worry about it in your perl app.  But, if you need or want to start it
in perl:

  use Sphinx::Manager;
  my $mgr = Sphinx::Manager->new({ config_file => ’/etc/sphinx.conf’ });
  $mgr->start_searchd;

You should verify that the effective UID of your perl app has all of the appropriate
permissions:

  • to create and write to the PID file (see ’searchd’ section of config, ‘pid_file’)
  • to create and write to the log file (see ’searchd’/'log’)
  • to read the Sphinx database files (‘path’ in each of your ‘index’ specifications)

Adding Content to the Index

  use Sphinx::Manager;
  my $mgr = Sphinx::Manager->new({ config_file => ’/etc/sphinx.conf’ });
  $mgr->run_indexer('--rotate');

Sphinx gets its content for indexing directly from the database, according to
the ’sql_query’ given in the config file.  ‘run_indexer’ simply runs the command
line version of the Sphinx indexer program.  You can pass any indexer arguments
through to ‘run_indexer’; ‘–rotate’ is typical, to force searchd to start using
the newly created index without disrupting searches while indexing is
occurring.

Searching

Make sure you have a version of Sphinx::Search that is compatible with searchd.
A compatibility list is given at the top of the Sphinx::Search perldoc.
Hopefully a point will be reached where the Sphinx::Search client can support a
range of searchd versions, but for the moment that is impractical.

Sphinx::Search can be used with any logging object that supports error, warn,
info and debug methods.  In this example I have used Log::Log4perl.

  use Sphinx::Search;
  use Log::Log4perl qw(:easy);
  Log::Log4perl->easy_init($DEBUG);
  $sph = Sphinx::Search->new( log => Log::Log4perl->get_logger('sphinx.search') );
  my $results = $sph->setMatchMode(SPH_MATCH_ALL)
                    ->Query("...");

Configuring

Sphinx::Config provides the tools to read and write the Sphinx configuration file.

A typical problem is that searchd is running on a non-standard port (the default
is 3312), so how will your perl app know where to find it?  Obviously you don’t
want to hard-code port numbers in case they change…

use Sphinx::Search;
use Sphinx::Config;
use Log::Log4perl qw(:easy);

Log::Log4perl->easy_init($DEBUG);

$sph = Sphinx::Search->new( log => Log::Log4perl->get_logger(’sphinx.search’) );

# Get port from config file
$conf = Sphinx::Config->new;
$conf->parse(‘/etc/sphinx.conf’);
my $port = $conf->get(’searchd’, undef, ‘port’);

# Tell Sphinx client
$sph->setServer(‘localhost’, $port);

my $results = $sph->Query(“…”);


Enjoy

We have had a considerable amount of success using Perl and Sphinx.  I hope you
do too.


Comments

Delicious Bookmark this on Delicious submit to reddit

Adding Action Timings to your Catalyst Output

About a year ago, onemogin wrote an article on adding action timings to the HTML output of a Catalyst app. To do so, it was necessary to access $c->stats, which at the time was an internal object (that is, there was no published API for it) and therefore subject to change. As of Catalyst-Runtime 5.7012, $c->stats has a defined interface and returns a Catalyst::Stats object (or your own class, if you provide one) rather than the Tree::Simple object that it used to.

It’s easy to fix your code to work with 5.7012. Onemogin’s code in the end() method looked like this:

  my $tree = $c->stats();

  my $dvisit = new Tree::Simple::Visitor();
  $tree->accept($dvisit);
  $c->stash->{'action_stats'} = $dvisit->getResults();

which needs to become this:

  my @report = $c->stats->report;
  $c->stash->{action_stats}= \@report;

and your template will also need to change; here’s an example:

 
 <div id="stats">
 <table border="0" cellspacing="0" cellpadding="0">
 [% space = '&nbsp;&nbsp;' %]
 <tr><th>Action</th><th>Time</th></tr>
 [% FOREACH r=action_stats %]
 <tr><td class="description">[% space.repeat(r.0) %][% r.1 | html %]</td>
<td class="elapsed">[% UNLESS r.3 %]+[% END %][% r.2 %]s</td></tr>
 [% END %]
 </table>
 </div>

to produce an end result such as:

ActionTime
/default0.005895s
  -> /look_left0.00091s
    - starting critical bit+0.000479s
    - critical bit complete+0.000208s
  -> /look_right0.000587s
  -> /look_left0.000799s
    - starting critical bit+0.000441s
    - critical bit complete+0.000169s
  -> /cross_over0.001766s
/end0.000462s

Here’s the bit of controller code that generated the example:

sub default : Private {
    my ( $self, $c ) = @_;

    $c->forward('look_left');
    $c->forward('look_right');
    $c->forward('look_left');
    $c->forward('cross_over');
}

sub look_left : Private {
    my ( $self, $c ) = @_;
    for (1 .. 100) {};
    $c->stats->profile("starting critical bit");
    for (1 .. 100) {};
    $c->stats->profile("critical bit complete");
}

sub look_right : Private {
    for (1 .. 1000) {};
}
sub cross_over : Private {
    for (1 .. 10000) {};
}

sub end : ActionClass('RenderView') {
    my ( $self, $c ) = @_;
    my @report = $c->stats->report;
    $c->stash->{action_stats}= \@report;
}

Comments (1)

Delicious Bookmark this on Delicious submit to reddit

MSIE Cookies Bite Back!

Here we are in 2008. We build computers with RAM measured in GB and disk in TB. I just discovered (the hard way) that Microsoft Internet Explorer can only handle 4096 bytes of cookies for a page in JavaScript. Total. Not each. Total.

Worse, if the cookies on your page exceed this limit and you try to read the cookies using document.cookie, you don’t just get some of the cookies or a set that is truncated to 4096 bytes; you get NOTHING.

From the Microsoft Knowledge Base: “For one domain name, each cookie is limited to 4,096 bytes. This total can exist as one name-value pair of 4 kilobytes (KB) or as up to 20 name-value pairs that total 4 KB. … If you use the document.cookie property to retrieve the cookie on the client side, the document.cookie property can retrieve only 4,096 bytes. This byte total can be one name-value pair of 4 KB, or it can be up to 20 name-value pairs that have a total size of 4 KB.”

Stack that up against RFC 2965, which says:

   ...general-use

   user agents SHOULD provide each of the following minimum capabilities

   individually, although not necessarily simultaneously:      *  at least 300 cookies

*  at least 4096 bytes per cookie (as measured by the characters

         that comprise the cookie non-terminal in the syntax description

         of the Set-Cookie2 header, and as received in the Set-Cookie2

         header)

*  at least 20 cookies per unique host or domain name

User agents created for specific purposes or for limited-capacity

   devices SHOULD provide at least 20 cookies of 4096 bytes, to ensure

   that the user can interact with a session-based origin server.

According to the references, this problem applies up to MSIE 6.0, but testing shows it is still a problem in IE 7.

Needless to say, this is only a problem in IE.  Firefox and Safari, although they presumably have some limit, do not suffer the same ridiculously small bound.

Test it yourself; here is a simple cookie limit test page containing a script that sets 10 cookies, each of about 72 bytes, printing document.cookies at each iteration. On first visit, the cookies disappear at iteration 6, and on subsequent visits at iteration 1 (until you clear cookies or close your browser).

I wonder how many shopping carts this has broken.

References:

Comments

Delicious Bookmark this on Delicious submit to reddit

Page Load Times and Visitor Abandonment

This is some 2005 data, courtesy http://www.marketingexperiments.com/improving-website-conversion/page-weight.html and http://www.emarketer.com/

Visitor Abandonment
Page Load Time Percent of Users
Continuing to Wait
10 seconds 84%
15 seconds 51%
20 seconds 26%
30 seconds 5%

Check box What You Need To UNDERSTAND: You will lose nearly half your visitors if they have to wait longer than 15 seconds for a page to load. Only 5% of visitors will wait longer than 30 seconds.

In 2008, where most of the population is on broadband, I expect that visitors are less patient than ever.

Update:  2006 data, quoted from http://www.avactis.com/forums/index.php?showtopic=238 : “The research shows that four seconds is the maximum length of time an average online shopper will wait for a Web page to load before abandoning one retail site and moving on to another.”

 

Comments

Delicious Bookmark this on Delicious submit to reddit

MP3 to WAV Conversion on Linux

MP3 to WAV conversion is remarkably simple:

mpg123 -w out.wav in.mp3

For the purpose of writing an audio CD, a sample rate of 44100 Hz and stereo output are essential:

mpg123 --stereo -r 44100 -w out.wav in.mp3

And then to write the WAV files to a CD:

cdrecord -audio -pad *.wav

Comments

Delicious Bookmark this on Delicious submit to reddit

« Previous entries Next Page » Next Page »