Template::Recall vs. Template::Toolkit

2014-12-04, James Robson, http://soundly.me

From the time I cared about templates, I've always disliked template systems that 'leak' logic into the template. I wrote about it here and argued about it here. (Long ago, in a galaxy far, far away...)

I'm fully aware of the arguments for and against different types of template systems, and I'm also aware that there is no definite "right answer". At some point we're just arguing small matters of taste.

One of the more casual arguments I made above was that there are cases where the design of pipeline templates have a higher CPU cost because they walk over the same data twice: Once when your code generates the data, and again when the template engine generates the output, i.e. does [% FOREACH %] or whatever.

I showed my reasoning in the article above, but never actually did any comparisons to quantify those differences. So I decided to try my own small, pure Perl module, Template::Recall against the heavyweight of template systems, Template::Toolkit.

Toolkit is a high quality system, and seems to be the defacto standard for Perl. In my opinion it suffers by virtue of being a pipeline system, but the developers have done a remarkable job adressing the shortcomings of this design (mainly by providing a fast, native backend for heavy processing, as you'll see).

The following comparisons are based on outputting an array of 10,000 rows containing anonymous arrays of 10 random numbers. Generated thusly:

my $arr = [];
for (0..9_999) {
    my $a = [];
    for (0..9) {
        push @$a, sprintf '%2.d', int(rand(99)+1);
    push @$arr, $a;

Comparison 1

The first comparison I did was pretty simple, at least with regard to templates. Basically, I just wanted to output the above data as a simple block of text:

93 58 55  1 39 82  1 59 24 88
17 48 30 99 67 77 27 93 66 32
65 75 19 42  2 31 11 77 42 13
53 51 50 51  8 70 91 94 50 51
37 44 96 74 34 45 36 13  4 92

I setup the test for Template::Toolkit as follows:

my $tt = Template->new;
my %vars = ( arr => $arr );
my $code = sub {
    my $output;
    $tt->process('test.tt', \%vars, \$output) or die $tt->error;

The template file test.tt looks like:

[% FOREACH a = arr %][% FOREACH b = a %] [% b -%][% END %]
[% END -%]

I setup the Template::Recall test as follows:

my $tr = Template::Recall->new(template_path => 'test.tr');
my $code2 = sub {
        my $output;
        foreach my $a (@$arr) {
            $output .= $tr->render('row', { data => join ' ', @$a } );

Its template file test.tr looks like this:

[=row=] ['data']

(Hardly anything there, I know. A clever person will probably notice that between the two templates, we've basically inverted where the display logic fires.)

I then ran the test using Benchmark::cmpthese:

        'Template::Toolkit' => $code,
        'Template::Recall' => $code2
    } );

The results are quite startling:

Rate Template::Toolkit  Template::Recall
Template::Toolkit 2.17/s                --              -88%
Template::Recall  18.2/s              741%                --

Toolkit has a natively compiled backend, as well as a pure Perl backend. It seems to switch between them as it deems necessary. In the above test the pure Perl backend of Toolkit is used (maybe because of the simplicity of the template?), and Recall outperforms it by a wide margin.

Comparison 2

The next comparison is based on a more complicated template that groups the data by the first element of each array. E.g. in the HTML table we output, all the arrays that start with 33 will clump together:

<tr><td colspan="10"><h2>33</h2></td></tr<tr><td colspan="10"><h2>33</h2></td></tr>

<td> 1</td>
<td> 3</td>


<td> 7</td>


Basically, it means you have to do some pre-processing of $arr.

Toolkit is based on the following code, and uses a hash to group the sub arrays:

%vars = ();
$code = sub {
    my %h;
    my $output;
    foreach my $a (@$arr) {
        $h{$$a[0]} = [] if !exists $h{$$a[0]};
        push $h{$$a[0]}, $a;
    %vars = ( h => \%h, title => 'Template::Toolkit' );
    $tt->process('test2.tt', \%vars, \$output) or die $tt->error;

And the template in test2.tt uses some nested loops:

<table border="1">
<tr><td colspan="10"><h1>[% title %]</h1></td></tr>
[% FOREACH key IN h.keys %]
<tr><td colspan="10"><h2>[%key%]</h2></td></tr>
[% FOREACH a = h.$key %]
[% FOREACH b = a %]<td>[%b%]</td>[% END %]
[% END %]
[% END %]

Recall uses the following, and you can see that all the logic, even 'presentation' logic remains in the code. (Which is basically my argument for this kind of template.)

[Update: I didn't realize, but Google basically came to this same conclusion with their internal template system.]

$tr = Template::Recall->new(template_path => 'test2.tr');
$code2 = sub {
    my $output;
    my %h;
    foreach my $a (@$arr) {
        $h{$$a[0]} = [] if !exists $h{$$a[0]};
        push $h{$$a[0]}, $a;

    $output .= $tr->render('head', { title => 'Template::Recall' });
    foreach my $k (keys %h) {
        $output .= $tr->render('group', { name => $k});
        foreach my $a (@{$h{$k}}) {
            $output .= $tr->render('row_start');
            foreach my $b (@$a) {
                $output .= $tr->render('field', { data => $b });
            $output .= $tr->render('row_end');
    $output .= $tr->render('foot');

The template in test2.tr is as follows:

<table border="1">
<tr><td colspan="10"><h1>['title']</h1></td></tr>
<tr><td colspan="10"><h2>['name']</h2></td></tr>

We do the comparison pretty much the same as above, although with fewer iterations (don't worry, the results were consistent):

        'Template::Toolkit' => $code,
        'Template::Recall' => $code2,
    } );

Here, Toolkit is actually slightly faster:

Rate  Template::Recall Template::Toolkit
Template::Recall  1.85/s                --              -11%
Template::Toolkit 2.09/s               13%                --

These are executing within a second of each other, as timethese shows:

Template::Recall:  5 wallclock secs ( 5.89 usr +  0.00 sys =  5.89 CPU) @  1.70/s (n=10)
Template::Toolkit:  5 wallclock secs ( 4.90 usr +  0.01 sys =  4.91 CPU) @  2.04/s (n=10)

It seemed to me pretty remarkable that the difference in speed would be so drastic in the simple comparison, but in the complex comparison it would be close. The answer to this is in the natively compiled backend.

In stepping through the Toolkit code in Perl's debugger we reach this line, just prior to getting the returned $output var:

55:     my $context = $self->{ CONTEXT };
DB<8> p Dumper $context

... snip ...
     '_DEBUG' => 0,
     'title' => 'Template::Toolkit',
     '_PARENT' => bless( {
                           '_STRICT' => undef,
                           'global' => $VAR1->{'STASH'}{'global'},
                           'inc' => $VAR1->{'STASH'}{'inc'},
                           '_DEBUG' => 0,
                           'dec' => $VAR1->{'STASH'}{'dec'},
                           '_PARENT' => undef
                         }, 'Template::Stash::XS' )
   }, 'Template::Stash::XS' ),

The context of the processor has switched to Template::Stash::XS, as opposed to Template::Stash which is the pure Perl implementation.


Template::Recall, a two file module with probably less than 200 lines of pure Perl is, depending on the context, either comparable or much faster than Template::Toolkit. It relies on no native component for performance.

This performance difference speaks solely to the design philosophy behind the two template systems. Toolkit is an excellently written pipeline system with lots of functionality and conscientious performance decisions.

My interest in creating Template::Recall was not exclusively performance, but true separation of concerns. Basically, I wanted to keep all the if, for, foreach, etc, statements in the code, where I feel they belong. The fact that its design lends itself to good performance is a nice, if slightly incidental perk.