=head1 NAME

mod_perl and Relational Databases

=head1 Description

Creating dynamic websites with mod_perl often involves using
relational databases. C<Apache::DBI>, which provides a database
connections persistence which boosts the mod_perl performance, is
explained in this chapter.

=head1 Why Relational (SQL) Databases

Nowadays millions of people surf the Internet. There are millions of
Terabytes of data lying around. To manipulate the data new smart
techniques and technologies were invented. One of the major inventions
was the relational database, which allows us to search and modify huge
stores of data very quickly. We use B<SQL> (Structured Query Language)
to access and manipulate the contents of these databases.

=head1 Apache::DBI - Initiate a persistent database connection

When people started to use the web, they found that they needed to
write web interfaces to their databases.  CGI is the most widely used
technology for building such interfaces. The main limitation of a CGI
script driving a database is that its database connection is not
persistent - on every request the CGI script has to re-connect to the
database, and when the request is completed the connection is closed.

C<Apache::DBI> was written to remove this limitation. When you use it,
you have a database connection which persists for the process' entire
life.  So when your mod_perl script needs to use a database,
C<Apache::DBI> provides a valid connection immediately and your script
starts work right away without having to initiate a database
connection first.

This is possible only with CGI running under a mod_perl enabled
server, since in this model the child process does not quit when the
request has been served.

It's almost as straightforward as is it sounds; there are just a few
things to know about and we will cover them in this section.

=head2 Introduction

The DBI module can make use of the C<Apache::DBI> module.  When it
loads, the DBI module tests if the environment variable
C<$ENV{MOD_PERL}> is set, and if the C<Apache::DBI> module has already
been loaded. If so, the DBI module will forward every connect()
request to the C<Apache::DBI> module. C<Apache::DBI> uses the ping()
method to look for a database handle from a previous connect()
request, and tests if this handle is still valid.  If these two
conditions are fulfilled it just returns the database handle.

If there is no appropriate database handle or if the ping() method
fails, C<Apache::DBI> establishes a new connection and stores the
handle for later re-use. When the script is run again by a child that
is still connected, C<Apache::DBI> just checks the cache of open
connections by matching the I<host>, I<username> and I<password>
parameters against it. A matching connection is returned if available
or a new one is initiated and then returned.

There is no need to delete the disconnect() statements from your
code. They won't do anything because the C<Apache::DBI> module
overloads the disconnect() method with an empty one.

=head2 When should this module be used and when shouldn't it be used?

You will want to use this module if you are opening several database
connections to the server. C<Apache::DBI> will make them persistent
per child, so if you have ten children and each opens two different
connections (with different connect() arguments) you will have in
total twenty opened and persistent connections. After the initial
connect() you will save the connection time for every connect()
request from your C<DBI> module. This can be a huge benefit for a
server with a high volume of database traffic.

You must B<not> use this module if you are opening a special
connection for each of your users (meaning that the login arguments
are different for each user).  Each connection will stay persistent
and after a certain period the number of open connections will reach
the allowed limit (configured by the database server) and new database
connection opening requests will be refused, rendering your service
unusable for some of your users.

If you want to use C<Apache::DBI> but you have both situations on one
machine, at the time of writing the only solution is to run two
Apache/mod_perl servers, one which uses C<Apache::DBI> and one which
does not.

=head2 Configuration

After installing this module, the configuration is simple - add the
following directive to C<httpd.conf>

  PerlModule Apache::DBI

Note that it is important to load this module before any other
C<Apache*DBI> module and before the C<DBI> module itself!

You can skip preloading C<DBI>, since C<Apache::DBI> does that. But
there is no harm in leaving it in, as long as it is loaded after
C<Apache::DBI>.

=head2 Preopening DBI connections

If you want to make sure that a connection will already be opened when
your script is first executed after a server restart, then you should
use the C<connect_on_init()> method in the startup file to preload
every connection you are going to use. For example:

  Apache::DBI->connect_on_init
  ("DBI:mysql:myDB:myserver",
   "username",
   "passwd",
   {
    PrintError => 1, # warn() on errors
    RaiseError => 0, # don't die on error
    AutoCommit => 1, # commit executes immediately
   }
  );

As noted above, use this method only if you want all of apache to be
able to connect to the database server as one user (or as a very few
users), i.e. if your user(s) can effectively share the connection. Do
B<not> use this method if you want for example one unique connection
per user.

Be warned though, that if you call C<connect_on_init()> and your
database is down, Apache children will be delayed at server startup,
trying to connect. They won't begin serving requests until either they
are connected, or the connection attempt fails.  Depending on your DBD
driver, this can take several minutes!

=head2 Debugging Apache::DBI

If you are not sure if this module is working as advertised, you should
enable Debug mode in the startup script by:

  $Apache::DBI::DEBUG = 1;

Starting with C<ApacheDBI-0.84>, setting C<$Apache::DBI::DEBUG = 1>
will produce only minimal output. For a full trace you should set
C<$Apache::DBI::DEBUG = 2>.

After setting the DEBUG level you will see entries in the C<error_log>
both when C<Apache::DBI> initializes a connection and when it returns one
from its cache.  Use the following command to view the log in real
time (your C<error_log> might be located at a different path, it is
set in the Apache configuration files):

  tail -f /usr/local/apache/logs/error_log

I use C<alias> (in C<tcsh>) so I do not have to remember the path:

  alias err "tail -f /usr/local/apache/logs/error_log"

=head2 Database Locking Risks

Be very careful when locking the database (C<LOCK TABLE ...>) or
singular rows if you use C<Apache::DBI> or similar persistent
connections.  MySQL threads keep tables locked until the thread ends
(connection is closed) or the tables are unlocked.  If your session
die()'s while tables are locked, they will stay neatly locked as your
connection won't be closed either.

See the section L<Handling the 'User pressed Stop button'
case|guide::debug/Handling_the__User_pressed_Stop_button__case>
for more information on prevention.

=head2 Troubleshooting

=head3 The Morning Bug

The SQL server keeps a connection to the client open for a limited
period of time. In the early days of C<Apache::DBI> developers were
bitten by so called I<Morning bug>, when every morning the first users
to use the site received a C<No Data Returned> message, but after that
everything worked fine.

The error was caused by C<Apache::DBI> returning a handle of the
invalid connection (the server closed it because of a timeout), and
the script was dying on that error. The C<ping()> method was
introduced to solve this problem, but it didn't worked properly till
C<Apache::DBI> version 0.82 was released. In that version and
afterwards ping() was called inside the C<eval> block, which resolved
the problem.

It's possible that some C<DBD::> drivers don't have the ping() method
implemented. The C<Apache::DBI> manpage explains how to write one.

Another solution was found - to increase the timeout parameter when
starting the database server. Currently we startup C<MySQL> server
with a script C<safe_mysql>, so we have modified it to use this
option:

  nohup $ledir/mysqld [snipped other options] -O wait_timeout=172800

172800 seconds is equal to 48 hours. This change solves the problem,
but the ping() method works properly in C<DBD::mysql> as well.

=head3 Opening Connections With Different Parameters

When C<Apache::DBI> receives a connection request, before it decides
to use an existing cached connection it insists that the new
connection be opened in exactly the same way as the cached
connection. If you have one script that sets C<AutoCommit> and one
that does not, C<Apache::DBI> will make two different connections.  So
if for example you have limited Apache to 40 servers at most, instead
of having a maximum of 40 open connections you may end up with 80.

So these two connect() calls will create two different connections:

  my $dbh = DBI->connect
      ("DBI:mysql:test:localhost", '', '',
       {
        PrintError => 1, # warn() on errors
        RaiseError => 0, # don't die on error
        AutoCommit => 1, # commit executes immediately
       }
      ) or die "Cannot connect to database: $DBI::errstr";

  my $dbh = DBI->connect
      ("DBI:mysql:test:localhost", '', '',
       {
        PrintError => 1, # warn() on errors
        RaiseError => 0, # don't die on error
        AutoCommit => 0, # don't commit executes immediately
       }
      ) or die "Cannot connect to database: $DBI::errstr";

Notice that the only difference is in the value of C<AutoCommit>.

However, you are free to modify the handle immediately after you get
it from the cache. So always initiate connections using the same
parameters and set C<AutoCommit> (or whatever) afterwards. Let's
rewrite the second connect call to do the right thing (not to create
a new connection):

  my $dbh = DBI->connect
      ("DBI:mysql:test:localhost", '', '',
       {
        PrintError => 1, # warn() on errors
        RaiseError => 0, # don't die on error
        AutoCommit => 1, # commit executes immediately
       }
      ) or die "Cannot connect to database: $DBI::errstr";
  $dbh->{AutoCommit} = 0; # don't commit if not asked to

When you aren't sure whether you're doing the right thing, turn debug
mode on.

However, when the C<$dbh> attribute is altered after connect() it
affects all other handlers retrieving this database handle. Therefore
it's best to restore the modified attributes to their original value
at the end of database handle usage. As of C<Apache::DBI> version 0.88
the caller has to do it manually. The simplest way to handle this is
to localize the attributes when modifying them:

  my $dbh = DBI->connect(...) ...
  {
    local $dbh->{LongReadLen} = 40;
  }

Here the C<LongReadLen> attribute overrides the value set in the
connect() call or its default value only within the enclosing block.

The problem with this approach is that prior to Perl version 5.8.0
this causes memory leaks. So the only clean alternative for older Perl
versions is to manually restore the C<dbh>'s values:

  my @attrs = qw(LongReadLen PrintError);
  my %orig = ();
  
  my $dbh = DBI->connect(...) ...
  # store the values away
  $orig{$_} = $dbh->{$_} for @attrs;
  # do local modifications
  $dbh->{LongReadLen} = 40;
  $dbh->{PrintError}  = 1;
  # do something with the filehandle
  # ...
  # now restore the values
  $dbh->{$_} = $orig{$_} for @attrs;

Another thing to remember is that with some database servers it's
possible to access more than one database using the same database
connection. MySQL is one of those servers. It allows you to use a
fully qualified table specification notation. So if there is a
database I<foo> with a table I<test> and database I<bar> with its own
table I<test>, you can always use:

  SELECT from foo.test ...

or:

  SELECT from bar.test ...

So no matter what database you have used in the database name string
in the connect() call (e.g.: C<DBI:mysql:foo:localhost>) you can still
access both tables by using a fully qualified syntax.

Alternatively you can switch databases with C<USE foo> and C<USE bar>,
but this approach seems less convenient, and therefore error-prone.


=head3 Cannot find the DBI handler

You must use C<DBI::connect()> as in normal DBI usage to get your $dbh
database handler. Using the C<Apache::DBI> does not eliminate the need
to write proper C<DBI> code. As the C<Apache::DBI> man page states,
you should program as if you are not using C<Apache::DBI> at
all. C<Apache::DBI> will override the DBI methods where necessary and
return your cached connection. Any C<disconnect()> call will be just
ignored.

=head3 Apache:DBI does not work

Make sure you have it installed.

Make sure you configured mod_perl with either:

  PERL_CHILD_INIT=1 PERL_STACKED_HANDLERS=1

or 

  EVERYTHING=1

Use the example script eg/startup.pl (in the mod_perl
distribution). Remove the comment from the line.

  # use Apache::DebugDBI;

and adapt the connect string.  Do not change anything in your scripts
for use with C<Apache::DBI>.

=head3 Skipping connection cache during server startup

Does your error_log look like this?

  10169 Apache::DBI PerlChildInitHandler
  10169 Apache::DBI skipping connection cache during server startup
  Database handle destroyed without explicit disconnect at
  /usr/lib/perl5/site_perl/5.005/Apache/DBI.pm line 29.

If so you are trying to open a database connection in the parent httpd
process. If you do, children will each get a copy of this handle,
causing clashes when the handle is used by two processes at the same
time. Each child must have its own, unique, connection handle.

To avoid this problem, C<Apache::DBI> checks whether it is called
during server startup. If so the module skips the connection cache and
returns immediately without a database handle.

You must use the C<Apache::DBI-E<gt>connect_on_init()> method in the
startup file.

=head3 Debugging code which deploys DBI

To log a trace of C<DBI> statement execution, you must set the
C<DBI_TRACE> environment variable. The C<PerlSetEnv DBI_TRACE>
directive must appear before you load C<Apache::DBI> and C<DBI>.

For example if you use C<Apache::DBI>, modify your C<httpd.conf> with:

  PerlSetEnv DBI_TRACE "3=/tmp/dbitrace.log"
  PerlModule Apache::DBI

Replace C<3> with the TRACE level you want. The traces from each
request will be appended to C</tmp/dbitrace.log>. Note that the logs
might interleave if requests are processed concurrently.

Within your code you can control trace generation with the trace()
method:

  DBI->trace($trace_level)
  DBI->trace($trace_level, $trace_filename)

DBI trace information can be enabled for all handles using this DBI
class method. To enable trace information for a specific handle use
the similar C<$h-E<gt>trace> method.

Using the handle trace option with a C<$dbh> or C<$sth> is handy for
limiting the trace info to the specific bit of code that you are
interested in.

Trace Levels:

=over

=item * 0 - trace disabled.

=item * 1 - trace DBI method calls returning with results.

=item * 2 - trace method entry with parameters and exit with results.

=item * 3 - as above, adding some high-level information from the
driver and also adding some internal information from the DBI.

=item * 4 - as above, adding more detailed information from the driver
and also including DBI mutex information when using threaded perl.

=item * 5 and above - as above but with more and more obscure
information.

=back

=head1 mysql_use_result vs. mysql_store_result.

Since many mod_perl developers use mysql as their preferred SQL
engine, these notes explain the difference between C<mysql_use_result()> and
C<mysql_store_result()>. The two influence
the speed and size of the processes.

The C<DBD::mysql> (version 2.0217) documentation includes the
following snippet:

  mysql_use_result attribute: This forces the driver to use
  mysql_use_result rather than mysql_store_result. The former is
  faster and less memory consuming, but tends to block other
  processes. (That's why mysql_store_result is the default.)

Think about it in client/server terms.  When you ask the server to
spoon-feed you the data as you use it, the server process must buffer
the data, tie up that thread, and possibly keep any database locks
open for a long time.  So if you read a row of data and ponder it for
a while, the tables you have locked are still locked, and the server
is busy talking to you every so often. That is C<mysql_use_result()>.

If you just suck down the whole dataset to the client, then the server
is free to go about its business serving other requests.  This results
in parallelism since the server and client are doing work at the same
time, rather than blocking on each other doing frequent I/O. That is
C<mysql_store_result()>.

As the mysql manual suggests: you should not use C<mysql_use_result()>
if you are doing a lot of processing for each row on the client side.
This can tie up the server and prevent other threads from updating the
tables.

=head1 Transactions Not Committed with MySQL InnoDB Tables

Sometimes, when using MySQL's InnoDB table type, you may notice that
changes you committed in one process don't seem to be visible to other
processes.  You may not be aware that InnoDB tables use a default
approach to transactions that is actually more cautious than PostgreSQL
or Oracle's default.  It's called "repeatable read", and the gist of
it is that you don't see updates made in other processes since your
last commit.  There is an explanation of this here:
http://dev.mysql.com/doc/mysql/en/InnoDB_Consistent_read_example.html

This is actually not directly related to mod_perl, but you wouldn't
notice this issue when using CGI because reconecting to the database on
each request resets things just as doing a commit does.  It is the
persistent connections used with mod_perl that make this issue visible.

If you suspect this is causing you problems, the simplest way to
deal with it is to change the isolation level to "read committed" --
which is more like what PostgreSQL and Oracle do by default -- with
the "set transaction" command, described here:
http://dev.mysql.com/doc/mysql/en/InnoDB_transaction_isolation.html

=head1 Optimize: Run Two SQL Engine Servers

Sometimes you end up running many databases on the same machine. These
might have very varying database needs (such as one db with sessions,
very frequently updated but tiny amounts of data, and another with
large sets of data that's hardly ever updated) you might be able to
gain a lot by running two differently configured databases on one
server. One would benefit from lots of caching, the other would
probably reduce the efficiency of the cache but would gain from fast
disk access. Different usage profiles require vastly different
performance needs.

This is basically a similar idea to having L<two Apache
servers|guide::strategy/One_Plain_Apache_and_One_mod_perl_enabled_Apache_Servers>,
each optimized for its specific requirements.

=head1 Some useful code snippets to be used with relational Databases

In this section you will find scripts, modules and code snippets to
help you get started using relational Databases with mod_perl
scripts. Note that I work with C<mysql> ( http://www.mysql.com ), so
the code you find here will work out of box with mysql. If you use
some other SQL engine, it might work for you or it might need some
changes.  YMMV.

=head2 Turning SQL query writing into a short and simple task

Having to write many queries in my CGI scripts, persuaded me to write
a stand alone module that saves me a lot of time in coding and
debugging my code. It also makes my scripts much smaller and easier to
read. I will present the module here, with examples following:

Notice the C<DESTROY> block at the end of the module, which makes
various cleanups and allows this module to be used under mod_perl and
C<mod_cgi> as well. Note that you will not get the benefit of
persistent database handles with mod_cgi.

=head2 The My::DB module

The F<code/My-DB.pm> module

(Note that you will not find this on CPAN. at least not yet :)

=head2 My::DB Module's Usage Examples

To use C<My::DB> in your script, you first have to create a C<My::DB>
object:

  use vars qw($db_obj);
  my $db_obj = new My::DB or croak "Can't initialize My::DB object: $!\n";

Now you can use any of C<My::DB>'s methods. Assume that we have a
table called I<tracker> where we store the names of the users and what
they are doing at each and every moment (think about an online community
program).

I will start with a very simple query--I want to know where the users
are and produce statistics. C<tracker> is the name of the table.

    # fetch the statistics of where users are
  my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
    ("tracker",
     [qw(where_user_are)],
    );
  
  my %stats = ();
  my $total = 0;
  foreach my $r_row (@$r_ary){
    $stats{$r_row->[0]}++;
    $total++;
  }

Now let's count how many users we have (in table C<users>):

  my $count = $db_obj->sql_count_matched("users");

Check whether a user exists:

  my $username = 'stas';
  my $exists = $db_obj->sql_count_matched
  ("users",
   [username => ["=",$username]]
  );

Check whether a user is online, and get the time since she went online
(C<since> is a column in the C<tracker> table, it tells us when a user
went online):

  my @row = ();
  $db_obj->sql_get_matched_row
  (\@row,
   "tracker",
   ['UNIX_TIMESTAMP(since)'],
   [username => ["=",$username]]
  );
  
  if (@row) {
    my $idle = int( (time() - $row[0]) / 60);
    return "Current status: Is Online and idle for $idle minutes.";
  }

A complex query. I join two tables, and I want a reference to an array
which will store a slice of the matched query (C<LIMIT $offset,$hits>)
sorted by C<username>. Each row in the array is to include the fields
from the C<users> table, but only those listed in C<@verbose_cols>.
Then we print it out.

  my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
    (
     "tracker STRAIGHT_JOIN users",
     [map {"users.$_"} @verbose_cols],
     [],
     ["WHERE tracker.username=users.username",
      "ORDER BY users.username",
      "LIMIT $offset,$hits"],
    );
  
  foreach my $r_row (@$r_ary){
    print ...
  }

Another complex query. The user checks checkboxes to be queried by,
selects from lists and types in match strings, we process input and
build the C<@where> array. Then we want to get the number of matches
and the matched rows as well.

  my @search_keys = qw(choice1 choice2);
  my @where = ();
    # Process the checkboxes - we turn them into a regular expression
  foreach (@search_keys) {
    next unless defined $q->param($_) and $q->param($_);
    my $regexp = "[".join("",$q->param($_))."]";
    push @where, ($_ => ['REGEXP',$regexp]);
  }
  
    # Add the items selected by the user from our lists
    # selected => exact match
  push @where,(country => ['=',$q->param('country')]) if $q->param('country');
  
    # Add the parameters typed by the user
  foreach (qw(city state)) {
    push @where,($_ => ['LIKE',$q->param($_)]) if $q->param($_);
  }
  
     # Count all that matched the query
  my $total_matched_users =  $db_obj->sql_count_matched
    (
     "users",
     \@where,
    );
  
    # Now process the orderby
  my $orderby = $q->param('orderby') || 'username';
  
     # Do the query and fetch the data
  my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
  (
   "users",
   \@display_columns,
   \@where,
   ["ORDER BY $orderby",
    "LIMIT $offset,$hits"],
  );

C<sql_get_matched_rows_ary_ref> knows to handle both C<OR>ed and
C<AND>ed params. This example shows how to use C<OR> on parameters:

This snippet is an implementation of a watchdog.  Our users want to
know when their colleagues go online.  They register the usernames of
the people they want to know about.  We have to make two queries: one
to get a list of usernames, the second to find out whether any of
these users is online. In the second query we use the C<OR> keyword.

  # check who we are looking for
  $r_ary = $db_obj->sql_get_matched_rows_ary_ref
    ("watchdog",
     [qw(watched)],
     [username => ['=',$username)],
     ],
    );
  
    # put them into an array
  my @watched = map {$_->[0]} @{$r_ary};
  
  my %matched = ();
    # Does the user have some registered usernames?
  if (@watched) {
  
  # Try to fetch all the users who match the usernames exactly.
  # Put it into an array and compare it with a hash!
    $r_ary = $db_obj->sql_get_matched_rows_ary_ref
      ("tracker",
       [qw(username)],
       [username => ['=',\@watched],
       ]
      );
  
    map {$matched{$_->[0]} = 1} @{$r_ary};
  }
  
  # Now %matched includes the usernames of the users who are being
  # watched by $username and currently are online.

=head1 Maintainers

Maintainer is the person(s) you should contact with updates,
corrections and patches.

=over

=item *

Stas Bekman [http://stason.org/]

=back


=head1 Authors

=over

=item *

Stas Bekman [http://stason.org/]

=back

Only the major authors are listed above. For contributors see the
Changes file.


=cut