Advanced DBI
Perl Conference 3.0 Tutorial
August 1999
Tim Bunce

Topical Topics
Wheels within Wheels
Error Checking and Handling
Transactions
Speed Speed Speed!
Portability
The Power of the Proxy
DBI for the Web
Handling LONG/BLOB data
Tainting
Flexing the Multiplex

Wheels within Wheels
The DBI architecture
and how to watch it at work

Under the hood
DBI defines and implements an interface
Driver modules do much of the real work
DBI provides default methods, functions, tools etc  for drivers
Not limited to the lowest common denominator -  mechanism provided for driver specific extensions
Designed and built for speed
Valuable detailed call tracing/debugging built-in

A picture is worth?

Setting the scene
Inner and outer worlds
Application and Drivers
Inner and outer handles
DBI handles are references to tied hashes
The DBI Method Dispatcher
gateway between the worlds and the heart of the DBI
… Now we'll go all deep and visual for a while...

Architecture of the DBI classes #1

Architecture of the DBI classes #2

Anatomy of a DBI handle

Method call walk-through
Consider a simple prepare call:
  $dbh->prepare(...)
$dbh is reference to a DBI::db object (regardless of driver)
DBI::db::prepare is an alias for DBI dispatch method
Dispatch calls driver’s prepare method something like this:
   my $inner_hash_ref = …     # from tie magic
   my $implementor_class = …  # from DBI magic data
   $inner_hash_ref->$implementor_class::prepare(...)
Driver code gets the inner hash so it has fast access to the hash contents

Watching the DBI in action
DBI has detailed call tracing built-in
The trace can be very helpful in understanding application behavior and for debugging
Shows parameters and results
Trace information can be written to a file
Not used often enough!

Our program for today...
 1: #!/usr/bin/perl -w
 2:
 3: use DBI;
 4:
 5: $dbh = DBI->connect('', '', '', { RaiseError => 1 });
 6:
 7: $upd = $dbh->prepare("UPDATE prices SET price=? WHERE prod=?");
 8: $ins = $dbh->prepare("INSERT INTO prices (prod,price) VALUES(?,?)");
 9: $rows = $upd->execute(42, "Widgets");
10: $ins->execute("Widgets", 42) if $rows == 0;
11:
12: $dbh->disconnect;

Enabling Tracing
Per handle
$h->trace($level);
$h->trace($level, $filename);
Only effects that handle and any new child handles created from it
Child handles get trace level of parent in effect at time of creation
Global (internal to application)
DBI->trace(...);
Sets effective global default minimum trace level
Global (external to application)
Enabled using DBI_TRACE environment variable
DBI_TRACE=digits   DBI->trace(digits);
DBI_TRACE=filename   DBI->trace(2, filename);
DBI_TRACE=digits=filename   DBI->trace(digits, filename);

Trace level 1
Trace level 1 shows method results and line numbers:
    <- connect= DBI::db=HASH(0xe0abc) at DBI.pm line 356.
    <- STORE('PrintError', 1)= 1 at DBI.pm line 382.
    <- STORE('AutoCommit', 1)= 1 at DBI.pm line 382.
    <- STORE('RaiseError', 1)= 1 at DBI.pm line 382.
    <- prepare('UPDATE …')= DBI::st=HASH(0xe1238) at test.pl line 7.
    <- prepare('INSERT …')= DBI::st=HASH(0xe1504) at test.pl line 8.
    <- execute= '0E0' at test.pl line 9.
    <- execute= 1 at test.pl line 10.
    <- disconnect= 1 at test.pl line 11.
    <- DESTROY= undef
    <- DESTROY= undef
    <- DESTROY= undef
    <- DESTROY= undef

Trace level 2 and above
Trace level 2 shows calls with parameters and more:
    -> connect for DBD::ODBC::dr (DBI::dr=HASH(0x13dfec)~0xe14a4
                                  '' '' **** HASH(0xe0a10))
    <- connect= DBI::db=HASH(0xe0ab0) at DBI.pm line 356.
    -> STORE for DBD::ODBC::db (DBI::db=HASH(0xe0abc)~INNER 'PrintError' 1)
    <- STORE= 1 at DBI.pm line 382.
    -> prepare for DBD::ODBC::db (DBI::db=HASH(0xe0ab0)~0xe0abc
                                  'UPDATE prices SET price=? WHERE prod=?')
    <- prepare= DBI::st=HASH(0xe1274) at test.pl line 7.
Trace level 3 and above shows more processing details

Error Checking and Handling
To err is human, to detect, divine.

The importance of error checking
Errors happen!
Failure happens when you don't expect errors!
database crash / network disconnection
lack of disk space for insert or select (sort space for order by)
server math error on select (divide by zero)
and maybe, just maybe, errors in your own code (Gasp!)
Beat failure by expecting errors!
Detect errors early to limit effects
Defensive Programming, e.g., check assumptions
Through Programming, e.g., check for errors after fetch loops

Error checking - ways and means
Error checking the hard way...
$h->method || die "DBI method failed: $DBI::errstr";
$h->method || die "DBI method failed: $DBI::errstr";
$h->method || die "DBI method failed: $DBI::errstr";
Error checking the easy way...
$h->{RaiseError} = 1;
$h->method;
$h->method;
$h->method;

Handling errors the smart way
For simple applications immediate death on error is fine
For more advanced applications greater control is needed
Life after death:
$h->{RaiseError} = 1;
eval { … $h->method; … };
if ($@) { … }
Bonus prize
Other, non-DBI, code within the eval block may also raise an exception that will be caught and handled cleanly

Transactions
To do or to undo,
that is the question

Transactions - What's it all about?
Far more than just locking
The A.C.I.D. test
True transactions give true safety
even from power failures and system crashes!
Incomplete transactions are automatically rolled-back by the database server when it's restarted.
Also removes burden of undoing incomplete changes
Hard to implement (for the vendor) and has significant performance cost
A very large topic worthy of an entire tutorial

Transactions - Life Preserving
Classic: system crash between one bank account being debited and another being credited.
Dramatic: power failure during update statement on 3 million rows after 2 minutes when only part way through.
Real-world: complex series of inter-related updates, deletes and inserts on many separate tables fails at the last step due to a duplicate primary key on an insert.
Transaction recovery would handle all these situations automatically.
Makes a system far more robust and  trustworthy over the long term.
Use transactions if your database supports them.
If it doesn't and you need them, switch to a different database.

Transactions - How the DBI helps
Tools of the trade:
Set AutoCommit off, and RaiseError on
Wrap eval { … } around the code
Use $dbh->commit; and $dbh->rollback;
Disable AutoCommit via $dbh->{AutoCommit} = 0;
to enable transactions and thus rollback-on-error
Enable RaiseError via $dbh->{RaiseError} = 1;
to automatically 'throw an exception' on error
The surrounding eval { … }
catches the exception and puts the error message into $@
Test $@ and rollback() if set, else commit()

Transactions - Example code
$dbh->{RaiseError} = 1;
eval {
    $dbh->method(…);   # assorted DBI calls
    foo(...);          # application code
    $dbh->commit;      # commit the changes
};
if ($@) {
    warn "Transaction aborted because $@";
    $dbh->rollback;
    ...
}

Transactions - Further comments
The eval { … } catches all exceptions
not just from DBI calls. Also catches fatal runtime errors from Perl.
Put commit() inside the eval
ensures commit failure is caught cleanly.
remember that commit() itself may fail for many reasons.
Don't forget that rollback() may also fail
due to database crash or network failure etc.
Other points:
Always explicitly commit or rollback before disconnect().
Destroying a connected $dbh should always rollback.
END blocks can catch exit-without-disconnect to rollback and disconnect cleanly.

Speed    Speed    Speed!
What helps,what doesn't

Give me speed!
DBI was designed for speed from day one
DBI dispatch written in hand-crafted XS/C
Dispatch to XS driver method calls optimized
Cached attributes returned directly by DBI dispatch
DBI overhead is generally insignificant
So we'll talk about other speed issues instead ...

Partition for speed
Application partitioning
stop and think - work smarter not harder
Pick the right database for the job, if you have the choice
Work close to the data
Use stored procedures where appropriate
Moving data to/from the client is always expensive
think of latency as well as bandwidth
Use proprietary bulk-load where appropriate
Consider database maths vs. Perl maths
Multiple simple queries with 'joins' in Perl may be faster
Consider local caching, in memory or DBM file etc.
Mix 'n Match techniques as needed

Prepare for speed
prepare() - what happens in the server
Receive and parse the SQL statement into internal form
Get details for all the selected tables
Check access rights for each
Get details for all the selected fields
Check data types in expressions
Get details for all the indices of  the tables
Develop an optimised query 'access plan' for best execution
Return a handle for all this cached information
This can be an expensive process - especially the 'access plan'
Some databases, like MySQL, don't cache the information but have simpler, and thus faster, plan creation

The Best Laid Plans
Query optimisation is hard
Intelligent high quality cost based query optimisation is really hard!
Know your optimiser
Oracle, Informix, Sybase, DB2, SQL Server etc. all slightly different.
Check what it's doing
Use tools to see the plans used for your queries - very helpful
Help it along
Most 'big name' databases have a mechanism to analyse and store the key distributions of indices to help the optimiser make good plans
beware: keep it fresh, old key distributions might be worse than none
Some also allow you to embed 'hints' into the SQL as comments
beware: take it easy, over hinting hinders dynamic optimisation

Respect your server's SQL cache
Optimised Access Plan etc. is cached within the server
keyed by the exact original SQL string used
Without placeholders, SQL string varies each time
so cached one is not reused
and time is wasted creating a new access plan
the new statement and access plan are added to cache
so the cache fills and other statements get pushed out
Compare do("insert … $id");
            with do("insert … ?", $id);

Hot handles
Use prepare() to get a handle on the statement in the SQL cache and keep it there.
The handle for the pre-prepared statement avoids a round-trip to server for cache-check / parsing etc.
Avoid using $dbh->do(…) in a loop.
Use $sth = $dbh->prepare(…)
    and $sth->execute() instead.
For example, convert looped do("insert … ?", $id)
    to prepare("insert … ?")
    and looped execute($id).

Let the DBI cache your handles
Sometimes it's not easy to hold all your handles
e.g., library code to lookup values from the database given a $dbh
The prepare_cached() method gives you a client side statement handle cache:
  while ( ($field, $value) = each %search_fields ) {
      push @sql,   "$field = ?";
      push @values, $value;
  }
  $where = "";
  $where = "where ".join(" and ", @sql) if @sql;
  $sth = $dbh->prepare_cached("select * from table $where");
  $sth->execute(@values);
Can avoid the need for global statement handle variables

Sling less for speed
while(@row = $sth->fetchrow_array) {  }
one field:   3,100 fetches per cpu second
ten fields:  1,000 fetches per cpu second
while($row = $sth->fetchrow_arrayref) {  }
one field:   5,300 fetches per cpu second
ten fields:  4,000 fetches per cpu second
Notes:
Timings made on an old SPARC 10 using DBD::Oracle
Timings assume instant record fetch within driver
Fields all one char. @row would be even slower for more/bigger fields
Use bind_columns() for direct access to fetched fields

Keep a handle on your databases
Connecting to a database can be slow
Try to connect once and stay connected where practical
We'll discuss web server issues later
New method:
The new connect_cached() method acts like prepare_cached() but for database handles
It also checks the connection and reconnects if it's broken
Similar, but not quite the same as Apache::DBI
New and subject to change with experience
Potentially most useful with DBD::Proxy & DBI::ProxyServer

Portability
A Holy Grail
To be taken with a pinch of salt

Portability in practice
Portability requires care and testing - it can be tricky
Platform Portability - the easier bit
Availability of database client software (and server if required)
Availability of DBD driver
DBD::Proxy can address both these issues - see later
Database Portability - more tricky
Differences in SQL dialects cause most problems
Differences in data types can also be a problem
Driver capabilities (placeholders etc)
DBIx::Compat module (in DBIx::RecordSet) may be useful.
A standard DBI test suite is needed (and planned).

The Power of the Proxy
Thin clients ...
and other buzz words

DBD::Proxy & DBI::ProxyServer
Networking for Non-networked databases
DBD::Proxy driver forwards calls over network to remote DBI::ProxyServer
No changes in application behavior
Only the DBI->connect statement needs to be changed
Proxy can be made completely transparent by setting the DBI_AUTOPROXY environment variable
So not even the DBI->connect statement needs to be changed!
DBI::ProxyServer works on Win32
Access to Access and other Win32 ODBC and ADO data sources
Developed by Jochen Wiedmann

A proxy picture

Thin clients and other buzz words
Proxying for remote access: "thin-client"
No need for database client code on the DBI client
Proxying for network security: "encryption"
Can use Crypt::IDEA, Crypt::DES etc.
Proxying for "access control" and "firewalls"
handy for web servers
Proxying for action control
e.g., only allow specific select or insert statements
Proxying for local access: "n-tier middleware"
 connect_cached(),  memory use,  licensing limits
Proxying for performance: "compression"
Can compress data transfers using Compress::Zlib

DBI for the Web
Hand waving from 30,000 feet

Web DBI - Connect speed
Databases can be slow to connect
Traditional CGI forces a new connect per request
Move Perl and DBI into the web server
Apache with mod_perl and Apache::DBI module
Microsoft IIS with ActiveState's PerlEx
Connections can then persist between requests
Apache::DBI automatically used by DBI if loaded
No CGI script changes required to get persistence
Take care not to change the shared session behaviour
Leave the $dbh in the same state you found it!
Other alternatives include
FastCGI, CGI::SpeedyCGI and CGI::MiniSvr

Web DBI - Too many connections
Limits on Database Connections
Database platform resources or licensing
Memory consumption of web servers
Partition web servers into General and Database
Direct database work to the Database web servers
Use Reverse Proxy / Redirect / Rewrite to achieve this
Allows each subset of servers to be tuned to best fit workload
And be run on appropriate platforms

Web DBI - State-less-ness
No fixed client-server pair
Each request can be handled by a different process.
So can't simply stop fetching when one page is complete and continue fetching from the same $sth when the next page is requested.
And transactions can't span requests. Even if they could you'd have problems with database locks being held etc.
Need access to 'accumulated state' somehow:
via the client (e.g., hidden form fields)
via the database (records in a session_state table)
via the web server file system (DBM files etc)
Need to purge old state info if stored on server, so timestamp it
See Apache::Session module
DBI::ProxyServer + connect_cached with session ID may suit

Web DBI - Browsing pages of results
Re-execute query each time then count/discard (simple but expensive)
works well for small results sets or where users rarely view many pages
fast initial response, degrades gradually for later pages
count/discard in server is better but still inefficient for large result sets
affected by inserts and deletes from other processes
Re-execute query with where clause using min/max keys from last results
works well where original query can be qualified in that way, not common
Select and cache full result rows somewhere for fast access
can be expensive for large result sets with big fields
Select and cache only the row keys, fetch full rows as needed
optimisation of above, use ROWID if supported, "select … where … in (…)"
If data is static and queries predictable then custom pre-built indexes may be useful
The cache can be on web server, using DBM file for example, or on database server using a table keyed by session id

Web DBI - Concurrent editing
How to prevent updates overwriting each other?
You can use Optimistic Locking via 'qualified update':
update table set ...=...
where  key = $old_key
and field1 = $old_field1
and field2 = $old_field2 and … for all fields
Potential problems with floating point data values not matching
Some databases support a high-resolution 'update timestamp' field that can be checked instead
Check the update row count
If it's zero then you know the record has been changed or deleted by another process

Web DBI - Tips for the novice
Test one step at a time
Test perl + DBI + DBD driver outside the web server first
Test web server + non-DBI CGI next
Remember that CGI scripts run as a different user with a different environment - expect to be tripped up by that
DBI trace is your friend - use it
Use the Perl "-w" and "-T" options. Always "use strict;"
Read and inwardly digest the WWW Security FAQ:
http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html
Read the Perl CGI FAQ:
http://www.perl.com/cgi-bin/pace/pub/doc/FAQs/cgi/perl-cgi-faq.html
And, if appropriate, read the mod_perl information available from:
http://perl.apache.org

Handling LONG/BLOB data
What makes LONG / BLOB / MEMO data special?
Not practical to pre-allocate fixed size buffers for worst case
Fetching LONGs - treat as normal fields after setting:
$dbh->{LongReadLen} - buffer size to allocate for expected data
$dbh->{LongTruncOk} - should truncating-to-fit be allowed
Inserting LONGs
The limitations of string literals
The benefits of placeholders
Chunking / Piecewise processing not supported
So you're limited to available memory
Some drivers support an unofficial blob_read() method

DBI security tainting
By default the DBI ignores Perl tainting
doesn't taint returned data
doesn't check that parameters are not tainted
The Taint attribute enables that behaviour
If Perl itself is in taint mode
Each handle has it's own inherited Taint attribute
So can be enabled for a connection and disabled for particular statements, for example:
$dbh = DBI->connect(…, { Taint => 1 });
$sth = $dbh->prepare("select * from safe_table");
$sth->{Taint} = 0;  # no tainting on this handle

Flexing the Multiplex
A new kid on the block

DBD::Multiplex
DBD::Multiplex - new and experimental
Connects to multiple databases at once (via DBI)
Executes statements on all of them by default
Could be configured to:
insert into all databases but select from one
fallback to alternate database if primary is unavailable
select from all and check results (pick most common)
select round-robin / or at random to distribute load
Can be used with DBD::Proxy, either above or below
May also acquire fancy caching in later versions
Watch this space!

Reference Materials
http://www.symbolstone.org/technology/perl/DBI
The DBI Home Page
http://www.isc.org/dbi-lists.html
To subscribe to the DBI Mailing Lists
http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk1_1997.tar.gz
My DBI session at The Perl Conference 1.0 - general introduction
http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk2_1998.tar.gz
My DBI session at The Perl Conference 2.0 - more depth
http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk3_1999.tar.gz
This DBI tutorial at The Perl Conference 3.0!
http://www.oreilly.com/catalog/perldbi/
or http://www.amazon.com/exec/obidos/ASIN/1565926994/dbi
The forthcoming DBI book! Due late 1999 or early 2000.
(Order in advance at the conference to get special discounted rate :-)

The end.
Till next year...