Advanced DBI
|
|
|
Perl Conference 3.0 Tutorial |
|
August 1999 |
|
|
|
Tim Bunce |
|
|
Topical Topics
|
|
|
Wheels within Wheels |
|
Error Checking and Handling |
|
Transactions |
|
Speed Speed Speed! |
|
Portability |
|
The Power of the Proxy |
|
DBI for the Web |
|
Handling LONG/BLOB data |
|
Tainting |
|
Flexing the Multiplex |
Wheels within Wheels
|
|
|
The DBI architecture |
|
and how to watch it at work |
Under the hood
|
|
|
DBI defines and implements an interface |
|
Driver modules do much of the real work |
|
DBI provides default methods,
functions, tools etc for drivers |
|
Not limited to the lowest common
denominator - mechanism provided for
driver specific extensions |
|
Designed and built for speed |
|
Valuable detailed call
tracing/debugging built-in |
A picture is worth?
Setting the scene
|
|
|
|
Inner and outer worlds |
|
Application and Drivers |
|
|
|
Inner and outer handles |
|
DBI handles are references to tied
hashes |
|
|
|
The DBI Method Dispatcher |
|
gateway between the worlds and the
heart of the DBI |
|
|
|
|
|
Now we'll go all deep and visual for
a while... |
Architecture of the DBI
classes #1
Architecture of the DBI
classes #2
Anatomy of a DBI handle
Method call walk-through
|
|
|
Consider a simple prepare call:
$dbh->prepare(...) |
|
$dbh is reference to a DBI::db object
(regardless of driver) |
|
DBI::db::prepare is an alias for DBI
dispatch method |
|
Dispatch calls drivers prepare method something
like this: |
|
my $inner_hash_ref =
#
from tie magic |
|
my $implementor_class =
#
from DBI magic data |
|
$inner_hash_ref->$implementor_class::prepare(...) |
|
Driver code gets the inner hash so it
has fast access to the hash contents |
Watching the DBI in action
|
|
|
DBI has detailed call tracing built-in |
|
The trace can be very helpful in
understanding application behavior and for debugging |
|
Shows parameters and results |
|
Trace information can be written to a
file |
|
Not used often enough! |
Our program for today...
|
|
|
1: #!/usr/bin/perl -w |
|
2: |
|
3: use DBI; |
|
4: |
|
5: $dbh = DBI->connect('', '', '', { RaiseError => 1 }); |
|
6: |
|
7: $upd = $dbh->prepare("UPDATE prices SET price=? WHERE
prod=?"); |
|
8: $ins = $dbh->prepare("INSERT INTO prices (prod,price)
VALUES(?,?)"); |
|
9: $rows = $upd->execute(42, "Widgets"); |
|
10:
$ins->execute("Widgets", 42) if $rows == 0; |
|
11: |
|
12: $dbh->disconnect; |
Enabling Tracing
|
|
|
|
|
Per handle |
|
$h->trace($level); |
|
$h->trace($level, $filename); |
|
Only effects that handle and any new
child handles created from it |
|
Child handles get trace level of parent
in effect at time of creation |
|
Global (internal to application) |
|
DBI->trace(...); |
|
Sets effective global default minimum
trace level |
|
Global (external to application) |
|
Enabled using DBI_TRACE environment
variable |
|
DBI_TRACE=digits DBI->trace(digits); |
|
DBI_TRACE=filename DBI->trace(2, filename); |
|
DBI_TRACE=digits=filename DBI->trace(digits, filename); |
Trace level 1
|
|
|
Trace level 1 shows method results and
line numbers: |
|
|
|
<- connect= DBI::db=HASH(0xe0abc) at DBI.pm line 356. |
|
<- STORE('PrintError', 1)= 1 at DBI.pm line 382. |
|
<- STORE('AutoCommit', 1)= 1 at DBI.pm line 382. |
|
<- STORE('RaiseError', 1)= 1 at DBI.pm line 382. |
|
|
|
<- prepare('UPDATE
')= DBI::st=HASH(0xe1238) at test.pl line 7. |
|
<- prepare('INSERT
')= DBI::st=HASH(0xe1504) at test.pl line 8. |
|
<- execute= '0E0' at test.pl line 9. |
|
<- execute= 1 at test.pl line 10. |
|
<- disconnect= 1 at test.pl line 11. |
|
|
|
<- DESTROY= undef |
|
<- DESTROY= undef |
|
<- DESTROY= undef |
|
<- DESTROY= undef |
Trace level 2 and above
|
|
|
Trace level 2 shows calls with
parameters and more: |
|
|
|
-> connect for DBD::ODBC::dr (DBI::dr=HASH(0x13dfec)~0xe14a4 |
|
'' '' **** HASH(0xe0a10)) |
|
<- connect= DBI::db=HASH(0xe0ab0) at DBI.pm line 356. |
|
|
|
-> STORE for DBD::ODBC::db (DBI::db=HASH(0xe0abc)~INNER
'PrintError' 1) |
|
<- STORE= 1 at DBI.pm line 382. |
|
|
|
-> prepare for DBD::ODBC::db (DBI::db=HASH(0xe0ab0)~0xe0abc |
|
'UPDATE prices SET price=?
WHERE prod=?') |
|
<- prepare= DBI::st=HASH(0xe1274) at test.pl line 7. |
|
|
|
Trace level 3 and above shows more
processing details |
|
|
Error Checking and Handling
|
|
|
To err is human, to detect, divine. |
The importance of error
checking
|
|
|
|
|
Errors happen! |
|
Failure happens when you don't expect
errors! |
|
database crash / network disconnection |
|
lack of disk space for insert or select
(sort space for order by) |
|
server math error on select (divide by
zero) |
|
and maybe, just maybe, errors in your
own code (Gasp!) |
|
|
|
Beat failure by expecting errors! |
|
|
|
Detect errors early to limit effects |
|
Defensive Programming, e.g., check
assumptions |
|
Through Programming, e.g., check for
errors after fetch loops |
Error checking - ways and
means
|
|
|
|
|
Error checking the hard way... |
|
|
|
$h->method || die "DBI method
failed: $DBI::errstr"; |
|
$h->method || die "DBI method
failed: $DBI::errstr"; |
|
$h->method || die "DBI method
failed: $DBI::errstr"; |
|
|
|
Error checking the easy way... |
|
|
|
$h->{RaiseError} = 1; |
|
$h->method; |
|
$h->method; |
|
$h->method; |
Handling errors the smart
way
|
|
|
|
|
|
For simple applications immediate death
on error is fine |
|
|
|
For more advanced applications greater
control is needed |
|
|
|
Life after death: |
|
$h->{RaiseError} = 1; |
|
eval {
$h->method;
}; |
|
if ($@) {
} |
|
|
|
Bonus prize |
|
Other, non-DBI, code within the eval
block may also raise an exception that will be caught and handled cleanly |
Transactions
|
|
|
To do or to undo, |
|
that is the question |
Transactions - What's it all
about?
|
|
|
|
|
Far more than just locking |
|
The A.C.I.D. test |
|
True transactions give true safety |
|
even from power failures and system
crashes! |
|
Incomplete transactions are
automatically rolled-back by the database server when it's restarted. |
|
Also removes burden of undoing
incomplete changes |
|
Hard to implement (for the vendor) and
has significant performance cost |
|
A very large topic worthy of an entire
tutorial |
Transactions - Life
Preserving
|
|
|
Classic: system crash between one bank
account being debited and another being credited. |
|
Dramatic: power failure during update
statement on 3 million rows after 2 minutes when only part way through. |
|
Real-world: complex series of
inter-related updates, deletes and inserts on many separate tables fails at
the last step due to a duplicate primary key on an insert. |
|
|
|
Transaction recovery would handle all
these situations automatically. |
|
Makes a system far more robust and trustworthy over the long term. |
|
Use transactions if your database
supports them. |
|
If it doesn't and you need them, switch
to a different database. |
Transactions - How the DBI
helps
|
|
|
|
|
Tools of the trade: |
|
Set AutoCommit off, and RaiseError on |
|
Wrap eval {
} around the code |
|
Use $dbh->commit; and $dbh->rollback; |
|
Disable AutoCommit via $dbh->{AutoCommit}
= 0; |
|
to enable transactions and thus
rollback-on-error |
|
Enable RaiseError via $dbh->{RaiseError}
= 1; |
|
to automatically 'throw an exception'
on error |
|
The surrounding eval {
} |
|
catches the exception and puts the
error message into $@ |
|
Test $@ and rollback() if set, else commit() |
Transactions - Example code
|
|
|
|
|
|
$dbh->{RaiseError} = 1; |
|
eval { |
|
$dbh->method(
); #
assorted DBI calls |
|
foo(...); #
application code |
|
$dbh->commit; # commit
the changes |
|
}; |
|
if ($@) { |
|
warn "Transaction aborted because $@"; |
|
$dbh->rollback; |
|
... |
|
} |
Transactions - Further
comments
|
|
|
|
|
The eval {
} catches all exceptions |
|
not just from DBI calls. Also catches
fatal runtime errors from Perl. |
|
Put commit() inside the eval |
|
ensures commit failure is caught
cleanly. |
|
remember that commit() itself may fail
for many reasons. |
|
Don't forget that rollback() may also
fail |
|
due to database crash or network
failure etc. |
|
Other points: |
|
Always explicitly commit or rollback
before disconnect(). |
|
Destroying a connected $dbh should
always rollback. |
|
END blocks can catch
exit-without-disconnect to rollback and disconnect cleanly. |
Speed Speed
Speed!
Give me speed!
|
|
|
|
|
DBI was designed for speed from day one |
|
DBI dispatch written in hand-crafted
XS/C |
|
Dispatch to XS driver method calls
optimized |
|
Cached attributes returned directly by
DBI dispatch |
|
DBI overhead is generally insignificant |
|
|
|
|
|
|
|
|
|
So we'll talk about other speed issues
instead ... |
Partition for speed
|
|
|
|
|
Application partitioning |
|
stop and think - work smarter not
harder |
|
Pick the right database for the job, if
you have the choice |
|
Work close to the data |
|
Use stored procedures where appropriate |
|
Moving data to/from the client is
always expensive |
|
think of latency as well as bandwidth |
|
Use proprietary bulk-load where
appropriate |
|
Consider database maths vs. Perl maths |
|
Multiple simple queries with 'joins' in
Perl may be faster |
|
Consider local caching, in memory or
DBM file etc. |
|
Mix 'n Match techniques as needed |
Prepare for speed
|
|
|
|
|
prepare() - what happens in the server |
|
Receive and parse the SQL statement
into internal form |
|
Get details for all the selected tables |
|
Check access rights for each |
|
Get details for all the selected fields |
|
Check data types in expressions |
|
Get details for all the indices of the tables |
|
Develop an optimised query 'access
plan' for best execution |
|
Return a handle for all this cached
information |
|
|
|
This can be an expensive process -
especially the 'access plan' |
|
Some databases, like MySQL, don't cache
the information but have simpler, and thus faster, plan creation |
The Best Laid Plans
|
|
|
|
|
Query optimisation is hard |
|
Intelligent high quality cost based
query optimisation is really hard! |
|
Know your optimiser |
|
Oracle, Informix, Sybase, DB2, SQL
Server etc. all slightly different. |
|
Check what it's doing |
|
Use tools to see the plans used for
your queries - very helpful |
|
Help it along |
|
Most 'big name' databases have a
mechanism to analyse and store the key distributions of indices to help the
optimiser make good plans |
|
beware: keep it fresh, old key
distributions might be worse than none |
|
Some also allow you to embed 'hints'
into the SQL as comments |
|
beware: take it easy, over hinting
hinders dynamic optimisation |
Respect your server's SQL
cache
|
|
|
|
Optimised Access Plan etc. is cached
within the server |
|
keyed by the exact original SQL string
used |
|
|
|
Without placeholders, SQL string varies
each time |
|
so cached one is not reused |
|
and time is wasted creating a new
access plan |
|
the new statement and access plan are
added to cache |
|
so the cache fills and other statements
get pushed out |
|
|
|
Compare do("insert
$id"); |
|
with do("insert
?", $id); |
Hot handles
|
|
|
Use prepare() to get a handle on the
statement in the SQL cache and keep it there. |
|
The handle for the pre-prepared
statement avoids a round-trip to server for cache-check / parsing etc. |
|
Avoid using $dbh->do(
) in a loop. |
|
Use $sth = $dbh->prepare(
) |
|
and $sth->execute() instead. |
|
For example, convert looped do("insert
?", $id) |
|
to prepare("insert
?") |
|
and looped execute($id). |
Let the DBI cache your
handles
|
|
|
|
|
Sometimes it's not easy to hold all
your handles |
|
e.g., library code to lookup values
from the database given a $dbh |
|
The prepare_cached() method gives you a
client side statement handle cache: |
|
while ( ($field, $value) = each %search_fields ) { |
|
push @sql, "$field =
?"; |
|
push @values, $value; |
|
} |
|
$where = ""; |
|
$where = "where ".join(" and ", @sql) if @sql; |
|
$sth = $dbh->prepare_cached("select * from table
$where"); |
|
$sth->execute(@values); |
|
Can avoid the need for global statement
handle variables |
Sling less for speed
|
|
|
|
while(@row = $sth->fetchrow_array)
{ } |
|
one field: 3,100 fetches per cpu second |
|
ten fields: 1,000 fetches per cpu second |
|
|
|
while($row =
$sth->fetchrow_arrayref) { } |
|
one field: 5,300 fetches per cpu second |
|
ten fields: 4,000 fetches per cpu second |
|
|
|
Notes: |
|
Timings made on an old SPARC 10 using
DBD::Oracle |
|
Timings assume instant record fetch
within driver |
|
Fields all one char. @row would be even
slower for more/bigger fields |
|
Use bind_columns() for direct access to
fetched fields |
Keep a handle on your
databases
|
|
|
|
Connecting to a database can be slow |
|
Try to connect once and stay connected
where practical |
|
We'll discuss web server issues later |
|
|
|
New method: |
|
The new connect_cached() method acts
like prepare_cached() but for database handles |
|
It also checks the connection and
reconnects if it's broken |
|
Similar, but not quite the same as
Apache::DBI |
|
New and subject to change with
experience |
|
Potentially most useful with DBD::Proxy
& DBI::ProxyServer |
Portability
|
|
|
A Holy Grail |
|
To be taken with a pinch of salt |
Portability in practice
|
|
|
|
Portability requires care and testing -
it can be tricky |
|
Platform Portability - the easier bit |
|
Availability of database client
software (and server if required) |
|
Availability of DBD driver |
|
DBD::Proxy can address both these
issues - see later |
|
Database Portability - more tricky |
|
Differences in SQL dialects cause most
problems |
|
Differences in data types can also be a
problem |
|
Driver capabilities (placeholders etc) |
|
DBIx::Compat module (in
DBIx::RecordSet) may be useful. |
|
A standard DBI test suite is needed
(and planned). |
The Power of the Proxy
|
|
|
Thin clients ... |
|
and other buzz words |
DBD::Proxy &
DBI::ProxyServer
|
|
|
|
Networking for Non-networked databases |
|
DBD::Proxy driver forwards calls over
network to remote DBI::ProxyServer |
|
No changes in application behavior |
|
Only the DBI->connect statement
needs to be changed |
|
Proxy can be made completely
transparent by setting the DBI_AUTOPROXY environment variable |
|
So not even the DBI->connect
statement needs to be changed! |
|
DBI::ProxyServer works on Win32 |
|
Access to Access and other Win32 ODBC
and ADO data sources |
|
Developed by Jochen Wiedmann |
A proxy picture
Thin clients and other buzz
words
|
|
|
|
|
Proxying for remote access:
"thin-client" |
|
No need for database client code on the
DBI client |
|
Proxying for network security:
"encryption" |
|
Can use Crypt::IDEA, Crypt::DES etc. |
|
Proxying for "access control"
and "firewalls" |
|
handy for web servers |
|
Proxying for action control |
|
e.g., only allow specific select or
insert statements |
|
Proxying for local access: "n-tier
middleware" |
|
connect_cached(), memory
use, licensing limits |
|
Proxying for performance:
"compression" |
|
Can compress data transfers using
Compress::Zlib |
|
|
DBI for the Web
|
|
|
Hand waving from 30,000 feet |
Web DBI - Connect speed
|
|
|
|
|
Databases can be slow to connect |
|
Traditional CGI forces a new connect
per request |
|
Move Perl and DBI into the web server |
|
Apache with mod_perl and Apache::DBI
module |
|
Microsoft IIS with ActiveState's PerlEx |
|
Connections can then persist between
requests |
|
Apache::DBI automatically used by DBI
if loaded |
|
No CGI script changes required to get
persistence |
|
Take care not to change the shared
session behaviour |
|
Leave the $dbh in the same state you
found it! |
|
Other alternatives include |
|
FastCGI, CGI::SpeedyCGI and
CGI::MiniSvr |
Web DBI - Too many
connections
|
|
|
|
Limits on Database Connections |
|
Database platform resources or
licensing |
|
Memory consumption of web servers |
|
|
|
Partition web servers into General and
Database |
|
|
|
Direct database work to the Database
web servers |
|
Use Reverse Proxy / Redirect / Rewrite
to achieve this |
|
Allows each subset of servers to be
tuned to best fit workload |
|
And be run on appropriate platforms |
Web DBI - State-less-ness
|
|
|
|
|
|
No fixed client-server pair |
|
Each request can be handled by a
different process. |
|
So can't simply stop fetching when one
page is complete and continue fetching from the same $sth when the next page
is requested. |
|
And transactions can't span requests.
Even if they could you'd have problems with database locks being held etc. |
|
Need access to 'accumulated state'
somehow: |
|
via the client (e.g., hidden form
fields) |
|
via the database (records in a
session_state table) |
|
via the web server file system (DBM
files etc) |
|
Need to purge old state info if stored
on server, so timestamp it |
|
See Apache::Session module |
|
DBI::ProxyServer + connect_cached with
session ID may suit |
Web DBI - Browsing pages of
results
|
|
|
|
Re-execute query each time then
count/discard (simple but expensive) |
|
works well for small results sets or
where users rarely view many pages |
|
fast initial response, degrades
gradually for later pages |
|
count/discard in server is better but
still inefficient for large result sets |
|
affected by inserts and deletes from
other processes |
|
Re-execute query with where clause
using min/max keys from last results |
|
works well where original query can be
qualified in that way, not common |
|
Select and cache full result rows
somewhere for fast access |
|
can be expensive for large result sets
with big fields |
|
Select and cache only the row keys,
fetch full rows as needed |
|
optimisation of above, use ROWID if
supported, "select
where
in (
)" |
|
If data is static and queries
predictable then custom pre-built indexes may be useful |
|
The cache can be on web server, using
DBM file for example, or on database server using a table keyed by session id |
Web DBI - Concurrent editing
|
|
|
|
|
How to prevent updates overwriting each
other? |
|
You can use Optimistic Locking via
'qualified update': |
|
update table set ...=... |
|
where
key = $old_key |
|
and field1 = $old_field1 |
|
and field2 = $old_field2 and
for all
fields |
|
Potential problems with floating point
data values not matching |
|
Some databases support a
high-resolution 'update timestamp' field that can be checked instead |
|
Check the update row count |
|
If it's zero then you know the record
has been changed or deleted by another process |
Web DBI - Tips for the
novice
|
|
|
|
|
Test one step at a time |
|
Test perl + DBI + DBD driver outside
the web server first |
|
Test web server + non-DBI CGI next |
|
Remember that CGI scripts run as a
different user with a different environment - expect to be tripped up by that |
|
DBI trace is your friend - use it |
|
Use the Perl "-w" and
"-T" options. Always "use strict;" |
|
Read and inwardly digest the WWW
Security FAQ: |
|
http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html |
|
Read the Perl CGI FAQ: |
|
http://www.perl.com/cgi-bin/pace/pub/doc/FAQs/cgi/perl-cgi-faq.html |
|
And, if appropriate, read the mod_perl
information available from: |
|
http://perl.apache.org |
Handling LONG/BLOB data
|
|
|
|
What makes LONG / BLOB / MEMO data
special? |
|
Not practical to pre-allocate fixed
size buffers for worst case |
|
|
|
Fetching LONGs - treat as normal fields
after setting: |
|
$dbh->{LongReadLen} - buffer size to
allocate for expected data |
|
$dbh->{LongTruncOk} - should
truncating-to-fit be allowed |
|
|
|
Inserting LONGs |
|
The limitations of string literals |
|
The benefits of placeholders |
|
|
|
Chunking / Piecewise processing not
supported |
|
So you're limited to available memory |
|
Some drivers support an unofficial blob_read()
method |
DBI security tainting
|
|
|
|
|
By default the DBI ignores Perl
tainting |
|
doesn't taint returned data |
|
doesn't check that parameters are not
tainted |
|
|
|
The Taint attribute enables that
behaviour |
|
If Perl itself is in taint mode |
|
|
|
Each handle has it's own inherited
Taint attribute |
|
So can be enabled for a connection and
disabled for particular statements, for example: |
|
|
|
$dbh = DBI->connect(
, { Taint =>
1 }); |
|
$sth = $dbh->prepare("select *
from safe_table"); |
|
$sth->{Taint} = 0; # no tainting on this handle |
Flexing the Multiplex
DBD::Multiplex
|
|
|
|
DBD::Multiplex - new and experimental |
|
Connects to multiple databases at once
(via DBI) |
|
Executes statements on all of them by
default |
|
Could be configured to: |
|
insert into all databases but select
from one |
|
fallback to alternate database if
primary is unavailable |
|
select from all and check results (pick
most common) |
|
select round-robin / or at random to
distribute load |
|
Can be used with DBD::Proxy, either
above or below |
|
May also acquire fancy caching in later
versions |
|
Watch this space! |
Reference Materials
|
|
|
|
|
http://www.symbolstone.org/technology/perl/DBI |
|
The DBI Home Page |
|
http://www.isc.org/dbi-lists.html |
|
To subscribe to the DBI Mailing Lists |
|
http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk1_1997.tar.gz |
|
My DBI session at The Perl Conference
1.0 - general introduction |
|
http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk2_1998.tar.gz |
|
My DBI session at The Perl Conference
2.0 - more depth |
|
http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk3_1999.tar.gz |
|
This DBI tutorial at The Perl
Conference 3.0! |
|
http://www.oreilly.com/catalog/perldbi/ |
|
or http://www.amazon.com/exec/obidos/ASIN/1565926994/dbi |
|
The forthcoming DBI book! Due late 1999
or early 2000. |
|
(Order in advance at the conference to
get special discounted rate :-) |
The end.