perlfaq8 - System Interaction

This section of the Perl FAQ covers questions involving operating system interaction. This involves interprocess communication (IPC), control over the user-interface (keyboard, screen and pointing devices), and most anything else not related to data manipulation.

Read the FAQs and documentation specific to the port of perl to your operating system (eg, perlvms, perlplan9, ...). These should contain more detailed information on the vagaries of your perl.

How do I find out which operating system I'm running under?

The $^O variable ($OSTYPE if you use English) contains the operating system that your perl binary was built for.

How come exec() doesn't return?

Because that's what it does: it replaces your currently running program with a different one. If you want to keep going (as is probably the case if you're asking this question) use system() instead.

How do I do fancy stuff with the keyboard/screen/mouse?

How you access/control keyboards, screens, and pointing devices (``mice'') is system-dependent. Try the following modules:

Keyboard

    Term::Cap                   Standard perl distribution
    Term::ReadKey               CPAN
    Term::ReadLine::Gnu         CPAN
    Term::ReadLine::Perl        CPAN
    Term::Screen                CPAN

Screen

    Term::Cap                   Standard perl distribution
    Curses                      CPAN
    Term::ANSIColor             CPAN

Mouse

    Tk                          CPAN

How do I ask the user for a password?

(This question has nothing to do with the web. See a different FAQ for that.)

There's an example of this in crypt). First, you put the terminal into ``no echo'' mode, then just read the password normally. You may do this with an old-style ioctl() function, POSIX terminal control (see the POSIX manpage, and Chapter 7 of the Camel), or a call to the stty program, with varying degrees of portability.

You can also do this for most systems using the Term::ReadKey module from CPAN, which is easier to use and in theory more portable.

How do I read and write the serial port?

This depends on which operating system your program is running on. In the case of Unix, the serial ports will be accessible through files in /dev; on other systems, the devices names will doubtless differ. Several problem areas common to all device interaction are the following

lockfiles

Your system may use lockfiles to control multiple access. Make sure you follow the correct protocol. Unpredictable behaviour can result from multiple processes reading from one device.

open mode

If you expect to use both read and write operations on the device, you'll have to open it for update (see open for details). You may wish to open it without running the risk of blocking by using sysopen() and O_RDWR|O_NDELAY|O_NOCTTY from the Fcntl module (part of the standard perl distribution). See sysopen for more on this approach.

end of line

Some devices will be expecting a ``\r'' at the end of each line rather than a ``\n''. In some ports of perl, ``\r'' and ``\n'' are different from their usual (Unix) ASCII values of ``\012'' and ``\015''. You may have to give the numeric values you want directly, using octal (``\015''), hex (``0x0D''), or as a control-character specification (``\cM'').

    print DEV "atv1\012";       # wrong, for some devices
    print DEV "atv1\015";       # right, for some devices

Even though with normal text files, a ``\n'' will do the trick, there is still no unified scheme for terminating a line that is portable between Unix, DOS/Win, and Macintosh, except to terminate ALL line ends with ``\015\012'', and strip what you don't need from the output. This applies especially to socket I/O and autoflushing, discussed next.

flushing output

If you expect characters to get to your device when you print() them, you'll want to autoflush that filehandle, as in the older

    use FileHandle;
    DEV->autoflush(1);

and the newer

    use IO::Handle;
    DEV->autoflush(1);

You can use select() and the $| variable to control autoflushing (see $| and select):

    $oldh = select(DEV);
    $| = 1;
    select($oldh);

You'll also see code that does this without a temporary variable, as in

    select((select(DEV), $| = 1)[0]);

As mentioned in the previous item, this still doesn't work when using socket I/O between Unix and Macintosh. You'll need to hardcode your line terminators, in that case.

non-blocking input

If you are doing a blocking read() or sysread(), you'll have to arrange for an alarm handler to provide a timeout (see alarm). If you have a non-blocking open, you'll likely have a non-blocking read, which means you may have to use a 4-arg select() to determine whether I/O is ready on that device (see select.

How do I decode encrypted password files?

You spend lots and lots of money on dedicated hardware, but this is bound to get you talked about.

Seriously, you can't if they are Unix password files - the Unix password system employs one-way encryption. Programs like Crack can forcibly (and intelligently) try to guess passwords, but don't (can't) guarantee quick success.

If you're worried about users selecting bad passwords, you should proactively check when they try to change their password (by modifying passwd(1), for example).

How do I start a process in the background?

You could use

    system("cmd &")

or you could use fork as documented in fork, with further examples in the perlipc manpage. Some things to be aware of, if you're on a Unix-like system:

STDIN, STDOUT and STDERR are shared

Both the main process and the backgrounded one (the ``child'' process) share the same STDIN, STDOUT and STDERR filehandles. If both try to access them at once, strange things can happen. You may want to close or reopen these for the child. You can get around this with opening a pipe (see open) but on some systems this means that the child process cannot outlive the parent.

Signals

You'll have to catch the SIGCHLD signal, and possibly SIGPIPE too. SIGCHLD is sent when the backgrounded process finishes. SIGPIPE is sent when you write to a filehandle whose child process has closed (an untrapped SIGPIPE can cause your program to silently die). This is not an issue with system("cmd&").

Zombies

You have to be prepared to ``reap'' the child process when it finishes

    $SIG{CHLD} = sub { wait };

See Signals for other examples of code to do this. Zombies are not an issue with system("prog &").

How do I trap control characters/signals?

You don't actually ``trap'' a control character. Instead, that character generates a signal, which you then trap. Signals are documented in Signals and chapter 6 of the Camel.

Be warned that very few C libraries are re-entrant. Therefore, if you attempt to print() in a handler that got invoked during another stdio operation your internal structures will likely be in an inconsistent state, and your program will dump core. You can sometimes avoid this by using syswrite() instead of print().

Unless you're exceedingly careful, the only safe things to do inside a signal handler are: set a variable and exit. And in the first case, you should only set a variable in such a way that malloc() is not called (eg, by setting a variable that already has a value).

For example:

    $Interrupted = 0;   # to ensure it has a value
    $SIG{INT} = sub {
        $Interrupted++;
        syswrite(STDERR, "ouch\n", 5);
    }

However, because syscalls restart by default, you'll find that if you're in a ``slow'' call, such as <FH>, read(), connect(), or wait(), that the only way to terminate them is by ``longjumping'' out; that is, by raising an exception. See the time-out handler for a blocking flock() in Signals or chapter 6 of the Camel.

How do I modify the shadow password file on a Unix system?

If perl was installed correctly, the getpw*() functions described in the perlfunc manpage provide (read-only) access to the shadow password file. To change the file, make a new shadow password file (the format varies from system to system - see passwd(5) for specifics) and use pwd_mkdb(8) to install it (see pwd_mkdb(5) for more details).

How do I set the time and date?

Assuming you're running under sufficient permissions, you should be able to set the system-wide date and time by running the date(1) program. (There is no way to set the time and date on a per-process basis.) This mechanism will work for Unix, MS-DOS, Windows, and NT; the VMS equivalent is set time.

However, if all you want to do is change your timezone, you can probably get away with setting an environment variable:

    $ENV{TZ} = "MST7MDT";                  # unixish
    $ENV{'SYS$TIMEZONE_DIFFERENTIAL'}="-5" # vms
    system "trn comp.lang.perl";

How can I sleep() or alarm() for under a second?

If you want finer granularity than the 1 second that the sleep() function provides, the easiest way is to use the select() function as documented in select. If your system has itimers and syscall() support, you can check out the old example in http://www.perl.com/CPAN/doc/misc/ancient/tutorial/eg/itimers.pl .

How can I measure time under a second?

In general, you may not be able to. The Time::HiRes module (available from CPAN) provides this functionality for some systems.

In general, you may not be able to. But if you system supports both the syscall() function in Perl as well as a system call like gettimeofday(2), then you may be able to do something like this:

    require 'sys/syscall.ph';

    $TIMEVAL_T = "LL";

    $done = $start = pack($TIMEVAL_T, ());

    syscall( &SYS_gettimeofday, $start, 0)) != -1
               or die "gettimeofday: $!";

       ##########################
       # DO YOUR OPERATION HERE #
       ##########################

    syscall( &SYS_gettimeofday, $done, 0) != -1
           or die "gettimeofday: $!";

    @start = unpack($TIMEVAL_T, $start);
    @done  = unpack($TIMEVAL_T, $done);

    # fix microseconds
    for ($done[1], $start[1]) { $_ /= 1_000_000 }

    $delta_time = sprintf "%.4f", ($done[0]  + $done[1]  )
                                            -
                                 ($start[0] + $start[1] );

How can I do an atexit() or setjmp()/longjmp()? (Exception handling)

Release 5 of Perl added the END block, which can be used to simulate atexit(). Each package's END block is called when the program or thread ends (see the perlmod manpage manpage for more details). It isn't called when untrapped signals kill the program, though, so if you use END blocks you should also use

        use sigtrap qw(die normal-signals);

Perl's exception-handling mechanism is its eval() operator. You can use eval() as setjmp and die() as longjmp. For details of this, see the section on signals, especially the time-out handler for a blocking flock() in Signals and chapter 6 of the Camel.

If exception handling is all you're interested in, try the exceptions.pl library (part of the standard perl distribution).

If you want the atexit() syntax (and an rmexit() as well), try the AtExit module available from CPAN.

Why doesn't my sockets program work under System V (Solaris)? What does the error message "Protocol not supported" mean?

Some Sys-V based systems, notably Solaris 2.X, redefined some of the standard socket constants. Since these were constant across all architectures, they were often hardwired into perl code. The proper way to deal with this is to ``use Socket'' to get the correct values.

Note that even though SunOS and Solaris are binary compatible, these values are different. Go figure.

How can I call my system's unique C functions from Perl?

In most cases, you write an external module to do it - see the answer to ``Where can I learn about linking C with Perl? [h2xs, xsubpp]''. However, if the function is a system call, and your system supports syscall(), you can use the syscall function (documented in the perlfunc manpage).

Remember to check the modules that came with your distribution, and CPAN as well - someone may already have written a module to do it.

Where do I get the include files to do ioctl() or syscall()?

Historically, these would be generated by the h2ph tool, part of the standard perl distribution. This program converts cpp(1) directives in C header files to files containing subroutine definitions, like &SYS_getitimer, which you can use as arguments to your functions. It doesn't work perfectly, but it usually gets most of the job done. Simple files like errno.h, syscall.h, and socket.h were fine, but the hard ones like ioctl.h nearly always need to hand-edited. Here's how to install the *.ph files:

    1.  become super-user
    2.  cd /usr/include
    3.  h2ph *.h */*.h

If your system supports dynamic loading, for reasons of portability and sanity you probably ought to use h2xs (also part of the standard perl distribution). This tool converts C header files to Perl extensions. See the perlxstut manpage for how to get started with h2xs.

If your system doesn't support dynamic loading, you still probably ought to use h2xs. See the perlxstut manpage and MakeMaker for more information (in brief, just use make perl instead of a plain make to rebuild perl with a new static extension).

Why do setuid perl scripts complain about kernel problems?

Some operating systems have bugs in the kernel that make setuid scripts inherently insecure. Perl gives you a number of options (described in the perlsec manpage) to work around such systems.

How can I open a pipe both to and from a command?

The IPC::Open2 module (part of the standard perl distribution) is an easy-to-use approach that internally uses pipe(), fork(), and exec() to do the job. Make sure you read the deadlock warnings in its documentation, though (see Open2).

Why can't I get the output of a command with system()?

You're confusing the purpose of system() and backticks (``). system() runs a command and returns exit status information (as a 16 bit value: the low 8 bits are the signal the process died from, if any, and the high 8 bits are the actual exit value). Backticks (``) run a command and return what it sent to STDOUT.

    $exit_status   = system("mail-users");
    $output_string = `ls`;

How can I capture STDERR from an external command?

There are three basic ways of running external commands:

    system $cmd;                # using system()
    $output = `$cmd`;           # using backticks (``)
    open (PIPE, "cmd |");       # using open()

With system(), both STDOUT and STDERR will go the same place as the script's versions of these, unless the command redirects them. Backticks and open() read only the STDOUT of your command.

With any of these, you can change file descriptors before the call:

    open(STDOUT, ">logfile");
    system("ls");

or you can use Bourne shell file-descriptor redirection:

    $output = `$cmd 2>some_file`;
    open (PIPE, "cmd 2>some_file |");

You can also use file-descriptor redirection to make STDERR a duplicate of STDOUT:

    $output = `$cmd 2>&1`;
    open (PIPE, "cmd 2>&1 |");

Note that you cannot simply open STDERR to be a dup of STDOUT in your Perl program and avoid calling the shell to do the redirection. This doesn't work:

    open(STDERR, ">&STDOUT");
    $alloutput = `cmd args`;  # stderr still escapes

This fails because the open() makes STDERR go to where STDOUT was going at the time of the open(). The backticks then make STDOUT go to a string, but don't change STDERR (which still goes to the old STDOUT).

Note that you must use Bourne shell (sh(1)) redirection syntax in backticks, not csh(1)! Details on why Perl's system() and backtick and pipe opens all use the Bourne shell are in http://www.perl.com/CPAN/doc/FMTEYEWTK/versus/csh.whynot .

You may also use the IPC::Open3 module (part of the standard perl distribution), but be warned that it has a different order of arguments from IPC::Open2 (see Open3).

Why doesn't open() return an error when a pipe open fails?

It does, but probably not how you expect it to. On systems that follow the standard fork()/exec() paradigm (eg, Unix), it works like this: open() causes a fork(). In the parent, open() returns with the process ID of the child. The child exec()s the command to be piped to/from. The parent can't know whether the exec() was successful or not - all it can return is whether the fork() succeeded or not. To find out if the command succeeded, you have to catch SIGCHLD and wait() to get the exit status. You should also catch SIGPIPE if you're writing to the child -- you may not have found out the exec() failed by the time you write. This is documented in the perlipc manpage.

On systems that follow the spawn() paradigm, open() might do what you expect - unless perl uses a shell to start your command. In this case the fork()/exec() description still applies.

What's wrong with using backticks in a void context?

Strictly speaking, nothing. Stylistically speaking, it's not a good way to write maintainable code because backticks have a (potentially humungous) return value, and you're ignoring it. It's may also not be very efficient, because you have to read in all the lines of output, allocate memory for them, and then throw it away. Too often people are lulled to writing:

    `cp file file.bak`;

And now they think ``Hey, I'll just always use backticks to run programs.'' Bad idea: backticks are for capturing a program's output; the system() function is for running programs.

Consider this line:

    `cat /etc/termcap`;

You haven't assigned the output anywhere, so it just wastes memory (for a little while). Plus you forgot to check $? to see whether the program even ran correctly. Even if you wrote

    print `cat /etc/termcap`;

In most cases, this could and probably should be written as

    system("cat /etc/termcap") == 0
        or die "cat program failed!";

Which will get the output quickly (as its generated, instead of only at the end ) and also check the return value.

system() also provides direct control over whether shell wildcard processing may take place, whereas backticks do not.

How can I call backticks without shell processing?

This is a bit tricky. Instead of writing

    @ok = `grep @opts '$search_string' @filenames`;

You have to do this:

    my @ok = ();
    if (open(GREP, "-|")) {
        while (<GREP>) {
            chomp;
            push(@ok, $_);
        }
        close GREP;
    } else {
        exec 'grep', @opts, $search_string, @filenames;
    }

Just as with system(), no shell escapes happen when you exec() a list.

Why can't my script read from STDIN after I gave it EOF (^D on Unix, ^Z on MS-DOS)?

Because some stdio's set error and eof flags that need clearing. The POSIX module defines clearerr() that you can use. That is the technically correct way to do it. Here are some less reliable workarounds:

Try keeping around the seekpointer and go there, like this:
```
    $where = tell(LOG);
    seek(LOG, $where, 0);
```
If that doesn't work, try seeking to a different part of the file and then back.
If that doesn't work, try seeking to a different part of the file, reading something, and then seeking back.
If that doesn't work, give up on your stdio package and use sysread.

How can I convert my shell script to perl?

Learn Perl and rewrite it. Seriously, there's no simple converter. Things that are awkward to do in the shell are easy to do in Perl, and this very awkwardness is what would make a shell->perl converter nigh-on impossible to write. By rewriting it, you'll think about what you're really trying to do, and hopefully will escape the shell's pipeline datastream paradigm, which while convenient for some matters, causes many inefficiencies.

Can I use perl to run a telnet or ftp session?

Try the Net::FTP, TCP::Client, and Net::Telnet modules (available from CPAN). http://www.perl.com/CPAN/scripts/netstuff/telnet.emul.shar will also help for emulating the telnet protocol, but Net::Telnet is quite probably easier to use..

If all you want to do is pretend to be telnet but don't need the initial telnet handshaking, then the standard dual-process approach will suffice:

    use IO::Socket;             # new in 5.004
    $handle = IO::Socket::INET->new('www.perl.com:80')
            || die "can't connect to port 80 on www.perl.com: $!";
    $handle->autoflush(1);
    if (fork()) {               # XXX: undef means failure
        select($handle);
        print while <STDIN>;    # everything from stdin to socket
    } else {
        print while <$handle>;  # everything from socket to stdout
    }
    close $handle;
    exit;

How can I write expect in Perl?

Once upon a time, there was a library called chat2.pl (part of the standard perl distribution), which never really got finished. These days, your best bet is to look at the Comm.pl library available from CPAN.

Is there a way to hide perl's command line from programs such as "ps"?

First of all note that if you're doing this for security reasons (to avoid people seeing passwords, for example) then you should rewrite your program so that critical information is never given as an argument. Hiding the arguments won't make your program completely secure.

To actually alter the visible command line, you can assign to the variable $0 as documented in the perlvar manpage. This won't work on all operating systems, though. Daemon programs like sendmail place their state there, as in:

    $0 = "orcus [accepting connections]";

I {changed directory, modified my environment} in a perl script. How come the change disappeared when I exited the script? How do I get my changes to be visible?

Unix: In the strictest sense, it can't be done -- the script executes as a different process from the shell it was started from. Changes to a process are not reflected in its parent, only in its own children created after the change. There is shell magic that may allow you to fake it by eval()ing the script's output in your shell; check out the comp.unix.questions FAQ for details.
VMS: Change to %ENV persist after Perl exits, but directory changes do not.

How do I close a process's filehandle without waiting for it to complete?

Assuming your system supports such things, just send an appropriate signal to the process (see kill. It's common to first send a TERM signal, wait a little bit, and then send a KILL signal to finish it off.

How do I fork a daemon process?

If by daemon process you mean one that's detached (disassociated from its tty), then the following process is reported to work on most Unixish systems. Non-Unix users should check their Your_OS::Process module for other solutions.

Open /dev/tty and use the the TIOCNOTTY ioctl on it. See tty(4) for details.
Change directory to /
Reopen STDIN, STDOUT, and STDERR so they're not connected to the old tty.
Background yourself like this:
```
    fork && exit;
```

How do I make my program run with sh and csh?

See the eg/nih script (part of the perl source distribution).

How do I find out if I'm running interactively or not?

Good question. Sometimes -t STDIN and -t STDOUT can give clues, sometimes not.

    if (-t STDIN && -t STDOUT) {
        print "Now what? ";
    }

On POSIX systems, you can test whether your own process group matches the current process group of your controlling terminal as follows:

    use POSIX qw/getpgrp tcgetpgrp/;
    open(TTY, "/dev/tty") or die $!;
    $tpgrp = tcgetpgrp(TTY);
    $pgrp = getpgrp();
    if ($tpgrp == $pgrp) {
        print "foreground\n";
    } else {
        print "background\n";
    }

How do I timeout a slow event?

Use the alarm() function, probably in conjunction with a signal handler, as documented Signals and chapter 6 of the Camel. You may instead use the more flexible Sys::AlarmCall module available from CPAN.

How do I set CPU limits?

Use the BSD::Resource module from CPAN.

How do I avoid zombies on a Unix system?

Use the reaper code from Signals to call wait() when a SIGCHLD is received, or else use the double-fork technique described in fork.

How do I use an SQL database?

There are a number of excellent interfaces to SQL databases. See the DBD::* modules available from http://www.perl.com/CPAN/modules/dbperl/DBD .

How do I make a system() exit on control-C?

You can't. You need to imitate the system() call (see the perlipc manpage for sample code) and then have a signal handler for the INT signal that passes the signal on to the subprocess.

How do I open a file without blocking?

If you're lucky enough to be using a system that supports non-blocking reads (most Unixish systems do), you need only to use the O_NDELAY or O_NONBLOCK flag from the Fcntl module in conjunction with sysopen():

    use Fcntl;
    sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644)
        or die "can't open /tmp/somefile: $!":

How do I install a CPAN module?

The easiest way is to have the CPAN module do it for you. This module comes with perl version 5.004 and later. To manually install the CPAN module, or any well-behaved CPAN module for that matter, follow these steps:

Unpack the source into a temporary area.
```
    perl Makefile.PL
```
```
    make
```
```
    make test
```
```
    make install
```

If your version of perl is compiled without dynamic loading, then you just need to replace step 3 (make) with make perl and you will get a new perl binary with your extension linked in.

See MakeMaker for more details on building extensions, the question ``How do I keep my own module/library directory?''

How do I keep my own module/library directory?

When you build modules, use the PREFIX option when generating Makefiles:

    perl Makefile.PL PREFIX=/u/mydir/perl

then either set the PERL5LIB environment variable before you run scripts that use the modules/libraries (see the perlrun manpage) or say

    use lib '/u/mydir/perl';

See Perl's the lib manpage for more information.

How do I add the directory my program lives in to the module/library search path?

    use FindBin;
    use lib "$FindBin:Bin";
    use your_own_modules;

How do I add a directory to my include path at runtime?

Here are the suggested ways of modifying your include path:

    the PERLLIB environment variable
    the PERL5LIB environment variable
    the perl -Idir commpand line flag
    the use lib pragma, as in
        use lib "$ENV{HOME}/myown_perllib";

The latter is particularly useful because it knows about machine dependent architectures. The lib.pm pragmatic module was first included with the 5.002 release of Perl.

How do I get one key from the terminal at a time, under POSIX?

    #!/usr/bin/perl -w
    use strict;
    $| = 1;
    for (1..4) {
        my $got;
        print "gimme: ";
        $got = getone();
        print "--> $got\n";
    }
    exit;

    BEGIN {
        use POSIX qw(:termios_h);

        my ($term, $oterm, $echo, $noecho, $fd_stdin);

        $fd_stdin = fileno(STDIN);

        $term     = POSIX::Termios->new();
        $term->getattr($fd_stdin);
        $oterm     = $term->getlflag();

        $echo     = ECHO | ECHOK | ICANON;
        $noecho   = $oterm & ~$echo;

        sub cbreak {
            $term->setlflag($noecho);
            $term->setcc(VTIME, 1);
            $term->setattr($fd_stdin, TCSANOW);
        }

        sub cooked {
            $term->setlflag($oterm);
            $term->setcc(VTIME, 0);
            $term->setattr($fd_stdin, TCSANOW);
        }

        sub getone {
            my $key = '';
            cbreak();
            sysread(STDIN, $key, 1);
            cooked();
            return $key;
        }

    }
    END { cooked() }

AUTHOR AND COPYRIGHT

Copyright (c) 1997 Tom Christiansen and Nathan Torkington. All rights reserved. See the perlfaq manpage for distribution information. END-of-perlfaq8.pod echo x - perlfaq9.pod sed 's/^X//' >perlfaq9.pod << 'END-of-perlfaq9.pod' =head1 NAME

perlfaq9 - Networking

DESCRIPTION

This section deals with questions related to networking, the internet, and a few on the web.

My CGI script runs from the command line but not the browser. Can you help me fix it?

Sure, but you probably can't afford our contracting rates :-)

Seriously, if you can demonstrate that you've read the following FAQs and that your problem isn't something simple that can be easily answered, you'll probably receive a courteous and useful reply to your question if you post it on comp.infosystems.www.authoring.cgi (if it's something to do with HTTP, HTML, or the CGI protocols). Questions that appear to be Perl questions but are really CGI ones that are posted to comp.lang.perl.misc may not be so well received.

The useful FAQs are:

    http://www.perl.com/perl/faq/idiots-guide.html
    http://www3.pair.com/webthing/docs/cgi/faqs/cgifaq.shtml
    http://www.perl.com/perl/faq/perl-cgi-faq.html
    http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html
    http://www.boutell.com/faq/

How do I remove HTML from a string?

The most correct way (albeit not the fastest) is to use HTML::Parse from CPAN (part of the libwww-perl distribution, which is a must-have module for all web hackers).

Many folks attempt a simple-minded regular expression approach, like s/<.*?>//g, but that fails in many cases because the tags may continue over line breaks, they may contain quoted angle-brackets, or HTML comment may be present. Plus folks forget to convert entities, like < for example.

Here's one ``simple-minded'' approach, that works for most files:

    #!/usr/bin/perl -p0777
    s/<(?:[^>'"]*|(['"]).*?\1)*>//gs

If you want a more complete solution, see the 3-stage striphtml program in http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz .

How do I extract URLs?

A quick but imperfect approach is

    #!/usr/bin/perl -n00
    # qxurl - tchrist@perl.com
    print "$2\n" while m{
        < \s*
          A \s+ HREF \s* = \s* (["']) (.*?) \1
        \s* >
    }gsix;

This version does not adjust relative URLs, understand alternate bases, deal with HTML comments, deal with HREF and NAME attributes in the same tag, or accept URLs themselves as arguments. It also runs about 100x faster than a more ``complete'' solution using the LWP suite of modules, such as the http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.

How do I download a file from the user's machine? How do I open a file on another machine?

In the context of an HTML form, you can use what's known as multipart/form-data encoding. The CGI.pm module (available from CPAN) supports this in the start_multipart_form() method, which isn't the same as the startform() method.

How do I make a pop-up menu in HTML?

Use the <SELECT> and <OPTION> tags. The CGI.pm module (available from CPAN) supports this widget, as well as many others, including some that it cleverly synthesizes on its own.

How do I fetch an HTML file?

One approach, if you have the lynx text-based HTML browser installed on your system, is this:

    $html_code = `lynx -source $url`;
    $text_data = `lynx -dump $url`;

The libwww-perl (LWP) modules from CPAN provide a more powerful way to do this. They work through proxies, and don't require lynx:

    # print HTML from a URL
    use LWP::Simple;
    getprint "http://www.sn.no/libwww-perl/";;

    # print ASCII from HTML from a URL
    use LWP::Simple;
    use HTML::Parse;
    use HTML::FormatText;
    my ($html, $ascii);
    $html = get("http://www.perl.com/";);
    defined $html
        or die "Can't fetch HTML from http://www.perl.com/";;
    $ascii = HTML::FormatText->new->format(parse_html($html));
    print $ascii;

how do I decode or create those %-encodings on the web?

Here's an example of decoding:

    $string = "http://altavista.digital.com/cgi-bin/query?pg=q&;what=news&fmt=.&q=%2Bcgi-bin+%2Bperl.exe";
    $string =~ s/%([a-fA-F0-9]{2})/chr(hex($1))/ge;

Encoding is a bit harder, because you can't just blindly change all the non-alphanumunder character (\W) into their hex escapes. It's important that characters with special meaning like / and ? not be translated. Probably the easiest way to get this right is to avoid reinventing the wheel and just use the URI::Escape module, which is part of the libwww-perl package (LWP) available from CPAN.

How do I redirect to another page?

Instead of sending back a Content-Type as the headers of your reply, send back a Location: header. Officially this should be a URI: header, so the CGI.pm module (available from CPAN) sends back both:

    Location: http://www.domain.com/newpage
    URI: http://www.domain.com/newpage

Note that relative URLs in these headers can cause strange effects because of ``optimizations'' that servers do.

How do I put a password on my web pages?

That depends. You'll need to read the documentation for your web server, or perhaps check some of the other FAQs referenced above.

How do I edit my .htpasswd and .htgroup files with Perl?

The HTTPD::UserAdmin and HTTPD::GroupAdmin modules provide a consistent OO interface to these files, regardless of how they're stored. Databases may be text, dbm, Berkley DB or any database with a DBI compatible driver. HTTPD::UserAdmin supports files used by the `Basic' and `Digest' authentication schemes. Here's an example:

    use HTTPD::UserAdmin ();
    HTTPD::UserAdmin
          ->new(DB => "/foo/.htpasswd")
          ->add($username => $password);

How do I make sure users can't enter values into a form that cause my CGI script to do bad things?

Read the CGI security FAQ, at http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html, and the Perl/CGI FAQ at http://www.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html.

In brief: use tainting (see the perlsec manpage), which makes sure that data from outside your script (eg, CGI parameters) are never used in eval or system calls. In addition to tainting, never use the single-argument form of system() or exec(). Instead, supply the command and arguments as a list, which prevents shell globbing.

How do I parse an email header?

For a quick-and-dirty solution, try this solution derived from page 222 of the 2nd edition of ``Programming Perl'':

    $/ = '';
    $header = <MSG>;
    $header =~ s/\n\s+/ /g;      # merge continuation lines
    %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header );

That solution doesn't do well if, for example, you're trying to maintain all the Received lines. A more complete approach is to use the Mail::Header module from CPAN (part of the MailTools package).

How do I decode a CGI form?

A lot of people are tempted to code this up themselves, so you've probably all seen a lot of code involving $ENV{CONTENT_LENGTH} and $ENV{QUERY_STRING}. It's true that this can work, but there are also a lot of versions of this floating around that are quite simply broken!

Please do not be tempted to reinvent the wheel. Instead, use the CGI.pm or CGI_Lite.pm (available from CPAN), or if you're trapped in the module-free land of perl1 .. perl4, you might look into cgi-lib.pl (available from http://www.bio.cam.ac.uk/web/form.html).

How do I check a valid email address?

You can't.

Without sending mail to the address and seeing whether it bounces (and even then you face the halting problem), you cannot determine whether an email address is valid. Even if you apply the email header standard, you can have problems, because there are deliverable addresses that aren't RFC-822 (the mail header standard) compliant, and addresses that aren't deliverable which are compliant.

Many are tempted to try to eliminate many frequently-invalid email addresses with a simple regexp, such as /^[\w.-]+\@([\w.-]\.)+\w+$/. However, this also throws out many valid ones, and says nothing about potential deliverability, so is not suggested. Instead, see http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz , which actually checks against the full RFC spec (except for nested comments), looks for addresses you may not wish to accept email to (say, Bill Clinton or your postmaster), and then makes sure that the hostname given can be looked up in DNS. It's not fast, but it works.

Here's an alternative strategy used by many CGI script authors: Check the email address with a simple regexp (such as the one above). If the regexp matched the address, accept the address. If the regexp didn't match the address, request confirmation from the user that the email address they entered was correct.

How do I decode a MIME/BASE64 string?

The MIME-tools package (available from CPAN) handles this and a lot more. Decoding BASE64 becomes as simple as:

    use MIME::base64;
    $decoded = decode_base64($encoded);

A more direct approach is to use the unpack() function's ``u'' format after minor transliterations:

    tr#A-Za-z0-9+/##cd;                   # remove non-base64 chars
    tr#A-Za-z0-9+/# -_#;                  # convert to uuencoded format
    $len = pack("c", 32 + 0.75*length);   # compute length byte
    print unpack("u", $len . $_);         # uudecode and print

How do I return the user's email address?

On systems that support getpwuid, the $< variable and the Sys::Hostname module (which is part of the standard perl distribution), you can probably try using something like this:

    use Sys::Hostname;
    $address = sprintf('%s@%s', getpwuid($<), hostname);

Company policies on email address can mean that this generates addresses that the company's email system will not accept, so you should ask for users' email addresses when this matters. Furthermore, not all systems on which Perl runs are so forthcoming with this information as is Unix.

The Mail::Util module from CPAN (part of the MailTools package) provides a mailaddress() function that tries to guess the mail address of the user. It makes a more intelligent guess than the code above, using information given when the module was installed, but it could still be incorrect. Again, the best way is often just to ask the user.

How do I send/read mail?

Sending mail: the Mail::Mailer module from CPAN (part of the MailTools package) is UNIX-centric, while Mail::Internet uses Net::SMTP which is not UNIX-centric. Reading mail: use the Mail::Folder module from CPAN (part of the MailFolder package) or the Mail::Internet module from CPAN (also part of the MailTools package).

   # sending mail
    use Mail::Internet;
    use Mail::Header;
    # say which mail host to use
    $ENV{SMTPHOSTS} = 'mail.frii.com';
    # create headers
    $header = new Mail::Header;
    $header->add('From', 'gnat@frii.com');
    $header->add('Subject', 'Testing');
    $header->add('To', 'gnat@frii.com');
    # create body
    $body = 'This is a test, ignore';
    # create mail object
    $mail = new Mail::Internet(undef, Header => $header, Body => \[$body]);
    # send it
    $mail->smtpsend or die;

How do I find out my hostname/domainname/IP address?

A lot of code has historically cavalierly called the `hostname` program. While sometimes expedient, this isn't very portable. It's one of those tradeoffs of convenience versus portability.

The Sys::Hostname module (part of the standard perl distribution) will give you the hostname after which you can find out the IP address (assuming you have working DNS) with a gethostbyname() call.

    use Socket;
    use Sys::Hostname;
    my $host = hostname();
    my $addr = inet_ntoa(scalar(gethostbyname($name)) || 'localhost');

Probably the simplest way to learn your DNS domain name is to grok it out of /etc/resolv.conf, at least under Unix. Of course, this assumes several things about your resolv.conf configuration, including that it exists.

(We still need a good DNS domain name-learning method for non-Unix systems.)

How do I fetch a news article or the active newsgroups?

Use the Net::NNTP or News::NNTPClient modules, both available from CPAN. This can make tasks like fetching the newsgroup list as simple as:

    perl -MNews::NNTPClient
      -e 'print News::NNTPClient->;new->list("newsgroups")'