MIME::Decoder - an object for decoding the body part of a MIME stream

NAME

SYNOPSIS

Decoding a data stream. Here's a simple filter program to read quoted-printable data from STDIN (until EOF) and write the decoded data to STDOUT:

    use MIME::Decoder;
    
    $decoder = new MIME::Decoder 'quoted-printable' or die "unsupported";
    $decoder->decode(\*STDIN, \*STDOUT);

Encoding a data stream. Here's a simple filter program to read binary data from STDIN (until EOF) and write base64-encoded data to STDOUT:

    use MIME::Decoder;
    
    $decoder = new MIME::Decoder 'base64' or die "unsupported";
    $decoder->encode(\*STDIN, \*STDOUT);

You can write and install your own decoders so that MIME::Decoder will know about them:

    use MyBase64Decoder;
    
    install MyBase64Decoder 'base64';

You can also test if an encoding is supported:

    if (MIME::Decoder->supported('x-uuencode')) {
        # we can uuencode!
    }

DESCRIPTION

This abstract class, and its private concrete subclasses (see below) provide an OO front end to the actions of...

Decoding a MIME-encoded stream
Encoding a raw data stream into a MIME-encoded stream.

The constructor for MIME::Decoder takes the name of an encoding (base64, 7bit, etc.), and returns an instance of a subclass of MIME::Decoder whose decode() method will perform the appropriate decoding action, and whose encode() method will perform the appropriate encoding action.

PUBLIC INTERFACE

Standard interface

If all you are doing is using this class, here's all you'll need...

new ENCODING

Class method. Create and return a new decoder object which can handle the given ENCODING.

    my $decoder = new MIME::Decoder "7bit";

Returns the undefined value if no known decoders are appropriate.

decode INSTREAM,OUTSTREAM

Instance method. Decode the document waiting in the input handle INSTREAM, writing the decoded information to the output handle OUTSTREAM.

Read the section in this document on I/O handles for more information about the arguments. Note that you can still supply old-style unblessed filehandles for INSTREAM and OUTSTREAM.

encode INSTREAM,OUTSTREAM

Instance method. Encode the document waiting in the input filehandle INSTREAM, writing the encoded information to the output stream OUTSTREAM.

Read the section in this document on I/O handles for more information about the arguments. Note that you can still supply old-style unblessed filehandles for INSTREAM and OUTSTREAM.

encoding

Instance method. Return the encoding that this object was created to handle, coerced to all lowercase (e.g., "base64").

supported [ENCODING]

Class method. With one arg (an ENCODING name), returns truth if that encoding is currently handled, and falsity otherwise. The ENCODING will be automatically coerced to lowercase:

    if (MIME::Decoder->supported('7BIT')) {
        # yes, we can handle it...
    }
    else {
        # drop back six and punt...
    }

With no args, returns all the available decoders as a hash reference... where the key is the encoding name (all lowercase, like '7bit'), and the associated value is true (it happens to be the name of the class that handles the decoding, but you probably shouldn't rely on that). Hence:

    my $supported = MIME::Decoder->supported;
    if ($supported->{7bit}) {
        # yes, we can handle it...
    }
    elsif ($supported->{8bit}) {
        # yes, we can handle it...
    }

You may safely modify this hash; it will not change the way the module performs its lookups. Only install can do that.

Thanks to Achim Bohnet for suggesting this method.

Subclass interface

If you are writing (or installing) a new decoder subclass, there are some other methods you'll need to know about:

decode_it INSTREAM,OUTSTREAM

Abstract instance method. The back-end of the decode method. It takes an input handle opened for reading (INSTREAM), and an output handle opened for writing (OUTSTREAM).

If you are writing your own decoder subclass, you must override this method in your class. Your method should read from the input handle via getline() or read(), decode this input, and print the decoded data to the output handle via print(). You may do this however you see fit, so long as the end result is the same.

Note that unblessed references and globrefs are automatically turned into I/O handles for you by decode(), so you don't need to worry about it.

Your method must return either undef (to indicate failure), or 1 (to indicate success).

encode_it INSTREAM,OUTSTREAM

Abstract instance method. The back-end of the encode method. It takes an input handle opened for reading (INSTREAM), and an output handle opened for writing (OUTSTREAM).

If you are writing your own decoder subclass, you must override this method in your class. Your method should read from the input handle via getline() or read(), encode this input, and print the encoded data to the output handle via print(). You may do this however you see fit, so long as the end result is the same.

Note that unblessed references and globrefs are automatically turned into I/O handles for you by encode(), so you don't need to worry about it.

Your method must return either undef (to indicate failure), or 1 (to indicate success).

init ARGS...

Instance method. Do any necessary initialization of the new instance, taking whatever arguments were given to new(). Should return the self object on success, undef on failure.

install ENCODING

Class method. Install this class so that ENCODING is handled by it. You should not override this method.

BUILT-IN DECODER SUBCLASSES

You don't need to "use" any other Perl modules; the following are included as part of MIME::Decoder.

MIME::Decoder::Base64

The built-in decoder for the "base64" encoding.

The name was chosen to jibe with the pre-existing MIME::Base64 utility package, which this class actually uses to translate each line.

When decoding, the input is read one line at a time. The input accumulates in an internal buffer, which is decoded in multiple-of-4-sized chunks (plus a possible ``leftover'' input chunk, of course).

When encoding, the input is read 45 bytes at a time: this ensures that the output lines are not too long. We chose 45 since it is a multiple of 3 and produces lines under 76 characters, as RFC-1521 specifies.

Thanks to Phil Abercrombie for locating one idiotic bug in this module, which led me to discover another.

MIME::Decoder::Binary

The built-in decoder for a "binary" encoding (in other words, no encoding).

The "binary" decoder is a special case, since it's ill-advised to read the input line-by-line: after all, an uncompressed image file might conceivably have loooooooooong stretches of bytes without a "\n" among them, and we don't want to risk blowing out our core. So, we read-and-write fixed-size chunks.

Both the encoder and decoder do a simple pass-through of the data from input to output.

MIME::Decoder::QuotedPrint

The built-in decoder the for "quoted-printable" encoding.

The name was chosen to jibe with the pre-existing MIME::QuotedPrint utility package, which this class actually uses to translate each line.

The decoder does a line-by-line translation from input to output.

The encoder does a line-by-line translation, breaking lines so that they fall under the standard 76-character limit for this encoding.

Note: just like MIME::QuotedPrint, we currently use the native "\n" for line breaks, and not CRLF. This may need to change in future versions.

MIME::Decoder::Xbit

The built-in decoder for both "7bit" and "8bit" encodings, which guarantee short lines (a maximum of 1000 characters per line) of US-ASCII data compatible with RFC-821.

The decoder does a line-by-line pass-through from input to output, leaving the data unchanged except that an end-of-line sequence of CRLF is converted to a newline ``\n''.

The encoder does a line-by-line pass-through from input to output, splitting long lines if necessary. If created as a 7-bit encoder, any 8-bit characters are mapped to zero or more 7-bit characters: note that this is a potentially lossy encoding if you hand it anything but 7-bit input: therefore, don't use it on binary files (GIFs) and the like; use it only when it ``doesn't matter'' if extra newlines are inserted and 8-bit characters are squished.

There are several possible ways to use this class to encode arbitrary 8-bit text as 7-bit text:

Don't use this class.: Really. Use a more-appropriate encoding, like quoted-printable.
APPROX: Approximate the appearance of the Latin-1 character via Internet conventions; e.g., "\c,", "\n~", etc. This is the default behavior of this class. It will pull in the MIME::Latin1 module to do the translation. This will be useless to you if your 8-bit characters are not Latin-1 text.
STRIP: Strip out any 8-bit characters. Nice if you're really sure that any such characters in your input are mistakes to be deleted, but it'll transform non-English documents into an abbreviated mess. But then, you should be using quoted-printable for those...
QP: Encode them as though we were doing a quoted-printable encoding; e.g., ``=A0''. This won't help the mail viewing software, but some humans may get the gist, and at least the original data might be recoverable...

To affect the default scheme, use the class method:

    MIME::Decoder::Xbit->map_8_to_7_by('STRIP');

To affect just one decoder object:

    $decoder->map_8_to_7_by('STRIP');

NOTES

Input/Output handles

As of MIME-tools 2.0, this class has to play nice with the new MIME::Body class... which means that input and output routines cannot just assume that they are dealing with filehandles.

Therefore, all that MIME::Decoder and its subclasses require (and, thus, all that they can assume) is that INSTREAMs and OUTSTREAMs are objects which respond to the messages defined in MIME::IO (basically, a subset of those defined by IO::Handle).

For backwards compatibilty, if you supply a scalar filehandle name (like "STDOUT") or an unblessed glob reference (like \*STDOUT) where an INSTREAM or OUTSTREAM is expected, this package will automatically wrap it in an object that fits the I/O handle criteria.

Thanks to Achim Bohnet for suggesting this more-generic I/O model.

Writing a decoder

If you're experimenting with your own encodings, you'll probably want to write a decoder. Here are the basics:

Create a module, like ``MyDecoder::'', for your decoder. Declare it to be a subclass of MIME::Decoder.
Create the following instance methods in your class, as described above:
```
    decode_it
    encode_it
    init
```

In your application program, activate your decoder for one or more encodings like this:

    require MyDecoder;

    install MyDecoder "7bit";        # use MyDecoder to decode "7bit"    
    install MyDecoder "x-foo";       # also, use MyDecoder to decode "x-foo"

To illustrate, here's a custom decoder class for the quoted-printable encoding:

    package MyQPDecoder;

    @ISA = qw(MIME::Decoder);    
    use MIME::Decoder;
    use MIME::QuotedPrint;
    
    # decode_it - the private decoding method
    sub decode_it {
        my ($self, $in, $out) = @_;
        
        while (defined($_ = $in->getline())) {
            my $decoded = decode_qp($_);
            $out->print($decoded);
        }
        1;
    }
    
    # encode_it - the private encoding method
    sub encode_it {
        my ($self, $in, $out) = @_;
        
        my ($buf, $nread) = ('', 0); 
        while ($in->read($buf, 60)) {
            my $encoded = encode_qp($buf);
            $out->print($encoded);
        }
        1;
    }

That's it.

The task was pretty simple because the "quoted-printable" encoding can easily be converted line-by-line... as can even "7bit" and "8bit" (since all these encodings guarantee short lines, with a max of 1000 characters). The good news is: it is very likely that it will be similarly-easy to write a MIME::Decoder for any future standard encodings.

The "binary" decoder, however, really required block reads and writes: see MIME/Decoder::Binary for details.

AUTHOR

VERSION

$Revision: 3.202 $ $Date: 1997/01/22 08:32:42 $