![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
use URI::URL;
# Constructors $url1 = new URI::URL 'http://www.perl.com/%7Euser/gisle.gif'; $url2 = new URI::URL 'gisle.gif', 'http://www.com/%7Euser'; $url3 = url 'http://www.sn.no/'; # handy constructor $url4 = $url2->abs; # get absolute url using base $url5 = $url2->abs('http:/other/path'); $url6 = newlocal URI::URL 'test';
# Stringify URL $str1 = $url->as_string; # complete escaped URL string $str2 = $url->full_path; # escaped path+params+query $str3 = "$url"; # use operator overloading
# Retrieving Generic-RL components: $scheme = $url->scheme; $netloc = $url->netloc; # see user,password,host,port below $path = $url->path; $params = $url->params; $query = $url->query; $frag = $url->frag;
# Accessing elements in their escaped form $path = $url->epath; $params = $url->eparams; $query = $url->equery;
# Retrieving Network location (netloc) components: $user = $url->user; $password = $url->password; $host = $url->host; $port = $url->port; # returns default if not defined
# Retrieve escaped path components as an array @path = $url->path_components;
# HTTP query-string access methods @keywords = $url->keywords; @form = $url->query_form;
# All methods above can set the field values, e.g: $url->scheme('http'); $url->host('www.w3.org'); $url->port($url->default_port); $url->base($url5); # use string or object $url->keywords(qw(dog bones));
# File methods $url = new URI::URL "file:/foo/bar"; open(F, $url->local_path) or die;
# Compare URLs if ($url->eq("http://www.sn.no")) or die;
URI::URL objects are created by calling new(),
which takes as
argument a string representation of the URL or an existing URL object
reference to be cloned. Specific individual elements can then be accessed
via the scheme(),
user(),
password(),
host(),
port(),
path(),
params(),
query()
and
frag()
methods. In addition escaped versions of the path,
params and query can be accessed with the epath(),
eparams()
and equery()
methods. Note that some
URL schemes will support all these methods.
The object constructor new()
must be able to determine the
scheme for the URL. If a scheme is not specified in the URL itself, it will
use the scheme specified by the base URL. If no base URL scheme is defined
then new()
will croak if URI::URL::strict(1) has been invoked,
otherwise http is silently assumed. Once the scheme has been determined new()
then uses the implementor()
function to determine which class
implements that scheme. If no implementor class is defined for the scheme
then new()
will croak if URI::URL::strict(1) has been invoked,
otherwise the internal generic URL class is assumed.
Internally defined schemes are implemented by the URI::URL::scheme_name module. The URI::URL::implementor() function can be used to explicitly set the class used to implement a scheme if you want to override this.
There is a conflict between the need to be able to represent many characters including spaces within a URI directly, and the need to be able to use a URI in environments which have limited character sets or in which certain characters are prone to corruption. This conflict has been resolved by use of an hexadecimal escaping method which may be applied to any characters forbidden in a given context. When URLs are moved between contexts, the set of characters escaped may be enlarged or reduced unambiguously. The canonical form for URIs has all white spaces encoded.
The components of a URL string must be individually escaped. Each component of a URL may have a separate requirements regarding what must be escaped, and those requirements are also dependent on the URL scheme.
Never escape an already escaped component string.
new()
and will return a fully escaped URL string from
as_string()
and full_path().
Individual components can be manipulated in unescaped or escaped form. The following methods return/accept unescaped strings:
scheme path user params password query host frag port
The following methods return/accept partial escaped strings:
netloc eparams epath equery
Partial escaped means that only reserved characters (i.e. ':', '@', '/', ';', '?', '=', '&' in addition to '%', '.' and '#') needs to be escaped when they are to be treated as normal characters. Fully escaped means that all unsafe characters are escaped. Unsafe characters are all all control characters (%00-%1F and %7F), all 8-bit characters (%80-%FF) as well as '{', '}', '|', '\', '^', '[', ']' '`', '``', '<' and '>'. Note that the character '~' is not considered unsafe by this library as it is common practice to use it to reference personal home pages, but it is still unsafe according to RFC 1738.
package MYURL::foo; @ISA = (URI::URL::implementor()); # inherit from generic scheme
The 'URI::URL::implementor()' function call with no parameters returns the
name of the class which implements the generic URL scheme behaviour
(typically URI::URL::_generic
). All hierarchical schemes should be derived from this class.
Your class can then define overriding methods (e.g., new(),
_parse()
as required).
To register your new class as the implementor for a specific scheme use code like:
URI::URL::implementor('x-foo', 'MYURL::foo');
Any new URL created for scheme 'x-foo' will be implemented by your
MYURL::foo
class. Existing URLs will not be affected.
$obj = eval { new URI::URL "snews:comp.lang.perl.misc" };
or set URI::URL::strict(0) if you do not care about bad or unknown schemes.
url()
function is
exported by the URI::URL module and is easier both to type and read than
calling URI::URL->new directly. Useful for constructs like this:
$h = url($str)->host;
This function is just a wrapper for URI::URL->new.
Attribute access methods marked with (*) can take an optional argument to set the value of the attribute, and they always return the old value.
abs()
method attempts to return a new absolute URI::URL
object for a given URL. In order to convert a relative URL into an absolute
one, a base URL is required. You can associate a default base with a URL either by
passing a base to the new()
constructor when a URI::URL is created or using
the base()
method on the object later. Alternatively you can
specify a one-off base as a parameter to the abs()
method.
Some older parsers used to allow the scheme name to be present in the
relative URL if it was the same as the base URL scheme. RFC1808 says that
this should be avoided, but you can enable this old behaviour by passing a
TRUE value as the second argument to the abs()
method. The
difference is demonstrated by the following examples:
url("http:foo")->abs("http://host/a/b") ==> "http:foo" url("http:foo")->abs("http://host/a/b", 1) ==> "http:/host/a/foo"
The rel()
method will do the opposite transformation.
abs()
method.
0: $url->scheme *) 1: $url->user 2: $url->password 3: $url->host 4: $url->port 5: $url->epath 6: $url->eparams 7: $url->equery 8: $url->frag
All elements except scheme will be undefined if the corresponding URL part is not available.
Note: The scheme (first element) returned by crack will aways be defined. This is different from what the $url->scheme returns, since it will return undef for relative URLs.
%XX
encoding unless they are ``reserved'' or ``unsafe''.
epath()
method to be safe.
abs()
method does. For instance:
url("http://www.math.uio.no/doc/mail/top.html", "http://www.math.uio.no/doc/linux/")->rel
will return a relative URL with path set to ``../mail/top.html'' and with the same base as the original URL.
If the original URL already is relative or the scheme or netloc does not match the base, then a copy of the original URL is returned.
equery()
method to be safe.
keywords()
and the query_form()
methods. Both will croak if the query is
not of the correct format. The encodings look like this:
word1+word2+word3.. # keywords key1=val1&key2=val2... # query_form
Note: These functions does not return the old value when they are used to set a value of the query string.
keywords()
method returns a list of unescaped strings. The
method can also be used to set the query string by passing in the keywords
as individual arguments to the method.
query_form()
method return a list of unescaped key/value
pairs. If you assign the return value to a hash you might loose some values
if the key is repeated (which it is allowed to do).
This method can also be used to set the query sting of the URL like this:
$url->query_form(foo => 'bar', foo => 'baz', equal => '=');
If the value part of a key/value pair is a reference to an array, then it will be converted to separate key/value pairs for each value. This means that these two calls are equal:
$url->query_form(foo => 'bar', foo => 'baz'); $url->query_form(foo => ['bar', 'baz']);
local_path()
method that returns a path
suitable for access to files within the current filesystem. These methods
can not be used to set the path of the URL.
netloc()
method.
epath()
or equery()
instead. The path()
method will for
instance loose information if any path segment contain an (encoded) '/'
character.
The path()
now consider a leading '/' to be part of the path.
If the path is empty it will default to '/'. You can get the old behaviour
by setting $URI::URL::COMPAT_VER_3 to TRUE before accessing the
path()
method.
wwwurl.pl
code in the libwww-perl distribution developed by Roy Fielding
Gisle Aas
If you have any suggestions, bug reports, fixes, or enhancements, send them
to the libwww-perl mailing list at
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
$CommentsMailTo = "perl5@dcs.ed.ac.uk"; include("../syssies_footer.inc");?>