Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

WWW::Sitemap
functions for generating a site map for a given site URL.

WWW::Sitemap - functions for generating a site map for a given site URL.


NAME

WWW::Sitemap - functions for generating a site map for a given site URL.


SYNOPSIS


    use WWW::Sitemap;

    use LWP::UserAgent;

    my $ua = new LWP::UserAgent;

    my $sitemap = new WWW::Sitemap(

        EMAIL       => 'your@email.address',

        USERAGENT   => $ua,

        ROOT        => 'http://www.my.com/'

    );

    $sitemap->url_callback(

        sub {

            my ( $url, $depth, $title, $summary ) = @_;

            print STDERR "URL: $url\n";

            print STDERR "DEPTH: $depth\n";

            print STDERR "TITLE: $title\n";

            print STDERR "SUMMARY: $summary\n";

            print STDERR "\n";

        }

    );

    $sitemap->generate();

    $sitemap->option( 'VERBOSE' => 1 );

    my $len = $sitemap->option( 'SUMMARY_LENGTH' );

    my $root = $sitemap->root();

    for my $url ( $sitemap->urls() )

    {

        if ( $sitemap->is_internal_url( $url ) )

        {

            # do something ...

        }

        my @links = $sitemap->links( $url );

        my $title = $sitemap->title( $url );

        my $summary = $sitemap->summary( $url );

        my $depth = $sitemap->depth( $url );

    }

    $sitemap->traverse(

        sub {

            my ( $sitemap, $url, $depth, $flag ) = @_;

            if ( $flag == 0 )

            {

                # do something at the start of a list of sub-pages ...

            }

            elsif( $flag == 1 )

            {

                # do something for each page ...

            }

            elsif( $flag == 2 )

            {

                # do something at the end of a list of sub-pages ...

            }

        }

    )


DESCRIPTION

The WWW::Sitemap module creates a sitemap for a site, by traversing the site using the WWW::Robot module. The sitemap object has methods to access a list of all the urls in the site, and a list of all the links for each of these urls. It is also possible to access the title of each url, and a summary generated from each url. The depth of each url can also be accessed; the depth is the minimum number of links from the root URL to that page.


CONSTRUCTOR

WWW::Sitemap->new [ $option => $value ] ...

Possible option are:

USERAGENT
User agent used to do the robot traversal. Defaults to LWP::UserAgent.

VERBOSE
Verbose flag, for printing out useful messages during traversal [0|1]. Defaults to 0.

SUMMARY_LENGTH
Maximum length of (automatically generated) summary.

EMAIL
E-Mail address robot uses to identify itself with. This option is required.

DEPTH
Maximum depth of traversal.

ROOT
Root URL of the site for which the sitemap is being created. This option is required.

    my $sitemap = new WWW::Sitemap(

        EMAIL       => 'your@email.address',

        USERAGENT   => $ua,

        ROOT        => 'http://www.my.com/'

    );


METHODS

generate( )

Method for generating the sitemap, based on the constructor options.


    $sitemap->generate();

url_callback( sub { ... } )

This method allows you to define a callback that will be invoked on every URL that is traversed while generating the sitemap. This is basically to allow bespoke verbose reporting. The callback should be of the form:


    sub {

        my ( $url, $depth, $title, $summary ) = @_;

        # do something ...

    }

option( $option [ => $value ] )

Iterface to get / set options after object construction.


    $sitemap->option( 'VERBOSE' => 1 );

    my $len = $sitemap->option( 'SUMMARY_LENGTH' );

root()

returns the root URL for the site.


    my $root = $sitemap->root();

urls()

Returns a list of all the URLs on the sitemap.


    for my $url ( $sitemap->urls() )

    {

        # do something ...

    }

is_internal_url( $url )

Returns 1 if $url is an internal URL (i.e. if $url =~ /^$root/.


    if ( $sitemap->is_internal_url( $url ) )

    {

        # do something ...

    }

links( $url )

Returns a list of all the links from a given URL in the site map.


    my @links = $sitemap->links( $url );

title( $url )

Returns the title of the URL.


    my $title = $sitemap->title( $url );

summary( $url )

Returns a summary of the URL - either from the <META NAME=DESCRIPTION> tag or generated automatically using HTML::Summary.


    my $summary = $sitemap->summary( $url );

    

=head2 depth( $url )

Returns the minimum number of links to traverse from the root URL of the site to this URL.


    my $depth = $sitemap->depth( $url );

traverse( \&callback )

The travese method traverses the sitemap, starting at the root node, and visiting each URL in the order that they would be displayed in a sequential sitemap of the site. The callback is called in a number of places in the traversal, indicated by the $flag argument to the callback:

$flag = 0
Before each set of daughter URLs of a given URL.

$flag = 1
For each URL.

$flag = 2
After each set of daughter URLs of a given URL.

See the sitemapper.pl script distributed with this module for an example of the use of the traverse method.


    $sitemap->traverse(

        sub {

            my ( $sitemap, $url, $depth, $flag ) = @_;

            if ( $flag == 0 )

            {

                # do something at the start of a list of sub-pages ...

            }

            elsif( $flag == 1 )

            {

                # do something for each page ...

            }

            elsif( $flag == 2 )

            {

                # do something at the end of a list of sub-pages ...

            }

        }

    );


SEE ALSO


    LWP::UserAgent

    HTML::Summary

    WWW::Robot


AUTHOR

Ave Wrigley <Ave.Wrigley@itn.co.uk>


COPYRIGHT

Copyright (c) 1997 Canon Research Centre Europe (CRE). All rights reserved. This script and any associated documentation or files cannot be distributed outside of CRE without express prior permission from CRE.

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy