Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

HTML::Summary
module for generating a summary from a web page.

HTML::Summary - module for generating a summary from a web page.


NAME

HTML::Summary - module for generating a summary from a web page.


SYNOPSIS


    use HTML::Summary;

    use HTML::TreeBuilder;

    my $tree = new HTML::TreeBuilder;

    $tree->parse( $document );

    my $summarizer = new HTML::Summary(

        LENGTH      => 200,

        USE_META    => 1,

    );

    $summary = $summarizer->generate( $tree );

    $summarizer->option( 'USE_META' => 1 );

    $length = $summarizer->option( 'LENGTH' );

    if ( $summarizer->meta_used( ) )

    {

        do something

    }


DESCRIPTION

The HTML::Summary module produces summaries from the textual content of web pages. It does so using the location heuristic, which determines the value of a given sentence based on its position and status within the document; for example, headings, section titles and opening paragraph sentences may be favoured over other textual content. A LENGTH option can be used to restrict the length of the summary produced.


CONSTRUCTOR

new( $attr1 => $value1 [, $attr2 => $value2 ] )

Possible attributes are:

VERBOSE
Generate verbose messages to STDERR.

LENGTH
Maximum length of summary (in bytes). Default is 500.

USE_META
Flag to tell summarizer whether to use the content of the <META> tag in the page header, if one is present, instead of generating a summary from the body text. Note that if the USE_META flag is set, this overrides the LENGTH flag - in other words, the summary provided by the <META> tag is returned in full, even if it is greater than LENGTH bytes. Default is 0 (no).


    my $summarizer = new HTML::Summary LENGTH => 200;


METHODS

option( )

Get / set HTML::Summary configuration options.


    my $length = $summarizer->option( 'LENGTH' );

    $summarizer->option( 'USE_META' => 1 );

generate( $tree )

Takes an HTML::Element object, and generates a summary from it.


    my $tree = new HTML::TreeBuilder;

    $tree->parse( $document );

    my $summary = $summarizer->generate( $tree );

meta_used( )

Returns 1 if the META tag description was used to generate the summary.


    if ( $summarizer->meta_used() )

    {

        # do something ...

    }


SEE ALSO


    HTML::TreeBuilder

    Text::Sentence

    Lingua::JA::Jcode

    Lingua::JA::Jtruncate


AUTHORS


    Ave Wrigley <wrigley@cre.canon.co.uk>

    Tony Rose <tgr@cre.canon.co.uk>

    Neil Bowers <neilb@cre.canon.co.uk>


COPYRIGHT

Copyright (c) 1997 Canon Research Centre Europe (CRE). All rights reserved. This script and any associated documentation or files cannot be distributed outside of CRE without express prior permission from CRE.

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy