Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

OurNet::Site
Extract web pages via templates

OurNet::Site - Extract web pages via templates


NAME

OurNet::Site - Extract web pages via templates


SYNOPSIS


    use LWP::Simple;

    use OurNet::Site;

    my ($query, $hits) = ('autrijus', 10);

    my $found;

    # Create a bot

    $bot = OurNet::Site->new('google');

    # Parse the result got from LWP::Simple

    $bot->callme($self, 0, get($bot->geturl($query, $hits)), \&callmeback);

    print '*** ' . ($found ? $found : 'No') . ' match(es) found.';

    # Callback routine

    sub callmeback {

        my ($self, $himself) = @_;

        foreach my $entry (@{$himself->{response}}) {

            if ($entry->{url}) {

                print "*** [$entry->{title}]" .

                         " ($entry->{score})" .

                       " - [$entry->{id}]\n"  .

                 "    URL: [$entry->{url}]\n" .

                       "    $entry->{preview}\n";

                $found++;

                delete($entry->{url});

            }

        }

    }


DESCRIPTION

This module parses results returned from a typical search engine by reading a 'site descriptor' file defining its aspects, and parses results on-the-fly accordingly.

Since v1.52, OurNet::Site uses site descriptors in Template toolkit format with extention '.tt2' by default. The template should contains at least one [% FOREACH entry %] block, and [% SET url.start %] accordingly.

Alternatively, you can use a special XML format for site descriptor. See the .xml files in the Site directory for examples.

Finally, it also takes Inforia Quest .fmt-style site descriptors, available at http://www.pasia.com/. The author of course cannot support this usage.

Note that tt2 support is *highly* experimental and should not be relied upon until a more stable release comes.


BUGS

Probably lots. Most notably the 'More' facilities is lacking. Also there is no template-generating abilities. This is a must, but I couldn't find enough motivation to do it. Maybe you could.

Currently, tt2 does not (quite) support incremental parsing in conjunction with the OurNet::Query manpage.

Also, the XML spec of site descriptor is not well-formed, let alone of a complete XML Schema or DTD description.


SEE ALSO

the OurNet::Template manpage, the OurNet::Query manpage


AUTHORS

Autrijus Tang <autrijus@autrijus.org>


COPYRIGHT

Copyright 2001 by Autrijus Tang <autrijus@autrijus.org>.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy