Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

WWW::2ch
scraping of a popular bbs of Japan.

WWW::2ch - scraping of a popular bbs of Japan.


NAME

WWW::2ch - scraping of a popular bbs of Japan.


SYNOPSIS


  use WWW::2ch;

  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/ogame/',

                          cache => '/tmp/www2ch-cache');

  $bbs->load_setting;

  $bbs->load_subject;

  foreach my $dat ($bbs->subject->threads) {

      $dat->load;

      my $one = $dat->res(1);

      print $dat->title . "\n";

      print '>>1: ' . $one->body;

      foreach my $res ($dat->reslist) {

        print $res->resid . ':' . $res->date . "\n";

        print $res->body_text . "\n";

      }

      last;

  }

  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/test/read.cgi/ogame/1140947283/l50',

                          cache => '/tmp/www2ch-cache');

  my $dat = $bbs->subject->thread('1140947283');

  $dat->load;

  # dat in cash is taken out

  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/ogame/',

                        cache => '/home/ko/cpan/my/WWW-2ch/cache');

  my $dat = $bbs->recall_dat('1141300600');

  # parse dose dat from file

  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/ogame/',

                        cache => '/home/ko/cpan/my/WWW-2ch/cache');

  open my $fh, "test.dat" or return;

  my $data = join('', <$fh>);

  close($fh);

  my $dat = $bbs->parse_dat($data);

  # returns it with raw article data.

  $dat->dat;

  #plugin load

  my $bbs = WWW::2ch->new(url => 'http://example.jp/test/read.cgi/ogame/1140947283/l50',

                          cache => '/tmp/www2ch-cache',

                          plugin => 'ExampleJp');

  # plugin file load

  my $bbs = WWW::2ch->new(url => 'http://example.com/test/read.cgi/ogame/1140947283/l50',

                          cache => '/tmp/www2ch-cache',

                          plugin => '/usr/local/www-2ch/lib/ExampleCom.pm');


DESCRIPTION

It is suitable for the scraping of a popular bbs of Japan.

other BBS and the news sites and other sites are also possible by the addition of the plugin for scraping.

Please take care with the flood control to an excessive access.


Method

new(%option)

option

  • url
  • set the permalink of top page.

  • cache
  • cache directory or Cache module object

  • plugin
  • plugin name (default Base)

encoding
encode name of plugin

load_setting
setting is read

load_subject
article list is read

parse_dat($data[, $subject])
parse does $data

recall_dat($key)
recall dat from cache file


SEE ALSO

http://2ch.net/, http://www.monazilla.org/, the WWW::2ch::Subject manpage, the WWW::2ch::Dat manpage, the WWW::2ch::Res manpage


AUTHOR

Kazuhiro Osawa <ko@yappo.ne.jp>


COPYRIGHT AND LICENSE

Copyright (C) 2006 by Kazuhiro Osawa

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available.

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy