Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

Win32::UrlCache
parse Internet Explorer's history/cache/cookies

Win32::UrlCache - parse Internet Explorer's history/cache/cookies


NAME

Win32::UrlCache - parse Internet Explorer's history/cache/cookies


SYNOPSIS


    use Win32::UrlCache;

    my $index = Win32::UrlCache->new( 'index.dat' );

    foreach my $url ( $index->urls ) {

      print $url->url, "\n";

    }

    Or, you can use callback function if you care memory usage.

    use Win32::UrlCache;

    my $index = Win32::UrlCache->new( 'index.dat' );

    $index->urls( callback => \&callback )

    sub callback {

      my $entry = shift;

      my $url = $entry->url;

         $url =~ s/^Visited: //;

      $entry->url( $url );

      print $entry->url, "\n";

      return;  # to prevent the entry from being kept in the object

    }

    If you want to know the title of the cached page (for Win32 only):

    use Win32::UrlCache::Cache;

    use Win32::UrlCache::Title;

    use Encode;

    my $cache = Win32::UrlCache::Cache->new;

       $cache->urls( callback => \&callback )

    sub callback {

      my $entry = shift;

      print $entry->url, "\n";

      my $title = Win32::UrlCache::Title->extract( $entry->filename );

      print encode( shiftjis => $title ), "\n\n" if $title;

      return;

    }


DESCRIPTION

This parses so-called ``Client UrlCache MMF Ver 5.2'' index.dat files, which are used to store Internet Explorer's history, cache, and cookies. As of writing this, I've only tested on Win2K + IE 6.0, but I hope this also works with some of the other versions of OS/Internet Explorer. However, note that this is not based on the official/public MSDN specification, but on a hack on the web. So, caveat emptor in every sense, especially for the redr entries ;)

Patches and feedbacks are welcome.


METHODS

new

receives a path to an 'index.dat', and parses it to create an object.

urls

returns URL entries in the 'index.dat' file. Each entry has url, filename, headers, filesize, last_modified, last_accessed, and optionally, title accessors (note that some of them would return meaningless values). As of 0.02, it can receive a callback function. See below. As of 0.04, you can also pass ( extract_title => 1 ) to extract title. However, this extraction is processed after a callback. So, if you want both to use a callback and to extract title, you might want to insert extraction code into the callback as shown in the synopsis.

leaks

almost the same as urls, but returns LEAK entries (if any) in the 'index.dat' file.

redrs

returns REDR entries (if any) in the 'index.dat' file. Each entry has a url accessor. As of 0.02, it can receive a callback function.


CALLBACK

Three methods shown above return all the entries found in the index by default, but this may eat lots of memory especially if you use IE as a main browser. As of 0.02, those methods may receive a callback function, which will take an entry for the first (and only, as of writing this) argument. If the callback returns true, the entry will be stored in the ::UrlCache object, and if the callback returns false, the entry will be discarded after the callback is executed.


SEE ALSO

http://www.latenighthacking.com/projects/2003/reIndexDat/


AUTHOR

Kenichi Ishigaki, <ishigaki at cpan.org>


COPYRIGHT AND LICENSE

Copyright (C) 2007 by Kenichi Ishigaki.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy