Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

Alvis::Convert
Perl extension for converting documents from a number of different source formats to Alvis XML format.

Alvis::Convert - Perl extension for converting documents from a number of different source formats to Alvis XML format.


NAME

Alvis::Convert - Perl extension for converting documents from a number of different source formats to Alvis XML format.


SYNOPSIS


 use Alvis::Convert;

 # Create a new instance, outputting under 'out'. Get the detected

 # encoding from sourceEncodingFromMeta.

 #

 my $C=Alvis::Convert->new(outputRootDir=>'out',

                           outputNPerSubdir=>1000,

                           outputAtSameLocation=>0,

                           includeOriginalDocument=>0,

                           sourceEncodingFromMeta=>1);

 # Restart output counters

 $C->init_output();

 # Convert e.g. HTML

 for my $html_text (@html)

 {

     my $alvisXML=$C->HTML($html_txt,$meta_txt);

     if (!defined($alvisXML))

     {

        warn $C->errmsg();

        $C->clearerr();

        next;

     }

 

     if (!$C->output_Alvis([$alvisXML]))

     {

         warn $C->errmsg();

         $C->clearerr();

         next;

     }

 }


DESCRIPTION

Converts document collections of different formats to Alvis XML format.


METHODS

new()

Options:


    fileType                 the MIME type of the source file to convert. 

                             Default: guess.

    sourceEncoding           encoding of the source document. Default: guess.  

    urlFromBasename          extract URL from basename. Default: no.

    outputAtSameLocation     output Alvis XML to the same directories as the

                             source documents. Default: no.

    alvisSuffix              suffix of the output Alvis XML records. Default:

                             'alvis'.

    outputRootDir            root directory for output files. Default: '.'

    outputNPerSubdir         number of records output per subdirectory.

                             Default: 1000

    defaultDocType           first guess document (MIME) type. Default: 'text'.

    defaultDocSubType        first guess document subtype. Default: 'html'.

    defaultEncoding          first guess encoding. Default: 'iso-8859-1'.

    includeOriginalDocument  include original document in the output?

                             Default: yes.

    ainodumpWarnings         issue warnings concerning ainodump conversion?

                             Default: yes.

    sourceEncodingFromMeta   read source encoding from Meta information?

                             Default: no.

HTML()


     my $alvisXML=$C->HTML($html_txt,$meta_txt,

                           {sourceEncoding=>'utf8',

                            sourceEncodingFromMeta=>0

                            });

     if (!defined($alvisXML))

     {

        warn $C->errmsg();

        $C->clearerr();

        next;

     }

newsXML()


     $meta_txt=$C->read_meta($news_xml_entries{$base_name}{metaF});

     if (!defined($meta_txt))

     {

         warn "Reading meta file " .

              "\"$news_xml_entries{$base_name}{metaF}\" failed. " .

              $C->errmsg();

         $C->clearerr();

         next;

     }

     my $alvisXMLs;

     $xml_txt=$C->read_news_XML($news_xml_entries{$base_name}{xmlF});

     if (!defined($xml_txt))

     {

         warn "Reading the news XML for basename \"$base_name\" failed. " .

               $C->errmsg();

         $C->clearerr();

         next;

     }

     $alvisXMLs=$C->newsXML($xml_txt,$meta_txt,$original_document_text);

     if (!defined($alvisXMLs))

     {

         warn "Obtaining the Alvis versions of the documents inside " .

              "\"$base_name\"'s XML file failed. " . $C->errmsg();

         $C->clearerr();

         next;

     }

ainodump()


    if (!$C->ainodump($ainodump_file))

    {

       warn "Obtaining the Alvis version of the " .

            "ainodump file \"$dump_entries{$base_name}{ainoF}\" " .

            "failed. " . $C->errmsg() if

              $Warnings;

       $C->clearerr();

    }



=head2 set()

    $C->set('alvisSuffix','foo');

read_HTML()


    $html_txt=$C->read_HTML($html_file,$meta_txt);

     if (!defined($html_txt))

     {

         warn "Reading the HTML failed. " .

               $C->errmsg();

         $C->clearerr();

         next;

     }

read_meta()

read_news_XML()

init_output()


    Initializes output counters.

output_alvis()


    $alvisXML=$C->HTML($html_txt,$meta_txt);

    if (!$C->output_Alvis([$alvisXML],$base_name))

    {

        warn "Outputting the Alvis records failed. " . $C->errmsg() if

                $Warnings;

        $C->clearerr();

        next;

    }



=head2 errmsg()

Returns a stack of error messages, if any. Empty string otherwise.


SEE ALSO

Alvis::Document


AUTHOR

Kimmo Valtonen, <kimmo.valtonen@hiit.fi>


COPYRIGHT AND LICENSE

Copyright (C) 2006 by Kimmo Valtonen

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.




=cut





Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy