Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

Alvis::Encoding
Perl extension for guessing and checking the encoding of documents.

Alvis::Encoding - Perl extension for guessing and checking the encoding of documents.


NAME

Alvis::Encoding - Perl extension for guessing and checking the encoding of documents.


SYNOPSIS


 use Alvis::Encoding;

 # Create a new instance

 my $e=Alvis::Encoding->new();

 if (!defined($e))

 {

    die "Instantiating Alvis::Encoding failed.";

 }

 # Check that a (decimal) character code is legal UTF-8

 my $code=55;

 if (!$e->code_is_utf8($code))

 {

    # The message will contain the position and the offending character's code 

    die $e->errmsg();

 }

 # Check that a text is legal UTF-8

 my $text;

 if (!$e->is_utf8($text))

 {

    # The message will contain the position and the offending character's code 

    die $e->errmsg();

 }

 # If you need to obtain the position (1..) and the offending character,

 # pass a placeholder in a hash ref argument:

 my %err=();

 if (!$e->is_utf8($text,\%err))

 {

    my $position=$err{pos};

    my $code=$err{code};

    . . . 

 }

 # 

 # Guess the encoding of a document given a guess for its type 

 #

 my $type_guesser=Alvis::Document::Type->new();

 my ($doc_type,$doc_sub_type)=$type_guesser->guess($text);

 my $doc_encoding=$e->guess($text,$doc_type,$doc_sub_type);

 if (!defined($doc_encoding))

 {

     die('Cannot guess. ' . $e->errmsg());

 }

 # 

 # Try converting a document to UTF-8 with only its type known

 #

 my $type_guesser=Alvis::Document::Type->new();

 my ($doc_type,$doc_sub_type)=$type_guesser->guess($text);

 my $doc_in_utf8=$e->try_to_convert_to_utf8($text,$doc_type,$doc_sub_type);

 if (!defined($doc_in_utf8))

 {

     die('Cannot guess. ' . $e->errmsg());

 }

 

 # Try to guess what was meant 

 my @possibilities=$e->guess_typo_fixes('uft-8');


DESCRIPTION

A collection of methods for guessing, confirming and fixing the encoding of a document.


METHODS

new()

Options:


    defaultDocType       default type for a document. Default: text.

    defaultDocSubType    default sub type for a document. Default: html.

    defaultEncoding      default encoding for a document. Default: iso-8859-1.

code_is_utf8(decimal_code)

Returns 1 if the (decimal) character code is legal UTF-8.

is_utf8(text,err_hash_ref)

Returns 1 if all of the characters of $text are legal UTF-8 Else, returns 0 and sets an error message specifying the location (1..) of the first illegal character code If you wish to obtain the position and offending code, pass a hash ref ($err_hash_ref). The info is in $err_hash_ref->{pos} and $err_hash_ref->{code}.

guess(text,doc_type,doc_sub_type)

Guess the encoding of a document given a guess for its type (and subtype).

guess_and_convert(text,doc_type,doc_sub_type,target_encoding)

Tries to first guess the encoding of the document given a guess at its type and subtype, and then tries to convert it to $target_encoding.

convert(text,source_encoding,target_encoding)

Tries to convert $text from $source_encoding to $target_encoding.

guess_typo_fixes($typo)

Returns a set of guesses for the meant encoding in a case of an encoding name containing typos.

errmsg()

Returns a stack of error messages, if any. Empty string otherwise.


SEE ALSO

Alvis::Document::Type


AUTHOR

Kimmo Valtonen, <kimmo.valtonen@hiit.fi>


COPYRIGHT AND LICENSE

Copyright (C) 2006 by Kimmo Valtonen

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy