Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

MARC::Charset
convert MARC-8 encoded strings to UTF-8

MARC::Charset - convert MARC-8 encoded strings to UTF-8


NAME

MARC::Charset - convert MARC-8 encoded strings to UTF-8


SYNOPSIS


    # import the marc8_to_utf8 function

    use MARC::Charset 'marc8_to_utf8';

   

    # prepare STDOUT for utf8

    binmode(STDOUT, 'utf8');

    # print out some marc8 as utf8

    print marc8_to_utf8($marc8_string);


DESCRIPTION

MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates unicode, and allows you to put non-Roman scripts in MARC bibliographic records.


    http://www.loc.gov/marc/specifications/spechome.html


EXPORTS

ignore_errors()

Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting. This is helepfuli if you have records that contain both MARC8 and UNICODE characters.


    my $ignore = MARC::Charset->ignore_errors();

    

    MARC::Charset->ignore_errors(1); # ignore errors

    MARC::Charset->ignore_errors(0); # DO NOT ignore errors

assume_unicode()

Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting. This is helepfuli if you have records that contain both MARC8 and UNICODE characters.


    my $setting = MARC::Charset->assume_unicode();

    

    MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)

    MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode

assume_encoding()

Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and other characters.


    my $setting = MARC::Charset->assume_encoding();

    

    MARC::Charset->assume_encoding('cp850'); # assume characters are cp850

    MARC::Charset->assume_encoding(''); # DO NOT assume any encoding

marc8_to_utf8()

Converts a MARC-8 encoded string to UTF-8.


    my $utf8 = marc8_to_utf8($marc8);

If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value:


    my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');

  or

  

    MARC::Charset->ignore_errors(1);

    my $utf8 = marc8_to_utf8($marc8);

utf8_to_marc8()

Will attempt to translate utf8 into marc8.


    my $marc8 = utf8_to_marc8($utf8);

If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter:


    my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');

  or

  

    MARC::Charset->ignore_errors(1);

    my $utf8 = marc8_to_utf8($marc8);


DEFAULT CHARACTER SETS

If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code:


    use MARC::Charset::Constants qw(:all);

    $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;

    $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;


SEE ALSO


AUTHOR

Ed Summers (ehs@pobox.com)

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy