Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

URI::Escape::XS
Drop-In replacement for URI::Escape

URI::Escape::XS - Drop-In replacement for URI::Escape


NAME

URI::Escape::XS - Drop-In replacement for URI::Escape


VERSION

$Id: XS.pm,v 0.1 2007/04/27 17:17:46 dankogai Exp dankogai $


SYNOPSIS


    # use it instead of URI::Escape

    use URI::Escape::XS qw/uri_escape uri_unescape/;

    $safe = uri_escape("10% is enough\n");

    $verysafe = uri_escape("foo", "\0-377);

    $str  = uri_unescape($safe);

    # or use encodeURIComponent and decodeURIComponent

    use URI::Escape::XS;

    $safe = encodeURIComponent("10% is enough\n");

    $str  = decodeURIComponent("10%25%20is%20enough%0A");


EXPORT

by default

encodeURIComponent and decodeURIComponent

on demand

uri_escape and uri_unescape


FUNCTIONS

encodeURIComponent

Does what JavaScript's encodeURIComponent does.


  $uri = encodeURIComponent("http://www.example.com/";);

  # http%3A%2F%2Fwww.example.com%2F

Note you cannot customize characters to escape. If you need to do so, use uri_escape.

decodeURIComponent

Does what JavaScript's decodeURIComponent does.


  $str = decodeURIComponent("http%3A%2F%2Fwww.example.com%2F");

  # http://www.example.com/

It decode not only %HH sequences but also %uHHHH sequences, with surrogate pairs correctly decoded.


  $str = decodeURIComponent("%uD869%uDEB2%u5F3E%u0061");

  # \x{2A6B2}\x{5F3E}a

This function UNCONDITIONALLY returns the decoded string with utf8 flag off. To get utf8-decoded string, use Encode and


  decode_utf8(decodeURIComponent($uri));

This is the correct behavior because you can't tell if the decoded string actually contains UTF-8 decoded string, like ISO-8859-1 and Shift_JIS.

uri_escape

Does exactly the same as the URI::Escape manpage::uri_escape() except when utf8-flagged string is fed.

the URI::Escape manpage::uri_escape() croak and urge you to uri_escape_utf8() but it is pointless because URI itself has no such things as utf8 flag. The function in this module ALWAYS TREATS the string as byte sequence. That way you can safely use this function without worring about utf8 flags.

Note this function is NOT EXPORTED by default. That way you can use the URI::Escape manpage and the URI::Escape::XS manpage simultaneously.

uri_unescape

Does exactly the same as the URI::Escape manpage::uri_escape() except when %uHHHH is fed.

the URI::Escape manpage::uri_unescape() simply ignores %uHHHH sequences while the function in this module does decode it into the corresponding UTF-8 byte sequence.

Like the uri_escape manpage, this funciton is NOT EXPORTED by default.

Note on the %uHHHH sequence

With this module the resulting strings never have the utf8 flag on. So if you want to decode it to perl utf8, You have to explicitly decode via Encode. Remember. URIs have always been a byte sequence, not UTF-8 characters.

If %uHHHH sequence became standard, you could've safely told if a given URI is in Unicode. But more fortunately than unfortunately, the RFC proposal was rejected so you can't tell which encoding is used just by looking at the URI.

http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementations

I said fortunately because %uHHHH can be nasty for non-BMP characters. Since each %uHHHH can hold one 16-bit value, you need a surrogate pair to represent it if it is U+10000 and above.

In spite of that, there are a significant number of URIs with %uHHHH escapes. Therefore this module supports decoding only.


SPEED

Since this module uses XS, it is really fast except for uri_escape(``noop'').

Regexp which is used in the URI::Escape manpage is really fast for non-matching but slows down significantly when it has to replace string.

BENCHMARK

On Macbook Pro 2GHz, Perl 5.8.8.


 http://www.google.co.jp/search?q=%E5%B0%8F%E9%A3%BC%E5%BC%BE

 ============================================================

 Unescape it

 -----------

 U::E      58526/s       --     -88%

 U::E::XS 486968/s     732%       --

 --------------

 Escape it back

 --------------

 U::E      30046/s       --     -78%

 U::E::XS 136992/s     356%       --

 www.example.com

 ===============

 Unescape it

 -----------

               Rate     U::E U::E::XS

  U::E     821972/s       --      -4%

  U::E::XS 854732/s       4%       --

 --------------

 Escape it back

 -------------

 U::E::XS 522969/s       --      -7%

 U::E     565112/s       8%       --


AUTHOR

Dan Kogai, <dankogai at dan.co.jp>


BUGS

Please report any bugs or feature requests to bug-uri-escape-xs at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.


SUPPORT

You can find documentation for this module with the perldoc command.


    perldoc URI::Escape::XS

You can also look for information at:


ACKNOWLEDGEMENTS

Gisle Aas for the URI::Escape manpage

Koichi Taniguchi for the URI::Escape::JavaScript manpage


COPYRIGHT & LICENSE

Copyright 2007 Dan Kogai, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy