Help-Site Computer Manuals
  Algorithms & Data Structures   Programming Languages   Revision Control
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

Application of RTF::Parser for document conversion

RTF::Control - Application of RTF::Parser for document conversion


RTF::Control - Application of RTF::Parser for document conversion


Application of RTF::Parser for document conversion


the RTF::Control manpage is a sublass of the RTF::Parser manpage. the RTF::Control manpage can be seen as a helper module for people wanting to write their own document convertors - the RTF::HTML::Convertor manpage and the RTF::TEXT::Convertor manpage both subclass it.

I am the new maintainer of this module. My aim is to keep the interface identical to the old interface while cleaning up, documenting, and testing the internals. There are things in the interface I'm unhappy with, and things I like - however, I'm maintaining rather than developing the module, so, the interface is mostly frozen.


For starters, go and look at the source of M<RTF::TEXT::Convertor>

Except for RTF::Parser subs, the following is a list of variables exported by RTF::Control that you're expected to tinker with in your own subclass.

RTF::Parser subs

If you read the docs of RTF::Parser you'll see that you can redefine some subs there - RTF::Control has its own definitions for all of these, but you might want to over-ride symbol(), text(), and char(). We'll look at what the defaults of each of these do, and what you need to do if you want to override any of them a little further down.


This hash is actually merged into %do_on_control, with the value wrapped in a subroutine that effectively says print shift. You can put any control words that should map directly to a certain output in here - \tab, for example could be $symbol{'tab'} = "\t".


This hash gets filled with document meta-data, as per the RTF specification.


Not really sure, but paragraph properties

%do_on_event %do_on_control

%do_on_control tells us what to do when we meet a specific control word. The values are coderefs. %do_on_event also holds coderefs, but these are more abstract things to do, say when the stylesheet changes. %do_on_event thingies tend to be called by %do_on_control thingies, as far as I can tell.

$style $newstyle

Style is the current style, $newstyle is the one we're about to change to if we're about to change...


Current event


Pending text


Returns an RTF::Control object. RTF::Control is a subclass of RTF::Parser.

Internally, we call RTF::Parser's new() method, and then we call an internal

method called _configure(), which takes care of options we were passed.

ADD STUFF ON -output AND -confdir


sub new {

        my $proto = shift;

        my $class = ref( $proto ) || $proto;


        my $self = $class->SUPER::new(@_);


                return $self;



# This is a private method. It accepts a hash (well, a list) # of values, and stores them. If one of them is 'output', # it calls a function I'm yet to examine. This was done # in a horrendous way - it's now a lot tidier. :-)

sub _configure {

        my $self = shift;

        my %options = @_;


        # Sanitize the options

        my %clean_options;

        for my $key ( keys %options ) {


                my $oldkey = $key;


                $key =~ s/^-//;

                $key = lc($key);


                $clean_options{ $key } = $options{ $oldkey }




        $self->{'_RTF_Control_Options'} = \%clean_options;


        $self->set_top_output_to( $clean_options{'output'} )

               if $clean_options{'output'};


        return $self;


use constant APPLICATION_DIR => 0;


I'm leaving this method in because removing it will cause a backward-compatability nightmare. This method returns the ( wait for it ) path that the .pm file corresponding to the class that the object is contained, without a trailing semi-colon. Obviously this is nasty in several ways. If you've set -confdir in new() that will be returned instead. You should definitely take that route if you're on an OS on which Perl can't use / as a directory seperator.


This nicely abstracts away using application_dir and so on. It's a method call. It'll take the name of the class, and an argument for the module/file it's looking for. This is likely to be 'ansi' or 'charmap'. This argument, for historical reasons (ho ho ho) will have any _'s removed in the check for a module name ... $self-charmap_reader('char_map') > will thus look for, for example, RTF::TEXT::charmap to load. It'll return the data in the file as an array of lines. This description sucks.

Stack manipulation


Serializes and prints the stack to STDERR


Holder routine for the current thing to do with output text we're given. It starts off as the same as $string_output_sub, which adds the string to the element at the TOP of the output stack. However, the idea, I believe, is to allow that to be changed at will, using push_output.


Adds a blank element to the end of the stack. It will change (or maintain) the function of output to be $string_output_sub, unless you pass it the argument 'nul' , in which case it will set output to be $nul_output_sub.


Removes and returns the last element of the ouput stack


Only called at init time, is a method call not a function. Sets the action of flush_top_output, depending on whether you pass it a filehandle or string reference.


Output the top element of the stack in the way specified by the call to set_top_output_to