Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

semcor-reformat.pl
reformat SemCor files for use by wsd.pl

semcor-reformat.pl - reformat SemCor files for use by wsd.pl


NAME

semcor-reformat.pl - reformat SemCor files for use by wsd.pl


SYNOPSIS

semcor-reformat.pl {--semcor-dir DIR | --file FILE [FILE ...]} [--key]


EXAMPLE

semcor-reformat.pl --semcor-dir ~/semcor2.0


DESCRIPTION

This scripts reads a semcor-formatted file and produces formatted text that can be used as input to wsd.pl. Alternatively, if the --key option is specified, the output will also include the sense number for each work, and this output can be used as a key file.

There are a few sources of data that are SemCor formatted, including SemCor itself and the Senseval-2 and Senseval-3 all words data sets. They have been made available for download by Rada Mihalcea:

http://www.cs.unt.edu/~rada/downloads.html

Only the words that are assigned valid sense numbers will be passed through this program. All other words are discarded. This means that only open-class words that appear in WordNet will be passed through. Closed class words (pronouns, conjuctions, etc.) and other words not appearing in WordNet are discarded.

head1 OPTIONS

--semcor-dir=DIRECTORY
The location of the SemCor directory. This directory will contain several sub-directories, including 'brown1' and 'brown2'. Do not specify these sub-directories. Only specify the directory name that contains them. For example, if /home/user/semcor2.0 contains the brown1 and brown2 directories, you would only specify /home/user/semcor2.0 as the value of this option. Do not use this option at the same time as the --file option.

--file=FILE
A semcor-formatted file to process. This can be used instead of the previous option to only specify a few Semcor files or to specify Senseval files. When this option is used, multiple files can be specified on the command line. For example

 semcor-reformat.pl --file br-a01 br-a02 br-k18 br-m02 br-r05

Do not attempt to use this option when using the previous option.

--key
Generates a key file for use by the scorer2 program instead of a file that can be used for wsd.pl. The scorer2 program can be used to measure the performance of a word sense disambiguation program. See the documentation for scorer2-format.pl for more information.


AUTHORS

Jason Michelizzi, <jmichelizzi at users.sourceforge.net>

Ted Pedersen, <tpederse at users.sourceforge.net>


COPYRIGHT AND LICENSE

Copyright (C) 2005 by Jason Michelizzi and Ted Pedersen

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy