Help-Site Computer Manuals
  Algorithms & Data Structures   Programming Languages   Revision Control
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac
reformat SemCor files for use by - reformat SemCor files for use by

NAME - reformat SemCor files for use by

SYNOPSIS {--semcor-dir DIR | --file FILE [FILE ...]} [--key]

EXAMPLE --semcor-dir ~/semcor2.0


This scripts reads a semcor-formatted file and produces formatted text that can be used as input to Alternatively, if the --key option is specified, the output will also include the sense number for each work, and this output can be used as a key file.

There are a few sources of data that are SemCor formatted, including SemCor itself and the Senseval-2 and Senseval-3 all words data sets. They have been made available for download by Rada Mihalcea:

Only the words that are assigned valid sense numbers will be passed through this program. All other words are discarded. This means that only open-class words that appear in WordNet will be passed through. Closed class words (pronouns, conjuctions, etc.) and other words not appearing in WordNet are discarded.


The location of the SemCor directory. This directory will contain several sub-directories, including 'brown1' and 'brown2'. Do not specify these sub-directories. Only specify the directory name that contains them. For example, if /home/user/semcor2.0 contains the brown1 and brown2 directories, you would only specify /home/user/semcor2.0 as the value of this option. Do not use this option at the same time as the --file option.

A semcor-formatted file to process. This can be used instead of the previous option to only specify a few Semcor files or to specify Senseval files. When this option is used, multiple files can be specified on the command line. For example --file br-a01 br-a02 br-k18 br-m02 br-r05

Do not attempt to use this option when using the previous option.

Generates a key file for use by the scorer2 program instead of a file that can be used for The scorer2 program can be used to measure the performance of a word sense disambiguation program. See the documentation for for more information.


Jason Michelizzi, <jmichelizzi at>

Ted Pedersen, <tpederse at>


Copyright (C) 2005 by Jason Michelizzi and Ted Pedersen

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.