Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

Audio::MFCC
Perl module for computing mel-frequency cepstral coefficients

Audio::MFCC - Perl module for computing mel-frequency cepstral coefficients


NAME

Audio::MFCC - Perl module for computing mel-frequency cepstral coefficients


SYNOPSIS


  use Audio::MFCC;

  my $fe = Audio::MFCC->init(\%params)

  $fe->start_utt;

  my @ceps = $fe->process_utt($rawdata, $nsamps);

  my $leftover = $fe->end_utt;


DESCRIPTION

This module provides an interface to the Sphinx-II feature extraction library which can be used to extract mel-frequency cepstral coefficients from data. These coefficients can then be passed to the Speech::Recognizer::SPX::uttproc_cepdata function.

You might find this useful if, for example, you wish to do the actual recognition on a different machine from the audio capture, and don't have the bandwidth to send a full stream of audio data over the network.

Currently, Sphinx-II also uses delta and double-delta cepstral vectors as input to its vector quantization module, but the calculation of these values is done inside the recognizer's utterance processing module.. In the future it may be possible to move the extraction of these features into the feature extraction library, or to use entirely different features as input (for example, LPC coefficients, though currently, mel-scale cepstra give the best recognition performance).


INITIALIZATIONO


  my $fe = Audio::MFCC->init(\%params);

Initializes parameters for feature extraction, and return an object which encapsulates the state of the extraction process.

The parameters are passed as a reference to a hash of parameter names keyed to parameter values. Available parameters include:

sampling_rate
Sampling rate at which the audio data to be processed was captured, specified in samples per second.

frame_rate
Number of frames of data to be processed per second of sampled audio.

window_length
Size of the FFT window, in number of samples.

num_cepstra
Number of cepstral coefficients to compute.

num_filters
Number of filters to use for creating the mel-scale.

fft_size
Frame size for FFT analysis (must be a power of 2).

lower_filt_freq
Low end of filter band.

upper_filt_freq
High end of filter band.

pre_emphasis_alpha
Scaling factor for pre-emphasis of input audio data.

fb_type
This is documented for completeness, but you should never use it. It specifies the type of filter band to use in extraction - the options are (exportable constants) MEL_SCALE and LOG_LINEAR, but only MEL_SCALE is supported.


OBJECT METHODS

start_utt

  $fe->start_utt or die "start_utt failed";

Prepares the $fe object for cepstral extraction. If it fails (though I don't know why it would), it will return undef.

process_utt

  my @cepvectors = $fe->process_utt($rawdata, $nsamps);

Performs cepstral extraction on $nsamps samples of audio data from $rawdata. If any data is left over (under one frame), it will be carried over to the next call to process_utt, or analyzed and returned by end_utt.

Note that the audio data is currently always represented as a vector of 16-bit signed integers in native byte order.

Returns a list of array references, each of which points to the vector of cepstral coefficients extracted from one frame of data.

end_utt

  my $leftover = $fe->end_utt;

Finishes the processing of utterance data. If there is any extra data remaining to be processed, it will be padded with zeroes to a single frame and cepstral extraction will be done, with the resulting vector returned (as an array reference). Otherwise, a false value is returned.


AUTHOR

David Huggins-Daines <dhuggins@cs.cmu.edu>


SEE ALSO

perl(1), the Speech::Recognizer::SPX manpage

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy