Help-Site Computer Manuals
Software
Hardware
Programming
Networking
  Algorithms & Data Structures   Programming Languages   Revision Control
  Protocols
  Cameras   Computers   Displays   Keyboards & Mice   Motherboards   Networking   Printers & Scanners   Storage
  Windows   Linux & Unix   Mac

/var/sites/help-site.com/auto/tmp/CPAN/9677/Bio-Chaos-0.02/bin/cx-genbank2chaos.pl

/var/sites/help-site.com/auto/tmp/CPAN/9677/Bio-Chaos-0.02/bin/cx-genbank2chaos.pl


NAME


  cx-genbank2chaos.pl.pl


SYNOPSIS


  cx-genbank2chaos.pl.pl sample-data/AE003734.gbk > AE003734.chaos.xml

  cx-genbank2chaos.pl.pl -islands sample-data/AE003734.gbk


DESCRIPTION

Converts a genbank file to a chaos xml file (or a collection of chaos xml files).

The genbank file is 'unflattened' in order to infer the relationships between features

with the -islands option set, this loops through a list of genbank-formatted files and builds a chaos file for every gene

by default it will store each gene in a directory named by the sequence accession. it will name each file by the unique feature_id; for example


  AE003644.2/

    gene:EMBLGenBankSwissProt:AE003644:128108:128179.xml

    gene:EMBLGenBankSwissProt:AE003644:128645:128716.xml

    gene:EMBLGenBankSwissProt:AE003644:128923:128994.xml

You can change the field used to name the file with -nameby; for example, if you use the chado/chaos name field like this:


  cx-genbank2chaos.pl.pl -islands -nameby name AE003734.gbk

You will get


  AE003644.3/

   noc.xml

   osp.xml

   BG:DS07721.3.xml

the default is the feature_id field, which is usually more unix-friendly (fly genes can have all kinds of weird characters in their name); also using the 'name' field could run into uniqueness issues.


HOW IT WORKS

  1. - parse genbank to bioperl
  2. uses the Bio::SeqIO::genbank manpage

  3. - unflatten the flat list of bioperl SeqFeatures
  4. uses the Bio::Seqfeature::Tools::Unflattener manpage

  5. - turn bioperl objects into chaos datastructure
  6. uses the Bio::SeqIO::chaos manpage

  7. - remap every gene to an 'island' (virtual contig)
  8. uses the Bio::Chaos::ChaosGraph manpage

  9. - spit out each virtual contig chaos graph to a file
  10. uses the Bio::Chaos::ChaosGraph manpage


ARGUMENTS

-islands

exports one file per gene

-ethresh ERRORTHRESH

Sets the error threshold. See the Bio::SeqFeature::Tools::Unflattener manpage

you will want to keep this at its default setting of 3 (insensitive)

-remove_type GENBANKFEATURETYPE

This will remove all features of a certain type prior to unflattening

This is useful if you wish to exclude a certain kind of feature (eg variation) from your analysis

It is also required for the genbank release of S_Pombe, which has a few scattered types purportedly of mRNA which confuse the unflattening process

-ds_root DIR b<EXPERIMENTAL>

Root directory for building a datastore - see the Datastore::MD5 manpage

-include_haplotypes

by default, only reference sequences are exported. if the genbank definition like contains the string ``haplotype'', then this is probably an alternative haplotype that will skew analyses. this is removed by default, unless this switch is set

For an example, see contigs NG_002432 and NT_007592 (the former is an alternate haplotype of the latter)


REQUIREMENTS

bioperl 1.5 or later

Programminig
Wy
Wy
yW
Wy
Programming
Wy
Wy
Wy
Wy