Various REGEX annotators
Back to Download Form
Author: Mike Marchywka
Contact: marchywka@hotmail.com
This set of programs was originally designed to examine speculative
or missing annotations on Affymetrix canine array data. I have
added and generalized a bunch of stuff that may be of use for automated
genome searches or string processing. The primary output right now
is an annotation file that can find rule hits ( generally regular expression)
to a variety of strings ( proteins and base sequences ) and display
the results in human (html,txt,bmp) readable format ( eventual audience is
automated processes). There is a fairly fast aligner based on recursive
exact string matching that may be related to research topics ( see citations
below ) but it has not been fully developed ( possibly competitive results
aligning ecoli strains on a desktop computer).
See individual programs for credits- source code contains citations
and most programs have an "-about" option for additional credits.
This code was developed on Cygwin and makes some use of other material
that is in public domain of licensed under various "free for personal usage"
terms. Check specifics for commercial usage.
Some code contains comments about "confidential " or NDA. These may
or may not apply but intended to preserve any rights that may exist.
All of the programs and scripts take blast results or fasta files
( or NCBI nucleotide search results parsed into fasta files ).
The most common output is text "annotation files" that describe
the position and details of features, determined from rules files,
in the various sequences. Ultimate final product can be text or
graphical alignment or, soon, more descriptive comparisons
between homologues.
Unfortunately, as of now, there is a lot of dead code as obsolete
methods and programs were retained for reference.
User scripts- these may invoke others that would not normally be
visible to user.
fastanote
fastanote.doc
findhomologues
findhomologues.doc
genome_search
I copied these out of the make file and believe them to be most
of the executables as of today:
make file_parsing
Create fasta or multi-fasta files from various sources.
make rules_annotater
make boost_rules_annotater
Take a set of fasta files and various rules files and output
annotation vector files. Many rules are in PERL regex form
and can be parsed with either greta or boost. Specialized
faster algorithms are used if a set of rules complies
with certain restrictions.
make mm_align_tool
make grannotater
Takes a variety of rules text files as well as fasta files and
makes a bitmap or jpeg ( not tested) or text alignment file
showing relative positions and rule hits.
make yaxml
The user can enter new rules based on extraction from the literature
and retain documentation such as publication or other notes in
XML format. This program, yet another xml, will parse these
into rules files for annotation.
make distance
make hitparse
make stringiness
make bruteparse
make string_correlator
make numgen
make probe_align
==============================================================================
help output and some doc file excerpts:
##########################################
Program help for annotater.exe:
########\n-about output if available :
Usage: annotater [ -flags n] [-orf file ] {-annotate fasta| }
Annotate a base sequence fasta file with ORF,
blast hits, or other results. stdin contains
hits to target sequence and repeats data that are
displayed in lower case but not listed
Commands are processed left to right as encountered
Model building commands:
-peptide : the guide is a peptide, not a DNA seqiuence.
Requires a coloring table for graphic output.
-orf : the following file contains ORF's derived from probe_align.
Each line contains start, stop, translation, and | delimited ID
Indicies start at zero: 0 5 AB |ID|long name of thing
stdin file is same format but only indicies are used
-blast : as above except that index base is 1 as per blast
Note blast output may require dos2unix for readability
-owrite : add to stdin additional overwrite info, likely from
'string_correlator -annotate' - NOTE: THIS HAS ASSUMED BASE OF ZERO WHILE
STDIN NOTES are ASSUMED TO BE BLAST INPUT OF BASE 1
-rules file : add next set of rules hits to model, assigned in order of
occurence from left to right
-mrules files : multiple rules from one file, checked against
names in previously loaded fasta sequences
-misc_bases fn nm : read fn for annotations labelled nm
-homo fn nm : as above, but presumed to be be homologs
NOTE: misc are ASSUMED to start at 0, HOMO's at 1 from blast
-contig fn : read a series of contigs from a multi-fasta file
Example: fastanote -show ex_fasta -repeat_max 10000
-owrite probe_notes -repeat_short 10 -repeat_dump 0 -width 130 -banner -xlate -inter
string_correlator -annotate ../dog_fasta ex_fasta 10 >probe_notes
Output options and parameters:
-annotate fn: annotate the fasta file in a text format
-html out: annotate a PRE-LOADED fasta into html file 'out'
-fasta fn : load fasta file fn
NOTE: THE STUPID CONVENTIONS CHANGE- HTML USED PRE-LOADED FASTA...
-guide_flags n : set guide label flags, see source code :) -guide_top: put guide names at top -guide_inter: guide names on each line(default) -nguide_top: -nguide_inter: -user_folors fn: load a list of color overrides in format n,r,g,b
where n is toc serial number ( data dependent) and 0<=rgb<256
-jpeg : not supported, use bmp
-bmp : write pictorial alignment as a bmp file
-comment: add a comment string to image
-acid_rank f : load list of ranking schemes for acids
-acid_map n : which string to use
-reps_per_seg: maximum repeats per seg, keeps image tractable
-peptides_per_seg: as above for translation candidates
-flags : set option specific modifiers
-width : set output width, default is 72
-seg i,f : select segments to output instead of whole thing
-ref r1,r2,... : show only these references
-banner : output a section banner for each segment
-xlate : show translation products in top 3 lines
-limit n: limit blast and orf files to n entries for all following
reset to zero to stop limiting
The repeats options are a bit confusing because the blast metrics
and data bases both have problems. Default is to only show
visible hits when the smallest are used first. Larger hits that
in agreggate cover all bases of smaller hits eliminate smaller
hits from list. Neither blast nor database can do this cleanly AFAIK.
If you use details, then play with other options to get useful output.
-repeat_details : output hidden or redundant repeats
-repeat_max : most repeats to note per line, default is 10
-repeat_short : threshold number to abbreviate, default is 5
-repeat_minlen : minimum length to mention, default is 10
Debug and Informational:
-v : verbose- tutorial and debug
-q : quiet, can be used after -v around file of interest
-about : credit, version, and implementation information
Contact: marchywka@hotmail.com
JPEG version uses code from various sources
Bitmap fonts downloaded from example openlgl program
and Cywgin X pcf font or Freetype
annotater.h224 Built on Mar 8 2008 at 09:19:05
Credit: Written by marchywka@hotmail.com ca. Aug 2007
SeeAlso: fastanote perl script by same author for usage ideas
Credit: acid ranks from http://www.expasy.org/tools/protscale.html
Credit: Default fonts from http://www.dgp.toronto.edu/~mjmcguff/learn/opengl/
font_reader.h72
http://fontforge.sourceforge.net/pcf-format.html
http://www-masu.ist.osaka-u.ac.jp/~kakugawa/VFlib/src-browse/VFlib3-3.6.12/HTML/S/src%20pcf.c.html
Credit: Freetype, Cygwin X pcf fonts and related code
##########################################
Program help for bruteparse.exe:
########\n-about output if available :
Usage : bruteparse -probes n [-flags flags] [-maxline n]
Read composite file from STDIN -
This will normally be dog info followed by brute_force
-probes : default is 11 probes, can be changed
-maxline : stupid line length max for composite line
-flags : IO and error control bits
1=suppress output, 2=suppress errors, 4= suppress summary
8=suppress zero line error 16=sort 32=invoke distgnace
64=die on error
ERROR: Zero lines of output
Summary :0 0 0 0 Skip:0 Start:0 Len:0 loQ:0 Out:0 Dups:0
##########################################
Program help for distance_matrix.exe:
########\n-about output if available :
Usage : distance_matrix dummy [-flags flags][-status flags]
Compare order string in each line to first line
and output each input line with distance info appended
Needs dummy arg to stop this message :)
-size : default is 11 probes, can be changed
-maxline : stupid line length max, possible string size limit
-verbose : output various checks and concerns
{ccccccccccc 11}#zzzzzzzzzzz#
##########################################
Program help for fasta_editor.exe:
########\n-about output if available :
Usage: fasta_editor [ -dest fn] [-mrules fn ] -do_edit inputfile }
Remove uninteresting or redundant fastas from file of indefinite size.
Removing nuisance or spurs also reduces size for size-limited readers
-dest fn: name of output file, cant go to stdout
-mrules fn : name of multirules file to read.
-criterion flags: bit 0-> require uniq, bit 1->require a rule hit if exists
2-> require a hit, truncate trailing files
-drop regex : exclude names containing PERL regex
-add regex : include names containing PERL regex
-addno n1,n2,n3 : add or drop (-n) specific list of serial numbers .
-range n1,n2,n3 : stop at n1, restart at n2, etc.
-do_edit inputfile: source file to edit
-v : enable verbose mode
-q : restore quiet mode
-about : brag about Mike Marchywka
fasta_editor.cpp135 Built on Mar 8 2008 at 09:16:04
Credit: Written by marchywka@hotmail.com ca. Aug 2007
Credit: Microsfot GRETA for regex,
Ref: http://research.microsoft.com/projects/greta/
SeeAlso: fastanote perl script by same author for usage ideas
##########################################
Program help for file_parsing.exe:
########\n-about output if available :
Usage : file_parsing file [-flags flags][-status flags]
Output a one-liner for each SEC document entry in file
file normally comes from secfulltext script that outputs
rendered html from sec full text search result
-flags : set flag for each field to be output
-defname : add DEFINITION line to name field
-status : set flag for each status field, updated each 10 entries
Flag bits are :
Usage : file_parsing file [-flags flags][-status flags]
Output a one-liner for each SEC document entry in file
file normally comes from secfulltext script that outputs
rendered html from sec full text search result
-flags : set flag for each field to be output
-defname : add DEFINITION line to name field
-status : set flag for each status field, updated each 10 entries
Flag bits are :
##########################################
Program help for hitparse.exe:
########\n-about output if available :
Usage : hitparse dummy [-flags flags][-probes n] [-maxline n]
Compare order string in each line to first line
and output each input line with distance info appended
Needs dummy arg to stop this message :)
-probes : default is 11 probes, can be changed
-maxline : stupid line length max for composite line
-flags : IO and error control bits
1=suppress output, 2=suppress errors, 4= suppress summary
8=suppress zero line error 16=sort 32=invoke distgnace
Seem to have blank line:1:
Seem to have blank line:2:
ERROR: Zero lines of output
Summary :0 0 0 0 linesout=0 skipped=0
##########################################
Program help for mm_align_tool.exe:
########\n-about output if available :
Usage: mm_align_tool
Align a pair of sequences using features identified,
from a rule check of the two sequences.
Commands are processed left to right as encountered
-rules file: load the next rules file. Successive rules
are applied as encountered
-mrules file: load all rules from one file. Names checked against
apreviously loaded fasta for consistency.
-merge_mrules file: append rules starting from zero to previously read rules
-mrules would assume these relate to NEW fastas.
-pair_rules file: load feature pairs/anchors hints from string_test
-sort_rules : speeds up later alignment if rules are badly out of order.
-stats : show rule usage by rule set
-uniques type: filter rules to show various subsets
-use_rule n: specifiy rule number to use for alignment
-add_rule n: push rule n onto rule map for string align
-clear_rules: empty rule map for string align
-ref n: use sequence n as the reference to align to
-sample n: align sequence n to the reference
-fastas file: the fasta file containing sequences to align.
These are NOT used for alignment but only for
output display.
-marked/no_marks :subsequent fastas may or may not be marked up
-nocat : do not concatenate adjacent rule hits.
By default, adjacent hits with same offset are assumed
to imply alignment of intervening sequences but it may be
desirable to only see the hits.
-align : perform alignment, output only diagnostics
until -output is encountered
-coverage type: output rules hits count for each location.
Use rule map or all rules if map is empty. type can be
raw( location and hit count), zero (annotation vectors for empty
areas), hits ( annotes for covered areas)
-cover_limit n : minimum sqrt(c)*len for rule to include
-string_align n :(*) align rule hit offset strings at least n rules long.
-rare_align n: create align features based on long or rare rule hits
with rare defined as fewer than n occureences in sample
-pair_align n : use pair features for alignment, not confined to pair-wise alignment
however as they are nice for multi-seq consistency checks
-refine x: chose refinement options for pair_align
x= -1(disable), 0(default, divide len by 2 each iteration)
-pair_params { uniq,motif,both} : output either the uniq suitable for alignment,
the motifs or both ( default )
-cluster {} : find and output clustering info, option required but not used
-doall : compared all samples to all refs
-doone : only align one ref to one sample
-all_samples : compare all samples to one ref
-one_samples : negate aboe option
-output format: output a previously performed alignment
format is "exclustal","fasta", "raw", "notes", or "text"
-flags : set option specific modifiers
Debug and Informational:
-v : verbose- tutorial and debug
-debug : more verbose than above
-q : quiet, can be used after -v around file of interest
-about : credit, version, and implementation information
Contact: marchywka@hotmail.com
mm_align_tool.h142 Built on Mar 8 2008 at 09:10:57
Credit: Written by marchywka@hotmail.com ca. Aug 2007
Note: See string_align.cpp for details. This adds features for
Note: extracting low-ambiguity alignment information from raw string hits.
Note: Most of this I made up myself but is probably analyzed in various forms
Note: in pubs at http://www.google.com/search?q=site%3Aciteseer.ist.psu.edu
Note: There is a lot of unexploited flexibility here are strings indicated
Note: either matches of unique features or repetitve junk based on frequence
Note: The indexing process automatically makes stats and filtering easy
Note: See for example progressive alignment, ca 1996
Ref: http://citeseer.ist.psu.edu/myers96progressive.htmlComment: Lots of speed tricks to compile or index expressions
##########################################
Program help for numgen.exe:
########\n-about output if available :
Usage: numgen [ -flags n] [-v,-q] [-prop file ] {-make file }
Generate constrained random numbers for monte
carlo scripts. First used with chromo repeats scripts.
Use with -v to get prompts for properies file.
Basic requirements are N, number of points to generate
M, number of fields per point
mmaxM is maximum size for field M
blockszM is block size for field M
groupM is the group to which field M belongs
Output file is the group numbers followed by
random numbers as start position for each entry
Numbers should not point to overlapping blocks for
fields belonging to same group
-v,-q: verbose or quiet execute left to right
-prop : name of property file to read
-make : put points into output file
-about : show version and credit info
numgen.cpp 60
Credit: Written by marchywka@hotmail.com ca. Aug 2007
Ref: numgens bash script by same author
SeeAlso: repeats bash script by same author for usage ideas
Ref: http://msdn2.microsoft.com/en-us/library/398ax69y(VS.71).aspx
##########################################
Program help for probe_align.exe:
########\n-about output if available :
elapse times= 0 0 0 0
target size is 0 end at 0
No hits found, did you have a lower case file?
elapse times= 0 0 0 0
target size is 0 end at 0
No hits found, did you have a lower case file?
##########################################
Program help for rules_annotater.exe:
########\n-about output if available :
Usage: rules_annotater
Annotate a DNA sequence based on a regex rule list
optionally annotate translations products, optionally
based on transcript splicing rule lists.
-string : load string for compares
-fastas : load multi-fasta file as source
-regex : output annotations against regex
-greta : use MS greta for regex ( default)
-boost : use boost regex ( some builds)
-rules : output annotations against regex
-edit_rules : for tanslated annotation, pick
option splicing rules for transcript
-xrules rules : translated annotations against regex
-erules rules : apply rules to untranslated FASTAS but with non-alpha
removed- MUST USE THIS ON MARKED UP FASTAS!!!
-proc : process rna and report results
-abbrv : abbreviate above results on long sequences
-edit : apply edit rules before doing xrules
default is just do 3 fwd continuous frames
-trhits : output the edited transcripts as hits
-splice : ignore begin/end on rules(default)
-nosplice : check begin/end(^$) on rules
-which n : select the fasta to examine
-doall : apply to each entry put name into file
-clean : remove confusing hits from greta output
including negative or overlaps
-const : assume next rules are literal
-nconst : undo above assumption for righter things
-compile : compile next rules - LIMIUTED reproitre
-ncompile : NOT IMPLLEMENTEDDDDD
-rc : experimental rev-compl matches [
] syntax
-nrc: turn off rc option
-output_trans : output transcript pieces as annotations
All translations use a codon file in the format of codon-acid pairs:
# nAn
TAT Y
TAC Y
TAA *
-q : quite, default
-v : progress to stderr
-debug : lots of stuff to stderr
-status : progress stuff to stderr
-about : show version and credit info
Contact: marchywka@hotmail.com
rules_annotater.h50
Credit: Written by marchywka@hotmail.com ca. Sept 2007
Note: std::string apppend is very slow, uses low-level code
Credit: http://research.microsoft.com/projects/greta/default.aspx
Credit: http://www.boost.org ( some builds )
Note: the reverse complement regex is similar to that in rnamotif,
Ref:http://www.scripps.edu/case/rnamotif-3.0.4.tar.gz
Note: good source of peptide rules,
Credit: ftp://ftp.expasy.org/databases/prosite/prosite.dat
Note: miRNA rules source,
Credit:http://microrna.sanger.ac.uk/sequences/ftp.shtml
Cite: miRBase: microRNA sequences, targets and gene nomenclature.
Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ.
NAR, 2006, 34, Database Issue, D140-D144
Ref:http://www.rnajournal.org/cgi/content/full/9/3/277
Note: Using amino acids to find translation initiation sites:
Ref:http://citeseer.ist.psu.edu/718129.html
Note:Splicing rules concepts, etc
Ref:http://citeseer.ist.psu.edu/zhang98statistical.html
Ref:http://citeseer.ist.psu.edu/391384.html
Note: TFBS pattern sources
Credit:http://lgsun.grc.nia.nih.gov/geneindex/mm6/download/TFpattern.txt
Credit:http://transcriptome.affymetrix.com/download/publication/tfbs_data/seq/
Note: promoter paper and fasta files:
Ref:http://genomebiology.com/2007/8/7/R140
Note: Pseudobase various RNA patterns:
Ref:http://biology.leidenuniv.nl/~batenburg/PKBAbout.html?
Note: misc references to bio stuff,
Ref:http://teosinte.wisc.edu/gen466_files/Lect19TransEuk.pdf
Ref:http://www.cs.rit.edu/~rpj/courses/bic2/lectures/Gene_Finding_Euk.ppt
##########################################
Program help for string_correlator.exe:
########\n-about output if available :
Usage : string_correlator [-flags n ] [-fix n] [-start1 n ][-start2 n ] [ ]
[-annotate fsmall fbig n ]
Find strings common to two operands or find a best alignment
by recursively finding longest matches between operand remnants.
Somewhat similar to clustalw but gapping is cheaper.
Operands can be strings, fasta files, and multi-fasta files
Some features are bio specific but correlation is generic.
c.f. You can use -wordn |sort|uniq -c for text analysis .
-fix : set bits to reverse(|1) and compl(|2).
-start : base at which to start translation of param 1 and 2.
-compare: 2 fasta files ,!(flags&2)->translate using start and fix params
use clustal format, !(flags&1)->output {} format
-substring: fasta_file, first, last: output sequence from first(start at 0)
upto and including last ( 0,1 should output 2 things)
-vector : fasta file names with vectors from word list
May create zombie exe when built with optimizations
-dump_strings : read fastas file, output each as name and string per line
-mcompare: a multifasta file, next param is single fasta to compare
output {} format string for each file
-word_lists: two files containing a list of words,
each word in f1 aligned to each
word in f2. (flags&1)->non-matches are also reported.
-fcompare: compare 2 fasta files in {} format, no xlate or clustal output
-annotate: find words or length>n in 2 fasta file output in annotation format:
start,end,string, name - position of string from f1 in file 1
for use with annotater.exe on second file
-rcccompare: as above but rev-comp second file
-words: find common words in 2 strings at least 4 chars long in order
-wordn: as above but 3rd param is cutoff length
-strings: compare 2 string, flag: &1->{}format,&2->clustalw
-translate: translate a fasta file in all frames, 1/line
-motif: read a multi-fasta file and find words common to entries of length>=n
-motif_agct f n : find words of length n or more in f common to 2 entires
Use a search specizlied to DNA
-motif_agct_fixed f n flag: as above, but apply fix to first entry
Fix flag bits are as with -fix reve(1) and compl(2)
-speculate: multi-fasta protein file compared to rna fasta file. tranlate latter in
frames and report common words of lenght >= n
-jfcompare : compare two fasta files and write matrix to jpeg
Contact:
marchywka@hotmail.com; No additional credits as this is
dirt level c string processing with some inspiration from clustalw
JPEG code was obtained from various public sources
Segmentation fault (core dumped)
##########################################
Program help for string_test.exe:
########\n-about output if available :
Usage: string_test
Perform filtered exhaustive search for common strings in various operands.
Operands can be command line strings or fasta files and output is exact
sequences common to both strings longer than a minimum length
Output formats are designed to work with other text tools.
This was supposed to be a test program and lacks iterator style
and iteration specified by cmd eg -fcompare_ALL
Note that args are processed l-to-r
-o1 s: String s is operand 1
-o2 s: String s is operand 2
-fastas f: Read fasta file f for operands
-ref n : use sequence n as operand 1 for later compares
-sample n: as above for operand 2
-compare : do a compare on the command line operands
( yes, this should figure out if you meant fasta or cl ops
but this is only a test program... )
-fcompare : compare fasta operands
-fcompare_all: do all fasta compares except self-against-self
-make_rules : look for self matches or RC, usually output in rule form
-gate_min : min sep gap for self rule hits
-gate_max : as above, maximum distance to output
-gate_p : max prob of chance match, ~1-(1-.25^l)^d
This means for n fastas you have n(n-1)/2 results
-features_all : output unique strings for all samples of specified length.
-fix flags: fix various things such as reverse or complement sample,
remove redundant strings, or enable output ref/samplelabelling
enum{FIXCOMP=1, FIXREV=2,FIXCLEAN=4,FIXLABEL=8}
-filterN : remove strings starting with N - don't use on nonbase text
-nofilterN : turnoff above option on following compares
-filterID : if operands point to same string, don't check diagonals
Note: -fix and -filter commands use differnt user interface approach
for no good reason, should make consistent.
-output n : set output format , furhter modified by -fix labelling
Output is one hit per line with o1 and o2 positions, length and
contents optional
n=0 ( default ): position1, position2, string1, string2
( yes, strings should match this is redundant
n=1: position1 position2 length
n=2: position1 position2 length string1 string2
n=3: p1 p2 len delta misc misc string - for RC/ align info
n=4: p1 len string >self|1| etc - for rule output
-once : only output first of repeat rule hits
-q,-v,-debug : normal verbosity control for stderr commentary
-about : additional refernces citations and imp notes
Example:
string_test -fastas gene_fastas -index 8 -length 25 -fix 12 -output 3 -filterN -filterID -status -fcompare_all > anchor_hits
The above would generate anchor_hits suitable for use with mm_align_tool
Author: marchywka@hotmail.com
Contact: marchywka@hotmail.com Nov 2007
Comment: uses some indexing to get speed up,
Comment: motivation for RC rules from this etc ,
Ref:http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1431710
Commment: and should work well on text or (modified slightly ) binary code too
Note: More code in mm_align_tool
Note: Based loosely on references such as these but 'common sense'
Note: seemed to work well as these are after-the-fact lookups
Ref: http://www.google.com/search?hl=en&safe=off&q=string+alignment+site%3Aciteseer.ist.psu.edu
Ref: http://citeseer.ist.psu.edu/csuros05rapid.html
Comment: Csuros, M., Ma, B.: Rapid homology search with two-stage extension and
Comment: daughter seeds. In: Proc. 11th Int. Computing and Combinatorics Conf. (COCOON).
Comment: Volume 3595 of LNCS., Springer-Verlag (2005) 104-- 114
Ref: http://citeseer.ist.psu.edu/468459.html
Ref: http://citeseer.ist.psu.edu/kahveci04speeding.html
Mar 8 2008 09:13:44 string_test.h204
##########################################
Program help for stringiness.exe:
########\n-about output if available :
supply 2 strings for comparison
##########################################
Program help for table_tool.exe:
########\n-about output if available :
Usage: table_tool
Modify rules based on alignment table,
from a rule check of the two sequences.
Commands are processed left to right as encountered
-table file: load the table file generated from mm_align_tool table output.
fmt is global pos followed by each input position (n strings, n numbers per line
-rules file: process this rules file using latest table and pre/suff info
-table_rules file: create a table of where rule hits are in aligned sequence
fmt is set of locations containing {seq number, position, offset from first hit}
-prefix s: prefix output names with s, usually a '>'
-suffix s: append s to each name output
-rule_suffix s: append s to each rule hit to convert strings to hits
-first n: first table index to use minus 1, n=0 implies use 2nd entry
as the first is the composite
-flags : set option specific modifiers
Debug and Informational:
-v : verbose- tutorial and debug
-debug : more verbose than above
-q : quiet, can be used after -v around file of interest
-about : credit, version, and implementation information
Example: this seemed to diff -w ok against manual DSCAM exsons,
$progpath/table_tool -v -table table_table -rule_suffix "XXX" -prefix ">" -rules ncbi_exons > fixed_exons
Contact: marchywka@hotmail.com
table_tool.cpp248 Built on Mar 8 2008 at 09:18:49
Credit: Written by marchywka@hotmail.com ca. Feb 2008
Note: Nothing to cite, this is a bit of a kluge for fixed location rules
Note: like the DSCAM exon positions derived from ncbi entries
##########################################
Program help for yaxml.exe:
########\n-about output if available :
Usage: yet_another_xml -parse file
Create a document tree of the input file. Manipulate or
output various components. Designed to parse in pieces and
delete old stuff as needed.
-parse : add file to internal parse model
-dump nm: output all data from those named nm
-about : show version and credit info
Contact: marchywka@hotmail.com
yet_another_xml.cpp221
Credit: Written by marchywka@hotmail.com ca. Sept 2007
Note: std::string apppend is very slow, uses low-level code