GrailEXP FAQ Section 6: Syntax for Command-Line GrailEXP


How do I get version information?

Use the --version option.

693 odin /home/4ph/grailexp> grailexp --version
GrailEXP v3.0 [June, 2000]
694 odin /home/4ph/grailexp> 

How do I get a list of options?

Basic options can be printed just by typing grailexp.

694 odin /home/4ph/grailexp> grailexp

Usage:  grailexp [eag] (options)
        grailexp --version
        grailexp --listorgs
        grailexp --help

For detailed explanation of options, do grailexp --help.

695 odin /home/4ph/grailexp> 
Use the --help option for a full list of options/switches.

695 odin /home/4ph/grailexp> grailexp --help

GrailEXP v3.0 [June, 2000]

Supported syntax:  --option=val, -option=val, -option val, --option val

grailexp --version:   Print version information.
grailexp --help:      Print this help information.
grailexp --listorgs:  Print the list of supported organisms.

grailexp a|c|e|g|r [--seqfile seqfile] [--mode serial|parallel] [--blast path]
  [--formatdb path] [--hostfile path] [--singleacc accnum]
  [--organism org label] [--repdb path] [--genedb path] [--exonfile path]
  [--alignfile path] [--(no)filter] [--fastalign] [--ends open|closed]
  [--strand f|r|b]

Requests:  e = Exons, a = Alignments, g = Genes, 
           c = CpG Islands, r = Repetitive Elements

General Flags

--organism:       Organism label (default human).
--strand:         Strand for analysis [f|r|b] (default both).
--seqfile:        Path to FASTA/raw sequence (mandatory)

Exon Prediction (Request e) Flags

--(no)filter:     Flag for repetitive filtering (default --filter)
--blast:          Path to blastall (default $GRAILEXP/blast/blastall)
--repdb:          Path to repetitive db (default $GRAILEXP/repbase/repbase)

Database Search (Request a) Flags

--exonfile:       Path to exon file (default none)
--blast:          Path to blastall (default $GRAILEXP/blast/blastall)
--formatdb:       Path to formatdb (default $GRAILEXP/blast/formatdb)
--mode:           Serial or parallel (default serial)
--genedb:         Path to list of search dbs (default $GRAILEXP/db/dblist)
--hostfile:       Path to hostfile (default $GRAILEXP/parallel/hostfile)
--singleacc:      Accession number of single reference to align with
--fastalign:      Search database in fast mode (does not report all alignments/splices)

Gene Assembly (Request g) Flags

--exonfile:       Path to exon file (default none)
--alignfile:      Path to alignment file (default none)
--ends:           Open (contig) or closed sequence ends (default closed)

For further documentation, consult the $GRAILEXP/doc directory.

696 odin /home/4ph/grailexp> 

How do I get a list of available organisms?

Use the --listorgs option.

696 odin /home/4ph/grailexp> grailexp --listorgs

GrailEXP v3.0 [June, 2000]

List of supported organisms

Label      Full Name                                Type

human      Homo sapiens                             Hardcoded 
mouse      Mus musculus                             Hardcoded 
aero       Aeropyrum pernix                         Microbial 
aquae      Aquifex aeolicus                         Microbial 
arab       Arabadopsis thaliana                     Model     
aful       Archaeoglobus fulgidus                   Microbial 
bsub       Bacillus subtilis                        Microbial 
bbur       Borrelia burgdorferi                     Microbial 
cjej       Campylobacter jejuni                     Microbial 
calb       Candida albicans                         Microbial 
ctraM      Chlamydia muridarum                      Microbial 
cpneu      Chlamydia pneumoniae                     Microbial 
ctra       Chlamydia trachomatis                    Microbial 
cpneuA     Chlamydophila pneumoniae AR39            Microbial 
drad       Deinococcus radiodurans                  Microbial 
droso      Drosophila melanogaster                  Model     
ecoli      Escherichia coli                         Microbial 
hinf       Haemophilus influenzae                   Microbial 
hpyl99     Helicobacter pylori strain J99           Microbial 
hpyl       Helicobacter pylori                      Microbial 
mthe       Methanobacterium thermoautotrophicum     Microbial 
mjan       Methanococcus jannaschii                 Microbial 
mtub       Mycobacterium tuberculosis               Microbial 
mgen       Mycoplasma genitalium                    Microbial 
mpneu      Mycoplasma pneumoniae                    Microbial 
nmen       Neisseria meningitidis                   Microbial 
ncras      Neurospora crassa                        Model     
paer       Pseudomonas aeruginosa                   Microbial 
pabyssi    Pyrococcus abyssi                        Microbial 
pyro       Pyrococcus horikoshii                    Microbial 
rpxx       Rickettsia prowazekii                    Microbial 
synecho    Synechocystis PCC6803                    Microbial 
tmar       Thermotoga maritima                      Microbial 
tpal       Treponema pallidum                       Microbial 
yeast      Yeast                                    Microbial 

697 odin /home/4ph/grailexp> 

How do I request a complete analysis?

To perform a complete analysis, do a grailexp acegr --seqfile my.seq, along with whatever other options you wish to add.

Organism is specified with the --organism switch. The default value for organism is human.

The sequence input file is a mandatory argument. GrailEXP does NOT read from standard input, because it can read multiple input files for its genomic alignment and gene assembly programs. Every analysis must have the option --seqfile my.seq.

An example of a request for a drosophila analysis:

715 odin /home/4ph/grailexp> grailexp cegar --organism droso --seqfile droso.seq > droso.out
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Drosophila melanogaster
Sequence: Reading from droso.seq
CpGs    : Will locate CpG Islands
Repeats : Will locate repetitive elements
Exons   : Will perform prediction
Aligns  : Will perform search in serial mode
Genes   : Will perform assembly
RepDB   : /home/4ph/grailexp/repbase/repbase
GeneDB  : /home/4ph/grailexp/db/dblist
Blastall: /home/4ph/grailexp/blast/blastall
Formatdb: /home/4ph/grailexp/blast/formatdb
--------------------------------------------------------------------------------
Locating CpG Islands...done.
Locating repetitive elements...done.
Predicting exons...done.
Searching database...done.
Assembling genes...done.
716 odin /home/4ph/grailexp> 

How do I specify what kind of output type I want?

Use the --output option. Valid options are raw (the default), gca (Genome Channel), and pretty (a nice formatted text output). The raw output is easy for machines to parse, but it's relatively dense and impenetrable to the human examiner. Therefore, the pretty option is recommended for human-readable output (and is the one utilized in the web version).

In addition, parsers from the raw format into pretty and gca formats are located in the $GRAILEXP/parsers subdirectory.

How can I specify which strand to analyze?

Use the --strand option. Valid options are f, r, or b, which represent forward strand only, reverse strand only, and both strands, respectively. By default, GrailEXP runs on both strands of the sequence.

How do I specify the draft sequence genefinder?

Use the --draftgap option. In most cases, this will be --draftgap 100. This tells GrailEXP never to build across gaps of size 100 unless there is EST/mRNA evidence allowing it to do so. For phase 1 draft data, this option should be used. For phase 2 draft or finished data, the draftgap option should not be used.

How do I request exon candidates?

Use the e request.

725 odin /home/4ph/grailexp> grailexp e --seqfile aa3.seq
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from aa3.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Will perform prediction
Aligns  : Not requested
Genes   : Not requested
RepDB   : /home/4ph/grailexp/repbase/repbase
Blastall: /home/4ph/grailexp/blast/blastall
--------------------------------------------------------------------------------
Predicting exons...done.
begin exons
 f 1 64 266 1 1 61 0 1 443 1
 f 1 64 443 2 1 52 0 1 443 0
end exons
726 odin /home/4ph/grailexp> 

How do I filter Grail Exon candidates for repetitive elements?

This is done by default. Unless you specify otherwise, GrailEXP will use the blastall executable located in $GRAILEXP/blast/blastall and the repetitive database located at $GRAILEXP/repbase/repbase.

If you do NOT want your exon candidates filtered for repetitives, then use the --nofilter option.

777 odin /home/4ph/grailexp> grailexp e --seqfile aa3.seq --nofilter
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from aa3.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Will perform prediction
Aligns  : Not requested
Genes   : Not requested
--------------------------------------------------------------------------------
Predicting exons...done.
begin exons
 f 1 64 266 1 1 61 0 1 443 1
 f 1 64 443 2 1 52 0 1 443 0
end exons
778 odin /home/4ph/grailexp> 

How do I specify my own database of repetitive elements to use?

Use the --repdb option.

778 odin /home/4ph/grailexp> grailexp e --seqfile aa3.seq --repdb /auto/GAT/db/repbase/repbase
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from aa3.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Will perform prediction
Aligns  : Not requested
Genes   : Not requested
RepDB   : /auto/GAT/db/repbase/repbase
Blastall: /home/4ph/grailexp/blast/blastall
--------------------------------------------------------------------------------
Predicting exons...done.
begin exons
 f 1 64 266 1 1 61 0 1 443 1
 f 1 64 443 2 1 52 0 1 443 0
end exons
779 odin /home/4ph/grailexp> 

How do I request genomic alignments?

Use the a request. Genomic alignments are usually requested by supplying the program with a sequence file and a list of exons, although you do not have to supply an exon file.

To predict exons and then get back alignments (the most typical use), do the following:

779 odin /home/4ph/grailexp> grailexp ea --seqfile aa3.seq
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from aa3.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Will perform prediction
Aligns  : Will perform search in serial mode
Genes   : Not requested
RepDB   : /home/4ph/grailexp/repbase/repbase
GeneDB  : /home/4ph/grailexp/db/dblist
Blastall: /home/4ph/grailexp/blast/blastall
Formatdb: /home/4ph/grailexp/blast/formatdb
--------------------------------------------------------------------------------
Predicting exons...done.
begin exons
 f 1 64 266 1 1 61 0 1 443 1
 f 1 64 443 2 1 52 0 1 443 0
end exons
Searching database...done.
begin alignments
 dots human DT.304279 none 869 99 1 0 0 1 450 1 451
 dbest human AA393779.1 IMAGE:728389 451 100 1 0 0 1 450 1 450
end alignments
780 odin /home/4ph/grailexp> 
Alternatively, you can use an existing exon file, as in the following example:

780 odin /home/4ph/grailexp> grailexp e --seqfile aa3.seq >& aa3.exons
781 odin /home/4ph/grailexp> grailexp a --seqfile aa3.seq --exonfile aa3.exons
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from aa3.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Reading from aa3.exons
Aligns  : Will perform search in serial mode
Genes   : Not requested
GeneDB  : /home/4ph/grailexp/db/dblist
Blastall: /home/4ph/grailexp/blast/blastall
Formatdb: /home/4ph/grailexp/blast/formatdb
--------------------------------------------------------------------------------
Searching database...done.
begin alignments
 dots human DT.304279 none 869 99 1 0 0 1 450 1 451
 dbest human AA393779.1 IMAGE:728389 451 100 1 0 0 1 450 1 450
end alignments
782 odin /home/4ph/grailexp> 

How do I specify my own BLAST program to use?

You can specify a different blastall executable with the --blast switch. This blastall must be platform-independent if running in parallel mode with different operating systems/architectures. By default, GrailEXP uses a PERL script located at $GRAILEXP/blast/blastall. This PERL script contains signal handlers (including a one hour timeout to stop BLAST when it hangs) and calls the appropriate blastall executable for the OS/architecture on which the script is running. Any BLAST program will work, as long as the output produced looks like BLAST 2.xx output.

How do I specify my own formatdb program to use?

You can specify a different formatdb executable with the --formatdb option. This is not recommended unless you really know what you're doing. Formatdb is called with the -i inputfile -p F options by both the PERL script and the search module binary.

How do I specify which databases to search?

GrailEXP searches a list of databases. By default, it looks for this list at $GRAILEXP/db/dblist. This list is simply expected to be a text file containing the full pathnames to the databases, one per line. You can build different database lists (i.e. one with human, mouse, and drosophila EST databases, and another with only mouse, and another with only human, etc.). To specify a particular list of databases to search, use the --genedb option.

785 odin /home/4ph/grailexp> grailexp ea --genedb mylist --seqfile aa3.seq 
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from aa3.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Will perform prediction
Aligns  : Will perform search in serial mode
Genes   : Not requested
RepDB   : /home/4ph/grailexp/repbase/repbase
GeneDB  : mylist
Blastall: /home/4ph/grailexp/blast/blastall
Formatdb: /home/4ph/grailexp/blast/formatdb
--------------------------------------------------------------------------------
Predicting exons...done.
begin exons
 f 1 64 266 1 1 61 0 1 443 1
 f 1 64 443 2 1 52 0 1 443 0
end exons
Searching database...done.
begin alignments
end alignments
786 odin /home/4ph/grailexp> 

In this particular example, no alignments were found.

How do I align my sequence with a single cDNA/mRNA/EST?

Use the --singleacc option. GrailEXP will look for the specified accession number in the specified list of databases. The accession number must be the same as the one used in the database, e.g. if the accession number in the database is AA393779.1, then just specifying AA393779 will NOT work. You can blame NCBI for this; version number should really be a separate field, as should organism.

836 odin /home/4ph/grailexp> grailexp a --singleacc DT.103374 --seqfile /home/4ph/seqs/11.seq
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from /home/4ph/seqs/11.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Not requested
Aligns  : Will perform search in serial mode
Genes   : Not requested
GeneDB  : /home/4ph/grailexp/db/dblist
AccNum  : DT.103374
Blastall: /home/4ph/grailexp/blast/blastall
Formatdb: /home/4ph/grailexp/blast/formatdb
--------------------------------------------------------------------------------
Searching database...[GrailExp Search][Warning]:  No exon file specified.  Searching whole sequence against database...
done.
begin alignments
 dots human DT.103374 none 1798 99 2 0 3 13674 13869 1 197
 dots human DT.103374 none 1798 100 3 2 12 13969 14156 198 385
 dots human DT.103374 none 1798 100 12 3 13 18059 18147 386 477
 dots human DT.103374 none 1798 100 13 12 7 18537 18582 478 523
 dots human DT.103374 none 1798 99 7 13 1 22150 22280 524 654
 dots human DT.103374 none 1798 100 1 7 4 23208 23423 655 870
 dots human DT.103374 none 1798 98 4 1 5 24128 24293 871 1037
 dots human DT.103374 none 1798 98 5 4 10 30196 30354 1038 1196
 dots human DT.103374 none 1798 100 10 5 6 30870 30952 1197 1279
 dots human DT.103374 none 1798 100 6 10 9 31182 31332 1280 1430
 dots human DT.103374 none 1798 100 9 6 8 31431 31546 1431 1546
 dots human DT.103374 none 1798 95 8 9 11 34124 34274 1547 1697
 dots human DT.103374 none 1798 96 11 8 0 34965 35065 1698 1798
end alignments
837 odin /home/4ph/grailexp> 

The genomic alignment program always prints a warning when no exon file is supplied to it.

How do I do a complete alignment (i.e. emulate sim4)?

First, a few caveats. You almost NEVER want to do this. It is VERY slow and VERY inefficient compared to running using a Grail Exon Candidate file. If you do decide that this is what you want to do, then be sure to run using a repeat-masked sequence. In addition, it is highly advisable to run such a search in parallel mode (described later) or to multithread the blast search by modifying the blastall script to call blastall with the -a numthreads option.

Assuming you do know what you're doing (the search program will give you a warning to make sure), here is the syntax:

839 odin /home/4ph/grailexp> grailexp a --seqfile aa3.seq >& takes.a.long.time

What does the '--fastalign' option do?

The --fastalign option tells the genomic alignment program to throw away alignments it considers redundant. The program doesn't save any time on its initial scan through the database, but it can save a lot of time on the rigorous second alignment phase, in which each EST/cDNA of interest is aligned against the whole sequence. This option is useful if you have a lot of redundant entries in your search database, i.e. 1,234 immunoglobin light chains or some such sequence. The --fastalign does result in loss of information, however, so it should only really be used if the program is running too slowly on a particular sequence.

How do I make the search module run in parallel?

Use the --mode parallel switch. By default, GrailEXP runs in serial mode.

How do I specify which machines should do the parallel search?

GrailEXP reads in a list of machines from $GRAILEXP/parallel/hostfile. You can specify your own list of machines using the --hostfile option. However, each machine in your list must be running a tcpserver. See the installation section for more information on setting up the parallel search.

How do I request gene models?

Use the g request. The gene assembly program can model from exons, alignments, or both. Some examples:

887 odin /home/4ph/grailexp> grailexp e --seq 11.seq >& 11.exons
888 odin /home/4ph/grailexp> grailexp a --seq 11.seq --exon 11.exons --mode parallel > & 11.aligns
891 odin /home/4ph/grailexp> grailexp g --seq 11.seq --exon 11.exons --align 11.aligns > 11.genes
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from 11.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Reading from 11.exons
Aligns  : Reading from 11.aligns
Genes   : Will perform assembly
--------------------------------------------------------------------------------
Assembling genes...done.
892 odin /home/4ph/grailexp> grailexp g --seqfile 11.seq --exonfile 11.exons > 11.genes.noaligns
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from 11.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Reading from 11.exons
Aligns  : Not requested
Genes   : Will perform assembly
--------------------------------------------------------------------------------
Assembling genes...[GrailExp Assemble][Warning]:  No alignment file specified.
done.
893 odin /home/4ph/grailexp> grailexp g --seqfile 11.seq --alignfile 11.aligns > 11.genes.noexons
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from 11.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Not requested
Aligns  : Reading from 11.aligns
Genes   : Will perform assembly
--------------------------------------------------------------------------------
Assembling genes...[GrailExp Assemble][Warning]:  No exon file specified.
done.
894 odin /home/4ph/grailexp> 

You can perform gene modeling with no database search involved. This is pretty much equivalent to running Genscan (with the added capability of repeat-filtering the Grail Exon Candidates):

894 odin /home/4ph/grailexp> grailexp eg --seqfile 11.seq > 11.genes.nodb
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from 11.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Will perform prediction
Aligns  : Not requested
Genes   : Will perform assembly
RepDB   : /home/4ph/grailexp/repbase/repbase
Blastall: /home/4ph/grailexp/blast/blastall
--------------------------------------------------------------------------------
Predicting exons...done.
Assembling genes...[GrailExp Assemble][Warning]:  No alignment file specified.
done.
895 odin /home/4ph/grailexp> 

You can build a gene model from a single reference source:

895 odin /home/4ph/grailexp> grailexp ag --singleacc AA393779.1 --seqfile aa3.seq 
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from aa3.seq
CpGs    : Not requested
Repeats : Not requested
Exons   : Not requested
Aligns  : Will perform search in serial mode
Genes   : Will perform assembly
GeneDB  : /home/4ph/grailexp/db/dblist
AccNum  : AA393779.1
Blastall: /home/4ph/grailexp/blast/blastall
Formatdb: /home/4ph/grailexp/blast/formatdb
--------------------------------------------------------------------------------
Searching database...[GrailExp Search][Warning]:  No exon file specified.  Searching whole sequence against database...
done.
begin alignments
 dbest human AA393779.1 IMAGE:728389 451 100 1 0 0 1 450 1 450
end alignments
Assembling genes...[GrailExp Assemble][Warning]:  No exon file specified.
done.
begin genes
 1 1 summary f 1 1 450 -1 443 1 450
 1 1 exon 1 450 1 2 100
 1 1 evidence dbest human AA393779.1 451 1 450 1 450 100 100
 1 1 mrna gtttttcacacttcattatgaaatttccctggcaatgggcatttctattaggttttgttctaggtgctgtctctcctgctgttgttgtcccttacatgatggtgctgcaagaaaatggatatggtgttgaggaaggcattccaaccttattaatggctgctagcagtatggatgacattctggctatcactggattcaatacatgcttgagcatagtcttttcctcaggtggtatacttaataacgccatagcctctataaggaacgtatgtattagtctgctggcaggaattgttttgggattttttgttcgatattttccaagtgaagaccagaaaaaacttacattgaagagaggattccttgttttgactatgtgtgtttctgccgtcttagcagccaacgtattggtttacatggatctggaggattatgcacactagtgttgag
 1 1 translation FSHFIMKFPWQWAFLLGFVLGAVSPAVVVPYMMVLQENGYGVEEGIPTLLMAASSMDDILAITGFNTCLSIVFSSGGILNNAIASIRNVCISLLAGIVLGFFVRYFPSEDQKKLTLKRGFLVLTMCVSAVLAANVLVYMDLEDYAH*
end genes
896 odin /home/4ph/grailexp> 

Or, finally, you can do a full analysis with grailexp eag --seqfile my.seq. This is the most common usage of the program (with or without the --mode parallel switch).

How do I restrict the alignments to be from one organism?

Use the --matchorg option when you want to restrict the alignments to be from just one organism. For example, you may be searching Refseq but only want to see how your human sequence compares with the mouse entries. So, --matchorg mouse would only build gene models from the mouse alignments.

How do I specify open/closed end gene modeling?

Use the --ends option. Acceptable values are open and closed. By default, GrailEXP runs with CLOSED ends. For more information on the difference between open and closed ends, see Section 4: The Gene Assembly Program.

How do I request CpG islands?

Use the c request.

902 odin /home/4ph/grailexp> grailexp c --seqfile /home/4ph/seqs/tigr.seq
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from /home/4ph/seqs/tigr.seq
CpGs    : Will locate CpG Islands
Repeats : Not requested
Exons   : Not requested
Aligns  : Not requested
Genes   : Not requested
--------------------------------------------------------------------------------
Locating CpG Islands...begin cpgs
 109562 110143 1.04 64.81
 113324 113606 0.77 56.16
 139504 140023 1.19 70.05
end cpgs
done.
903 odin /home/4ph/grailexp> 

How do I request repetitive elements?

Use the r request.

903 odin /home/4ph/grailexp> grailexp r --seqfile aa3.seq
--------------------------------------------------------------------------------
GrailEXP v3.0                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2000

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997
--------------------------------------------------------------------------------
Organism: Homo sapiens
Sequence: Reading from aa3.seq
CpGs    : Not requested
Repeats : Will locate repetitive elements
Exons   : Not requested
Aligns  : Not requested
Genes   : Not requested
RepDB   : /home/4ph/grailexp/repbase/repbase
--------------------------------------------------------------------------------
Locating repetitive elements...begin smprpts
end smprpts
begin cpxrpts
end cpxrpts
begin maskedseq
 gtttttcacacttcattatgaaatttccctggcaatgggcatttctattaggttttgttc
 taggtgctgtctctcctgctgttgttgtcccttacatgatggtgctgcaagaaaatggat
 atggtgttgaggaaggcattccaaccttattaatggctgctagcagtatggatgacattc
 tggctatcactggattcaatacatgcttgagcatagtcttttcctcaggtggtatactta
 ataacgccatagcctctataaggaacgtatgtattagtctgctggcaggaattgttttgg
 gattttttgttcgatattttccaagtgaagaccagaaaaaacttacattgaagagaggat
 tccttgttttgactatgtgtgtttctgccgtcttagcagccaacgtattggtttacatgg
 atctggaggattatgcacactagtgttgag
end maskedseq
done.
904 odin /home/4ph/grailexp> 
In this particular case, no repetitive elements were located (the sequence being a mere 450 bases long).

How do I obtain a repetitive-masked sequence?

Simply by requesting repetitive elements with the r request. The masked sequence is automatically output along with the list of repetitive elements.

What are some other examples of command line requests?

Typical human run

     grailexp eag --mode parallel --seq my.seq

Analyze an arabidopsis sequence using a homemade database of arabidopsis repetitive sequences:

     grailexp eag --mode parallel --org arab --seq my.seq --repdb my.arab.reps

Predict exons and perform gene modeling on a drosophila contig in which you think there may be partial genes, but only on the forward strand:

     grailexp eg --org droso --seq my.seq --strand f --ends open

Find exons in an Aeropyrum pernix sequence and filter them against the ecoli genome:

     grailexp e --org ecoli --seq my.seq --repdb ecoli.genome.db

Do a human run but only use mouse ESTs/cDNAs as your search database:

     grailexp eag --seq my.seq --genedb whiskers.list

Run GrailEXP's search program using Genscan exons:

     genscan HumanIso.mat my.seq > my.genscan.exons
     grailexp a --seq my.seq --exon my.genscan.exons > my.aligns

These are but a few of the ways in which you can run GrailEXP.


The author and maintainer of this FAQ is Doug Hyatt (hyattpd@ornl.gov). This FAQ applies to GrailEXP version 3.2, released February, 2001. This FAQ was last updated February, 2001.