GrailEXP is currently used primarily to analyze human and mouse, but many more systems are under development, including arabidopsis, drosophila, rice, corn, wheat, and many more.
The original GrailEXP (v1.0) was developed by Ying Xu, Manesh Shah, Richard Mural, and Edward C. Uberbacher.
A complete list of credits/acknowledgments is available online. This includes all authors who have worked on all versions.
GrailEXP provides users with many different capabilities. Some use it just to get an idea of where coding is in their favorite sequence. Others are interested in using it as a sim4-like EST alignment program. Some are interested in comparing human genomic sequence with mouse ESTs. Others are more interested in the development of gene-finders for model organisms such as rice, corn, and wheat. Others use the program just to align a single mRNA with a genomic sequence suspected of containing that gene. Finally, some use the program to quickly mask a sequence for repetitives. GrailEXP is used in many different ways, depending upon the needs of its users.
If you're interested in obtaining fast, accurate analysis of DNA sequence, then GrailEXP may be the program for you.
GrailEXP features a Grail-like exon finder (with improved splice site recognition and other minor changes) adapted from the Grail 1.3 code. However, the gene modeling has been vastly improved by searching a database of known gene messages (complete and partial) and building gene models based on the corresponding alignments. It is these two additional powerful tools (the gene message alignment program and the gene assembly program) which distinguish GrailEXP from Grail. In addition, the Smith-Waterman-like complex repeat Grail finder, which takes forever to run, has been replaced by a BLAST-based method which is much faster, although admittedly less precise.
As far as raw exon prediction goes, Grail 1.3 performs slightly worse than Genscan on human and mouse. However, the addition of similarity search information causes GrailEXP to outperform Genscan consistently. Genscan is particularly weak in predicting the beginning and end of genes. GrailEXP vastly outperforms Genscan in regions of EST/cDNA similarity. Where there is partial EST information, GrailEXP also outperforms Genscan. In regions where there is no similarity with known genes, Genscan predicts exon edges slightly better than GrailEXP, but GrailEXP still predicts gene begin and end better. In such regions, Genscan's tendency is to create genes that are too long; GrailEXP tends to produce genes that are too short (i.e. it breaks genes). Regardless, Genscan exons can be fed to GrailEXP's alignment program, effectively creating a GenscanEXP, if that is the user's desire.
GrailEXP also has the capability of running reliably on unmasked sequence, since it can filter its exons against a repetitive element database. This means that GrailEXP can produce exons that overlap repetitive elements only slightly, but are real exons. Genscan lacks this capability; it must be run on a repeat-masked sequence (thus possibly obliterating good splice sites) or else a lot of garbage comes back. One caveat with running on unrepeat-masked sequence: there will be a few genes predicted that overlap repetitive elements. However, we consider this to be an acceptable price to pay to obtain the untranslated regions of genes that contain repetitive elements (and would be lost if GrailEXP were run on repeatmasked sequence).
An additional feature of GrailEXP is that it can run reliably on draft sequence. The user can specify for the program not to build gene models across gaps unless there is EST/mRNA evidence supporting such a build. This is another option lacking in most other genefinding tools.
GrailEXP was evaluated using the Guigo et al. test set from their recent study on gene prediction accuracy in large-scale genomic sequences. The results of running GrailEXP on this test set are available online.
GrailEXP's gene message alignment program (Galahad) is one of the best publicly available. Because it relies on exon information to seed its search AND because it uses BLAST to get its initial "ball-park" alignments AND because it runs in parallel, the program literally runs hundreds of times faster than programs like sim4. GrailEXP does not recognize short exons well currently, and repeating zinc finger genes produce some crazy-looking alignments. However, its speed and reliability in finding splice sites make it a very useful utility. In addition, the gene assembly program asks the "next step" questions, like which ESTs agree on a gene model, which ESTs indicate an alternative splice, as well as being able to assemble overlapping ESTs on the fly into gene models. A comparison of EST alignment programs was also performed.
As far as repetitive finding goes, RepeatMasker provides far more rigorous alignments with a repetitive database. If your interest is in eliminating repetitives from your sequence quickly, however, then GrailEXP is a more useful tool. A caveat, however: using BLAST to locate repetitives is MUCH LESS SENSITIVE than RepeatMasker (a factor of two). The BLAST method, however, does work well to eliminate most genes containing repetitives.
It is also available commercially through ApoCom Genomics and Genome Informatics Corporation.