Perceval reads in a DNA sequence and produces a list of possible Grail Exon Candidates. It also filters these candidates against a repetitive element database. It also locates repetitive elements and CpG islands.
The raw list of exons is then organized into clusters. Each cluster is filtered for repetitives. Candidates flagged as repetitive elements are eliminated. Next a strand resolution process is applied, wherein overlapping exons on opposite strands are examined and the lower scoring cluster (containing what we call "shadow exons") is eliminated.
The final list of exon candidates is then output (with the eliminated shadow and repetitive exons clearly indicated).
One could also mask a sequence for repetitives prior to submitting it to GrailEXP's exon prediction program, but this is not recommended and may lead to the loss of legitimate exons.
--------------------------------------------------------------------------------
GrailEXP v3.2 http://compbio.ornl.gov/grailexp/
Authors: Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and
Edward C. Uberbacher, 1996-2001
Reference: "Automated Gene Identification in Large-Scale Genomic Sequences",
Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
Number 3, 1997
Sequence: >GrailEXP Input Sequence (36741 bp)
--------------------------------------------------------------------------------
PERCEVAL Exon Candidates (15 predicted)
Index Std Begin End Frm Type Len Scr Quality
1 - 200 386 2 Internal 187 79 Good
2 - 595 693 0 Terminal 99 80 Good
3 - 9207 9254 0 Internal 48 66 Marginal
4 - 9910 9986 0 Initial 77 70 Good
5 - 14287 14794 2 Terminal 508 49 Marginal
6 + 19230 19291 0 Internal 62 99 Excellent
7 + 26344 26466 2 Internal 123 100 Excellent
8 + 28908 29051 2 Internal 144 100 Excellent
9 + 29823 29938 2 Internal 116 100 Excellent
10 + 31176 31303 1 Internal 128 98 Excellent
11 + 32425 32496 0 Internal 72 100 Excellent
12 + 32573 32674 0 Internal 102 100 Excellent
13 + 32851 32915 0 Internal 65 60 Marginal
14 + 34354 34483 2 Internal 130 100 Excellent
15 + 35100 35202 0 Internal 103 94 Excellent
--------------------------------------------------------------------------------
Unlike the remaining outputs (GCA and raw), the pretty output
only reports the highest-scoring exon in each cluster. All indexing
is from the forward strand perspective.
begin exons f 1 176 313 1 0 57 1 176 454 1 f 2 412 549 3 0 48 1 391 549 1 f 3 4031 4063 0 0 54 1 3626 4288 1 f 4 6846 7091 3 0 43 2 6789 7091 0 f 4 6852 7091 3 0 45 2 6789 7091 1 f 5 19230 19291 1 0 99 0 19170 19295 1 f 5 19234 19291 1 1 91 0 19170 19295 0 f 6 26291 26466 1 0 86 0 26267 26470 0 f 6 26291 26470 2 0 78 0 26267 26470 0 f 6 26344 26466 1 2 100 0 26267 26470 1 f 6 26344 26470 2 2 94 0 26267 26470 0 f 6 26402 26466 0 0 84 0 26267 26470 0 f 7 28837 29051 0 0 88 0 28750 29055 0 f 7 28908 28976 1 2 88 0 28750 29055 0 f 7 28908 28986 1 2 82 0 28750 29055 0 f 7 28908 29022 1 2 91 0 28750 29055 0 f 7 28908 29051 1 2 100 0 28750 29055 1 f 7 28908 29055 2 2 79 0 28750 29055 0 f 7 28954 29051 0 0 93 0 28750 29055 0 f 7 28954 29055 3 0 13 0 28750 29055 0 f 8 29795 29895 1 1 83 0 29662 29946 0 f 8 29795 29906 1 1 78 0 29662 29946 0 f 8 29795 29915 1 1 79 0 29662 29946 0 f 8 29795 29938 1 1 91 0 29662 29946 0 f 8 29795 29942 1 1 80 0 29662 29946 0 f 8 29795 29946 2 1 79 0 29662 29946 0 f 8 29823 29938 1 2 100 0 29662 29946 1 f 8 29823 29942 1 2 89 0 29662 29946 0 f 8 29823 29946 2 2 89 0 29662 29946 0 f 9 31151 31303 1 0 89 0 31148 31345 0 f 9 31151 31312 1 0 88 0 31148 31345 0 f 9 31151 31341 1 0 84 0 31148 31345 0 f 9 31176 31303 1 1 98 0 31148 31345 1 f 9 31176 31312 1 1 97 0 31148 31345 0 f 9 31176 31341 1 1 93 0 31148 31345 0 f 9 31176 31345 2 1 89 0 31148 31345 0 f 10 32359 32469 0 0 84 0 32335 32643 0 f 10 32359 32496 0 0 91 0 32335 32643 0 f 10 32394 32496 1 2 96 0 32335 32643 0 f 10 32401 32469 0 0 87 0 32335 32643 0 f 10 32401 32487 0 0 87 0 32335 32643 0 f 10 32401 32496 0 0 93 0 32335 32643 0 f 10 32425 32469 1 0 90 0 32335 32643 0 f 10 32425 32487 1 0 89 0 32335 32643 0 f 10 32425 32496 1 0 100 0 32335 32643 1 f 11 32573 32674 1 0 100 0 32501 33106 1 f 11 32573 32690 1 0 89 0 32501 33106 0 f 11 32573 32713 1 0 96 0 32501 33106 0 f 11 32595 32674 1 1 83 0 32501 33106 0 f 12 32851 32915 1 0 60 0 32644 32919 1 f 13 34289 34483 0 0 88 0 34283 34513 0 f 13 34289 34487 0 0 77 0 34283 34513 0 f 13 34289 34491 0 0 76 0 34283 34513 0 f 13 34289 34513 3 0 79 0 34283 34513 0 f 13 34354 34470 1 2 92 0 34283 34513 0 f 13 34354 34483 1 2 100 0 34283 34513 1 f 13 34354 34487 1 2 92 0 34283 34513 0 f 13 34354 34491 1 2 92 0 34283 34513 0 f 13 34354 34493 1 2 90 0 34283 34513 0 f 13 34354 34513 2 2 95 0 34283 34513 0 f 13 34370 34483 1 0 93 0 34283 34513 0 f 13 34370 34487 1 0 85 0 34283 34513 0 f 13 34392 34483 1 1 83 0 34283 34513 0 f 14 35022 35202 0 0 72 0 35013 35273 0 f 14 35100 35202 1 0 94 0 35013 35273 1 f 14 35100 35206 1 0 87 0 35013 35273 0 r 9 200 321 1 1 57 0 170 415 0 r 9 200 386 1 2 79 0 170 415 1 r 8 595 693 2 0 80 0 595 1047 1 r 8 595 818 2 1 32 0 595 1047 0 r 7 9207 9254 1 0 66 0 8943 9257 1 r 6 9910 9977 0 0 59 0 9894 10025 0 r 6 9910 9980 0 0 60 0 9894 10025 0 r 6 9910 9983 0 0 68 0 9894 10025 0 r 6 9910 9986 0 0 70 0 9894 10025 1 r 5 14287 14794 2 2 49 0 14287 15027 1 r 4 25704 25740 2 2 53 1 25704 25889 0 r 4 25708 25740 1 2 61 1 25704 25889 1 r 3 26096 26277 0 0 50 1 25453 26286 1 r 2 29488 29570 1 0 55 1 29466 29618 1 r 1 31214 31310 1 2 66 1 31043 31432 1 end exons
All indexing is from the forward strand's perspective. All coordinates are in ASCENDING order; strand is indicated by a separate field.
The fields are, in order:
Strand, Cluster ID, Begin, End, Type, Phase, Score, Status, ORF Begin, ORF End, Grail Exon Flag
The fields are separated by spaces. A leading space begins each data line. The exon candidate list is enclosed by "begin exons" and "end exons" tags.
exon_grailexp_v3=1|f|1|2|19230|19291|19170|19295|0.99|1 exon_grailexp_v3=2|f|1|2|19234|19291|19170|19295|0.91|0 exon_grailexp_v3=3|f|1|1|26291|26466|26267|26470|0.86|0 exon_grailexp_v3=4|f|2|1|26291|26470|26267|26470|0.78|0 exon_grailexp_v3=5|f|1|1|26344|26466|26267|26470|1|1 exon_grailexp_v3=6|f|2|1|26344|26470|26267|26470|0.94|0 exon_grailexp_v3=7|f|0|1|26402|26466|26267|26470|0.84|0 exon_grailexp_v3=8|f|0|0|28837|29051|28750|29055|0.88|0 exon_grailexp_v3=9|f|1|0|28908|28976|28750|29055|0.88|0 exon_grailexp_v3=10|f|1|0|28908|28986|28750|29055|0.82|0 exon_grailexp_v3=11|f|1|0|28908|29022|28750|29055|0.91|0 exon_grailexp_v3=12|f|1|0|28908|29051|28750|29055|1|1 exon_grailexp_v3=13|f|2|0|28908|29055|28750|29055|0.79|0 exon_grailexp_v3=14|f|0|0|28954|29051|28750|29055|0.93|0 exon_grailexp_v3=15|f|3|0|28954|29055|28750|29055|0.13|0 exon_grailexp_v3=16|f|1|0|29795|29895|29662|29946|0.83|0 exon_grailexp_v3=17|f|1|0|29795|29906|29662|29946|0.78|0 exon_grailexp_v3=18|f|1|0|29795|29915|29662|29946|0.79|0 exon_grailexp_v3=19|f|1|0|29795|29938|29662|29946|0.91|0 exon_grailexp_v3=20|f|1|0|29795|29942|29662|29946|0.8|0 exon_grailexp_v3=21|f|2|0|29795|29946|29662|29946|0.79|0 exon_grailexp_v3=22|f|1|0|29823|29938|29662|29946|1|1 exon_grailexp_v3=23|f|1|0|29823|29942|29662|29946|0.89|0 exon_grailexp_v3=24|f|2|0|29823|29946|29662|29946|0.89|0 exon_grailexp_v3=25|f|1|1|31151|31303|31148|31345|0.89|0 exon_grailexp_v3=26|f|1|1|31151|31312|31148|31345|0.88|0 exon_grailexp_v3=27|f|1|1|31151|31341|31148|31345|0.84|0 exon_grailexp_v3=28|f|1|1|31176|31303|31148|31345|0.98|1 exon_grailexp_v3=29|f|1|1|31176|31312|31148|31345|0.97|0 exon_grailexp_v3=30|f|1|1|31176|31341|31148|31345|0.93|0 exon_grailexp_v3=31|f|2|1|31176|31345|31148|31345|0.89|0 exon_grailexp_v3=32|f|0|0|32359|32469|32335|32643|0.84|0 exon_grailexp_v3=33|f|0|0|32359|32496|32335|32643|0.91|0 exon_grailexp_v3=34|f|1|0|32394|32496|32335|32643|0.96|0 exon_grailexp_v3=35|f|0|0|32401|32469|32335|32643|0.87|0 exon_grailexp_v3=36|f|0|0|32401|32487|32335|32643|0.87|0 exon_grailexp_v3=37|f|0|0|32401|32496|32335|32643|0.93|0 exon_grailexp_v3=38|f|1|0|32425|32469|32335|32643|0.9|0 exon_grailexp_v3=39|f|1|0|32425|32487|32335|32643|0.89|0 exon_grailexp_v3=40|f|1|0|32425|32496|32335|32643|1|1 exon_grailexp_v3=41|f|1|1|32573|32674|32501|33106|1|1 exon_grailexp_v3=42|f|1|1|32573|32690|32501|33106|0.89|0 exon_grailexp_v3=43|f|1|1|32573|32713|32501|33106|0.96|0 exon_grailexp_v3=44|f|1|1|32595|32674|32501|33106|0.83|0 exon_grailexp_v3=45|f|1|0|32851|32915|32644|32919|0.6|1 exon_grailexp_v3=46|f|0|1|34289|34483|34283|34513|0.88|0 exon_grailexp_v3=47|f|0|1|34289|34487|34283|34513|0.77|0 exon_grailexp_v3=48|f|0|1|34289|34491|34283|34513|0.76|0 exon_grailexp_v3=49|f|3|1|34289|34513|34283|34513|0.79|0 exon_grailexp_v3=50|f|1|1|34354|34470|34283|34513|0.92|0 exon_grailexp_v3=51|f|1|1|34354|34483|34283|34513|1|1 exon_grailexp_v3=52|f|1|1|34354|34487|34283|34513|0.92|0 exon_grailexp_v3=53|f|1|1|34354|34491|34283|34513|0.92|0 exon_grailexp_v3=54|f|1|1|34354|34493|34283|34513|0.9|0 exon_grailexp_v3=55|f|2|1|34354|34513|34283|34513|0.95|0 exon_grailexp_v3=56|f|1|1|34370|34483|34283|34513|0.93|0 exon_grailexp_v3=57|f|1|1|34370|34487|34283|34513|0.85|0 exon_grailexp_v3=58|f|1|1|34392|34483|34283|34513|0.83|0 exon_grailexp_v3=59|f|0|2|35022|35202|35013|35273|0.72|0 exon_grailexp_v3=60|f|1|2|35100|35202|35013|35273|0.94|1 exon_grailexp_v3=61|f|1|2|35100|35206|35013|35273|0.87|0 exon_grailexp_v3=62|r|1|2|36421|36542|36327|36572|0.57|0 exon_grailexp_v3=63|r|1|2|36356|36542|36327|36572|0.79|1 exon_grailexp_v3=64|r|2|0|36049|36147|35695|36147|0.8|1 exon_grailexp_v3=65|r|2|0|35924|36147|35695|36147|0.32|0 exon_grailexp_v3=66|r|1|1|27488|27535|27485|27799|0.66|1 exon_grailexp_v3=67|r|0|1|26765|26832|26717|26848|0.59|0 exon_grailexp_v3=68|r|0|1|26762|26832|26717|26848|0.6|0 exon_grailexp_v3=69|r|0|1|26759|26832|26717|26848|0.68|0 exon_grailexp_v3=70|r|0|1|26756|26832|26717|26848|0.7|1 exon_grailexp_v3=71|r|2|0|21948|22455|21715|22455|0.49|1
A technical description of the format:
exon_grailexp_v3=id|strand|type|frame|begin|end|orfbeg|orfend|score|best_flag
id = id
strand = f or r
type = 0-3, where 0=Initial,1=Internal,2=Terminal,3=Single
frame = 0-2
begin, end = coordinates
orfbeg, orfend = open reading frame coords
score = score from 0.0 to 1.0
best_flag = 0 or 1, 1 indicates the exon is a BEST EXON,
i.e. the best in a cluster, 0 indicates the exon is
not a best exon
Genome Channel output format is always indexed from the TARGET STRAND'S PERSPECTIVE. This means all forward strand objects are indexed relative to the forward strand, and all reverse strand objects are indexed relative to the reverse strand.