GrailEXP FAQ Section 5: CpG Island and Repetitive Element Prediction


How are CpG islands located?

CpG islands are located using a simple sliding window algorithm. The algorithm slides across the sequence examining 100 base pair windows. Once it has located high concentrations of CG, it optimizes the edges of each located island.

How are simple repeats located?

Simple repeats are located with a simple algorithm that looks for repetitions of words, either in exact or approximate form.

How are complex repeats located?

Complex repeats are located by BLASTing the sequence against a database of repetitive elements. Elements in the same location are then merged, so that only one repetitive element is reported at each location in the sequence.

What are the advantages of using BLAST?

If your only goal is to mask the sequence and get rid of the junk, using BLAST works very well. The primary advantage of BLAST, as has been mentioned in previous sections, is speed.

What are the disadvantages of using BLAST?

BLAST does not provide correct edges to the alignments and often breaks a single alignment into many fragments. GrailEXP's repetitive finding tool is primarily used to mask a sequence, NOT to report boundaries of repetitive elements with pinpoint accuracy. A more rigorous program such as RepeatMasker is better suited to this task.

How is the sequence masked?

All bases in the sequence that are identified as part of a COMPLEX REPEAT are replaced by "n"'s. Simple repeats are NOT currently masked, except where they somehow slip through the BLAST filter and get reported as complex repeats. Simple repeats do not typically cause problems in most applications, so they are not masked.

What are the pretty output formats?

Here is an example of the pretty output formats for CpG Islands and repetitive elements:

--------------------------------------------------------------------------------
GrailEXP v3.2                                  http://compbio.ornl.gov/grailexp/

Authors:  Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and 
  Edward C. Uberbacher, 1996-2001

Reference:  "Automated Gene Identification in Large-Scale Genomic Sequences",
  Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4,
  Number 3, 1997

Sequence:  >GrailEXP Input Sequence (36741 bp)
--------------------------------------------------------------------------------
PERCEVAL CpG Islands (1 predicted)

 Index    Begin       End        Ratio    Pct_GC

      1       3511       4215     0.90     74.14
--------------------------------------------------------------------------------
PERCEVAL Repeats (54 located:  12 simple, 42 complex)


Simple Repeats (12 located)

 Index    Begin       End          Score    1st 10 Bases

      1       1349       1374         64    tctcaggtat...
      2       1526       1555         85    taattttgta...
      3       2636       2674        107    aaaaaaacaa...
      4       5194       5227         85    aaaaaaaaag...
      5       5886       5907        115    aaaaaaaaaa...
      6       8179       8200         61    tttctttctt...
      7      10257      10282        104    tttttttttt...
      8      13734      13775        127    aaaaaaaaaa...
      9      15084      15103         42    taagttaata...
     10      25143      25170         91    tttctttttt...
     11      28313      28346         83    aaaaaaaaaa...
     12      31740      31866        266    aaaaaaaaaa...

Complex Repeats (42 located)

 Index Std   Begin       End         E-Val             Element Names

      1 +        804        831      4e-08                     MIR3#SINE/MIR...
      2 +       1025       1361     8e-232    MSTA#LTR/MaLR/THE1C#LTR/MaLR/T...
      3 +       1400       1490      2e-07                         SVA#Other...
      4 +       2392       2863      0e+00    FLAM_C#SINE/Alu/AluJb#SINE/Alu...
      5 +       4918       5193      0e+00    B4A#SINE/B4/PB1D10#SINE/Alu/B1...
      6 +       5617       5885      0e+00    B4A#SINE/B4/PB1D7#SINE/Alu/PB1...
      7 +       7741       7795      6e-11                         SVA#Other...
      8 +       8360       8458      9e-04                         SVA#Other...
      9 +      10036      10081      1e-05                         SVA#Other...
     10 +      13466      13733      0e+00    FLAM_C#SINE/Alu/AluYb8#SINE/Al...
     11 +      15151      15172      0.004                         SVA#Other...
     12 +      15822      16086      0e+00    FLAM_C#SINE/Alu/AluYb8#SINE/Al...
     13 +      16963      17018      9e-07                         SVA#Other...
     14 +      18212      18272      6e-27        MIR3#SINE/MIR/MIR#SINE/MIR...
     15 +      18422      18689      0e+00    FLAM_C#SINE/Alu/AluYb8#SINE/Al...
     16 +      19642      19694      2e-14                         SVA#Other...
     17 +      21993      22023      4e-06                        L2#LINE/L2...
     18 +      22537      22805      0e+00    FLAM_C#SINE/Alu/AluYb8#SINE/Al...
     19 +      23348      23385      4e-09                       L2a#LINE/L2...
     20 +      27007      27063      2e-07                         SVA#Other...
     21 +      28043      28312      0e+00    FLAM_C#SINE/Alu/AluYb8#SINE/Al...
     22 +      31460      31739      0e+00    AluSc#SINE/Alu/AluYa5#SINE/Alu...
     23 -      30578      30604      7e-10                      MIR#SINE/MIR...
     24 -      28329      28365      1e-09                       L2a#LINE/L2...
     25 -      26987      27269      0e+00    BC200#SINE/Alu/FLAM_C#SINE/Alu...
     26 -      25177      25449      0e+00    PB1D10#SINE/Alu/PB1D7#SINE/Alu...
     27 -      24855      24892      2e-05                      HAL1#LINE/L1...
     28 -      24502      24773      0e+00    FLAM_C#SINE/Alu/AluSx#SINE/Alu...
     29 -      22147      22199      2e-13                      MIR#SINE/MIR...
     30 -      19617      19890      0e+00    FLAM_C#SINE/Alu/AluYb8#SINE/Al...
     31 -      18621      18672      9e-07                         SVA#Other...
     32 -      16948      17211      0e+00    FLAM_C#SINE/Alu/AluYb8#SINE/Al...
     33 -      15977      16002      0.004                         SVA#Other...
     34 -      14897      15383      0e+00    AluSp#SINE/Alu/FLAM_C#SINE/Alu...
     35 -      14762      14787      3e-10                   MLT1E2#LTR/MaLR...
     36 -      10011      10526      0e+00    AluYb8#SINE/Alu/AluSc#SINE/Alu...
     37 -       8202       8484      0e+00    BC200#SINE/Alu/FLAM_C#SINE/Alu...
     38 -       7601       8000      0e+00    BC200#SINE/Alu/FLAM_C#SINE/Alu...
     39 -       6907       7333     2e-164                    LTR32#LTR/ERVL...
     40 -       5811       5863      2e-07                         SVA#Other...
     41 -       2562       2614      4e-12                         SVA#Other...
     42 -       1378       1668      0e+00    FLAM_C#SINE/Alu/AluJb#SINE/Alu...

Masked Sequence

>GrailEXP Input Sequence, masked
gatctgggtaaagggttttccaggtgtcaggatggaagtgactaaggtgcagaggctgga
gggctggggcaggtagaagcaagcattcctgttacctactgctgtgtgacaatctccccc
taaaacacaatggcttaaaataacatccatttcattacatatctcaatactataggtcag
gaatttgggctgggcttacttgggtaattcttctgtcccacatggcattgaccaaagcct
ggttttcagtgggcagctgggctggatggcccaacacagcttcgctaacatgattgctgt
(rest deleted to save space)
--------------------------------------------------------------------------------

What are the raw output formats?

begin cpgs
 3511 4215 0.90 74.14
end cpgs

The elements of the CpG Island output are, in order:
Begin, End, Observed/Expected GC, Percent GC

begin smprpts
 1349 1374 64
 1526 1555 85
 2636 2674 107
 5194 5227 85
 5886 5907 115
 8179 8200 61
 10257 10282 104
 13734 13775 127
 15084 15103 42
 25143 25170 91
 28313 28346 83
 31740 31866 266
end smprpts

The fields of the simple repeat output are, in order:
Begin, End, Score

begin cpxrpts
 f 804 829 44 0.006 THER2___tRNA-related_SINE_element_from_therians.
 f 1025 1364 483 4e-102 MSTA___a_consensus./THE1B___a_consensus*
 f 1400 1456 58 4e-07 SVA___Composite_retroposon.
 f 1547 1646 44 0.006 SVA___Composite_retroposon.
 f 2357 2746 5353 0e+00 FLA/AluYb5/AluSq/AluSp/AluJb/AluJo/AluSz/AluSx/AluSg/AluSc/AluYa1/AluYb8/AluYa5/AluYb1/AluYa*
 f 2796 2903 2098 6e-273 AluSq/AluSc/AluSx/AluSg/AluSz/AluSp/AluJ*/AluJo/AluYb1/AluYa1/AluYa5/AluYa8/AluYb5/TAAA@*
 f 4918 5201 3064 0e+00 B1F___B1F_repetitive_element_-_an_old_subfamily_consensus./AluYa8/FLA/AluSp/AluJo/AluJb/AluSz/AluS*
 f 5617 5912 16953 0e+00 B1___Mouse_B1_repetitive_sequence_-_a_consensus./FLA/AluSq/AluSp/AluSx/AluYa1/AluSz/AluSg/AluS*/A@*
 f 7741 7795 79 1e-13 SVA___Composite_retroposon.
 f 8222 8278 50 1e-04 SVA___Composite_retroposon.
 f 8353 8404 52 2e-05 SVA___Composite_retroposon.
 f 10036 10081 63 6e-09 SVA___Composite_retroposon.
 f 10256 10278 220 6e-12 HERVH48I___HERVH-related_endogenous_retrorvirus_flanked_by_MER4*/LINE_CH___C.hircus_LINE_element*
 f 13466 13759 4440 0e+00 AluJb/AluJo/AluSq/AluSx/AluSz/AluSg/AluSp/AluSc/AluY/AluYa1/AluYa5/AluYb1/AluYa8/AluYb5/AluYb*/FL*
 f 15151 15360 54 6e-06 SVA___Composite_retroposon.
 f 15822 16101 4863 0e+00 FLA/AluJb/AluJo/AluSq/AluSz/AluSx/AluSg/AluSp/AluSc/AluY/AluYa1/AluYb1/AluYa5/AluYb5/AluYb8/AluYa*
 f 16315 16368 150 1e-12 LINE2___MIR2/LINE2_repetitive_element_-_a_consensus.
 f 16562 16604 198 4e-16 LINE2___MIR2/LINE2_repetitive_element_-_a_consensus*
 f 16963 17184 130 2e-18 SVA___Composite_retroposon.
 f 18213 18272 150 1e-12 MIR___mammalian-wide_interspersed_repeat_(MIR)_-_a_consensu*
 f 18422 18706 4171 0e+00 FLA/AluJb/AluJo/AluSx/AluSz/AluSg/AluSq/AluSc/AluSp/AluY/AluYa1/AluYb1/AluYa5/AluYb8/AluYb5/AluYa*
 f 19642 19875 152 9e-25 SVA___Composite_retroposon.
 f 22537 22812 4222 0e+00 FLA/AluSq/AluSp/AluSx/AluSz/AluSg/AluSc/AluY/AluYa1/AluYb1/AluYa5/AluJb/AluYb5/AluYa8/AluYb8/AluJ*
 f 23348 23385 168 8e-18 LINE2___MIR2/LINE2_repetitive_element_-_a_consensus.
 f 25193 25225 50 1e-04 SVA___Composite_retroposon.
 f 26063 26095 1548 2e-124 RSINE2___SINE_element;_RSINE2_family_-_a_consensus./AC@*
 f 26949 26986 740 8e-115 TAAA@2
 f 27007 27063 65 2e-09 SVA___Composite_retroposon.
 f 27135 27243 46 0.002 SVA___Composite_retroposon.
 f 28043 28333 5080 0e+00 AluYb5/AluSq/AluSp/AluSx/AluSz/AluSg/AluYa1/AluSc/AluYa5/AluYa8/AluYb1/AluYb8/AluJo/AluJ*/FLA/AluJ*
 f 31460 31752 2334 0e+00 AluYa8/AluYb1/AluYa5/FLA/AluSp/AluSq/AluYb5/AluJo/AluJb/AluSz/AluSx/AluSg/AluYa1/AluSc/AluYb*
 f 31793 31852 437 2e-44 GGAA@1/L1P_MA2___LINE1_subfamily_consensus;_L1PMA2_subfamily.
 r 31793 31852 180 2e-22 GGAA@2
 r 31740 31771 44 0.006 LINE_CH___C.hircus_LINE_element.
 r 30578 30604 184 2e-11 MIR___mammalian-wide_interspersed_repeat_(MIR)_-_a_consensu*
 r 28328 28365 189 2e-25 LINE2___MIR2/LINE2_repetitive_element_-_a_consensus.
 r 28061 28291 100 1e-08 SVA___Composite_retroposon.
 r 26949 27269 5221 0e+00 FLA/AluSx/AluSq/AluSz/AluSg/AluSp/AluSc/AluY/AluYa1/AluYa5/AluJb/AluYb1/AluYa8/AluYb5/AluYb8/AluJ*
 r 26065 26095 1184 6e-92 AC@2/MER4I___MER41I,_MER57I_and_MER65I_retroelements.
 r 25142 25449 4123 0e+00 B1___Mouse_B1_repetitive_sequence_-_a_consensus*/FLA/AluYb8/AluSq/AluSp/AluYb5/AluSc/AluSz/AluS*
 r 24495 24773 2289 0e+00 FLA/AluSz/AluSx/AluSg/AluSq/AluY/AluYa1/AluYb5/AluYa5/AluSc/AluYb8/AluYa8/AluYb1/AluJb/AluJ*
 r 22730 22784 46 0.002 SVA___Composite_retroposon.
 r 22549 22643 50 1e-04 SVA___Composite_retroposon.
 r 22147 22199 162 2e-16 MIR___Mammalian-wide_interspersed_repeat_(MIR)_-_a_consensus*
 r 19612 19900 4564 0e+00 B1F___B1F_repetitive_element_-_an_old_subfamily_consensus./FLA/AluYb5/AluSq/AluSp/AluSz/AluSx/AluS*
 r 18621 18672 63 6e-09 SVA___Composite_retroposon.
 r 18213 18275 150 8e-13 MON1___tRNA-related_SINE_element_from_monotremes./MIR3___MIR3_SINE_element_associated_with_L3*
 r 16936 17211 5308 0e+00 FLA/AluYb5/AluSx/AluSz/AluSg/AluJb/AluSc/AluYa1/AluYa5/AluYb8/AluYb1/AluYa8/AluJo/AluSq/AluS*/AluS*
 r 16013 16067 52 2e-05 SVA___Composite_retroposon.
 r 15835 15937 48 4e-04 SVA___Composite_retroposon.
 r 15096 15467 4056 0e+00 MLT1C1___(MLT1c_subfamily)_-_a_consensus./MLT1D___(MLT1d_subfamily)_-_a_consensus*/AluSp/FLA/AluYb*
 r 14897 15082 2839 0e+00 FLA/AluJb/ALU___Human_ALU_interspersed_repetitive_sequence_-_a_consensus./AluSq/AluSx/AluSz/AluS*
 r 14582 14785 247 5e-31 MLT1C1___(MLT1c_subfamily)_-_a_consensus./MLT1C___(MLT1c_subfamily)_-_consensus_sequenc*
 r 13678 13713 48 4e-04 SVA___Composite_retroposon.
 r 10445 10526 1379 2e-220 AluSq/AluSz/AluJb/ALU___Human_ALU_interspersed_repetitive_sequence_-_a_consensus./AluSx/AluSg/AluS*
 r 10255 10398 2792 0e+00 7SL___Human_gene_for_small_cytoplasmic_7SL_RNA_(7L30.1)./FLA/AluYb8/AluJo/AluJb/AluSc/AluYa1/AluYa*
 r 10011 10205 3200 0e+00 AluSx/AluSg/AluSz/AluSc/AluSq/AluY/AluYa1/AluYb8/AluYb5/AluSp/AluJ*/AluYb1/AluYa5/AluYa8/AluJ*/FL*
 r 8195 8484 4828 0e+00 FLA/AluSp/AluSq/AluSx/AluSz/AluSg/AluSc/AluY/AluYa1/AluJb/AluYa5/AluYb5/AluYb1/AluYa8/AluYb8/AluJ*
 r 7601 8000 6667 0e+00 FLA/AluSq/AluSp/AluYb5/AluJb/ALU___Human_ALU_interspersed_repetitive_sequence_-_a_consensus./AluJ*
 r 6907 7333 238 2e-61 LTR32___LTR_from_human_endogenous_retrovirus.
 r 5881 5911 5398 0e+00 TAM2_AM___Anthirrhinum_majus_DNA_for_transposon_tam2*/LINE_CH___C.hircus_LINE_element*/A@*/A@*/A@*
 r 5784 5863 109 1e-11 SVA___Composite_retroposon.
 r 2865 2903 689 6e-111 TAAA@2
 r 2631 2675 536 1e-50 CAAAAA@2/MER61I___Primate_LTR_retroviral-like_element_MER61I_-_a_consensus*
 r 2434 2614 123 7e-16 SVA___Composite_retroposon.
 r 1378 1668 3743 0e+00 FLA/AluJb/ALU___Human_ALU_interspersed_repetitive_sequence_-_a_consensus./AluJo/AluSp/AluSq/AluS*
 r 131 221 142 9e-22 MLT1G___(MLT1g_subfamily)_-_a_consensus.
end cpxrpts

The fields of the complex repeat output are, in order:
Strand, Begin, End, BLAST Score, BLAST E-Value, Element Name(s)

Complex repeat coordinates are always ASCENDING and are given from the FORWARD STRAND'S perspective.

begin maskedseq
 gatctgggtaaagggttttccaggtgtcaggatggaagtgactaaggtgcagaggctgga
 gggctggggcaggtagaagcaagcattcctgttacctactgctgtgtgacaatctccccc
 taaaacacaannnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
 nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnatggcattgaccaaagcct
 ggttttcagtgggcagctgggctggatggcccaacacagcttcgctaacatgattgctgt
 cttcgtagggatggtggaagcctgggctcagtgggactgtcaactggaatggccatatgt
 ggactctcttagcatgatggtctcttctagaagcttgggttcccagagagaatgttcaag
(rest deleted to save space)
end maskedseq

All outputs are surrounded by the appropriate "begin" and "end" tags. Each data line begins with a space.

What are the Genome Channel output formats?

Here are examples of the formats, accompanied by their technical descriptions:

CpG Island Example

cpg_grailexp=1|3511|4215|0.90|74.14

CpG Island Format Technical Description

cpg_grailexp=num|begin|end|cpg_ratio|percent_gc

  num = numerical identifier
  begin = left boundary within the sequence
  end = right boundary within the sequence
  cpg_ratio = ratio of observed GC to expected GC (maximum value 2.0)
  percent_gc = percent GC within the island

Simple Repeat Example

smprpt_grailexp=1|1349|1374|64
smprpt_grailexp=2|1526|1555|85
smprpt_grailexp=3|2636|2674|107
smprpt_grailexp=4|5194|5227|85
smprpt_grailexp=5|5886|5907|115
smprpt_grailexp=6|8179|8200|61
smprpt_grailexp=7|10257|10282|104
smprpt_grailexp=8|13734|13775|127
smprpt_grailexp=9|15084|15103|42
smprpt_grailexp=10|25143|25170|91
smprpt_grailexp=11|28313|28346|83
smprpt_grailexp=12|31740|31866|266

Simple Repeat Format Technical Description

smprpt_grailexp=num|begin|end|count

  num = numerical identifier
  begin = left boundary within the sequence
  end = right boundary within the sequence
  count = Grail simple repeat count, divide by length to
    get a density score

Complex Repeat Example

cpxrpt_grailexp=1|f|1025|1361|326|3e-66|THE1B   a consensus./MSTA   a consensus*
cpxrpt_grailexp=2|f|1400|1456|58|3e-07|SVA   Composite retroposon.
cpxrpt_grailexp=3|f|2391|2635|2961|0e+00|FLA/AluJb/ALU   Human ALU interspersed repetitive sequence - a consensus./AluYb8/AluSq/AluSp/AluYb*
cpxrpt_grailexp=4|f|2676|2746|1122|5e-166|AluYa1/AluYb1/AluYb5/AluYb8/AluYa5/ALU   Human ALU interspersed repetitive sequence - a consensus*
cpxrpt_grailexp=5|f|2796|2863|924|5e-128|AluJb/ALU   Human ALU interspersed repetitive sequence - a consensus./AluSx/AluSz/AluSg/AluSq/AluS*
cpxrpt_grailexp=6|f|4918|5022|1547|5e-272|B1F   B1F repetitive element - an old subfamily consensus./AluYa1/AluYa5/AluYb1/AluYb5/AluYa8/AluS*
cpxrpt_grailexp=7|f|5071|5193|1647|8e-269|AluYb5/AluYb8/7SL   Human gene for small cytoplasmic 7SL RNA (7L30.1)./FLA/AluSc/AluYb1/AluYa*
cpxrpt_grailexp=8|f|5617|5885|3964|0e+00|AluYb1/FLA/AluYb8/AluSq/AluSp/AluYb5/AluSg/AluYa1/AluSz/AluSx/AluSc/AluYa5/AluJb/AluYa*/AluJo
cpxrpt_grailexp=9|f|7741|7795|77|4e-13|SVA   Composite retroposon.
cpxrpt_grailexp=10|f|8222|8278|50|8e-05|SVA   Composite retroposon.
cpxrpt_grailexp=11|f|10036|10081|60|9e-08|SVA   Composite retroposon.
cpxrpt_grailexp=12|f|13466|13733|4233|0e+00|AluYb1/FLA/AluYb8/AluSq/AluSp/AluYb5/AluJ*/AluJo/AluSx/AluSz/AluSg/AluSc/AluYa1/AluYa5/AluYa*
cpxrpt_grailexp=13|f|15323|15383|65|1e-09|AluYb1
cpxrpt_grailexp=14|f|15822|16086|4641|0e+00|AluYb1/AluYb8/AluYb5/AluJb/ALU   Human ALU interspersed repetitive sequence - a consensus./AluJ*
cpxrpt_grailexp=15|f|16963|17018|63|6e-09|SVA   Composite retroposon.
cpxrpt_grailexp=16|f|18212|18272|150|5e-13|MIR   mammalian-wide interspersed repeat (MIR) - a consensu*
cpxrpt_grailexp=17|f|18422|18689|3707|0e+00|AluYb1/FLA/AluSq/AluSp/AluYb5/AluJb/ALU   Human ALU interspersed repetitive sequence - a consensus*
cpxrpt_grailexp=18|f|19642|19694|81|2e-14|SVA   Composite retroposon.
cpxrpt_grailexp=19|f|22537|22805|3617|0e+00|AluYb1/FLA/AluJb/ALU   Human ALU interspersed repetitive sequence - a consensus./AluYb5/AluJo/AluS*
cpxrpt_grailexp=20|f|23348|23385|168|1e-18|LINE2   MIR2/LINE2 repetitive element - a consensus.
cpxrpt_grailexp=21|f|25193|25225|50|8e-05|SVA   Composite retroposon.
cpxrpt_grailexp=22|f|27007|27063|65|1e-09|SVA   Composite retroposon.
cpxrpt_grailexp=23|f|28043|28312|4576|0e+00|AluYb1/AluYb8/AluSq/AluSp/AluYb5/AluSg/AluYa1/AluSx/AluSz/AluSc/AluYa5/AluYa*/AluJo/AluJ*/FLA/AluJ*
cpxrpt_grailexp=24|f|31460|31565|1159|4e-153|AluSc/AluYb5/AluYa5/AluSg/AluYa1/AluYa8/AluYb8/AluYb1/AluS*/AluJb/FLA/AluSz/AluSx/AluSq/AluJo
cpxrpt_grailexp=25|f|31602|31739|1281|1e-191|AluSp/7SL   Human gene for small cytoplasmic 7SL RNA (7L30.1)./AluYa5/AluYa8/AluYb5/AluYa1/AluYb*
cpxrpt_grailexp=26|r|6138|6164|138|1e-09|MIR   mammalian-wide interspersed repeat (MIR) - a consensu*
cpxrpt_grailexp=27|r|8377|8413|183|8e-24|LINE2   MIR2/LINE2 repetitive element - a consensus.
cpxrpt_grailexp=28|r|8451|8473|46|0.001|SVA   Composite retroposon.
cpxrpt_grailexp=29|r|9473|9755|3591|0e+00|AluYb1/FLA/AluYb8/AluSq/AluSp/AluYb5/AluSx/AluSz/AluSg/AluSc/AluYa*/AluJb/AluYa5/AluYa8/AluJo
cpxrpt_grailexp=30|r|11293|11330|1062|6e-138|B1   Mouse B1 repetitive sequence - a consensus./AluSq/AluSx/AluYa5/AluYb1/AluSz/AluYa8/AluSg/AluJ*
cpxrpt_grailexp=31|r|11350|11565|2990|0e+00|AluYb8/AluYb5/AluSc/AluSg/AluYa1/AluYa5/AluYa8/AluSq/AluSp/AluSz/AluSx/FLA/AluJ*/AluJo/AluYb*
cpxrpt_grailexp=32|r|11969|12240|1401|2e-228|AluSz/AluSx/AluSq/AluSg/ALU   Human ALU interspersed repetitive sequence - a consensus./AluJb/AluJ*
cpxrpt_grailexp=33|r|13958|14012|46|0.001|SVA   Composite retroposon.
cpxrpt_grailexp=34|r|14543|14595|150|5e-13|MIR   mammalian-wide interspersed repeat (MIR) - a consensu*
cpxrpt_grailexp=35|r|16852|17125|4208|0e+00|AluYb1/FLA/AluYb8/AluSq/AluSp/AluYb5/AluSz/AluSx/AluSg/AluSc/AluYa1/AluYa5/AluYa8/AluJb/AluJ*
cpxrpt_grailexp=36|r|18070|18121|63|6e-09|SVA   Composite retroposon.
cpxrpt_grailexp=37|r|18470|18516|46|0.001|MIR3   MIR3 SINE element associated with L3.
cpxrpt_grailexp=38|r|19531|19794|4176|0e+00|AluYb1/AluYb8/AluSc/AluSx/AluSz/AluSg/AluJ*/AluYb5/AluYa1/AluYa5/AluYa8/AluJo/AluSq/AluSp/FL*/FL*
cpxrpt_grailexp=39|r|20675|20697|46|0.001|SVA   Composite retroposon.
cpxrpt_grailexp=40|r|21275|21303|46|0.001|MLT1C1   (MLT1c subfamily) - a consensus.
cpxrpt_grailexp=41|r|21359|21638|3367|0e+00|FLA/AluYb8/AluSc/AluSq/AluSp/AluYb5/AluJ*/AluJo/AluSz/AluSx/AluSg/AluYa1/AluYa5/AluYa8/AluYb*
cpxrpt_grailexp=42|r|21660|21845|2603|0e+00|FLA/AluJb/ALU   Human ALU interspersed repetitive sequence - a consensus./AluSq/AluSx/AluSz/AluS*
cpxrpt_grailexp=43|r|21957|22019|54|5e-06|MLT1C1   (MLT1c subfamily) - a consensus.
cpxrpt_grailexp=44|r|22065|22109|215|5e-22|MLT1C1   (MLT1c subfamily) - a consensus./MLT1C   (MLT1c subfamily) - a consensus*
cpxrpt_grailexp=45|r|23029|23064|48|3e-04|SVA   Composite retroposon.
cpxrpt_grailexp=46|r|26216|26293|1290|1e-195|AluYb8/AluYa5/AluYb1/AluYb5/AluSc/AluYa1/AluYa8/AluSz/AluSq/AluJ*/AluSx/AluSp/AluJo/AluSg/FL*
cpxrpt_grailexp=47|r|26344|26459|1938|0e+00|FLA/7SL   Human gene for small cytoplasmic 7SL RNA (7L30.1)./AluYb8/AluSc/AluYb5/AluJ*/AluJb/AluYa*
cpxrpt_grailexp=48|r|26537|26731|2922|0e+00|AluSq/AluYb5/AluSz/AluSx/AluSg/AluSc/AluYa1/AluYb8/AluSp/AluJ*/AluYb1/AluYa5/AluYa8/AluJ*/FL*
cpxrpt_grailexp=49|r|28258|28540|4381|0e+00|AluYb1/FLA/AluYb8/AluSc/AluYb5/AluSp/AluSq/AluSx/AluSz/AluSg/AluYa1/AluJb/AluYa*/AluYa8/AluJ*
cpxrpt_grailexp=50|r|28742|29141|6210|0e+00|AluYb1/FLA/AluYb8/AluSc/AluSq/AluSp/AluYb5/AluJ*/AluJo/AluSx/AluSz/AluSg/AluYa1/AluYa5/AluYa*/AluJ*
cpxrpt_grailexp=51|r|29409|29534|107|4e-22|LTR32   LTR from human endogenous retrovirus.
cpxrpt_grailexp=52|r|29573|29743|113|7e-24|LTR32   LTR from human endogenous retrovirus.
cpxrpt_grailexp=53|r|29782|29835|67|4e-10|LTR32   LTR from human endogenous retrovirus.
cpxrpt_grailexp=54|r|30879|30931|65|1e-09|SVA   Composite retroposon.
cpxrpt_grailexp=55|r|34128|34180|73|6e-12|SVA   Composite retroposon.
cpxrpt_grailexp=56|r|35074|35189|1863|0e+00|AluYb1/AluJb/ALU   Human ALU interspersed repetitive sequence - a consensus./AluJo/FLA/AluSp/AluS*
cpxrpt_grailexp=57|r|35210|35364|1892|0e+00|AluJb/ALU   Human ALU interspersed repetitive sequence - a consensus./AluJo/AluSp/AluSq/AluSz/AluS*
cpxrpt_grailexp=58|r|36544|36580|122|4e-16|MLT1G   (MLT1g subfamily) - a consensus.

Complex Repeat Format Technical Description

cpxrpt_grailexp=num|strand|begin|end|blast_score|blast_evalue|element_list

  num = numerical identifier
  strand = f or r (forward or reverse)
  begin = left boundary from that strand's perspective
  end = right boundary from that strand's perspective
  blast_score = BLAST score returned by blastall, sum of scores 
    if concatenation of elements
  blast_evalue = BLAST e-value returned by blastall, product of 
    e-values if concatentation of elements
  element_list = list of elements in REPBASE hit by this 
    region, multiple elements separated by '/'

Genome Channel output format is always indexed from the TARGET STRAND'S PERSPECTIVE. This means all forward strand objects are indexed relative to the forward strand, and all reverse strand objects are indexed relative to the reverse strand.

What is planned for future versions?

A routine will be added to the complex repeat finder to "merge" alignments hitting the same repetitive element that BLAST has incorrectly fragmented. An option may be added to mask the sequence for simple repeats.


The author and maintainer of this FAQ is Doug Hyatt (hyattpd@ornl.gov). This FAQ applies to GrailEXP version 3.2, released February, 2001. This FAQ was last updated February, 2001.