Genome Analysis and Systems Modeling Group

Analysis Tools

One of our primary goals is to design, implement, and provide practical computational tools for the analysis and annotation of genomes. The group is working on developing state-of-the-art tools for genomic analysis including gene finding, with a special focus on systems for retrieval, assembly, analysis, and annotation of the ever-increasing DNA sequence data that is being generated by sequencing centers around the world.

Currently Available Tools

  • GrailEXP Gene finder (Human and Mouse Genomes)
  • Generation Gene finder (Microbial Genomes)
  • Pipeline, Comprehensive Genome Analysis Pipeline
  • Genome Channel, Java applet for the comprehensive sequence-based view of genomes.
  • Grail, a tool for the identification of genes, exons, and various features in DNA sequences.
  • Tools Under Development

    Current areas of research/development include:

    • Annotation/Visualization: The Genome Channel provides the end-user with the ability to examine contigs on a clickable chromosome map. Relevant information, such as GrailEXP genes, Grail exons, GrailEXP EST alignments, Genbank annotations, Genscan genes, and many other features are displayed. The system is being continually refined to provide enhanced capabilities.
    • Assembly: Efficient and accurate assembly of DNA fragments obtained from shotgun sequencing data.

    Bioinformation Systems

    This goal is to provide information required for advances in biological research by developing an underlying data and computer system infrastructure that supports improved access to:

    • biological information resulting from research and analysis;
    • biological data and information needed for research and analysis; and
    • biological, bioinformatics, and computational resources and tools.

    Our Approach

    Biological information is both a required input and the desired product of research in computational biology. Biological information is extremely diverse, complex, and changes rapidly. The infrastructure we are creating will accommodate the exponentially growing volume and evolving concepts of sequencing data and related information and make available knowledge inferred from these data. A mix of technologies and approaches will be developed:

    • to recognize the diversity of types and formats of biological data and data resources;
    • that can provide a high cost/benefit ratio to make good use of limited resources;
    • that are scalable or are developed with a plan for a transition after initial prototyping; and
    • to recognize the user needs for data visualization, browsing, and querying.

    Our Products

    We help provide data resources and databases. These include:

    • Draft Microbial Genome Database
    • Homologous Bacterial Genes (HOBACGEN) server

    Our Present Efforts

    The first focus is on information about genome annotation, protein structure, and gene and protein function. We will:

    • provide access to information and analyses that result from our efforts and those of collaborators;
    • link, import, catalog, or provide improved access to external data resources;
    • make these stable data resources available for external users; and
    • link to external resources and tools.

    Some ongoing projects in the group are:

    • importing several DNA and protein-based external data resources, including Sequence Retrieval Service (SRS) from EMBL/EBI, for the use of our collaborators and other genome and protein research in the section;
    • reorganizing and storing some important analytical and operational data produced during comparative genome analysis into an Annotation Data Warehouse;
    • automating several data gathering and data analysis processes;
    • adding simple hyperlinks to important resources;
    • creating links to protein structure resources and tools;
    • providing links to genome sequencing centers, functional genomics laboratories, and protein structure resources; and
    • providing underlying system and data architectures and systems.

    Our Research Interests

    Future focus will be on:

    • bioinformatics and the next generation information analysis systems for biologists;
    • new concepts of information representation and next generation information systems;
    • high performance computing system architecture and new data storage technologies;
    • system analysis, design, and construction; completion of systems by analyzing user needs and prioritizing developments with constraints of budget, time, and technology; and
    • semantics and structure of molecular biology knowledge.
