Protein Informatics Group
  
Home People Research Publications News
         

PROSPECT Version 2.0:

Introduction
Installation
Quick Guide
Running
Prospect Manager
Input Formats
Templates
Parameters
Configurations
Outputs
References
FAQ
   

Running Prospect

We suggest users to run BLAST / PSI-BLAST first before using PROSPECT to make sure whether any homolog for the target exists in PDB. In case a remote homolog found by PSI-BLAST has only alignment for partial sequence and it is not included in the DALI or FSSP list, it is suggested to include them in the template library (see Templates) to verify if it is the true fold or to generate the full alignment. One can do the same thing for a PDB structure having similar function to the target.

The programs included as part of the prospect suite are:


For the various examples below, we use 'LINUX' as the architecture in the commands.



Sequence Profile

We use PSI-Blast to generate sequence profiles.  The program blastpgp produces a 'checkpoint file' at the end of a search iteration.  

We include the script get_chk_file, which can be used with the command:
get_chk_file <seq file>

<seq file> : The path to the sequence file

This is produce the file <seq file>.chk

You can run blastpgp yourself, with the command:

blastpgp -b 0 -j 3 -h 0.001 -d /data/nr/nr -i test.seq -C test.seq.chk

This tells blastpgp to show none of the alignments (-b 0),  with 3 iterations (-j 3), and an e-value threshold of 0.001 (-h 0.001), using a database found at /data/nr/nr (-d), the sequence test.seq (-i), and saving the checkpoint file (thing we're intersted in here) to file test.seq.chk (-C)

The chk file is a non-ascii file which is not transferable between machines with different encoding schemes (i.e. between big and little endian machines).

read_chk.<Architecture> <chk file>

This will read all of the information in the file out to ASCII format.  To save it, simple pipe the output to a file, like
read_chk.LINUX test.seq.chk > test.seq.freq

This file should be comprised of one line with the number of amino acids in the sequnce (N), followed by the sequnce on the next line, and then N lines 


Secondary Structure Prediction

prospect_ssp.<Architecture> [-freqfile <file>]/[-seqfile <file>]/[-chkfile <file>] [-p]

-chkfile <file>

The frequency profile for a sequence, created by psi-blast.
-freqfile <file>
The ASCII version of a check point (chk) file, created with the tool read_chk.

-seqfile <file>

The raw sequence, typically in standard FASTA format

-p

Print output in PHD style format


Alignment

Threading is the actual 'work horse' of the prospect suite.  It is the program that actually does the actual threading procedure.  Typically you don't call threading directly, but rather call it through prospect, which then calls threading aginst an entire database.

threading.<Architecture> [-phdfile <file>]/[-seqfile <file>] [-freqfile <file>] [-global]/[-global_local]/[-np]/[-wp] [-reliab] [-o <output file>] [TemplateName ]/[-tempfile <file>]

-phdfile <file>

A secondary structure prediction in PHD format, can be generated by prospect_ssp

-seqfile <file>

The amino acid sequence file, typically in FASTA format.  Note, if you provide a secondary structure prediction, you don't need to provide this file.

-freqfile <file>

The frequency profile for the sequence, as outputed by read_chk

-global -global_local -np -wp

Select the type of threading procedure to use.  See threading methods for more information.

-reliab

Calculate the zscore when doing global and global local threading.  Does not apply to NP and WP threading.

-o <output file>

Name of the file to output to.

[TemplateName ]/[-tempfile <file>]

Define the name of the template to thread against.  Or if neccessary, the path of the threading file to use defined by the -tempfile arguement.


e.g.,
thread.LINUX -seqfile 1ltsd.seq 1bova

The options at the command line are complementary with the settings specified in configuations.

Threading Methods

The following 'threading methods' are avalible (and are mutually exclusive, so you pick one)

  • "-global": Global alignment
  • "-global_local": Global-local alignment, i.e., no end gap penalty for a query sequece.
  • "-np": without using pairwise interaction
  • "-wp": With pairwise interaction

Threading against a Database

For fold recognition:

prospect.<Architecture> [-phdfile <file>]/[-seqfile <file>] [-freqfile <file>] [-global]/[-global_local]/[-np]/[-wp] [-reliab] [-ncpus N] [-o <output file> ]


-phdfile <file>
-seqfile <file>
-freqfile <file>
-global -global_local -np -wp

All of these parameters are passed directly to threading

-scop

Thread aginst the SCOP database.
-fssp
Thread aginst the FSSP database (this is the default action).

-custom

Thread aginst all templates found in the template paths that are not part of FSSP or SCOP (i.e. templates that you've generated with make_template)

-all

Thread aginst all found templates.

-tfile <file>

Thread against all the templates listed in <file>.

-o <output file>

Name of the output XML file

-ncpus N
Try to launch N simultanous threading jobs.

e.g.,

Run default template list aginst sequence
prospect.LINUX -seqfile t0052.seq   

Run custom subset of templates against phd file

prospect.LINUX -tfile subset.list -phdfile agouti.phd 

The following optional flags are available:

  • "-tfile TemplateListFile": a list of template used for threading (default: FSSP located at $PROSPECT_PATH/data/parameters/fssp.list).



After Prospect

Sorting prospect Results


SortProspect is provided to scan and sort the names of templates in prospect files in order to determine which templates are the best matches.

sortProspect.<Architecture> <prospect outfile> [-r/-z] [-s] [-1] [-top x]

<prospect outfile>

The file outputed by prospect

-r

Sort by raw score

-z

Sort by zscore

-s

Save the prospect output file with the enteries in the order of sorting

-1

One colum view, only print out the names of templates

-top x

Print out the top x scores of the sort


There are different methods to sort scores, amoung them: by raw score, Zscore, and SVM score.  SVM sort is default.  Zscore, and raw score can be activated by the -z and -r flags respectively.

To sort according to the raw score in the fold recognition, use:

sortProspect.LINUX -r OutputFile.xml

To sort according to the SVM score, and save sort back to the file:

sortProspect.LINUX OutputFile.xml -s


Confidence Index




Using Modeller

To create a modeller runfile from a prospect threading, call

modellerProspect.<Architecture> -prosfile <prospect outfile> <template name> [-o output directory name]

-prosfile <prospect outfile> <template name>

[-o output directory name]


The example would be:

modellerProspect.LINUX -prosfile test.seq.xml 1aac -o test_1aac

This command would create a modeller alignment from the template 1aac, and place the files needed to run modeller in the directory test_1aac.

To actually run modeller:
cd test_1aac
modeller run.top


Prospect File Utilities

catProspect.<Architecture> <file1> .... <file n>

- Takes several prospect files, and place all of the in a single prospect record

mergeProspect.<Architecture> <file1> <file2> <file3>

Replace records in file1 with those from file2, save into file 3.  
This is a tool lets use replace newer threadings into a prosect file, while keeping the older ones that don't need to be replaced.


Viewing Prospect Results

convertProspect.<Architecture> <prospect file> [-wrap] [-table] [-html] [-pdb <templateName>]

<prospect file>

 The file created by prospect

-wrap

Line wrap the alignments at 60 characters

-table

For html pages, produce a table at the top of the page that summerizes the results

-html

Output to HTML

-pdb <templateName>

Copy the template's PDB file, setting temperatures to show the 'level' of alignments, hotter means alignments with excact matches, while colder means no alignment.


Template Generation

Prospect 2.0 includes the tools needed for you to create new templates.

make_template.<Architecture> [-pdbfile <file>] [-c] [-n <name>] [-d <start res> <end res>] [-o <file>]

-pdbfile <file>

Path to the PDB file that the template will be based on.  Please check the templates page for notes on the requirements for base PDB files.

-c

The chain to use for the template

-n <name>

Base name for the template.

-d <start res> <end res>

If you are creating a template based on a domain, rather then a chain, this selects the area between and including the start residue ID and the end residue ID.

-o <file>

Name of the output template


-
Life Sciences Division  -  ORNL  -  Disclaimer  -  Webmaster