                       CONVERT

   Data conversion from CLUSTAL, PIR, SWISS-PROT, PEARSON,
        Intelligenetics datafiles to PROANALYST

     State Research Center of Virology an Biotechnology "Vector"
                Institute of Molecular  Biology
          Koltsovo, Novosibirsk Region,  633159  Russia

With the using of the program CONVERT you can create data files for
PROANALYST (and PROANAL, PROANAL2) from five commonly used data file
formats: CLUSTAL multiple alignment, Pearson, PIR, SWISS-PROT and
Intelligenetics.  Just start the program, select the input data file,
select the name of relative format any you'll get three files (*.seq,
*.act, *.ali) prepared to analysis by PROANALYST. To input data on
protein activity just type they in the lines with the names of proteins.

PEARSON and PIR formats - after symbol ">".
Example:

> 455.8  P1;104K_THEPA

SWISS-PROT - to the line DE (DESCRIPTION),  after DE.
Example:

DE  455.8 CLARA CELLS 10 KD SECRETORY PROTEIN PRECURSOR (CC10).

INTELLIGENETICS - to the line with short name (line before 
sequence), after one space:

Example:

 455.8 KS61_MOUSE-356-724


If you have a protein family alignment done by CLUSTAL (CLUSTAL3, 4, or
5) or prepared in other similar formats (PHYLIP), you can easily
transform these data to PROANALYST.  Data on protein activities, if
available, should be inputed to the beginning of the first block of 
sequences.


Example:
Output file OUT4 of the program CLUSTAL4 looks as followed:

Gap fixed =  10 Gap vary. =  10
 * :=>  match across all seqs.
 . :=>  conservative substitutions

CPN1_HUMAN     RLNPEVLSPNAVQRFLPMVDAVARDFSQALCKKVLQNARGSLTLDVQPSIFH-YTIEASN
CPN1_RAT       QLNPNMLSPKAIQSFVPFVDVVARDFVENLCKRMLENVHGSMSINIQSNMFN-YTMEASH
CPN2_HUMAN     RLNPDVLSPKAVQRFLPMVDAVARDFSQALCKKVLQNARGSLTLDVQPSIFH-YTIEASN
CPS1_BOVIN     AL---LLGTRSS------MEPWVDQLTQEFCERMRVQAGAPVTIQKEFSL---LTCSIIC
CPS1_HUMAN     AL---LLGIRDS------MEPVVEQLTQEFCERMRAQPGTPVAIEEEFSL---LTCSIIC
CPS1_MOUSE     AL---MLGMRDS------MEPLIEQLTQEFCERMRAQAGTPVAIHKEFSF---LTCSIIS
                                            .*

CPN1_HUMAN     LALFGERLGLVG-HSPSSASLNFL
CPN1_RAT       FVISGERLGLTG-HDLKPESVTFT
CPN2_HUMAN     LALFGERLGLVG-HSPSSASLNFL
CPS1_BOVIN     YLTFG-----NK--EDT-----LV
CPS1_HUMAN     YLTFG-----DKIKDDN-----LM
CPS1_MOUSE     CLTFG-----D--KDST-----LV
                   *                 .

Run  CONVERT,  select filename OUT4 in the menu and you will get three 
files (OUT4.SEQ,  OUT4.ALI,  OUT4.ACT) in the same directory.  File  
OUT4.ACT  has the protein activities equal to zero.  If the name of  a  
protein  occasionally  begins  with  a number, this number will appear as 
activity value in a file with extension "act". 

To  input  the activities you should either edit the file OUT4.ACT or 
type the data before the name of the sequence (as the first column) in the 
very first block  of  sequences  in  the file OUT4, and then run CONVERT. 
Please do not add the activity values in the other blocks of sequences.  



Example:

20  CPN1_HUMAN   RLNPEVLSPNAVQRFLPMVDAVARDFSQALCKKVLQNARGSLTLDVQPSIFH-YTIEASN
30  CPN1_RAT     QLNPNMLSPKAIQSFVPFVDVVARDFVENLCKRMLENVHGSMSINIQSNMFN-YTMEASH
65  CPN2_HUMAN   RLNPDVLSPKAVQRFLPMVDAVARDFSQALCKKVLQNARGSLTLDVQPSIFH-YTIEASN
.15 CPS_BOVIN    AL---LLGTRSS------MEPWVDQLTQEFCERMRVQAGAPVTIQKEFSL---LTCSIIC
2.5 CPS1_HUMAN   AL---LLGIRDS------MEPVVEQLTQEFCERMRAQPGTPVAIEEEFSL---LTCSIIC
39  CPS1_MOUSE   AL---MLGMRDS------MEPLIEQLTQEFCERMRAQAGTPVAIHKEFSF---LTCSIIS
                                              .*

CPN1_HUMAN     LALFGERLGLVG-HSPSSASLNFL
CPN1_RAT       FVISGERLGLTG-HDLKPESVTFT
CPN2_HUMAN     LALFGERLGLVG-HSPSSASLNFL
CPS1_BOVIN     YLTFG-----NK--EDT-----LV
CPS1_HUMAN     YLTFG-----DKIKDDN-----LM
CPS1_MOUSE     CLTFG-----D--KDST-----LV
                   *                 .


In this case file OUT4.ACT will have protein  activities  taken from the 
file OUT4.  

If  you  don't have  any data on protein activities, it is convenient to 
input the numbers of proteins in the file, instead of activities, e.g.  

1
2
3
4
5
6
.
.


This could be useful for comparative analysis of sequences on the 
structure-activity plots.  





