Blast Rule(00-03-09)
1.Choose entries with identity >= 95%(default) from the result of BLAST search. If user specified lower limit id, use it to choose entries.
Example)
______________________________________________________________________________
sp|P00173|CYB5_RAT CYTOCHROME B5 >gi|65657|pir||CBRT5 cytochrome b5, microsomal -
rat >gi|220730|dbj|BAA02492| (D13205) cytochrome b5
precursor [Rattus rattus] >gi|2257957 (AF007108)
cytochrome b5 [Rattus norvegicus]
Length = 134
Score = 198 bits (498), Expect = 7e-51
Identities = 94/94 (100%), Positives = 94/94 (100%)
Query: 1 DKDVKYYTLEEIQKHKDSKSTWVILHHKVYDLTKFLEEHPGGEEVLREQAGGDATENFED 60
DKDVKYYTLEEIQKHKDSKSTWVILHHKVYDLTKFLEEHPGGEEVLREQAGGDATENFED
Sbjct: 6 DKDVKYYTLEEIQKHKDSKSTWVILHHKVYDLTKFLEEHPGGEEVLREQAGGDATENFED 65
Query: 61 VGHSTDARELSKTYIIGELHPDDRSKIAKPSETL 94
VGHSTDARELSKTYIIGELHPDDRSKIAKPSETL
Sbjct: 66 VGHSTDARELSKTYIIGELHPDDRSKIAKPSETL 99
_______________________________________________________________________________
2. Pull out "real data" and "embedded data" from the entries chosen. (Embedded data gets same score, identities, positives, expect value as real data)
Example)
_______________________________________________________________________________
real data: sp|P00173|
embedded data: gi|65657|, pir||CBRT5, gi|220730|, dbj|BAA02492|, gi|2257957|
_______________________________________________________________________________
3. Make a loop inside the saveframe containing "_Mol_residue_sequence". List all of the pdb entries and top 5 of other database entries (based on the score value) sorted by database name.
Example)
_______________________________________________________________________________
loop_
_Database_name
_Database_accession_code
_Database_entry_mol_name
_Sequence_query_to_submitted_percentage
_Sequence_subject_length
_Sequence_identity
_Sequence_positive
_Sequence_homology_expectation_value
SWISS-PROT P00173 "CYTOCHROME B5" 100% 94 100% 100% 7e-51
stop_
_______________________________________________________________________________
a)_Database_name
convert: gi and gb ------> GenBank
pdb ------> PDB
sp ------> SWISS-PROT
pir ------> PIR
emb ------> EMBL
dbj ------> DBJ
b)_Database_accession_code
gi: gi|accession_code
gb: gb|accession_code|locus
pdb: pdb|accession_code|chain
sp: sp|accession_code|entryname
pir: pir||accession_code
emb: emb|accession_code|locus
dbj: dbj|accession_code|locus or
dbj||accession_code
c)_Database_entry_mol_name
pdb and embeded data: "?"
others : first line or
string before >gi
d)_Sequence_query_to_submitted_percentage
(sequence query length / residue count) * 100
e)_Sequence_subject_length
length value
f)_Sequence_identity
identity value
g)_Sequence_positive (No positive tag for 'blastn')
positive value
h)_Sequence_homology_expectation_value
Expect value
4. Update the time tags in front of the loop.
Example)
-----------------------------------------------------------------------------
_Sequence_homology_query_date 2000-03-09
_Sequence_homology_query_revisied_last_date 2000-03-02
-----------------------------------------------------------------------------
5. Make a loop inside the saveframe containing " _Mol_system_name" and list all pdb entries.
Example)
-----------------------------------------------------------------------------
loop_
_Database_name
_Database_accession_code
_Database_entry_mol_name
_Database_entry_details
PDB 1BFX ? .
PDB 1IEU ? .
PDB 1IET ? .
PDB 1AQA ? .
PDB 2AXX ? .
PDB 1AW3 ? .
PDB 1AXX ? .
PDB 1B5A ? .
PDB 1B5B ? .
PDB 1BLV ? .
PDB 1WDB ? .
PDB 3B5C ? .
PDB 1CYO ? .
stop_
_____________________________________________________________________________