1蛋白质家族和结构域 数据库
1.1蛋白质模体及结构域数据库
模体和结构域
PROSITE数据库
PRINTS数据库
characteriseBLOCKS数据库
ProDom数据库
Pfam数据库
SMART数据库
InterPro数据库
Conserved Domain数据库
CDART
 
模体(motifs)和结构域 (domains):
Biologists can gain insight of the protein function based on identification of short consensus sequences related to known functions. These consensus sequence patterns are termed motifs and domains.
A motif is a short conserved sequence pattern associated with distinct functions of a protein or DNA.
It is often associated with a distinct structural site performing a particular function.
A typical motif, such as a Zn-finger motif, is ten to twenty amino acids long.
 
A domain is also a conserved sequence pattern, defined as an independent functional and structural unit.
Domains are normally longer than motifs.
A domain consists of more than 40 residues and up to 700 residues, with an average length of 100 residues.
A domain may or may not include motifs within its boundaries.
Examples,transmembrane domains, ligand-binding domains.
 
Identification of motifs and domains heavily relies on multiple sequence alignment as well as profile and hidden Markov model (HMM) construction
 
PROSITE(蛋白质家族及结构域数据库):
The first established sequence pattern database                      &/prosite/ 
是蛋白质家族和结构域数据库,包含具有生物学意义的位点、模式、可帮助识别蛋白质家族的统计特征。
PROSITE中涉及的序列模式包括酶的催化位点、配体结合位点、与金属离子结合的残基、二硫键的半胱氨酸、与小分子或其它蛋白质结合的区域等。
PROSITE还包括根据多序列比对而构建的序列统计特征,能更敏感地发现一个(未知)序列是否具有相应的特征。
The functional information of these patterns is primarily based on published literature.
 
PRINTS(蛋白质模体指纹数据库):
A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of a SWISS-PROT/TrEMBL composite. Us
ually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space..                             bioinf.man.ac.uk/dbbrowser/PRINTS/
提供蛋白质同源性分析,蛋白质模体指纹分析,系统发生和序列进化分析,以及微阵列分析,并提供生物信息学和PRINTS数据库数据下载。
 
BLOCKS:
A database of blocks
Blocks:ungapped multiple alignments derived from the most conserved, ungapped regions of homologous protein sequences.                                 
The blocks, which are usually longer than motifs, are subsequently converted to PSSMs.
Because blocks often encompass motifs, the functional annotation of blocks is thus consistent with that for the motifs
/blocks.
检测和鉴定蛋白质模体,有BLOCK search、Get Blocks和Block Maker工具
A query sequence can be used to align with precomputed profiles in the database to select the highest scored matches.
 
ProDom
Domain database
ProDom is a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases
The domains are built using recursive iterations of PSI-BLAST.
prodom.prabi.fr/prodom/current/html/home.php
提供相似性搜索、来自SWISSPROT相关结构域的多序列比对
 
Pfam(Protein families database of alignments and HMMs)
A database with protein domain
derived from sequences in SWISSPROT and TrEMBL. Each motif or domain is represented by an HMM profile generated from the seed alignment of a number of conserved homologous proteins.            /
The Pfam database is composed of two parts
Pfam-A involves manual alignments
Pfam-B, automatic alignment in a way similar to ProDom( PSI-BLAST ).
The functional annotation of motifs in Pfam-A is often related to that in PROSITE. Pfam-B only contains sequence families not covered in Pfam-A.
Because of the automatic nature, Pfam-B has a much larger coverage but is also more error prone because some HMMs are generated from unrelated sequences.
 
SMART (Simple Modular Architecture Research Tool):
Contains HMM profiles constructed from manually refined protein domain alignments.                                                             bl-heidelberg.de/
Alignments in the database are built based on
tertiary structures whenever available
or based on PSI-BLAST profiles.
Alignments are further checked and refined by human annotators before HMM profile construction.
Protein functions are also manually curated.
The  database may be of better quality than Pfam with more extensive functional annotations.
Compared to Pfam, the SMART database contains an independent collection of HMMs, with emphasis on signaling, extracellular, and chromatin-associated motifs and domains.
Sequence searching in this database produces a graphical output of domains with well-annotated information with respect to cellular localization, functional sites, superfamily, and tertiary structure