Conserved Domain Database banner graphic
Structure Group 3D Structures Conserved Domains PubChem PubMed Protein Entrez BLAST
Search across Entrez databases [Clear]  Help

in Entrez:



Last Revised 08/26/08

This document includes help for the Conserved Domain Database (CDD) and the CD-Search Tool.
Both resources can be used to help elucidate protein function. The data continue to evolve as research progresses. Comments about the data are welcome and can be sent to info@ncbi.nlm.nihi.gov.

  Conserved Domain Database (CDD) Help back to top


 
What is a conserved domain?
Thumbnail image for 3D structure of type-1 insulin-like growth-factor receptor (IGF-1R), viewed in the free Cn3D structure viewing program and colored by domain.  Click on image to jump to a larger, annotated version in this help document.

3-D structures and
conserved core motifs:

Thumbnail image for example of 3-dimensional structure: Cl- binding residues in Voltage-Gated Chloride Channel, cd00400.  Click on image to jump to a larger, annotated version in this help document.

Conserved features
(binding and catalytic sites)

Thumbnail image for examples of Conserved Features (Sites) in Voltage-Gated Chloride Channel, cd00400, including Cl- selectivity filter, pore-gating glutamate residue, Cl- binding residues, and dimer interface..  Click on image to jump to a larger, annotated version in this help document.

Domain family hierarchies
Thumbnail image of domain hierarchy showing divergence in a protein family based on phylogenetic relationships of protein sequences and functional properties.  Click on image to jump to a larger, annotated version in this help document.

  CD-Search Help back to top



 

CD-Search Results: Concise Display:
top-scoring hits only


Thumbnail image for RPS-BLAST concise display (default), which shows only the top-scoring hits for each region of the query sequence.  Click on image to jump to a larger, annotated version in this help document.


CD-Search Results: Full Display: all hits

Thumbnail image for RPS-BLAST full display, which shows all hits on each region of the query sequence.  Click on image to jump to a larger, annotated version in this help document.


CD-Search Results: Small Triangles
represent conserved features/sites


Thumbnail image for small triangles shown in CD-Search results.  The triangles point to specific residues involved in conserved features, such as binding and catalytic sites, as mapped from a conserved domain to the query protein sequence. Click on image to jump to a larger, annotated version in this help document.


Specific Hits must meet or exceed
domain-specific threshold score


Thumbnail image that shows the method for determining the domain-specific E-value threshold score for RPS-BLAST.  Each protein sequence that was used to curate a domain model is RPS-BLASTed against the domain model's PSSM.  The highest (i.e., weakest E-value) among the member sequences is the domain-specific Threshold score. If a protein query sequence is RPS_BLASTed against CDD and receives an E-value score equal to or lower than the threshold, that protein is considered a specific hit..  Click on image to jump to a larger, annotated version in this help document.

Conserved Domain Database back to top

What is a conserved domain? back to top

Domains can be thought of as distinct functional and/or structural units of a protein. These two classifications coincide rather often, as a matter of fact, and what is found as an independently folding unit of a polypeptide chain also carries specific function. Domains are often identified as recurring (sequence or structure) units, which may exist in various contexts. The image below illustrates four "domains" identified as structural units in the MMDB-entry 1IGR, chain A, as segments colored in magenta, blue, brown, and green.

In molecular evolution such domains may have been utilized as building blocks, and may have been recombined in different arrangements to modulate protein function. We define conserved domains as recurring units in molecular evolution, the extents of which can be determined by sequence and structure analysis.

Conserved domains contain conserved sequence patterns or motifs, which allow for their detection in polypeptide sequences. The distinction between domains and motifs is not sharp, however, especially in the case of short repetitive units. Functional motifs are also present outside the scope of structurally conserved domains. The CD database is not meant to systematically collect such motifs.
3D structure of type-1 insulin-like growth-factor receptor (IGF-1R), viewed in the free Cn3D structure viewing program and colored by domain.
For this query sequence, a good correspondence exists between structural units (3D domains), identified by purely geometric criteria, and units asserted to be evolutionary conserved (domain families). The region annotated as "FU" (furin-repeat like) overlaps with a domain-split that was suggested by the MMDB domain parser.

Click anywhere on the image to open the complete, interactive record for this protein structure (1IGR) in Cn3D, a free helper application available for Windows, Macintosh, and Unix platforms. Cn3D installation takes only a couple of minutes and a tutorial describes the program's features and functions.

Open the 1IGR structure summary record in the Molecular Modeling Database (MMDB) to access more information about the protein, its conserved domains, and ligands (small molecules). Click on a conserved domain or ligand of interest to view its complete information in the Conserved Domain Database or PubChem, respectively. Click on the colored bar representing a 3D domain to retrieve similar 3D structures.

View the CD-Search help document for more details about the program that was used to identify the conserved domains in the protein chain. The concise display of the conserved domains is shown here and includes specific hits, superfamilies, and multi-domains. (Open the actual CD-Search results to view alignments of the query sequence to a conserved domain's consensus sequence, and/or to access a full display of all domain models found.)

Source Databases:  Where does CDD content come from? back to top

Conserved Domains can be described by multiple local sequence alignments. Computational biologists have compiled collections of such alignments representing conserved domains, and CDD includes domains curated at NCBI as well as data imported from external sources.
NCBI-Curated Domains
NCBI-curated domains use 3D-structure information to explicitly to define domain boundaries, aligned blocks, and amend alignment details. More details about the unique features of NCBI-curated domains are below.

The goal of the curation project is to provide CDD users with insights into how patterns of residue conservation and divergence in a family relate to functional properties, and to provide useful links to more detailed information that may help to understand those sequence/structure/function relationships. The presence of conserved features help to affirm family membership in search results with borderline significance, for example. NCBI CDD Curators provide feature annotation and associated evidence in a computer friendly way, so that the scientific community can build software tools for the automation of tasks like annotation transfer, for example.
External Data Sources
In addition, CDD imports data from four other major sources:
  • SMART, the Simple Modular Architecture Research Tool
  • Pfam, Pfam-A seed alignments from the Protein families database of alignments and HMMs
  • COGs, Clusters of Orthologous Groups of proteins
  • PRK, PRotein K(c)lusters
SMART and Pfam are public domain databases, that are presented together with Hidden Markov Model (HMM)-based search engines and alignment visualization services.
COG (Clusters of Orthologous Groups) is an NCBI-curated protein classification resource. Sequence alignments corresponding to COGs are created automatically from constituent sequences and have not been validated manually when imported into CDD.
PRK (Protein Clusters) is an NCBI collection of related protein sequences (clusters) consisting of Reference Sequence proteins encoded by complete prokaryotic and chloroplast plasmids and genomes. It includes both curated and non-curated (automatically generated) clusters.

CDD also contains data from additional research projects, such as KOGs (a eukaryotic counterpart to COGs) and the Library of Ancient Domains (LOAD), contributed by I. Aravind, E. Koonin, and colleagues. The latter data sets are accessible as a separate CD-Search database and on the FTP site, respectively, but are not directly searchable by text term in Entrez CDD.
Accession Prefixes indicate data sources:
Source databases are evident from CD accessions:

Accession starts withSource Database
cd Curated at NCBI
pfam Pfam
smartSMART
COG COGs
KOG KOGs (available as a separate search set via CD-Search (RPS-BLAST); not searchable by text term in Entrez)
PRK PRotein K(c)lusters (Entrez database)
CHL Chloroplast and organelle proteins; subset of the PRK database.
MTH Mitochondrial proteins; subset of the PRK database.
PHA Phage proteins; subset of the PRK database.
LOAD_ Library of Ancient Domains (LOAD) data set. (available as a separate data set via FTP; not searchable by text term in Entrez)

Accessions that start with "cl" are for superfamily cluster records and can contain domain models from one or more source databases.

When searching CDD, it is possible to limit search results to domains from any given source database by using the Database Search Field.

CD Assembly Process:  How have CDs been assembled? back to top

NCBI-curated domain models are assembled using the methods briefly described in the source databases section of this document. More details about the NCBI curation process are provided by Marchler-Bauer, et al. (2007).

Domain models from external data sources are assembled by various methods, ranging from automated processing to manual curation, depending on the individual source database. Upon import into CDD, protein sequence alignments from each of the source databases are processed in an automated way to provide links from each aligned sequence to the corresponding, complete record in the Entrez Protein database. Occasionally, sequences that cannot be identified in Entrez's databases are omitted or substituted for closely related matches. Whenever possible, sequences in PFAM, SMART, and COGs alignments are substituted for closely related sequences that have direct links to three-dimensional structures in the Moleclular Modeling Database (MMDB).

A representative sequence is chosen for each domain model, preferably with a structure-link, for technical reasons. The representative sequence is generally shown as the first member of the multiple sequence alignment for a domain model. By default, this representative is the 3D structure shown when CD alignments are visualized with Cn3D.

A consensus sequence is computed from the imported alignments. Alignment columns have to be represented in at least (weighted) 50% of all aligned sequences to determine the extent of the consensus. The most frequently occurring residue in each column (after weighting to account for redundancy) is reported. For the extent of the consensus sequence, a position-specific scoring matrix (PSSM) is calculated; the consensus sequence does not contribute to the residue frequency statistics. Search databases compiled of these PSSMs are available through the CD-Search service (help document) and on the NCBI FTP site as collections of pre-computed RPS-BLAST databases that can be used for locally installed versions of that program.

What is unique about NCBI-curated domains? back to top

Example of 3-dimensional structure: Cl- binding residues in Voltage-Gated Chloride Channel, cd00400.

As noted in the section on CDD data sources, the goal of the NCBI conserved domain curation project is to provide database users with insights into how patterns of residue conservation and divergence in a family relate to functional properties, and to provide useful links to more detailed information that may help to understand those sequence/structure/function relationships. To do this, CDD Curators include the following types of information in order to supplement and enrich the traditional multiple sequence alignments that form the foundation of domain models:

3-dimensional structures and conserved core motifs:   NCBI Conserved Domain Curators have re-evaluated and modified multiple sequence alignments imported from outside sources, and made them agree with what we can infer from three-dimensional structure and three-dimensional structure superposition. Curated alignments contain aligned blocks spanning all rows (with no gaps allowed inside blocks) and unaligned regions between blocks. The blocks are meant to represent conserved structural core motifs of the corresponding domain family. The 3D structures can be viewed interactively with the Cn3D structure viewing program. More information about viewing structures is provided in the section of this document on CD summary pages, and the illustration at the right provides an example of a protein structure that has been annotated by NCBI curators to highight the Cl- binding residues.

Conserved features/sites:   In addition to working on the alignment model, NCBI curators also record, when possible, the location and nature of features conserved in the domain family. Typically these would describe catalytic residues, binding sites, or motifs commonly referred to in the literature.

Examples of Conserved Features (Sites) in Voltage-Gated Chloride Channel, cd00400, including Cl- selectivity filter, pore-gating glutamate residue, Cl- binding residues, and dimer interface.  Click anywhere on the image to open the complete, interactive record for this domain model in the Conserved Domain Database (CDD).

Features are added if they seem applicable to the family described in the CD's scope and if there is evidence linking the feature to a set of addresses on the alignment. Such evidence is recorded and available for inspection; it may be free-text comments, citations linked to PubMed, or "structure evidence" - exemplifying the existence of a site by highlighting an actual molecular complex, for example. Both features and evidence can be visualized on CD summary pages (in the conserved features/sites summary box and as hash marks (#) in the multiple sequence alignment displays), and with the Cn3D structure viewing program. An example is shown in the illustration at the right.

Phylogenetic organization:   Based on evidence from sequence comparison, NCBI Conserved Domain Curators attempt to organize related domain models into phylogenetic family hierarchies (A separate illustration and additional information are provided below.) The CDTree program used by NCBI curators can be downloaded in order to view NCBI-curated domains interactively and in greater detail.

Links to electronic literature resources:   NCBI curated domains also provide links to citations in PubMed and NCBI Bookshelf that discuss the domain. These references are selected by curators and, whenever possible, include articles that provide evidence for the biological function of the domain and/or discuss the evolution and classification of a domain family.

NCBI-curated domains can be recognized in CDD search results by their "cd" accession number prefix. It is also possible to limit CDD search results to domain models from any given source database by using the Database Search Field.

What is a domain family hierarchy? back to top

A domain family hierarchy is a set of related domains that share a common ancestor, a common set of conserved residues, and a common general function, but differ from each other in their specific phylogeny, specific functions, and additional spans of conserved residues. Domain hierarchies are present in NCBI-curated domains in order to provide insights into how patterns of residue conservation and divergence in a family relate to functional properties.

Some domain families have only a single node, while others have a hierarchy that is two or more levels deep, sometimes with numerous nodes at each level. Such hierarchies have generic "parent" models and more specific "children". The parent node contains a span of conserved residues that is also present in each of the children. Each of the child nodes can have additional conserved residues that extend beyond that span and help to further characterize the members of the child node.

NCBI CDD Curators attempt to split "children" nodes where they see evidence for ancient gene duplications resulting in orthologous groups, often occurring together with functional divergence. The CDTree program used by NCBI curators can be downloaded in order to view NCBI-curated domains interactively and in greater detail.

Image of domain hierarchy showing divergence in a protein family based on phylogenetic relationships of protein sequences and functional properties. Click anywhere on the image to open an interactive view of the domain model in the Conserved Domain Database (CDD).
Click anywhere on the image to open the complete, interactive record for this domain model (cd00400) in the Conserved Domain Database (CDD).

What is a superfamily? back to top

A superfamily cluster is a set of conserved domain models that generate overlapping annotation on the same protein sequences. These models are assumed to represent evolutionarily related domains and may be redundant with each other.
Clustering methodology:
Superfamily members are clustered through an automated process that involves the following steps:
  1. identify domain models that have overlapping hits on Entrez protein sequences from at least two different sentinel taxonomic nodes (e.g., high level nodes such as flowering plants, conifers, mollusks, flatworms, roundworms, annelid worms, insects, amphibians, mammals, etc.).
  2. pull those domain models into the superfamily
  3. if any domain models are part of an NCBI-curated family hierarchy, pull in all members of the hierarchy
  4. repeat steps 1-3 for each newly added domain until no more new models are pulled in
NOTE: Multi-domain models that were computationally detected are not included in Superfamily clusters. These models are likely to contain multiple single domains and might falsely join superfamily clusters.
Rationale:
Superfamilies provide a method for organizing data within CDD in a non-redundant way. CDD contains conserved domains from a number of different source databases, each of which may have its own model for a given conserved domain. The models might share many similiarities in their reported residue conservation patterns, but differ in the specific protein sequences used in the multiple alignment, their footprint length [domain boundaries], and biological annotations. Because of the similarities, RPS-BLAST might find that multiple domain models align to the same general region of a query protein, but have different footprints and E-value scores relative to the query protein. If the footprints of two or more domain models overlap on the query, those models are clustered into the same superfamily, then the superfamily continues to be extended using the methodology described above.
Example:
One example of a superfamily is Cluster ID cl02915, which contains various domain models for the voltage-gated chloride channel. Superfamily members include the NCBI-curated domain cd00400 and all members of that family hierarchy plus domain models from external resources.
Selection of Superfamily Representative:
A superfamily can contain one to many domain models. As of spring 2008, approximately 70% of the ~9,000 superfamilies contain a single model and the rest contain multiple models. Single model superfamilies often represent proteins specific to certain organisms or taxonomic lineages (for example, viruses). The numbers of superfamilies containing single or multiple domain models will continue to evolve as new domains are imported and new NCBI-curated hierarchies are added.

In superfamilies contatining multiple domain models, one of the models is selected as the source of the superfamily name and description. The representative is one of the following, listed in priority order:
  • the parent node of an NCBI-curated domain family hierarchy, if one is present in the superfamily cluster. In the few cases where a superfamily contains more than one NCBI-curated domain, the parent of the hierarchy with the largest number of sequence hits is chosen as the superfamily representative.
  • the Pfam domain model that hits the largest number of Entrez protein sequences in an RPS-BLAST search
  • the SMART, COG, PRK, or CHL model that hits the largest number of Entrez protein sequences in an RPS-BLAST search
  • the sole member of a superfamily

Search Tips: How to find conserved domains back to top

Protein Query Sequence (CD-Search): back to top
Most users will explore conserved domains starting from CD-Search results for a protein of interest.

The query can be a protein sequence in FASTA format or the GI or Accession of a protein sequence that exists in the Entrez Protein database.

The search results will show the conserved domains found in the protein. The colored bars that depict the domain footprints (shown in both the concise display and full display of CD-Search results) are active hotlinks that open the corresponding CD summary pages with your query sequence embedded in the multiple sequence alignment of proteins used to create the domain model.

The second half of this help document provides details on how to use the CD-Search service, including input required and output shown.

Search Entrez CDD directly: back to top
Conserved domains can also be searched directly in the Entrez CDD database. The Entrez query interface allows searching for keywords, publication dates, and taxonomic span, and more. The PubMed help document and Entrez help document provide general information about using the Entrez search system. The information below pertains specifically to CDD.

For example, search the Entrez CDD database for strings like "Kinase" or "pfam023*" or "Tetratrico*" to see how it works:
for

Search Results: Document Summary (DocSum) back to top
The initial search results provide a list (document summary, or "DocSum") of the conserved domain records that contain your search term, which can appear in any field of the record, unless a search field was specified in the query.

If desired, you can narrow your search by restricting the query to a search field of interest or adding more terms with a Boolean AND.

Alternatively, you can broaden your search by adding more terms (e.g., synonyms) to your query with a Boolean OR, or by following links to Superfamily Members.
"Display" menu options: back to top
The "Display" menu on the DocSum (search results) page allows you to view output in the formats below. The "Display" menu options act upon all of the CDD records in the current window (default) or on the subset selected with checkboxes.
Format Description
Summary shows the conserved domain's:
  • accession number
  • thumbnail image indicating if the conserved domain includes a protein sequence from a 3D structure.
    If a 3D structure is included, the thumbnail will be a still graphic of the actual domain structure.
    If no 3D structure is available for the protein family from which the domain model was created, the thumbnail icon will show a schematic of a multiple sequence alignment.
  • short name, which concisely defines the domain
  • a text summary, which provides a synopsis of biological function and salient features of the domain
  • PSSMid
  • Brief shows the conserved domain's:
  • accession number
  • short name, which concisely defines the domain
  • PSSMid
  • UI List shows only the conserved domain's:
  • PSSMid
  • Additional options The other options in the Display menu are described in the section of this help document on Links to related data in Entrez
    The detailed view ("CD Summary page) for a conserved domain can only be viewed for one record at a time by following the link for that record's accession number.

    In addition to displaying CDD search results in various formats, the Display menu can also be used to retrieve related data in CDD and in other Entrez databases by selecting the "xxxxx Links" menu items. It therefore provides integrated access to data many different data types.

    For example, the Superfamily Member Links option will retrieve the other domain models in CDD that appear to be evolutionarily related to or redundant with the domains listed/selected on the page. The Protein, Structure, Gene, PubMed, etc. links traverse to associated data in those Entrez databases.
    "Links" pop-up menus: back to top
    The "Links" menu for an individual CDD record on the Docsum page page allows you to retrieve related data for that particular domain model. For example, if you select the Superfamily Members option from the "Links" menu of a CDD record (e.g., cd00400), you will retrieve the other domain models in CDD that appear to be evolutionarily related to or redundant with cd00400. In contrast, the "xxxxx Links" options in the "Display" menu near the top of the search results page act upon all of the CDD records in the current window (default) or on the subset selected with checkboxes.

    The links are described in the help document section on "CDD Record: What information is displayed for each domain model on its CD Summary page?" : "Links to related data in Entrez". Most links are accessible from both the DocSum page and the detailed CDD records (also called "CD summary pages"), with the exception of two (Architecture and Books) that are available only from the CD summary pages.
    Search Fields back to top
    By default, Entrez searches All Fields of the database unless a specific search field is indicated in the query. Search fields can be selected from pop-up menus on either the Limits and Preview/Index page, or can be typed directly in your query (surrounding field names with square brackets [], for example, [Organism] or [Orgn]).* The Index button on the Preview/Index page allows you to browse the index of each search field, where you can see the available terms, the number of records containing each term or phrase, as well as the syntax for entering values in search fields such as Modification Date or Publication Date.

    The currently available fields include:

    Field name Abbreviation* Description Sample Search
    All Fields [all] Searches the complete database record "chloride channel"[All]

    will retrieve the CDD records that contain the phrase "chloride channel" in any field of the record.

    The quotes surrounding the search terms ensure they are searched as a phrase.**
    Accession [accn] Searches only the accession number of the record, which is always an alphanumeric combination. The accession number prefix indicates the source database cd00400[All]

    will retrieve the CDD records that contain the phrase "chloride channel" in any field of the record. (The quotes surrounding the search terms ensure they are searched as a phrase.)
    Alternative Accession [AltAccn] Native accession format from an external source database. For example, the PFAM database uses accessions with a format such as pf08617. When these are imported into CDD, the accessions are represented in a format such as pfam08617. Similarly, the SMART database uses a format such as sm00100, while records that have been imported into CDD have a format such as smart00100. This is primarily done to indicate that SMART and PFAM domain alignments may have been modified slightly by NCBI staff, for example by the substitution of a protein sequence that does not have 3D structure with a highly similar one that does (as explained in the help document section on the CD assembly process). pf08617[AltAccn]

    will retrieve the pfam08617 record from CDD.
    Database [db] Use this field to limit your search to a particular source database. cdd[db]

    will retrieve the NCBI curated domain models and superfamily records, which are also created at NCBI, from CDD.

    pfam[db]

    will retrieve the domain models that were imported from the PFAM database.
    Filter [filt] The "Filter" search field allows you to narrow your retrieval to records that have certain attributes, such as curated or uncurated, or records that have links to other Entrez databases of interest.

    Many attributes from the Filter field are provided in the "Display" and "Links" menus of an Entrez search results page, and in the "Links" box on a CD Summary Page. A detailed explanation of each type of link is provided in the description of the CD Summary page "Links" box.
    cdd_gene[filt]

    will retrieve the CDD records that have associated data in the Entrez Gene database.

    On the CDD search results page, you can then open "Display" menu and select the Gene Links option to view the corresponding Entrez Gene records.
    Modification Date [mdat] Date of the most recent changes to the alignment model and/or descriptive information  
    Number of Sites [ns] The number of conserved features, such as catalytic or binding sites, that have been annotated on a domain. Conserved features are available on NCBI-curated domains.

    As of April 2008, this ranges from zero to 21 sites. (To see the current range, select the "Number of Sites" search field on the "Preview/Index" page, then use the "Index" button to view the index of that search field and see available values.)
    4[ns]

    will retrieve the NCBI curated domain models that contain four sites (i.e., four conserved features).
    Organism [Orgn] The root taxonomy node of a conserved domain. This is the highest node in the NCBI Taxonomy database that encompasses all organisms whose protein sequences are in the multiple sequence alignment for a domain model. eukaryotes[orgn]

    will retrieve conserved domains found in eukaryotes.
    PSSM Length [plen] Length of the PSSM or domain search model. This is the same as the length of the consensus sequence.  
    Publication Date [pdat] date a CD was published [create date = date at which the seed (or de-novo) alignment was imported into CDD; what is publication date = date of release into public version of CDD?]  
    Structure Representative [strp] The number of structures that have protein sequences in the multiple sequence alignment for a domain model.

    As of April 2008, this ranges from zero to 72 protein sequences from structures. (To see the current range, select the "Structure Representative" search field on the "Preview/Index" page, then use the "Index" button to view the index of that search field and see available values.)
    6[strp]

    will retrieve domain models that contain six protein sequences from 3D structures in their multiple sequence alignment.
    Text Word [word] The long description (text summary) of the conserved domain.    
    The Description of Sites [sd] Brief descriptions of conserved features.  
    Title [titl] The short name of a conserved domain, which concisely defines the domain.
    Example:  Voltage gated ClC (voltage gated chloride channel)
    voltage[titl]

    will retrieve the CDD records that have the term "voltage" as part of their short name, such as cd00400: Voltage gated ClC and pfam00654: Voltage CLC, which represent NCBI-curated and externally imported domain models, respectively, for the voltage gated chloride channel.

    * In a query, the field name may be typed as the full name or abbreviation, and may be in upper, lower, or mixed case. It must be surrounded by square brackets []. A space between the search term and the field specifier is optionl. If desired, surround a phrase with quotes to force an adjacency search. For example, the sample queries below will work equally:
          "chloride channel" [WORD]
          "chloride channel"[WORD]
          "chloride channel" [word]
          "chloride channel"[Text Word]

    ** The quotes surrounding the search terms in the All Fields example ensure the terms are searched as a phrase. If quotes are not used and the terms are not automatically recognized as a phrase by the Entrez system, Entrez will insert a Boolean AND between the terms and they may or may not appear adjacent to each other in the retrieved records. More search tips are provided in the PubMed help document and Entrez help document.

    Entrez Protein links to Conserved Domains: back to top
    All sequence records in the Entrez Protein database have been RPS-BLASTed against the Conserved Domain database. These pre-calculated search results are available as "Conserved Domains" links from protein sequence records, making protein functional information one click away from the sequence record.

    Domain architecture: CDART: back to top
    The Conserved Domain Architecture Retrieval Tool (CDART) program has been used to analyze the domain architecture of all sequence records in the Entrez Protein database, and to identify proteins with similar architecture. Those proteins are accessible by selecting "Domain Relatives" in the "Links" menu of a protein sequence record of interest.

    Or, you can search CDART directly by entering a query protein sequence in FASTA format, or entering the GI or Accession number of a protein sequence that already exists in the Entrez Protein database. CDART will then retrieve proteins that contain one or more of the domains present in the query sequence.


    CDD Record (CD Summary page):   What information is displayed for each domain model? back to top

    A CD-summary page provides the following information for a domain model (example: cd00400: voltage-gated chloride channel):
    Text Summary (synopsis of function): back to top
    • The text summary shown at the top of a CD summary page was written by curators at the source database and provides a synopsis of the domain's biological function. In NCBI curated domains, it also describes the taxonomic extent of the domain, whether it is a monomer or dimer, and any salient features. The text summary in a superfamily record is derived from the representative domain.
    Links to related data in Entrez:back to top
    Link Name Description
    Family Members Members of the hierarchy to which the domain belongs. This link is only provided for NCBI-curated domain records.
    Superfamily This links to the record for the CDD superfamily to which this domain belongs.
    Superfamily Members This retrieves all the other domain models that belong to the superfamily.
    Architecture * Proteins found by CDART to contain one or more of the domains present in the proteins that are hit by domains found in the domain superfamily
    Protein Superset of all protein sequences found by RPS BLAST to contain the domain (with an E-value equal to or better than the default cutoff of 0.01). Therefore, this superset includes CD-Search hit types: specific hits and non-specific hits.
    Protein Specific Hits Subset of protein sequences found by RPS BLAST to contain the domain (with an e-value that is equal to or lower than a domain-specific Threshold E-value). More about specific hits...
    Structure All of the protein sequences from 3D structure records that were hit by RPS BLAST by this domain model's PSSM
    Gene Links from the RPS BLAST concise display hits to the protein sequences listed in Entrez Gene records.

    Details: Each protein listed in an Entrez Gene record has been RPS BLASTed against the domain models in CDD. Links are then created between specific regions of those protein sequences and top-scoring domain models which align to them. Top-scoring domain models are shown either as specific-hits, or as the superfamily to which the highest-ranking non-specific hit belongs.
    HomoloGene Links from the RPS BLAST concise display hits to the protein sequences listed in HomoloGene records. (The details provided for Gene links, above, also apply to HomoloGene links.)
    PubMed PubMed citations annotated on the domain. All references have been identified by curators, either by NCBI staff for the NCBI-curated domains, or by the staff of the external databases represented in CDD.

    For NCBI-curated domains, the PubMed link leads to the citations that have been annotated on that particular node of a domain family hierarchy, not for all nodes in the tree. Whenever possible, the