SynDB Home Page
SynDB Home Page
Browse
Search
Download
Help
People
links


SynDB help/FAQ       

 

ABOUT SYNDB

 

Q: What is SynDB?

Q: What is synapse ontolgy

Q: What does the number following a synapse ontology term mean

Q: What can SynDB be used for

Q: What does the db_source mean in the synapse ontology view

Q: Which proteins are collected in SynDB?

Q: What information about each protein does SynDB contain?

Q: How is the SynDB database structured/designed?

Q: How is the SynDB user interface structured/designed?

Q: How current is the information in SynDB?

 

USING SYNDB

 

Q: How do I search for a protein by keyword?

Q: How do I search for a protein by sequence?

Q: How do I browse the records in SynDB?

Q: How do I interpret a protein record/entry in SynDB?

Q: How do I download the data?

 

FEEDBACK

 

Q: How do I contact SynDB?

Q: How can I contribute to SynDB?

Q: How do I cite SynDB in my research?

 

 

 

 

ABOUT SYNDB

 

 

Q: What is SynDB? (top)

 

A: SynDB is the first focused database of the molecular biology of the synapse proteome.  It contains the most comprehensive collection of proteins (13809 unique proteins spanning 1979 species and 104 protein domains, Aug 2006) that are known or predicted to be associated with synaptic activities.  It integrates extensive information on protein functions, sequences, structures, expression, pathways, interactions, and disease associations.  SynDB was generated using a combination of automated approaches, including keyword- and domain-based searches, and manual curation.  It serves as a starting point for future neurobiology, neuropharmacology, and neuroinformatics research. 

 

 

Q: What is synapse ontology?(top)

 

A: Synapse ontology is a set of standard vocabulary which help to describe all synaptic gene products in a consistant way. As in common ontology, synapse ontolgy is composed of all the terms in a hierarchical structure, but specifically restricted to the function and structure annotation of synapse related gene products.
   Synapse ontology is a callaborative fruit of bioinformatists and neural biologists.  Synapse ontolgy is aimed to describe all the synaptic molecules in terms of structure/biochemistry of synapse and physiology/function at synapse in a specied-independent manner. The controled vocabularies are hierarchically structured, so you can browser the related gene products in different levels: for example, you can find all the gene products of synaptic vesicle cycling or ion channels and receptors, or you can zoom in on all the gene products playing roles in the priming step of synaptic vesicle cycling.  

 

 

Q: Why does the number following a synapse ontology term mean?(top)

 

A: The number is the count of all the unique proteins under that term, that is, the sum of proteins which are specifically assigned to the term and all the proteins which are assigned to all the subtree terms.  

 

 

Q: Why does it mean for db_source in the synapse ontology view?(top)

 

A: The db_source indicates where the gene product comes from. There are three symbol in total, that is, sp, trembl, refseq, which mean uniprot/swissprot, uniprot/trembl, refseq respectively. The link_out column links to the source item of the database.  

 

 

 

A:  SynDB can serve as a repository for current knowledge and a starting point for future experimental design or in silico data mining. 


For example:

    1. Extensive annotations in SynDB allow for analyses of global attributes of the synapse proteome, such as identifying the most prevalent molecular functions of the synapse proteome and potential miRNA targets.


    2. The complete or a selected set of mRNA sequences in SynDB can be downloaded to use in a DNA array experiment to study gene expressions in synaptic processes. 


    3. The previously unannotated proteins that were predicted to be synapse-related by domain search can serve as interesting candidates for cloning and further molecular biology validation.


    4. Phylogenetic analyses of the protein families can suggest possible experiments on model organisms.

 

 

Q: Which proteins are collected in SynDB? (top)

 

A: We used DAG-edit to input, manage, and update SynO (Figure 2). We annotated each term with Name, Synonyms, Definition, and Source references, as well as the 'part-of' or 'is-a' relationship to other terms. In the Definition field we recorded additional protein keywords associated with the term as well as InterPro domains related to the term. SynO is available for download in the Open Biomedical Ontologies (OBO) flat file format at http://syndb.cbi.edu.cn/download/SynO.obo

We developed a Perl script to generate a list of search keywords based on SynO, including and expanding from SynO terms and synonyms. If a SynO term consists of more than one words, the Perl script specified which word can be expanded and whether the order of the words can be flexible. All possible combinations were automatically generated. The expanded list of search keywords was used in the next step.

We searched the InterPro database using the search keywords and retrieved 400 protein domains. Through careful manual screening we identified 109 domains as being involved in synaptic activities and assigned them to the most appropriate SynO terms. We retrieved over 5000 proteins using the mapping between InterPro and UniProt and associated these proteins with SynO terms.

We then searched UniProt to retrieve additional protein entries that contain the search keywords. While domain-based searches tend to have a high false-negative rate (as not all domains can be modeled), keyword-based searches tend to have a high false-positive rate, requiring that we impose both automated and manual quality control. For example, entries containing "immune" or "immunological" were removed because "immunological synapse" is a term defining a process in the immunological system that occurs in hundreds of protein entries. In another example, thousands of false-positive entries were removed because they were annotated as being submitted by a company named Synapse. After manual review of thousands of entries, we retrieved over 10000 proteins and assigned them with SynO.

 

 

Q: What information about each protein does SynDB contain? (top)

 

A: To enhance SynDB's utility as an information resource and a data-mining tool, we collected extensive information on sequences, structures, expression, pathways, protein-protein and protein-small compound interactions and disease associations. The table below lists the features and corresponding molecular databases. Detailed description follows.

 

Protein Feature

Cross-referenced Database

Name

HUGO (http://www.gene.ucl.ac.uk/nomenclature/)
LocusLink (ftp://ftp.ncbi.nih.gov/refseq/LocusLink/)

Functional Category

Gene Ontology (GO) (http://www.ebi.ac.uk/GOA/)

Species

NCBI Taxonomy DB (ftp://ftp.ncbi.nih.gov/pub/taxonomy/)

Sequences

GenBank (http://www.ncbi.nlm.nih.gov/)

Chromosomal Location

GoldenPath (http://hgdownload.cse.ucsc.edu/goldenPath/)

Protein Domain

InterPro (http://www.ebi.ac.uk/interpro/)

Structure

Protein Data Bank (PDB) (http://www.rcsb.org/pdb/)

Gene Expression

Bodymap-Xs (http://bodymap.jp/)

Pathway

KEGG (http://www.genome.jp/kegg/)

Protein-Protein Interaction

PPID (http://www.anc.ed.ac.uk/mscs/PPID/)
BIND (http://bind.ca/)

Protein-

Small Compound Interaction

ChEBI (http://www.ebi.ac.uk/chebi/)

Disease Association

OMIM (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM)

References

PubMed (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi)



  • Name assignment

    From RefSeq proteins, gene names are parsed out from Entrez Gene database. For those unassigned SwissProt/Trembl proteins, GN field of original SwissProt format file is taken.



  • mRNA/CDS mapping

    mRNA/CDS accessions corresponding to SynDB proteins were retrieved from the "KnownGene" table of GoldenPath. Then, sequences were downloaded from GenBank.

  • Domain architecture

    For SwissProt/Trembl proteins, their domain composition is directly parsed out from InterPro release files.

  • Chromosomal location

    SynDB proteins of human, mouse and rat were mapped to the genome according to the "KnownGene" table of GoldenPath (See also Chromosomal browser)

  • Linking to GeneNetwork

    "GeneNetwork incorporates several large transcriptome and phenotype databases. We have assembled and curated about 25 years of published legacy data that includes well over 1000 classical system-level phenotypes in six GRPs in mice and rats (BXD, AXB/BXA, CXB, BXH, LXS, and HXB/BXH recombinant inbred strains)." Most of SynDB human, mouse or rat proteins have corresponding entries or orthologous entries in GeneNetwork, which could complement the lack of phenotypic data in SynDB.

  • Mapping to BodyMap-Xs

    BodyMap-Xs is an anatomical breakdown of animal ESTs. Based on the chromosomal mapping data of GoldenPath (the "all_est" table), orientation-reliable ESTs were mapped to the genome (See also the manual of NATsDB). Based on the genomic coordinates, ESTs were mapped to SynDB proteins further.

  • Mapping to protein-protein interaction database (PPID/BIND)

    With cross-reference provided by Entrez-Gene, SynDB proteins were mapped to PPID and Bind.

  • Mapping to pathway database (KEGG)

    KOBAS is used to assign KEGG ontology terms to SynDB proteins with BLASTP cutoff 1e-5 and rank 5.

  • Mapping to PDB

    PDB is a popular structure database. Since there is rather serious redundancy, a representative dataset, PDB_SELECT_25 is downloaded, and corresponding sequences are parsed out. Pair-wise BLASTP with E-value cutoff 0.01 is run between SynDB entries and chains of PDB_SELECT_25. Actually, we can extract the PDB links from the raw datafile of SwissProt/RefSeq. However, few entries have such cross-references. Moreover, taking homology-remodeling into consideration, whether there is similar sequences in PDB for a special SynDB entry is of some significance.

  • Mapping to ChEBI

    Chemical Entities of Biological Interest (ChEBI) is a dictionary of 'small molecular entities'. The term 'molecular entity' means any constitutionally or isotopically distinct atom, molecule, ion, complex, conformer, etc. The SwissProt division of SynDB could be directly mapped to ChEBI, for it provides links to UniProt.

  • Except the above mappings, OMIM information as well as GO assignment is also parsed out from its raw datafiles. So do the references

  •  

    Q: How is the SynDB database structured/designed? (top)

     

    A: SynDB is stored in a MySQL relational database consisting of 116 tables and occupying over five gigabytes of hard-disk volume.  The database architecture schema is shown below:

     

    Q: How is the SynDB user interface structured/designed? (top)

     

    A:SynDB has a web interface consisting of five modules:

    Browse: browse SynDB protein entries according to Functional Category, Protein Domain, Species, and Chromosomal Location (for human, mouse, and rat). SynDB supports interaction between views and within the hierarchical structure of a view.

    Search: search for proteins of interest by text (protein ID, keywords, description) or sequence (amino acid or nucleotide).

    Download: download protein sequences, corresponding cDNAs

    Help: Frequently asked questions about SynDB.

    Comment: User feedback on specific proteins or general of SynDB.

     

     

    Q: How current is the information in SynDB? (top)

     

    A:  SynDB is maintained weekly with major updates scheduled quarterly.

     

      1. last update finished on Aug. 2006

      2. the current source versions in SynDB are InterPro 12.0, UniProt downloaded on Jul. 2006

     


     

     

    USING SYNDB

     

    Q: How do I search for a protein by keyword? (top)

     

    A: The text search page performs a free text search of the Protein Name, ID, AC, Definition and Species fields for all proteins in SynDB.  It supports phrase searches (in quotes) and Boolean syntax for AND, OR, and NOT.   Quick access to the search function is available at the top of each page.  Returns results in the form of a summary of matched proteins.

     

    Q: How do I search for a protein by sequence? (top)

    A:  The sequence search page accepts a FASTA format sequence and performs a local BLAST (BLASTP or BLASTX) of one of the following data sets:

    Returns results in the from of BLAST result. If the sequence of interest is found, the user can jump directly to that entry.
    Here, a sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length.

     

    Q: How do I browse the records in SynDB? (top)

     

    A:

    The Functional Category browser displays the synapse related proteins of all species according to their Gene Ontology (GO) assignments.  GO entries without assigned proteins do not appear in the list. The default 'Level 1' displays entries in the three root nodes, while selecting Level 2, 3, or 4 displays a list with the nodes exploded to that level.  'Total' 'H', 'M' and 'R' indicate the number of proteins in 'all species', 'homo sapiens', 'mus musculus', and 'rattus norvegicus' respectively.

     

    The Protein Domain browser displays the 306 synapse related InterPro Domains in 94 groups.  The list hierarchy shows 'Parent/child' relationships as indented and 'Contains/found in' relationships on the same level.'Total' 'H', 'M' and 'R'  indicate the number of proteins in 'all species', 'homo sapiens', 'mus musculus', and 'rattus norvegicus' respectively, and 'Expand' mode will list all proteins in a given Domain.(see also limits of domain or family assignments)

     

    The Species browser displays species names and corresponding numbers of proteins. Selecting a species takes you to a protein list from which you can select either a Functional Category or Protein Domain browser view for the proteins of that species only.

     

    The Chromosomal Location browser gives a graphical representation of the chromosomes of Human, Rat, or Mouse.  A '+' or '-' symbol indicates a protein coded by the plus or minus chain.  Holding the cursor over a symbol displays the gene name and definition, while clicking on a symbol opens the corresponding protein entry. [Users can adjust 'Locus number' to show those loci clusters with a designated number locus. Combining locus number, species as well as data source, only proteins of interest are shown.]

     

     

    Q: How do I interpret a protein record/entry in SynDB? (top)

     

    A:

     

    The entry page includes one top bar and over ten fields, which are organized as four large groups: General Information, Local Annotation, Cross Reference and Reference.  In addition, plenty of links are provided, which help uses jump to original sources of data or information.

     

    • 'General Information' group is designed to show the official names, the description, and the GO assignment, etc. Some fields are directly parsed out from original data source. Here, two points might be noteworthy. First, as for the 'GO' field, if multiple GO entries were assigned to this protein, one schematic and clickable figure is to display the relationship between these GO entries, i.e., parent-child relationship. Second, as for domain field, Ensembl's InterPro mapping information is retrieved via its API. SwissProt/Trembl's are parsed out InterPro 12.0 mapping file. GenBank's CDD assignment are directly parsed out from region field of its genbank-format file. Similarly, a figure is drawn to show the possible domain structure of this entry. The so-called 'Possible synapse-related InterPro domains' mean 306 domains mentioned above. (See also Collection of domains)

    • 'Local Annotation' means those fields via local calculation, such as homolog cluster, chromosomal content and protein family. The 'Similar Sequences' field displays those possible paralogous or redundant sequences with overall identity cutoff above 95% and overall length coverage above 95% from the same species. Here, users could jump to those un-representative sequences via this field, which are not labeled with '*'. The 'Loci structure' and 'Loci cluster' are used for the chromosomal content of this entry, e.g., intron/exon structure, proteins in close proximity, synteny pair information, genetic band, and so on. Here for more clarity, exons or loci are drawn in black and gray alternately. With mouse over the bars, a brief information will emerge. The 'Protein Family' is used to display the corresponding cluster yielded via TribeMCL.

      (See also Redundancy-removing and Chromosomal location)

      Here, one point is noteworthy. Differences of family distribute between different species, especially those close species, might be caused by different annotation method of background proteome not a genuine biological difference. For example, number of genes in rat and mouse in many families are very different, which is probably a direct reflection of the difference in the level of gene prediction for the two organisms.

    • 'Cross Reference' is used to show the mapping data. For 'Interaction' field, a schematic also clickable figure is to display the protein-protein interaction map. The oval in yellow means the protein of interest. Some descriptive information is given too. As for KEGG pathway field, the pathway description will appear if possible. The bodymap field is relatively complex. A column figure indicates the distribution of EST counts across 13 BodyMap-Xs organs. If users click the "Details" link, a detailed statistics across 40 BodyMap-Xs tissues will be shown. Following it is the field 'PDB', which shows identifier(ID) and description of similar PDB chains with E-value not larger than 0.01. With mouse over the ID, some BLASTP statistics value will be shown. Then, the field 'OMIM' describes whether this protein is mapped to OMIM. The last field 'ChEBI' lists related chemical compounds: names and where they appear in the annotation layout of this protein. (See also Methods of cross reference)

    • 'Reference' lists all the references from the original data file. The format should be like this: Journal name, volume, publication date and title.

     

     

    Q: How do I download the data? (top)

     

    A:

      1. Currently, protein sequences as fasta format are available at http://syndb.cbi.edu.cn/sequence/

      2. Right click the link and 'save target as'; Then extract the file with gunzip or winzip.

      


     

    FEEDBACK

     

    Q: How do I contact SynDB? (top)

     

    A: You can leave the message at http://syndb.cbi.edu.cn/sdb_message.php

     

    Q: How can I contribute to SynDB? (top)

     

    A: User feedback is welcome and supported through the web site. Feedback comments on specific protein entries could be added and displayed through the top bar of each protein. 

     

    Q: How do I cite SynDB in my research? (top)

     

    A: There is no publication on SynDB until now. But, the paper will be submitted soon.


    Copyright © Center for Bioinformatics, Peking University