About dbSAM
Correct subcellular localization is fundamental to protein function, and mislocalization is a key driver in diverse human diseases. However, the landscape of localization altering mutations (SAMs) remains largely uncharacterized. Here, we present dbSAM, a comprehensive database curating experimentally validated and computationally predicted SAMs. We manually curated over 500 publications to compile an experimental dataset consisting of 1,512 verirified SAMs and 2,789 mutations with no significant effect on protein localization. By integrating 8,934,325 missense mutations across 18,675 human proteins from public resources, we systematically identified 2,137,705 putative SAMs through a multi-dimensional screening pipeline incorporating two localization prediction models (pSAM and ProtGPS), post-translational modification sites, localization signals, as well as condensate-associated functional domains. We further annotated these variants against disease-related databases, yielding 181,955 pathogenic SAMs implicated in 3,638 Mendelian diseases and 622 cancer types. It indicates that 3%–12% of pathogenic missense mutations trigger protein mislocalization. Our findings highlight protein mislocalization as a prevalent and fundamental pathogenic mechanism underlying human diseases.
(1) Variant-inducedtransitions between nuclearlocalization levels
(2) Number of DNL/NES/NLS and number of variants occurring within these determinants
Variant data used in dbSAM
| Resource | Description | URL |
|---|---|---|
| Pubmed | 507 publised literature (update to 2025.03) | https://pubmed.ncbi.nlm.nih.gov/ |
| OncoKB | A precision oncology knowledge base | http://oncokb.org/ |
| dbSNP | Database for human single nucleotide variants | https://www.ncbi.nlm.nih.gov/snp/ |
| GWAS | Human genome-wide association studies | https://www.ebi.ac.uk/gwas/ |
| COSMIC | Catalogue of Somatic Mutations in Cancer | https://cancer.sanger.ac.uk/cosmic/ |
| ClinVar | Relationships between human variations and phenotypes | https://www.ncbi.nlm.nih.gov/clinvar/ |
Annotation sources used in dbSAM
| Data type | Resource | Description | URL |
|---|---|---|---|
| Basic information | UniProt | Universal protein resource | https://www.uniprot.org/ |
| Disease association | OncoKB | A precision oncology knowledge base | http://oncokb.org/ |
| GWAS | Human genome-wide association studies | https://www.ebi.ac.uk/gwas/ | |
| TCGA | Cancer-associated somatic mutations | https://www.cbioportal.org/ | |
| ClinVar | Relationships between human variations and phenotypes | https://www.ncbi.nlm.nih.gov/clinvar/ | |
| OMIM | Database of human genes and genetic phenotypes | https://www.omim.org/ | |
| AlphaMissense | Database of their pathogenicity scores and classes of amino acid substitutions | https://alphamissense.hegelab.org/ | |
| dbNSFP | Database for functional annotations and deleteriousness predictions of non-synonymous single-nucleotide variants | https://www.dbnsfp.org/ | |
| OncoTree | A cancer classification system | https://oncotree.org/ | |
| Experimental NLS/NES region | SeqNLS | Nuclear localization signal prediction | http://mleg.cse.sc.edu/seqNLS/ |
| ValidNESs | Validated NES-containing proteins, functional NES sites and NES predictions | http://validness.ym.edu.tw/ | |
| NESbase | Nuclear export signal database | https://www.nesbase.org/ | |
| Predicted NLS/NES region | NLSdb | Nuclear localization signal database | https://service.rostlab.org/nlsdb/ |
| Experimental subcellular localization | UniProt | Universal protein resource | https://www.uniprot.org/ |
| Predicted subcellular localization | Compartments | Subcellular localization database | https://compartment.jensenlab.org/ |
| Translocatome | Predicted translocating proteins from human cells | http://translocatome.linkgroup.hu | |
| Post translational modification | qPTM | Quantification of post-translational modifications | https://qptm.omicsbio.info/ | PhosphositePlus | Post-translational modification database | https://phosphosite.org/ |
| 3D structure | PDB | Protein data bank | https://www.rcsb.org/ |
| AlphaFold DB | Predicted 3D structure of proteins | https://alphafold.ebi.ac.uk/ | |
| Physicochemical property | AAindex | Amino acid index database | https://www.genome.jp/aaindex/ |
Tools used in dbSAM
| Tools | Description | URL |
|---|---|---|
| pSAM | A deep learning model for predicting 5 type of subcellular localizations and site-specific contributions of proteins | https://github.com/lzxlab/pSAM |
| ProtGPS | A language model for predicting 12 type of condensate compartments of proteins | https://github.com/pgmikhael/protgps |
| IUPred | A tool that predicts the tendency of amino acids to be in disordered regions based on energy estimation and experimental annotation | https://iupred1.elte.hu/ |
| NetSurfP | A tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence | https://services.healthtech.dtu.dk/services/NetSurfP-1.0/ |
