About dbSAM

Introduction

Correct subcellular localization is fundamental to protein function, and mislocalization is a key driver in diverse human diseases. However, the landscape of localization altering mutations (SAMs) remains largely uncharacterized. Here, we present dbSAM, a comprehensive database curating experimentally validated and computationally predicted SAMs. We manually curated over 500 publications to compile an experimental dataset consisting of 1,512 verirified SAMs and 2,789 mutations with no significant effect on protein localization. By integrating 8,934,325 missense mutations across 18,675 human proteins from public resources, we systematically identified 2,137,705 putative SAMs through a multi-dimensional screening pipeline incorporating two localization prediction models (pSAM and ProtGPS), post-translational modification sites, localization signals, as well as condensate-associated functional domains. We further annotated these variants against disease-related databases, yielding 181,955 pathogenic SAMs implicated in 3,638 Mendelian diseases and 622 cancer types. It indicates that 3%–12% of pathogenic missense mutations trigger protein mislocalization. Our findings highlight protein mislocalization as a prevalent and fundamental pathogenic mechanism underlying human diseases.

dbSAM
Statistics

(1) Variant-inducedtransitions between nuclearlocalization levels

(2) Number of DNL/NES/NLS and number of variants occurring within these determinants

Data Sources

Variant data used in dbSAM

ResourceDescriptionURL
Pubmed507 publised literature (update to 2025.03)https://pubmed.ncbi.nlm.nih.gov/
OncoKBA precision oncology knowledge basehttp://oncokb.org/
dbSNPDatabase for human single nucleotide variantshttps://www.ncbi.nlm.nih.gov/snp/
GWASHuman genome-wide association studieshttps://www.ebi.ac.uk/gwas/
COSMICCatalogue of Somatic Mutations in Cancerhttps://cancer.sanger.ac.uk/cosmic/
ClinVarRelationships between human variations and phenotypeshttps://www.ncbi.nlm.nih.gov/clinvar/

Annotation sources used in dbSAM

Data typeResourceDescriptionURL
Basic informationUniProtUniversal protein resourcehttps://www.uniprot.org/
Disease association OncoKBA precision oncology knowledge basehttp://oncokb.org/
GWASHuman genome-wide association studieshttps://www.ebi.ac.uk/gwas/
TCGACancer-associated somatic mutationshttps://www.cbioportal.org/
ClinVarRelationships between human variations and phenotypeshttps://www.ncbi.nlm.nih.gov/clinvar/
OMIMDatabase of human genes and genetic phenotypeshttps://www.omim.org/
AlphaMissenseDatabase of their pathogenicity scores and classes of amino acid substitutionshttps://alphamissense.hegelab.org/
dbNSFPDatabase for functional annotations and deleteriousness predictions of non-synonymous single-nucleotide variantshttps://www.dbnsfp.org/
OncoTreeA cancer classification systemhttps://oncotree.org/
Experimental NLS/NES region SeqNLSNuclear localization signal predictionhttp://mleg.cse.sc.edu/seqNLS/
ValidNESsValidated NES-containing proteins, functional NES sites and NES predictionshttp://validness.ym.edu.tw/
NESbaseNuclear export signal databasehttps://www.nesbase.org/
Predicted NLS/NES regionNLSdbNuclear localization signal databasehttps://service.rostlab.org/nlsdb/
Experimental subcellular localizationUniProtUniversal protein resourcehttps://www.uniprot.org/
Predicted subcellular localization CompartmentsSubcellular localization databasehttps://compartment.jensenlab.org/
TranslocatomePredicted translocating proteins from human cellshttp://translocatome.linkgroup.hu
Post translational modificationqPTMQuantification of post-translational modificationshttps://qptm.omicsbio.info/
PhosphositePlusPost-translational modification databasehttps://phosphosite.org/
3D structurePDBProtein data bankhttps://www.rcsb.org/
AlphaFold DBPredicted 3D structure of proteinshttps://alphafold.ebi.ac.uk/
Physicochemical propertyAAindexAmino acid index databasehttps://www.genome.jp/aaindex/

Tools used in dbSAM

ToolsDescriptionURL
pSAMA deep learning model for predicting 5 type of subcellular localizations and site-specific contributions of proteinshttps://github.com/lzxlab/pSAM
ProtGPSA language model for predicting 12 type of condensate compartments of proteinshttps://github.com/pgmikhael/protgps
IUPredA tool that predicts the tendency of amino acids to be in disordered regions based on energy estimation and experimental annotationhttps://iupred1.elte.hu/
NetSurfPA tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequencehttps://services.healthtech.dtu.dk/services/NetSurfP-1.0/
Contact us
This study was performed by Jiamin Hu, Jun Wu and Ze-Xian Liu,

Jiamin Hu, Jun Wu and Ze-Xian Liu are from

Sun Yat-sen University Cancer Center,

Building 2#, 651 Dongfeng East Road,

Guangzhou 510060, P. R. China


Email: liuzx AT sysucc.org.cn

Tel/Fax: +86-20-87342025