Analysis of Cys2-His2 zinc finger (C2H2-zf) proteins and domains in the PDB.
For details on the analysis refer to the Jupyter notebook.
- The UniProt reference proteomes can be downloaded with the bash script get_data.sh
- The Prosite to UniProt mappings (i.e. uniprot2prosite.tab.gz) is obtained from UniProt by customizing the output to
Entry,Gene names,OrganismandPROSITE - The PDB to UniProt mappings (i.e. uniprot2pdb.tab.gz) is obtained from UniProt by customizing the output to
Entry,Gene names,OrganismandPDB
Pickles and any intermediate files are created on the fly.
- PDB2UniProt
- Python 3.7 with the following libraries: Biopython and pandas
- ScanProsite (version 1.86)
The following results were obtained as of April 24, 2020, are based on predictions by the PROSITE zinc finger C2H2 type domain profile, and refer to the UniProt reference proteomes of eukaryotes (one protein per gene).
- The total number of C2H2-zf proteins is 186,703
- For human, the number is 762
- The total number of C2H2-zf domains is 963,553
- For human, the number is 7,180
- The total number of C2H2-zf proteins with a PDB structure is 136
- For protein-DNA complex structures, the number is 23
- For human, these numbers are 91 and 12, respectively"
- The total number of C2H2-zf domains covered (i.e. with atomic coordinates) by a PDB structure is 331
- For protein-DNA complex structures, the number is 86
- For human, these numbers are 270 and 48, respectively
- The total number of PDB protein-DNA complexes covering one or more C2H2-zf domains is 104
- For human, the number is 62
- The total number of C2H2-zf proteins with one or more domains covered by a PDB protein-DNA complex is 19
- For human, the number is 11
Note that due to the focus on reference proteomes, some PDB entries might be lost (e.g. 1tf3, 1tf6, 1un6, 2drp or 2hgh).