The ZF Gene Family
Zinc Finger (ZF) genes constitute a very large family of transcription factor genes in humans. The molecular evolution of this gene family in particular is interesting because it is exceptionally large, with over four hundred members in the human genome, because the functional role played by this very large gene family remains mostly mysterious, and because the simple and conserved gene structures present in this gene family make it unusually easy to identify, characterize and compare ZF genes between species.
Nearly half of all annotated transcription factors in the human genome belong to the C2H2 zinc finger (ZF) superfamily, defined by the presence of one or more Cys2His2 Zinc Finger DNA binding domains. Most genes containing C2H2 ZF domains contain multiple tandem repeats of the C2H2 ZF motif, and among human proteins with multiple C2H2 ZF domains, the number of ZF domains varies from just a few to more than 30, with a mean value of about 8. Most of the ZF repeats in these proteins are present in tandem and they are remarkably homogeneous in their spacing and core structure: nearly all are 21 amino acids long with the pattern C-x2-C-xl2-H-x3-H, and they are separated from each other by a 7 amino acid linker of conserved sequence. In almost all cases, protein-protein-interaction domains such as KRAB and SCAN are located on separate exons upstream of the ZF domains, which are contained on a single large exon.
Most ZF proteins have a shared architecture consisting of an N-terminal domain that interacts with other proteins and a C-terminal region that consists of one or multiple C2H2 (Kruppel-type) zinc finger repeats that are presumed to bind DNA. About half of human ZF super-family members have an N-terminal KRAB domain, a transcriptional repression domain, and about one in ten includes a SCAN domain, a protein-protein interaction domain that may be found as the only N-terminal domain of ZF genes or alongside a KRAB domain.
Despite the large number and diversity of ZF proteins encoded in the human genome, however, the ZF gene family seems to be a recent invention. The ancestral size of the ZF gene family is small, and the addition of the KRAB domain to genes with a tandem ZF structure first arose only in the tetrapod vertebrates. Additionally, the KRAB-ZF proteins have been recognized as important subjects of lineage-specific expansion in vertebrates. Rapid expansion of this gene family has occurred on the primate lineage, and a substantial proportion of human ZF genes have no mouse ortholog.
ZF domains bind their target DNA sequences by lying in tandem along the major groove of DNA, and in the case of tandem ZF domains multiple ZF domains bind to sequential nucleotides to form a single contiguous binding site; each ZF domain contacts three new nucleotides one nucleotide in common with the previous ZF domain. Most of the specificity for these four nucleotide contacts is provided by several residues residing within the central a-helix of the ZF domain. Adopting standard residue-naming conventions, residues 6, 3 and -1 of the α-helix make prominent contacts with nucleotides 1, 2, and 3 of a putative binding site in that order, while residue 2 contacts the nucleotide complementary to position 4. In addition to their known functions mediating ZF-DNA contacts, residues - 1, 2, 3 and 6 of the central a-helix also bear evidence of rapid sequence divergence and positive selection in ZF genes, which is a pattern that suggests selective pressure to specifically modify DNA binding preferences.
In accordance with the canonical model in which each ZF domain binds three nucleotides, a typical human ZF protein has the potential to specifically bind sequences in excess of 20 nucleotides, far longer than typical transcription factor binding sites. The longest human ZF proteins, with more than 20 ZF domains arranged in tandem, could have a DNA binding profile in excess of 60 nucleotides. Since a 16-nucleotide sequence appears on average once in a random 3 Gb genome, even accounting for potential degeneracy in binding affinities these very long potential recognition sites are surprising. One possibility, consistent with previous evidence, is that proteins containing large numbers of tandem C2H2 zinc fingers do not use all fingers to bind one specific target but rather bind several different targets using different subsets of their zinc fingers. Another possibility is that very substantial degeneracy within binding sites exists in many tandem ZF proteins. Finally, it is also possible that very long binding sites could allow for tandem ZF proteins to retain DNA binding even to target elements with mismatches to the preferred sequence at several sites. This might be a useful property if the target sequences of tandem ZF proteins were not themselves subject to selection to retain this binding relationship.
ZF-Associated Protein Domains
It is thought that ZF genes bearing an N-terminal KRAB domain function as transcriptional repressors, with the KRAB box mediating a cascade of protein-protein interactions eventually leading to a closed chromatin state that results in stable epigenetic gene silencing.
The KRAB domain (KRuppel-Associated-Box; named for its association with C2H2 or Kruppel-type Zinc Finger domains) was probably derived originally from the Meisetz (PRDM9) gene, with the first bona fide KRAB-ZF gene aside from Meisetz appearing near the root of tetrapod vertebrates. KRAB is thought to act almost exclusively through KRAB-ZF genes in which the zinc finger array provides DNA target recognition and the KRAB domain recruits KAP-1 (KRABAssociated-Protein-1), which serves as a scaffold for further recruitment of histone deacetylases (HDACs) and histone methy(transferases (HMTs) such as SETDBl (also known as ESET). This protein complex effects chromatin modifications which lead to a localized heterochromatic signal that silences gene transcription, and there is evidence that this chromatin modification leads to stable epigenetic transcriptional repression.
Instead of or alongside the KRAB domain, many genes with tandem ZF domains contain a SCAN domain. The SCAN domain is a conserved motif of approximately 80 amino acids found at the N-terminus of many C2H2-type zinc finger proteins, and is leucine-rich and dominated by a-helical structure. The SCAN domain is known to be involved in protein-protein interactions, and is capable of dimerization leading to the creation of homo- and heterodimer SCAN-ZF complexes. A SCAN domain is a common feature of the tandem C2H2 zinc finger gene complements of many mammals and is present in about 50 human ZF genes, about one in ten of all human ZF genes. SCAN domains are often found alongside KRAB domains, in which case the SCAN, KRAB and C2H2-ZF domains are present in that order and are usually found on separate exons. This association is not universal, however, and many SCAN-ZF genes exist with no associated KRAB domain.