Chromatin immunoprecipitation (ChIP): For DNA Binding Proteins and DNA Binding Sites Analysis

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) is a well-established method for identifying binding sites for DNA-binding proteins. This method involves (1) cross-linkage of DNA binding proteins to DNA, (2) DNA shearing, (3) immunoprecipitation of DNA fragments bound by a protein of interest, and (4) sequencing of precipitated DNA fragments. Resulting sequencing reads are aligned to the corresponding genome, and genome areas bound by the protein of interest are expected to be over-represented with aligned reads.

Ideally, only genomic regions bound by the protein of interest would display read coverage. In practice, some DNA fragments are isolated and sequenced non-specifically, resulting in a background coverage of reads aligned across the genome sequence. To assess this background coverage, some control experiments are typically used. Binding sites for the protein of interest are regions along the genome characterized by significantly greater read coverage in the protein ChIP-Seq experiment than in the background coverage. ChIP-Seq also produces a strand specific signature of enrichment that can be used to identify true binding peaks.

DNA binding proteins, especially transcription factors, typically bind to short DNA sequences (on the order of 15 nucleotides or less). Enriched peaks, however, typically span a region of several hundred base pairs as a consequence of the larger fragment size generated during ChIP (typically around 250 nucleotides). Moreover, when multiple closely spaced binding sites exist in a particular location, the read coverage for these sites can merge into a single broad enriched region.

Numerous algorithms and software programs have been developed for ChIP-Seq analysis. Different software packages provide slightly different options and therefore slightly different solutions to the problem. Importantly, most of them are tailored for work with large eukaryotic genomes and are not optimized for smaller microbial genomes. The small size of microbes increases the mean coverage of an experiment from a couple of reads per nucleotide to hundreds. As a result, binding sites can be called with higher accuracy and various ChIP-Seq artifacts can be detected and filtered out.

Most common ChIP-Seq software takes raw sequence data and outputs a set of predicted enriched regions. Known peak-callers include FindPeaks, CisGenome, PeakSeq, HPeak, PeakAnalyzer, ChIPpeakAnno, PeakRanger, MACS, ChIPseeqer, CSAR, CentriMO, SISSRs, DROMPA, and jMOSAiCS. All of these packages are optimized for ChIP-Seq on complex eukaryotic genomes only.

DNA binding proteins in Creative Biomart