Chip-seq ENCODE Guidelines

Chromatin immunoprecipitation (ChIP) has been an incredibly powerful tool used for determining protein-DNA interactions for more than ten years. The development of ChIP paired with sequencing increased the resolution of the technique compared to the first iterations of ChIP on a chip. Pairing ChIP to sequencing also resulted in greater complexity when it came to interpreting data. There are many different methods for preparing ChIP experiments, and so data analysis can be made more difficult if the appropriate controls are not put into place. In addition, if experiments are not designed in a similar way, then comparing different studies becomes difficult if not impossible.
The solution to these issues came from the ENCODE consortium who have performed many different ChIP experiments across many different organisms. They developed a set of guiding principles for ChIP assays that can help to ensure the quality and reproducibility of data. These guidelines cover three major areas that include ChIP experimental design, data interpretation, and reporting.

Experimental Design
ENCODE includes five different general considerations for experimental design. First, the optimal antibody must be selected for purification. Antibodies should be put through two different tests. One is used to gauge the selectivity of the antibody by immunoblot or immunochemical staining. This will allow for the characterization of off-target or properly-localized binding. The second test that must be passed is a robustness criterion. For example, two different antibodies targeting different epitopes on the same protein can be obtained to see if they give the same results. This provides an independent assessment of whether the DNA is being bound by the protein of interest.
The second general consideration is the use of affinity tags to purify proteins. This can be an attractive alternative, but it can also introduce errors due to higher or lower expression of the protein of interest.
Third, replication is critical to ensure reproducibility. ENCODE reccomends running two completely independent experiments. If the DNA site of interest is not particularly strong, running more replicates can also help to strengthen the data.
Fourth, the selection of proper controls is pivotal in ensuring quality data. In particular, sonication does not break DNA evenly, so it can result in nonuniform results. A control sample can be either a non-precipitated DNA mixture that has been fixed and sonicated or a set of DNA precipitated with an antibody not related to the protein of interest. This will give you an idea of DNA fragments that happen to be preferentially detected in the experiment independently of antibody pulldown.
The fifth and final experimental design consideration recommended by the ENCODE consortium is the choice of peak calling software. Significance values and false positives are calculated using different statistical methods from program to program; therefore, a direct comparison of data from different programs is difficult. The thresholds for detection of peaks as well as the software to be used should be carefully considered when designing an experiment.

Interpreting ChIP Data
Data quality and success of the experiment can be difficult to ascertain with ChIP assays. For example, new antibodies may emerge and remain uncharacterized for ChIP, or a new DNA site may be discovered. Quality control in these situations is both critical and lacking in the initial stages. The ENCODE consortium suggests a few guiding principles that will help in the interpretation of data.
An initial characterization of your data can be performed by simply comparing the signal you gained to the control sample. This is a non-quantitative means of analyzing data, but it can help further refine the parameters for further analysis. It can also be excessively slow if applied over a large portion of the genome. You can also look at the enrichment of a given sequence above the background to correlate it with the degree of immunoprecipitation. This method is called FRiP, and it stands for fraction of reads in peaks. Setting a threshold for detection can help to quickly analyze data among the entire genome, but it can be subject to false positives and negatives.
Cross-correlation analysis can be used to measure the amount of purified DNA. This is because as the protein is enriched by immunoprecipitation, the forward and reverse strands of the DNA to which it is bound are also enriched. This promotes the clustering of different molecules that will not occur if the DNA is not pulled down by the protein.

Reporting Data
Transparency and reproducibility are critical features of ChIP data. Proper dissemination of information is key for these aspects. Therefore, ENCODE suggests that all raw data be included in publication for further analysis and verification by other labs. This reporting of data would allow for other labs to reproduce the analysis performed in the published work. Other software algorithms for data analysis can also be used on these data sets to provide greater robustness.