Seqsite for Binding Site Analysis

Chromatin immunoprecipitation (ChIP) is used primarily to identify sites of DNA binding by proteins. Of course, it is imperative to optimize data analysis. This allows for the best interpretation of the information gained from ChIP. The method of ChIP-seq, a form of ChIP that combines the traditional purification with deep sequencing to detect the isolated DNA fragments, has been gaining in popularity over the more traditional ChIP-chip technique.

The rising popularity of ChIP-seq has created a unique problem. The computational tools developed for ChIP-chip were made to analyze the relatively low-resolution data gained. This means that ChIP-chip was better at characterizing regions of transcription factor binding rather than the exact sites. Consequently, the analytical tools for data processing were optimized to handle this workload. They are not necessarily sufficient for analyzing the higher resolution ChIP-seq data. Seqsite was developed to specifically pinpoint exact binding sites of transcription factors as opposed to regions. 

Seqsite works using a two-step methodology. First, regions of multiple binding sites are identified using a screening method that filters out nucleotide sequences that are too short or too poorly-enriched. This helps to grossly identify where multiple binding sites occur while preventing the identification of false positives. Next, statistical methods are applied to the data set to find the binding regions that are most enriched. This identifies likely DNA binding sites with higher resolution. 

Next, the researchers attempted to validate the method using an algorithm implemented into a program. They used known data sets at varying sequence depths to characterize the power of Seqsite. High accuracy of binding site prediction was found when more than seventy tags were enriched. This represents a relatively low coverage in the ChIP experiment. The method was also able to resolve two different binding sites if they were more than sixty base pairs apart. Further analysis of known data sets revealed that transcription factors that were previously characterized bound to regions of multiple binding sites more than half of the time. 

Finally, Seqsite was compared to other methods in terms of identification of both binding regions and of resolution of the specific binding sites. The fidelity of detecting regions was found to be comparable to other methods. For most data sets, Seqsite presented a significant advantage of resolving the different binding sites within a specific region. If the binding sites were too close, however, Seqsite was unable to significantly improve on other methods. 

Overall, the analytical methodology built into Seqsite for ChIP-seq allows for the identification of DNA binding sites with higher resolution than previous tools. This type of analysis is essential for the characterization of transcription factor interactions with DNA. A more developed understanding of these interactions will help molecular biologists to better characterize transcriptional regulation.