Re-Scan CNVs

Using simulated data we were able to identify that most of the current methods are unable to find all true positive CNVs. A relevant information as many methods (e.g. PennCNV) over predict CNVs, people assume it can predict all true CNVs. Moreover, many CNVs can be fragmented or have wrong start or stop position. Using known psychiatric CNVs we observed that using known position could improve CNV prediction on those regions. To test if this could help in other CNVRs, we used the position from hotspots to re-scan all samples. The result from real data suggested that we could improve CNV prediction by using the specific position. This happens because sometimes the data is too noise to find the correct CNV position, but the signal is strong enough to be considered a CNV. To understand exactly why methods fail to predict all CNVs and to make a robust evaluation of false negatives, we used mock data, which simulated all the 22 autosome chromosomes.

To correct for false positive CNVs and fragmented CNVs we developed a method that uses CNVRs (or Hotspot results) to Re-Scan all samples. It will re-evaluate LRR and BAF but this time with a specific position. This simplify the problem, as many true CNVs have similar position. On figure 13 we can see an example of 100 simulated data. The CNVs CN state, position, length and signal are randomly created every time you run the function. Therefore, there is no previous knowledge to help the prediction. The function to create the simulated data returns a set with the true CNVs, which is used for evaluation. The simulated data includes random noise and is stronger in chromosome with high GC content. We selected two chromosomes to demonstrate how Re-Scan works. Figures 13a and 13b are the true CNVs in the simulated data. The iPsychCNV prediction and its hotspots for both chromosomes can be visualize at figures 13c and 13d, and Re-Scan results are represented by figures 13e and 13f. In overall, Re-scan improve prediction in all hotspots, but it does specially on high GC content chromosomes and on samples with higher level of noise. The iPsychCNV prediction in chromosome 1 and 16 is 84% and 50% respectively. After re-scanning using hotspot position we improved the prediction to 94% and 100% respectively. We can see that using HotspotsCNV method it finds all regions with true CNVs because those regions have a consistence number of CNVs with similar position. Thus removing false positive CNVs that have a random CNV position. Using Re-Scan method we re-evaluate the CNV copy-number state, fixing possible wrong CN state. Moreover, it helps to correct the CNV position if any is fragmented. But probably the most relevant result is its capacity to find CNVs that are missed, increasing the total number of true CNVs found in CNVR.

Tutorial

library(iPsychCNV)

# Creating Mock data
MockDataCNVs <- MockData(N=100, Type="Blood,", Cores=20)
roi_mock <- subset(MockDataCNVs, ID %in% "MockSample_1.tab" & CN != 2)
roi_mock$Class <- "ROI"
tmp <- subset(MockDataCNVs, CN != 2)
PlotAllCNVs(tmp, Name="MockData.png", Roi=roi_mock)

# iPsychCNV prediction
iPsych.Pred <- iPsychCNV(PathRawData=".", Cores=28, Pattern="^MockSample", Skip=0)
PlotAllCNVs(df= iPsych.Pred, Name=" iPsych.Pred.png", Roi=roi_mock, hg="hg19")
iPsych.Hotspots <- HotspotsCNV(df=iPsych.Pred, Freq=3, OverlapCutoff=0.7, Cores=22)
PlotAllCNVs(df= iPsych.Pred, Name=" iPsych.Pred.png", Roi=iPsych.Hotspots, hg="hg19")
iPsych.ReScan <- ReScanCNVs(CNVs=iPsych.Hotspots, PathRawData=".", Pattern="^MockSample_*", 
Skip=0, Cores=28, IndxPos=TRUE, CNVSignal=0, OnlyCNVs=FALSE)
PlotAllCNVs(iPsych.ReScan, Name="iPsych.ReScan.png", Roi=iPsych.Hotspots)


Challenges

Amplified DNA from dried blood spots offers number of challenges for copy number of variation detection. Here we describe some of the challenges one can find working with DBS data.

Tools

iPsychCNV package offers a series of tools that can be used in the CNV prediction pipeline, but also independent with other programs.

Classification

Evaluation of CNV prediction performance is an important step for methods comparison. Here we describe how binary classification is used to evaluate the method performance.

Methods

iPsychCNV uses many different methods to perform a series of functions. Here we describe in detail the methods used by iPsychCNV.

  • Github

    iPsychCNV is an open source R package project. People are welcome to give suggestions, code new functions and/or improve existing ones. The source code is available at Github .

  • About iPsychCNV

    iPsychCNV is a method to find copy number variation from amplified DNA from dried blood spots on Illumina SNP array. It is designed to handle large variation on Log R ratio, and uses B allele frequency to improve CNV calls. iPsychCNV is an open source project on Github

  • About iPSYCH

    The project will study five specific mental disorders; autism, ADHD, schizophrenia, bipolar disorder and depression. All disorders are associated with major human and societal costs all over the world. The iPSYCH project will study these disorders from many different angles, ranging from genes and cells to population studies, from fetus to adult, from cause to symptoms of the disorder, and this knowledge will be combined in new ways across scientific fields, visit iPSYCH.