A robust method for copy number variation detection on dried blood spots.
Methods
CNV prediction
CNV start and stop position is determined by multiple change points in Log R ratio (LRR). iPsychCNV uses
the package changepoint,
which implements the method Pruned Exact Linear Time (PELT) (Rebecca Killick, 2012).
On dried blood spots Log R ratio signal can be noisy and deviate from expected. Therefore, change point methods will have advantage over model methods like HMM,
as it does not expect specific values for each state. To see more about how methods perform with unexpected signal
see mock data.
CNV validation
Two methods are used to validate a CNV. Both use B Allele frequency (BAF) to check if it agrees with Log R ratio (LRR).
The first method is based on BAF density distribution. Each copy number state of a CNV is expected to have a specific
BAF distribution. The function turnpoints, from R package pastecs, is used to find peaks in the distribution. The second
method split the BAF into seven centroids where BAF values are expected to appear. Each copy number state have a distinct
centroid distribution.
Turning point
BAF centroids
For each BAF point its distance to centroids is calculated. The BAF point is assigned to the centroid with the smallest
value. There are seven centroids and its means are 0, 0.25, 0.33, 0.5, 0.66, 0.75 and 1. A percentage number of each
centroid is returned and used to evaluate the copy number state.
Amplified DNA from dried blood spots offers number of challenges for copy number of variation detection. Here we describe some of the challenges one can find working with DBS data.
Evaluation of CNV prediction performance is an important step for methods comparison. Here we describe how binary classification is used to evaluate the method performance.
iPsychCNV is an open source R package project. People are welcome to give suggestions,
code new functions and/or improve existing ones. The source code is available at
Github .
About iPsychCNV
iPsychCNV is a method to find copy number variation from amplified DNA from dried blood spots on Illumina SNP array. It is designed to handle large variation on Log R ratio, and uses B allele frequency to improve CNV calls. iPsychCNV is an open source project on Github
About iPSYCH
The project will study five specific mental disorders; autism, ADHD, schizophrenia, bipolar disorder and depression. All disorders are associated with major human and societal costs all over the world. The iPSYCH project will study these disorders from many different angles, ranging from genes and cells to population studies, from fetus to adult, from cause to symptoms of the disorder, and this knowledge will be combined in new ways across scientific fields, visit iPSYCH.