Multi-sample
aCGH Data Analysis via Total Variation and Spectral Regularization
Xiaowei Zhou, Can Yang, Xiang Wan, Hongyu Zhao and Weichuan Yu
Abstract
DNA copy number variation (CNV) accounts for a large
proportion of genetic variation. One commonly used approach to detecting CNVs
is array-based comparative genomic hybridization (aCGH). Although many methods
have been proposed to analyze aCGH data, it is not clear how to combine
information from multiple samples to improve CNV detection. In this paper, we
propose to use a matrix to approximate the multi-sample aCGH data and minimize
the total variation of each sample as well as the nuclear norm of the whole
matrix. In this way, we can make use of the smoothness property of each sample
and the correlation among multiple samples simultaneously in a convex
optimization framework. We also developed an efficient and scalable algorithm
to handle large-scale data. Experiments demonstrate that the proposed method outperforms
the state-of-the-art techniques under a wide range of scenarios and it is
capable of processing large data sets with millions of probes.
Matlab
codes: https://bioinformatics.hkust.edu.hk/tvsp/tvsp.zip
1. please run example_real and example_simu to see the use of this package
2. File list:
TVSp:
algorithm 1 in the paper
TVSpC:
algorithm 2 in the paper
aCGH_TVSp:
the main function that calls TVSpC to analyze aCGH data, including parameter tuning and fdr estimation
genSimuData:
the function to generate synthesized data described in the paper
flsa.*:
external mex functions to solve flsa
Copyright reserved. Any questions please contact
eexwzhou@ust.hk