<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2164-13-342</ui><ji>1471-2164</ji><fm><dochead>Methodology article</dochead><bibl><title><p>Genome-wide identification of significant aberrations in cancer genome</p></title><aug><au id="A1" ce="yes"><snm>Yuan</snm><fnm>Xiguo</fnm><insr iid="I1"/><insr iid="I2"/><email>xiguoyuan@mail.xidian.edu.cn</email></au><au id="A2" ce="yes"><snm>Yu</snm><fnm>Guoqiang</fnm><insr iid="I2"/><insr iid="I3"/><email>yug@vt.edu</email></au><au id="A3"><snm>Hou</snm><fnm>Xuchu</fnm><insr iid="I2"/><email>bella@vt.edu</email></au><au id="A4"><snm>Shih</snm><fnm>Ie-Ming</fnm><insr iid="I4"/><insr iid="I8"/><email>ishih@jhmi.edu</email></au><au id="A5"><snm>Clarke</snm><fnm>Robert</fnm><insr iid="I5"/><email>clarker@georgetown.edu</email></au><au id="A6"><snm>Zhang</snm><fnm>Junying</fnm><insr iid="I1"/><email>jyzhang@mail.xidian.edu.cn</email></au><au id="A7"><snm>Hoffman</snm><mi>P</mi><fnm>Eric</fnm><insr iid="I6"/><email>ehoffman@cnmcresearch.org</email></au><au id="A8"><snm>Wang</snm><mi>R</mi><fnm>Roger</fnm><insr iid="I7"/><email>wang.rroger@gmail.com</email></au><au id="A9"><snm>Zhang</snm><fnm>Zhen</fnm><insr iid="I8"/><email>zzhang7@jhmi.edu</email></au><au id="A10" ca="yes"><snm>Wang</snm><fnm>Yue</fnm><insr iid="I2"/><email>yuewang@vt.edu</email></au></aug><insg><ins id="I1"><p>School of Computer Science and Technology, Xidian University, Xi'an, P. R. China</p></ins><ins id="I2"><p>Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA</p></ins><ins id="I3"><p>Center for Sleep Sciences and Medicine, Stanford University School of Medicine, Palo Alto, CA, 94304, USA</p></ins><ins id="I4"><p>Departments of Gynecology/Obstetrics and Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21231, USA</p></ins><ins id="I5"><p>Lombardi Comprehensive Cancer Center and Department of Oncology, Georgetown University, Washington, DC, 20057, USA</p></ins><ins id="I6"><p>Research Center for Genetic Medicine, Children's National Medical Center, Washington, DC, 20010, USA</p></ins><ins id="I7"><p>The International Baccalaureate Magnet Diploma Program, Richard Montgomery High School, Rockville, MD, 20852, USA</p></ins><ins id="I8"><p>Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD, 21231, USA</p></ins></insg><source>BMC Genomics</source><section><title><p>Human and rodent genomics</p></title></section><issn>1471-2164</issn><pubdate>2012</pubdate><volume>13</volume><issue>1</issue><fpage>342</fpage><url>http://www.biomedcentral.com/1471-2164/13/342</url><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-13-342</pubid><pubid idtype="pmpid">22839576</pubid></pubidlist></xrefbib></bibl><history><rec><date><day>1</day><month>2</month><year>2012</year></date></rec><acc><date><day>27</day><month>7</month><year>2012</year></date></acc><pub><date><day>27</day><month>7</month><year>2012</year></date></pub></history><cpyrt><year>2012</year><collab>Yuan et al.; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt><abs><sec><st><p>Abstract</p></st><sec><st><p>Background</p></st><p>Somatic Copy Number Alterations (CNAs) in human genomes are present in almost all human cancers. Systematic efforts to characterize such structural variants must effectively distinguish significant consensus events from random background aberrations. Here we introduce Significant Aberration in Cancer (SAIC), a new method for characterizing and assessing the statistical significance of recurrent CNA units. Three main features of SAIC include: (1) exploiting the intrinsic correlation among consecutive probes to assign a score to each CNA unit instead of single probes; (2) performing permutations on CNA units that preserve correlations inherent in the copy number data; and (3) iteratively detecting Significant Copy Number Aberrations (SCAs) and estimating an unbiased null distribution by applying an SCA-exclusive permutation scheme.</p></sec><sec><st><p>Results</p></st><p>We test and compare the performance of SAIC against four peer methods (GISTIC, STAC, KC-SMART, CMDS) on a large number of simulation datasets. Experimental results show that SAIC outperforms peer methods in terms of larger area under the <it>Receiver Operating Characteristics</it> curve and increased detection power. We then apply SAIC to analyze structural genomic aberrations acquired in four real cancer genome-wide copy number data sets (ovarian cancer, metastatic prostate cancer, lung adenocarcinoma, glioblastoma). When compared with previously reported results, SAIC successfully identifies most SCAs known to be of biological significance and associated with oncogenes (e.g., KRAS, CCNE1, and MYC) or tumor suppressor genes (e.g., CDKN2A/B). Furthermore, SAIC identifies a number of novel SCAs in these copy number data that encompass tumor related genes and may warrant further studies.</p></sec><sec><st><p>Conclusions</p></st><p>Supported by a well-grounded theoretical framework, SAIC has been developed and used to identify SCAs in various cancer copy number data sets, providing useful information to study the landscape of cancer genomes. Open&#8211;source and platform-independent SAIC software is implemented using C++, together with R scripts for data formatting and Perl scripts for user interfacing, and it is easy to install and efficient to use. The source code and documentation are freely available at <url>http://www.cbil.ece.vt.edu/software.htm</url>.</p></sec></sec></abs></fm><bdy><sec><st><p>Background</p></st><p>Somatic copy number alterations (CNAs) are common genetic events in the development and progression of various human cancers, and significantly contribute to tumorigenesis <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. The coverage of CNAs in tumors varies from a few hundred to several million nucleotide bases, consisting of both deletions and amplifications with highly complex patterns <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Recent advances in oligonucleotide-based single nucleotide polymorphism (SNP) arrays have made it possible to detect regional amplifications and deletions with high resolution on a genome-wide scale <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. A critical challenge in the genome-wide analysis of CNAs is to distinguish between the &#8220;driver&#8221; mutations that allow the tumor to initiate, grow, and persist, and the &#8220;passenger&#8221; mutations that represent random somatic events accumulated during tumorigenesis <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B3">3</abbr><abbr bid="B7">7</abbr></abbrgrp>. Identification of these &#8220;driver&#8221; alterations can provide important insights into the cellular defects that cause cancer and suggest potential diagnostic, prognostic, and targeted therapeutic strategies <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>.</p><p>By studying a sufficiently large collection of cancer samples, Significant Copy Number Aberrations (SCAs), defined as significantly recurrent CNAs that affect the same region in multiple tumors, are widely considered as informative surrogates of &#8220;driver&#8221; mutations that may help pinpoint novel cancer-causing genes <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B9">9</abbr></abbrgrp>. Past studies have detected many SCAs in a wide range of cancer types, with an impressive coverage of many known oncogenes and cancer suppressor genes <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B7">7</abbr></abbrgrp>. Several methods for finding regions of SCAs using CNAs data have been described in the literature, where the task of distinguishing between sporadic CNAs and SCAs is largely a statistical significance testing. Two reviews with qualitative comparison of different methods have been published <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. Despite the use of different algorithms, a common theme in these methods is that they often adopt a four-step strategy: (1) detect CNAs and separate deletions and amplifications; (2) design and calculate ensemble test statistics associated with a genomic locus; (3) construct and/or estimate the probability distribution of test statistics under the null hypothesis; (4) perform multiple testing on a pool of genomic loci.</p><p>Significance testing for aberrant copy number (STAC) starts by converting the normalized log-ratios into a binary matrix, with zeros indicating no change and ones indicting losses and gains <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. STAC then proposes two statistics (footprint and frequency) to define regions of SCAs while adjusting for multiple comparisons, where the null hypothesis is that the detected CNAs from single-sample analysis are the realizations of random CNA placements whose probability distribution is generated by permutations on CNA segments <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Genomic Identification of Significant Targets in Cancer (GISTIC) works on the real-valued step function of log-ratios that allows GISTIC to exploit both the type (amplification/deletion) and amplitude of CNAs <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B3">3</abbr></abbrgrp>. Using a semi-parametric permutation assuming independence between probes, GISTIC calculates a score that is based on both the amplitude and frequency of CNAs at each probe position and subsequently identify regions of SCAs, where amplification and deletion CNAs are handled separately, and armed-level and focal CNAs are further analyzed independently <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Aimed to correlate information from neighboring probes with the amplitude and frequency of CNAs at each probe position, Kernel Convolution &#8211; a Statistical Method for Aberrant Regions detection (KC-SMART) uses varying-width kernel functions to calculate the testing statistics from the original log-ratios across multiple samples, producing the kernel smoothed estimate (KSE) at each locus by locally weighted regression <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. SCAs are selected based on a permutation-generated null distribution and Bonferroni correction. To substantially reduce computational burden in analyzing high-resolution and large-population data, correlation matrix diagonal segmentation (CMDS) identifies SCAs based on a between-chromosomal-site correlation analysis directly using the raw intensity ratios across all samples <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. CMDS uses a correlation statistics to detect SCAs with a standard normal null distribution whose parameters are estimated directly from the data and adjusts for multiple comparisons by false discovery rate.</p><p>Existing methods have several limitations. When working with unprocessed raw intensity ratios <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>, most methods are oblivious to noise clutter that can significantly confound estimation of the null distribution about true yet sporadic CNAs <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B17">17</abbr></abbrgrp>. Furthermore, these methods cannot distinguish between contributions of amplifications and deletions to the calculated overall test statistics that may affect the power to detect SCAs. While some effort has been made to incorporate correlation among neighboring probes into the test statistics, most methods assign a score to, and test the significance at, each individual probe locus <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. In addition, while it is widely accepted that CNAs signals at adjacent probes are highly correlated <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>, the assumption of probe independence is often adopted in constructing and learning the null distribution, probably for mathematical convenience <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B16">16</abbr></abbrgrp>. Moreover, existing permutation experiments using multiple samples cannot distinguish between the contributions of sporadic CNAs (obeying null distribution) and actual SCAs (deviating from null distribution) to the estimation of null distributions, resulting in theoretically conservative estimations especially when the number of true SCAs participating in the permutation is large.</p><p>We now report Significant Aberration in Cancer (SAIC), a carefully motivated method for accurately identifying SCAs using CNAs data from multiple samples. To distinguish between different biological roles of CNAs types and between noise and sporadic CNAs, we use discretized CNAs data and separately analyze copy number amplifications and deletions. By exploiting the intrinsic correlation among consecutive probes, we calculate and assign a score (test statistics) to each CNA unit instead of each single probe, based on both the amplitude and frequency of CNAs within the unit. To accurately estimate the null distribution governing sporadic CNAs, we perform random positional permutations on CNA units that preserve correlations inherent to the copy number data. More importantly, to minimize the unwanted participation of true SCAs in determining the null distribution <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B14">14</abbr></abbrgrp>, we iteratively detect SCAs and estimate an unbiased null distribution by an SCA-exclusive permutation scheme.</p><p>We tested SAIC on extensive simulation data sets, observing significantly improved performance with larger areas under the <it>Receiver Operating Characteristics</it> (ROC) curves and higher sensitivities at acceptable low false discovery rates, as compared to four popular peer methods (GISTIC, STAC, KC-SMART, and CMDS). We then applied SAIC to four real benchmark data sets, successfully identified the majority (84%) of previously reported SCAs harboring regions associated with well-known tumor-causing genes, and more importantly, detected some novel SCAs partially validated by the presence of known cancer-related genes.</p></sec><sec><st><p>Methods</p></st><sec><st><p>Data format and definitions</p></st><p>Preprocessed log-ratio data are stored in a numeric <it>N</it>&#8201;&#215;<it>M</it> matrix <it>X</it>. Each entry <it>x</it><sub><it>nm</it></sub> represents DNA copy number (in log2-ratio) for sample <it>n</it> at probe <it>m</it>, where each row <it>X</it><sub><it>n</it></sub> corresponds to copy number for <it>n</it>th sample at <it>M</it> probes. Copy number amplifications and deletions are analyzed separately. We use the indicator function to divide matrix <it>X</it> into two matrices <it>X</it>&#8201;=&#8201;<it>X</it><sub>amplification</sub>&#8201;+<it>X</it><sub>deletion</sub>, where</p><p><display-formula id="M1"><m:math name="1471-2164-13-342-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd>
         <m:msub>
            <m:mi>X</m:mi>
            <m:mtext>amplification</m:mtext>
         </m:msub>
         <m:mo>=</m:mo>
         <m:mfenced open="{" close="}">
            <m:mrow>
               <m:mi>I</m:mi>
               <m:mfenced open="(" close=")">
                  <m:mrow>
                     <m:msub>
                        <m:mi>x</m:mi>
                        <m:mi mathvariant="italic">nm</m:mi>
                     </m:msub>
                     <m:mo>&#8805;</m:mo>
                     <m:msub>
                        <m:mi>&#952;</m:mi>
                        <m:mtext>amplification</m:mtext>
                     </m:msub>
                  </m:mrow>
               </m:mfenced>
               <m:mo>&#183;</m:mo>
               <m:msub>
                  <m:mi>x</m:mi>
                  <m:mi mathvariant="italic">nm</m:mi>
               </m:msub>
            </m:mrow>
         </m:mfenced>
         <m:mo>,</m:mo>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:msub>
            <m:mi>X</m:mi>
            <m:mtext>deletion</m:mtext>
         </m:msub>
         <m:mo>=</m:mo>
         <m:mfenced open="{" close="}">
            <m:mrow>
               <m:mi>I</m:mi>
               <m:mfenced open="(" close=")">
                  <m:mrow>
                     <m:msub>
                        <m:mi>x</m:mi>
                        <m:mi mathvariant="italic">nm</m:mi>
                     </m:msub>
                     <m:mo>&#8804;</m:mo>
                     <m:msub>
                        <m:mi>&#952;</m:mi>
                        <m:mtext>deletion</m:mtext>
                     </m:msub>
                  </m:mrow>
               </m:mfenced>
               <m:mo>&#183;</m:mo>
               <m:msub>
                  <m:mi>x</m:mi>
                  <m:mi mathvariant="italic">nm</m:mi>
               </m:msub>
            </m:mrow>
         </m:mfenced>
         <m:mtext>,</m:mtext>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math></display-formula></p><p>with <it>&#952;</it><sub>amplification</sub> and <it>&#952;</it><sub>deletion</sub> being the pre-specified thresholds. For brevity, we focus all subsequent discussion on <it>X</it><sub>amplification</sub> and make comments on <it>X</it><sub>deletion</sub> when necessary.</p><sec><st><p>Definition 1</p></st><p>Any copy number probe <it>m</it> whose associated copy number is amplified or deleted in at least one of <it>N</it> samples is called a CNA probe.</p><p>To exploit correlations inherent in copy number data, we first merge consecutive CNA probes into CNA regions, leaving the gaps consisting of only non CNA probes, see Figure <figr fid="F1">1</figr>. Within each CNA region, the Pearson correlation coefficient <it>&#961;</it><sub>ij</sub> between CNA probes <it>i</it> and <it>j</it> is then calculated for <inline-formula><m:math name="1471-2164-13-342-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mfenced open="{" close="}">
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mo>&#8800;</m:mo>
         <m:mi>j</m:mi>
      </m:mrow>
   </m:mfenced>
   <m:mo>&#8712;</m:mo>
   <m:mi>M</m:mi>
</m:mrow>
</m:math></inline-formula>:</p><p><display-formula id="M2"><m:math name="1471-2164-13-342-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mi>&#961;</m:mi>
      <m:mi mathvariant="italic">ij</m:mi>
   </m:msub>
   <m:mo>=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:munderover>
            <m:mo>&#8721;</m:mo>
            <m:mrow>
               <m:mi>n</m:mi>
               <m:mo>=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mi>N</m:mi>
         </m:munderover>
         <m:mo stretchy="false">(</m:mo>
         <m:msub>
            <m:mi>x</m:mi>
            <m:mi mathvariant="italic">ni</m:mi>
         </m:msub>
         <m:mo>&#8722;</m:mo>
         <m:msub>
            <m:mover accent="true">
               <m:mi>x</m:mi>
               <m:mo>&#175;</m:mo>
            </m:mover>
            <m:mi>i</m:mi>
         </m:msub>
         <m:mo stretchy="false">)</m:mo>
         <m:mo stretchy="false">(</m:mo>
         <m:msub>
            <m:mi>x</m:mi>
            <m:mi mathvariant="italic">nj</m:mi>
         </m:msub>
         <m:mo>&#8722;</m:mo>
         <m:msub>
            <m:mover accent="true">
               <m:mi>x</m:mi>
               <m:mo>&#175;</m:mo>
            </m:mover>
            <m:mi>j</m:mi>
         </m:msub>
         <m:mo stretchy="false">)</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mo stretchy="false">(</m:mo>
         <m:mi>N</m:mi>
         <m:mo>&#8722;</m:mo>
         <m:mn>1</m:mn>
         <m:mo stretchy="false">)</m:mo>
         <m:msub>
            <m:mi>s</m:mi>
            <m:mi>i</m:mi>
         </m:msub>
         <m:msub>
            <m:mi>s</m:mi>
            <m:mi>j</m:mi>
         </m:msub>
      </m:mrow>
   </m:mfrac>
   <m:mtext>,</m:mtext>
</m:mrow>
</m:math></display-formula></p><p>where <inline-formula><m:math name="1471-2164-13-342-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:msub>
   <m:mover accent="true">
      <m:mi>x</m:mi>
      <m:mo>&#175;</m:mo>
   </m:mover>
   <m:mi>i</m:mi>
</m:msub>
</m:math></inline-formula>, <inline-formula><m:math name="1471-2164-13-342-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:msub>
   <m:mover accent="true">
      <m:mi>x</m:mi>
      <m:mo>&#175;</m:mo>
   </m:mover>
   <m:mi>j</m:mi>
</m:msub>
</m:math></inline-formula>, <it>s</it><sub><it>i</it></sub> and <it>s</it><sub><it>j</it></sub> are the estimated means and standard deviations of copy numbers at probes <it>i</it> and <it>j</it> across <it>N</it> samples, respectively. If <it>&#961;</it><sub>ij</sub> is less than a pre-specified threshold <it>&#952;&#961;</it>, a breakpoint occurs between probes <it>i</it> and <it>j</it>.</p><fig id="F1"><title><p>Figure 1</p></title><caption><p>An illustration on how CNA units are defined</p></caption><text>
   <p><b>An illustration on how CNA units are defined.</b> Left: Consecutive CNA probes are merged into two intervals, with the first interval containing probes 1&#8211;10 and the second interval containing probes 14&#8211;16. Right: Each of the two intervals is split into CNA units according to the correction coefficients between CNA probes defined by Eq. (2), <it>e.g.,</it> the first interval is split into three independent CNA units.</p>
</text><graphic file="1471-2164-13-342-1"/></fig></sec><sec><st><p>Definition 2</p></st><p>A sequence of consecutive CNA probes with no breakpoints is defined as a CNA unit, denoted by <it>u</it> (<it>k, L</it>) with <it>k</it> being the starting probe index and <it>L</it> being the length of the CNA unit.</p><p>Intuitively, a CNA unit consists of a sequence of highly correlated consecutive CNA probes. Figure <figr fid="F1">1</figr> illustrates the concepts of CNA region and CNA unit, where two CNA regions contain 10 and 3 CNA probes, respectively, and the first CNA region is further split into three CNA units due to two breakpoints within the CNA region.</p></sec></sec><sec><st><p>Summary statistics and significance assessment</p></st><p>Units that exhibit high or low average copy number are of interest, so it is natural to examine summary statistics for each unit. SAIC identifies significant aberration units through two steps. First, the method calculates a statistic (<it>U</it> score) that incorporates both the frequencies of occurrence and the amplitudes of the CNA probes within the unit, leading to the unit summary statistics given by</p><p><display-formula id="M3"><m:math name="1471-2164-13-342-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msubsup>
      <m:mi>U</m:mi>
      <m:msup>
         <m:mrow/>
         <m:mrow>
            <m:mi>k</m:mi>
            <m:mtext>,</m:mtext>
            <m:mspace width="0.25em"/>
            <m:mi>L</m:mi>
         </m:mrow>
      </m:msup>
      <m:mrow/>
   </m:msubsup>
   <m:mo>=</m:mo>
   <m:mfrac>
      <m:mn>1</m:mn>
      <m:mrow>
         <m:mi>L</m:mi>
         <m:mi>N</m:mi>
      </m:mrow>
   </m:mfrac>
   <m:munderover>
      <m:mo>&#8721;</m:mo>
      <m:mrow>
         <m:mi>n</m:mi>
         <m:mo>=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mi>N</m:mi>
   </m:munderover>
   <m:mspace width="0.25em"/>
   <m:mstyle>
      <m:munderover>
         <m:mo>&#8721;</m:mo>
         <m:mrow>
            <m:mi>l</m:mi>
            <m:mo>=</m:mo>
            <m:mi>k</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>k</m:mi>
            <m:mo>+</m:mo>
            <m:mi>L</m:mi>
            <m:mo>&#8722;</m:mo>
            <m:mn>1</m:mn>
         </m:mrow>
      </m:munderover>
      <m:msubsup>
         <m:mi>x</m:mi>
         <m:mi mathvariant="italic">nl</m:mi>
         <m:mrow/>
      </m:msubsup>
   </m:mstyle>
   <m:mtext>.</m:mtext>
</m:mrow>
</m:math></display-formula></p><p>Second, the method assesses the statistical significance of each CNA unit by comparing the observed statistic to the <it>U</it> scores that would be expected by chance.</p><p>Sporadic CNA units often occur throughout the genome, so a null distribution for <it>U</it><sub><it>k, L</it></sub> under the hypothesis that no SCAs are present, can be estimated by randomly permuting the overall pattern of presumed all-sporadic CNA units across the genome <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B9">9</abbr><abbr bid="B12">12</abbr><abbr bid="B15">15</abbr></abbrgrp>. Though various permutation schemes can be adopted, due to different rates of CNA and different percentages of normal tissue contamination in tumor samples <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, permutation of CNA units across rows/samples should be avoided. As aforementioned, permutation should be performed on CNA units (instead of single CNA probes) that preserve correlations inherent to the copy number data, even if the CNA units are sporadic <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B9">9</abbr><abbr bid="B15">15</abbr></abbrgrp>. Another subtle but conveniently ignored issue is the different background rates of CNA units with varying lengths <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Short CNA units occur at a frequency inversely related to their lengths and long CNA units occur approximately 30 times more frequently than would be expected by the inverse-length distribution. This observation is seen across all cancer types, is applicable to both copy gains and losses, and is supported by the calculated genome-average background rates for CNAs as a function of length <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. These considerations motivate our carefully designed SAIC permutation scheme.</p><p>Let <inline-formula><m:math name="1471-2164-13-342-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mi mathvariant="double-struck">L</m:mi>
</m:math></inline-formula> denote the integer set containing the lengths of all the observed CNA units in <it>X</it>, <inline-formula><m:math name="1471-2164-13-342-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mi mathvariant="double-struck">K</m:mi>
</m:math></inline-formula> denote the integer set containing the starting probe indices of all the observed CNA units in <it>X</it>, and <it>X</it><sup>(<it>t</it>)</sup> be the random positional permutation of <it>X</it> for <it>t</it>&#8201;=&#8201;1,2,&#8230;,<it>T</it>, with <it>T</it> being the total number of permutations. We now describe our method for analyzing CNA units for evidence of significant alteration in cancer, where we account for the difference in background rates between CNA units of different lengths by considering them adaptively.</p><sec><st><p>Algorithm 1</p></st><p>Assessing the statistical significance of <it>U</it><sub><it>k, L</it></sub></p><p indent="1">(1) Perform <it>T</it> random within-row positional permutations <it>X</it><sup>(<it>1</it>)</sup>, <it>X</it><sup>(<it>2</it>)</sup>, &#8230;, <it>X</it><sup>(<it>T</it>)</sup> of the data matrix <it>X</it> on CNA units;</p><p indent="1">(2) Compute the value of summary statistic <inline-formula><m:math name="1471-2164-13-342-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mi>U</m:mi>
      <m:mrow>
         <m:mi>k</m:mi>
         <m:mtext>,</m:mtext>
         <m:mspace width="0.25em"/>
         <m:mi>L</m:mi>
      </m:mrow>
   </m:msub>
   <m:mfenced open="(" close=")">
      <m:msup>
         <m:mi>X</m:mi>
         <m:mfenced open="(" close=")">
            <m:mi>t</m:mi>
         </m:mfenced>
      </m:msup>
   </m:mfenced>
</m:mrow>
</m:math></inline-formula> for each permuted data set <it>t</it>&#8201;=&#8201;1,2,&#8230;,<it>T</it>, and for each starting probe <inline-formula><m:math name="1471-2164-13-342-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>k</m:mi>
   <m:mo>=</m:mo>
   <m:mn>1</m:mn>
   <m:mo>,</m:mo>
   <m:mn>2</m:mn>
   <m:mo>,</m:mo>
   <m:mo>&#8230;</m:mo>
   <m:mo>,</m:mo>
   <m:mi>M</m:mi>
   <m:mo>&#8722;</m:mo>
   <m:mi>L</m:mi>
   <m:mo>+</m:mo>
   <m:mn>1</m:mn>
</m:mrow>
</m:math></inline-formula> and each length <inline-formula><m:math name="1471-2164-13-342-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>L</m:mi>
   <m:mo>&#8712;</m:mo>
   <m:mi mathvariant="double-struck">L</m:mi>
</m:mrow>
</m:math></inline-formula>;</p><p indent="1">(3) Calculate and assign a P-value to each observed CNA unit <it>u</it> (<it>k, L</it>) for <inline-formula><m:math name="1471-2164-13-342-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>k</m:mi>
   <m:mo>&#8712;</m:mo>
   <m:mi mathvariant="double-struck">K</m:mi>
</m:mrow>
</m:math></inline-formula> based on the extreme right-hand tail probability given by <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B19">19</abbr></abbrgrp></p><p><display-formula id="M4"><m:math name="1471-2164-13-342-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>P</m:mi>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:msub>
            <m:mi>U</m:mi>
            <m:mrow>
               <m:mi>k</m:mi>
               <m:mtext>,</m:mtext>
               <m:mspace width="0.25em"/>
               <m:mi>L</m:mi>
            </m:mrow>
         </m:msub>
         <m:mfenced open="(" close=")">
            <m:mi>X</m:mi>
         </m:mfenced>
      </m:mrow>
   </m:mfenced>
   <m:mo>=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:mn>1</m:mn>
         <m:mo>+</m:mo>
         <m:mstyle>
            <m:munderover>
               <m:mo>&#8721;</m:mo>
               <m:mrow>
                  <m:mi>t</m:mi>
                  <m:mo>=</m:mo>
                  <m:mn>1</m:mn>
               </m:mrow>
               <m:mi>T</m:mi>
            </m:munderover>
            <m:mrow>
               <m:mi>I</m:mi>
               <m:mfenced open="(" close=")">
                  <m:mrow>
                     <m:munder>
                        <m:mo>max</m:mo>
                        <m:mrow>
                           <m:mi>k</m:mi>
                           <m:mo>'</m:mo>
                        </m:mrow>
                     </m:munder>
                     <m:msub>
                        <m:mi>U</m:mi>
                        <m:mrow>
                           <m:mi>k</m:mi>
                           <m:mo>'</m:mo>
                           <m:mtext>,</m:mtext>
                           <m:mspace width="0.25em"/>
                           <m:mi>L</m:mi>
                        </m:mrow>
                     </m:msub>
                     <m:mfenced open="(" close=")">
                        <m:msup>
                           <m:mi>X</m:mi>
                           <m:mfenced open="(" close=")">
                              <m:mi>t</m:mi>
                           </m:mfenced>
                        </m:msup>
                     </m:mfenced>
                     <m:mo>&#8805;</m:mo>
                     <m:msub>
                        <m:mi>U</m:mi>
                        <m:mrow>
                           <m:mi>k</m:mi>
                           <m:mtext>,</m:mtext>
                           <m:mspace width="0.25em"/>
                           <m:mi>L</m:mi>
                        </m:mrow>
                     </m:msub>
                     <m:mfenced open="(" close=")">
                        <m:mi>X</m:mi>
                     </m:mfenced>
                  </m:mrow>
               </m:mfenced>
            </m:mrow>
         </m:mstyle>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo>+</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:mfrac>
   <m:mtext>,</m:mtext>
</m:mrow>
</m:math></display-formula></p><p>where <inline-formula><m:math name="1471-2164-13-342-i14" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>I</m:mi>
   <m:mfenced open="(" close=")">
      <m:mo>&#183;</m:mo>
   </m:mfenced>
</m:mrow>
</m:math></inline-formula>is the indicator function.</p><p>The empirical P-values on <it>X</it><sub>deletion</sub> are calculated by the extreme left-hand tail probabilities and reversing the inequality in Eq. (4). Both definitions produce P-values that are easy to interpret, and the &#8220;max&#8221; operation automatically adjusted P-values for multiple comparisons across CNA units thus controls the family-wise error rate <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p><p>In algorithm 1, it is important to note that when we generate a randomly permuted dataset based on the observed data, we do not re-define the CNA units but re-use the already-defined CNA units. Specifically, in each permutation, we randomly place the already-defined CNA units over the whole genome or each chromosome within each sample, and calculate the summary <it>U</it> score for each length of CNA units. Thus, independent of the unit length, the observed CNA units will always be retained (implicitly) in the permuted dataset. Moreover, when the number of permutations is sufficiently large, the p-values of observed CNA units can be accurately estimated. More precisely, to assess the p-value associated with an observed CNA unit of length <it>L</it>, we calculate the <it>U</it> scores for any consecutive <it>L</it> probes (probes do not need to reside within the same unit) across the genome, and compare the maximum score with the score of the observed CNA unit.</p></sec></sec><sec><st><p>Iterative estimation of unbiased null distribution</p></st><p>One important issue concerning Algorithm 1 is the presence of true SCAs (departing from null distribution) in cancer genomes that presumably contribute high copy number deviations to the estimation of overall null distribution (governing only sporadic CNAs), potentially reducing power to detect less-extreme SCAs due to theoretical conservativeness <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B14">14</abbr></abbrgrp>. Loss of power is particularly critical in real-world applications where the number of true SCAs in cancer genomes may be large. Thus, to minimize the unwanted participation of true SCAs in determining the null distribution, we iteratively detect SCAs and estimate an unbiased null distribution by applying an SCA-exclusive permutation scheme. SAIC assesses the &#8216;new&#8217; SCAs conditional on having found the &#8216;existing&#8217; SCAs, successively correcting for true SCAs in order to better dissect and detect SCAs. Specifically, the CNA units associated with the &#8216;existing&#8217; SCAs are masked as zeros after each iteration, resulting in a new data set <it>X</it><sub>-SCAs</sub> in which already-detected SCAs becomes null.</p><sec><st><p>Algorithm 2</p></st><p>Assessing iteratively the statistical significance of <it>U</it><sub><it>k, L</it></sub></p><p indent="1">(1) Perform Algorithm 1;</p><p indent="1">(2) Check whether &#8216;new&#8217; SCAs are detected. If &#8216;yes&#8217;, continue; if &#8220;no&#8221;, stop and re-calculate the P-values for all SCAs using truth converging null distribution;</p><p indent="1">(3) Mask the CNA units associated with newly detected SCAs as zeros and let <inline-formula><m:math name="1471-2164-13-342-i15" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>X</m:mi>
   <m:mo>=</m:mo>
   <m:msub>
      <m:mi>X</m:mi>
      <m:mrow>
         <m:mo>-</m:mo>
         <m:mtext>SCAs</m:mtext>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math></inline-formula>, then go to step (1).</p><p>It has been shown experimentally that additional power to detect SCAs can be gained by removing the effect of newly detected SCAs after each iteration <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. However, an iterative SCA-exclusive permutation scheme raises another subtle yet critical issue concerning the convergence of null distribution learning and potential bias due to the expected false positive SCAs under the truth-converging null distribution. Fortunately, based on the careful design of Algorithm 2, the following theorem shows that, if we apply a significance level <inline-formula><m:math name="1471-2164-13-342-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>&#945;</m:mi>
   <m:mo>'</m:mo>
   <m:mo>=</m:mo>
   <m:mi>&#945;</m:mi>
   <m:mo>/</m:mo>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:mn>1</m:mn>
         <m:mo>+</m:mo>
         <m:mi>&#945;</m:mi>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math></inline-formula> where <it>&#945;</it> is the targeted false positive rate (FPR), an unbiased estimation and detection results can be readily obtained using Algorithm 2 (see formal proof in Appendix A).</p></sec><sec><st><p>Theorem 1</p></st><p><it>Suppose that Algorithm 2 is used to iteratively detect SCAs and estimate truth converging null distribution</it>. <it>Let &#945;</it><it>be the targeted FPR and</it><inline-formula><m:math name="1471-2164-13-342-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>&#945;</m:mi>
   <m:mo>'</m:mo>
   <m:mo>=</m:mo>
   <m:mi>&#945;</m:mi>
   <m:mo>/</m:mo>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:mn>1</m:mn>
         <m:mo>+</m:mo>
         <m:mi>&#945;</m:mi>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math></inline-formula><it>be the significance level used to detect SCAs. Then an unbiased truth converging null distribution can be obtained together with a theoretical FPR &#945;.</it></p></sec></sec><sec><st><p>SAIC algorithm and data preprocessing</p></st><p>Figure <figr fid="F2">2</figr> shows the flowchart describing the entire SAIC algorithm. Our algorithm begins with two data preprocessing steps <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. First, the extracted raw copy number signals from CEL files are normalized using benchmark methods such as dChip (DNA-Chip Analyzer) <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. Second, the normalized copy number signals are segmented into CNA regions using existing single-sample analysis methods such as CBS (Circular Binary Segmentation) <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. The preprocessed log2-transformed ratios are subsequently analyzed by the novel algorithm described here.</p><fig id="F2"><title><p>Figure 2</p></title><caption><p>Schematic flowchart of combined SAIC algorithms 1 and 2</p></caption><text>
   <p>
      <b>Schematic flowchart of combined SAIC algorithms 1 and 2.</b>
   </p>
</text><graphic file="1471-2164-13-342-2"/></fig></sec></sec><sec><st><p>Results</p></st><p>In the absence of definitive ground truth about the recurrent CNAs in the cancer genomes, the validation of a new method for detecting SCAs is always problematic <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B13">13</abbr><abbr bid="B16">16</abbr><abbr bid="B18">18</abbr><abbr bid="B24">24</abbr></abbrgrp>. We first validate SAIC on multiple realistic simulation data sets and then proceed to evaluate the method using real CNA data sets. All data sets were analyzed according to the algorithm described in Figure <figr fid="F2">2</figr>. We tested SAIC and the four peer methods (GISTIC, STAC, KC-SMART, CMDS) on realistic simulation data sets. Comparative performance was based on the ground truth in terms of detection power <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and the <it>Receiver Operating Characteristics</it> (ROC) curves <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. When applied to real CNA data, we compared and discussed biological plausibility of the implicated SCAs, and examined relative SCAs coverage between SAIC and GISTIC on benchmark data sets using Venn diagrams. To assure a meaningful and differential comparison, we emphasized experiment suitability when choosing algorithm parameter settings. For example, the algorithm parameter settings cannot be too &#8220;simple&#8221; (if there are only a few arm-level SCAs, all methods may perform equally well) or too &#8220;complex&#8221; (if there are many weak focal SCAs, no method will perform consistently well) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p><sec><st><p>Simulation studies</p></st><p>Multiple simulation data sets with definitive ground truth and various design or parameter settings were generated based on the modified benchmark models proposed in <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B16">16</abbr><abbr bid="B18">18</abbr><abbr bid="B24">24</abbr></abbrgrp> and as used to assess various performance characteristics <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B16">16</abbr><abbr bid="B18">18</abbr></abbrgrp>. We first assessed the family-wise type 1 error rate (FWER) whose accuracy is crucial for methods that detect SCAs based on their P-values. If the FWER is either too conservative or too liberal, the P-value loses its intended meaning and does not reflect the actual false positive rate. Thus, we cannot control how many false positives are detected by setting a P-value based threshold <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. A large number of simulated null data sets (under the null hypothesis that no recurrent CNAs are present) were generated based on the realistic model proposed in <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> and subsequently analyzed with SAIC; results are presented in Table <tblr tid="T1">1</tblr>. Algorithm 2 was repeated 10,000 times, and the observed FWER was estimated by the proportion of at least one <it>U</it><sub><it>k, L</it></sub> (<it>X</it>) in <it>X</it> that was significant at <it>&#945;</it>&#8201;=&#8201;0.05 level <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Values of the observed FWER in Table <tblr tid="T1">1</tblr> (0.0497) suggest that SAIC is almost perfect when compared with slightly conservative values (0.0452) by similar method <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p><table id="T1"><title><p>Table 1</p></title><caption><p><b>Empirical type 1 error rate for simulated data sets under the null hypothesis</b></p></caption><tgroup align="left" cols="2"><colspec align="left" colname="c1" colnum="1" colwidth="1*"/><colspec align="center" colname="c2" colnum="2" colwidth="1*"/><thead valign="top"><row rowsep="1"><entry colname="c1"><p><b>Null simulation model</b></p></entry><entry align="center" colname="c2"><p><b>Empirical FWER at</b><b><it>&#945;</it></b><b>= 0.05</b><b>level</b></p></entry></row></thead><tbody valign="top"><row><entry colname="c1"><p>Copy number data</p></entry><entry align="char" char="." colname="c2"><p>0.0488</p></entry></row><row><entry colname="c1"><p>Clumped copy number data (25%)</p></entry><entry align="char" char="." colname="c2"><p>0.0500</p></entry></row><row><entry colname="c1"><p>Clumped copy number data (50%)</p></entry><entry align="char" char="." colname="c2"><p>0.0493</p></entry></row><row rowsep="1"><entry colname="c1"><p>Clumped copy number data (75%)</p></entry><entry align="char" char="." colname="c2"><p>0.0505</p></entry></row></tbody></tgroup></table><p>We then assessed the detection power of SAIC as compared to GISTIC. Based on the simulation model proposed in <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, we generated 100 simulation data sets under each combinatorial parameter setting, resulting in a total of 1,900 simulation data sets, where each data set consists of <it>N</it>&#8201;=&#8201;40&#8201;~&#8201;80 samples and each sample contains <it>M</it>&#8201;=&#8201;5,000 probes. To replicate the effect of inevitable normal cell contamination <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, the copy numbers at every probes are simulated by a mixture of normal and tumor genomes, where the normal cell fraction <it>&#955;</it> is randomly drawn from a normal distribution <inline-formula><m:math name="1471-2164-13-342-i18" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi mathvariant="script">N</m:mi>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:msub>
            <m:mi>&#956;</m:mi>
            <m:mi>&#955;</m:mi>
         </m:msub>
         <m:mo>,</m:mo>
         <m:msub>
            <m:mi>&#963;</m:mi>
            <m:mi>&#955;</m:mi>
         </m:msub>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math></inline-formula> with <it>&#956;</it><sub><it>&#955;</it></sub> and <it>&#963;</it><sub><it>&#955;</it></sub> being the mean and standard deviation of normal cell fraction in the sample. Each sample contains two sporadic CNA regions, one deletion and one amplification randomly drawn from integer sets {0, 1} and {3, 4,&#8230;,8}, respectively. Each data set contains two recurrent CNA regions that are contributed from a fraction of samples according to a specified frequency <it>&#969;</it>, one deletion and one amplification similarly designed as aforementioned. The length of both sporadic and recurrent CNA regions is randomly assigned from 150 to 250 probes, realistically reflecting the estimated background rate of focal CNAs in a typical cancer sample genome <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. To equally assess the power in detecting deletion or amplification SCAs, we calculate the detection power of SAIC or GISTIC as the rate of successfully detecting inserted, deleted or amplified SCAs across 100 data sets. Table <tblr tid="T2">2</tblr> summarizes the comparative detection power of SAIC and GISTIC for a total of 19 parameter settings across 1,900 data sets. These comparative experimental results consistently show that SAIC outperforms GISTIC with significantly increased detection power in 18 out of 19 simulations.</p><table id="T2"><title><p>Table 2</p></title><caption><p><b>Power to detect SCAs by SAIC and GISTIC in simulation studies</b></p></caption><tgroup align="left" cols="6"><colspec align="left" colname="c1" colnum="1" colwidth="1*"/><colspec align="char" colname="c2" colnum="2" colwidth="1*"/><colspec align="char" colname="c3" colnum="3" colwidth="1*"/><colspec align="char" colname="c4" colnum="4" colwidth="1*"/><colspec align="char" colname="c5" colnum="5" colwidth="1*"/><colspec align="char" colname="c6" colnum="6" colwidth="1*"/><thead valign="top"><row rowsep="1"><entry colname="c1"><p><b><it>N</it></b><b>&#8201;=&#8201;60,</b><b><it>&#969;</it></b>&#8201;<b>=&#8201;0.2,</b><b><it>&#956;</it></b><sub><b><it>&#955;</it></b></sub>&#8201;<b>=&#8201;0.6,</b><b><it>&#963;</it></b><sub><b><it>&#955;</it></b></sub><b>=</b></p></entry><entry align="char" char="." colname="c2"><p><b>0.15</b></p></entry><entry align="char" char="." colname="c3"><p><b>0.2</b></p></entry><entry align="char" char="." colname="c4"><p><b>0.25</b></p></entry><entry align="char" char="." colname="c5"><p><b>0.3</b></p></entry><entry align="char" char="." colname="c6"><p><b>0.35</b></p></entry></row></thead><tbody valign="top"><row><entry colname="c1"><p>GISTIC</p></entry><entry align="center" colname="c2"><p>89%</p></entry><entry align="center" colname="c3"><p>86%</p></entry><entry align="center" colname="c4"><p>79%</p></entry><entry align="center" colname="c5"><p>74%</p></entry><entry align="center" colname="c6"><p>72%</p></entry></row><row><entry colname="c1"><p>SAIC</p></entry><entry align="center" colname="c2"><p>96%</p></entry><entry align="center" colname="c3"><p>94%</p></entry><entry align="center" colname="c4"><p>86%</p></entry><entry align="center" colname="c5"><p>86%</p></entry><entry align="center" colname="c6"><p>82%</p></entry></row><row><entry colname="c1"><p><it>N</it>&#8201;=&#8201;60, <it>&#969;</it>&#8201;=&#8201;0.2, <it>&#963;</it><sub><it>&#955;</it></sub>&#8201;=&#8201;0.25, <it>&#956;</it><sub><it>&#955;</it></sub> =</p></entry><entry align="char" char="." colname="c2"><p>0.4</p></entry><entry align="char" char="." colname="c3"><p>0.5</p></entry><entry align="char" char="." colname="c4"><p>0.6</p></entry><entry align="char" char="." colname="c5"><p>0.7</p></entry><entry align="char" char="." colname="c6"><p>0.8</p></entry></row><row><entry colname="c1"><p>GISTIC</p></entry><entry align="center" colname="c2"><p>83%</p></entry><entry align="center" colname="c3"><p>81%</p></entry><entry align="center" colname="c4"><p>82%</p></entry><entry align="center" colname="c5"><p>72%</p></entry><entry align="center" colname="c6"><p>79%</p></entry></row><row><entry colname="c1"><p>SAIC</p></entry><entry align="center" colname="c2"><p>93%</p></entry><entry align="center" colname="c3"><p>91%</p></entry><entry align="center" colname="c4"><p>87%</p></entry><entry align="center" colname="c5"><p>79%</p></entry><entry align="center" colname="c6"><p>74%</p></entry></row><row><entry colname="c1"><p><it>&#969;</it>&#8201;=&#8201;0.2, <it>&#963;</it><sub><it>&#955;</it></sub>&#8201;=&#8201;0.25, <it>&#956;</it><sub><it>&#955;</it></sub>&#8201;=&#8201;0.6, <it>N</it>=</p></entry><entry align="center" colname="c2"><p>40</p></entry><entry align="center" colname="c3"><p>50</p></entry><entry align="center" colname="c4"><p>60</p></entry><entry align="center" colname="c5"><p>70</p></entry><entry align="center" colname="c6"><p>80</p></entry></row><row><entry colname="c1"><p>GISTIC</p></entry><entry align="center" colname="c2"><p>58%</p></entry><entry align="center" colname="c3"><p>73%</p></entry><entry align="center" colname="c4"><p>79%</p></entry><entry align="center" colname="c5"><p>86%</p></entry><entry align="center" colname="c6"><p>89%</p></entry></row><row><entry colname="c1"><p>SAIC</p></entry><entry align="center" colname="c2"><p>65%</p></entry><entry align="center" colname="c3"><p>83%</p></entry><entry align="center" colname="c4"><p>87%</p></entry><entry align="center" colname="c5"><p>93%</p></entry><entry align="center" colname="c6"><p>94%</p></entry></row><row><entry colname="c1"><p><it>N</it>&#8201;=&#8201;60, <it>&#963;</it><sub><it>&#955;</it></sub>&#8201;=&#8201;0.25, <it>&#956;</it><sub><it>&#955;</it></sub>&#8201;=&#8201;0.6, <it>&#969;</it> =</p></entry><entry colname="c2"/><entry align="char" char="." colname="c3"><p>0.1</p></entry><entry align="char" char="." colname="c4"><p>0.15</p></entry><entry align="char" char="." colname="c5"><p>0.2</p></entry><entry align="char" char="." colname="c6"><p>0.25</p></entry></row><row><entry colname="c1"><p>GISTIC</p></entry><entry colname="c2"/><entry align="center" colname="c3"><p>30%</p></entry><entry align="center" colname="c4"><p>58%</p></entry><entry align="center" colname="c5"><p>80%</p></entry><entry align="center" colname="c6"><p>92%</p></entry></row><row rowsep="1"><entry colname="c1"><p>SAIC</p></entry><entry colname="c2"/><entry align="center" colname="c3"><p>37%</p></entry><entry align="center" colname="c4"><p>72%</p></entry><entry align="center" colname="c5"><p>87%</p></entry><entry align="center" colname="c6"><p>97%</p></entry></row></tbody></tgroup></table><p>We further assessed the overall performance of SAIC, measured by both sensitivity and specificity via ROC curves, as compared with the four peer methods (GISTIC, STAC, KC-SMART, CMDS). Based on the modified benchmark model proposed in <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, we generated 100 simulation data sets under each combinatorial parameter setting, where each data set consists of <it>N</it>&#8201;=&#8201;50 samples and each sample contains <it>M</it>&#8201;=&#8201;5,000 probes. The log-ratios at every probe are simulated by a mixture of normal and tumor genomes, with the normal cell fraction <it>&#955;</it> being randomly drawn from a uniform distribution <inline-formula><m:math name="1471-2164-13-342-i19" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi mathvariant="script">U</m:mi>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:mn>0.2</m:mn>
         <m:mo>,</m:mo>
         <m:mn>0.8</m:mn>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math></inline-formula>. Zero-mean Gaussian noise is then added to each sample with three levels of standard deviation <it>&#963;</it> randomly drawn from uniform distributions <inline-formula><m:math name="1471-2164-13-342-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi mathvariant="script">U</m:mi>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:mn>0.2</m:mn>
         <m:mo>,</m:mo>
         <m:mn>0.4</m:mn>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math></inline-formula><inline-formula><m:math name="1471-2164-13-342-i21" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi mathvariant="script">U</m:mi>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:mn>0.4</m:mn>
         <m:mo>,</m:mo>
         <m:mn>0.6</m:mn>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math></inline-formula>, and <inline-formula><m:math name="1471-2164-13-342-i22" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi mathvariant="script">U</m:mi>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:mn>0.6</m:mn>
         <m:mo>,</m:mo>
         <m:mn>0.8</m:mn>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math></inline-formula>. To make the simulations more realistic, for each simulated sample genome, we insert 2 to 10 randomly located background CNA regions with the lengths varying from 10 to 50 probes. There are three &#8216;amplification&#8217; (<it>L</it>&#8201;=&#8201;30, 20, 10) and one &#8216;deletion&#8217; (<it>L</it>&#8201;=&#8201;20) ground truth SCAs embedded in each of the simulation data sets with a baseline frequency <it>&#969;</it>&#8201;=&#8201;0.1. The copy numbers associated with amplification SCAs are 3, 4 and 5, and deletion SCAs are 0 and 1. In our simulation software, we use two parameters <it>&#946;</it><sub><it>L</it></sub> and <it>&#946;</it><sub><it>&#969;</it></sub> to modify the length and frequency of these SCAs. Other parameter settings include <it>&#952;</it><sub><it>&#961;</it></sub>&#8201;=&#8201;0.75, <it>&#952;</it><sub>amplification</sub>&#8201;=&#8201;0.1 and <it>&#952;</it><sub>deletion</sub>&#8201;=&#8201;&#8722;0.1 (default setting by GISTIC and CBS) for defining CNAs probes and units. Based on the estimated true positive rate (TPR) and corresponding FPR at different significance levels, Figure <figr fid="F3">3</figr> presents ROC curves of SAIC and peer methods derived from the simulation studies. These comparative experimental results consistently show that SAIC outperforms the peer methods in terms of larger areas <it>A</it><sub><it>z</it></sub> under the ROC curves or increased sensitivity at low FPR. More simulation studies are given in Additional file <supplr sid="S1">1</supplr>, where we report the power in detecting the boundaries of SCAs by these methods, and once again, showing outperformance of SAIC as compared to the peer methods <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B14">14</abbr></abbrgrp>.</p><suppl id="S1"><title><p>Additional file 1</p></title><text><p><b>Table S1.</b> Comparative detection rates of ground truth SCA boundaries by STAC, GISTIC, KC-SMART, CMDS, and SAIC for simulation data sets under various model parameter settings. The results are calculated based on 100 replications for each of the parameter settings and using p-value (or q-value) cutoff threshold &lt;0.05.</p></text><file name="1471-2164-13-342-S1.doc">
   <p>Click here for file</p>
</file></suppl><fig id="F3"><title><p>Figure 3</p></title><caption><p>Comparative performance of SAIC and four peer methods (STAC, GISTIC, KC-SMART, CMDS) on realistic simulation data sets, quantified by the partial ROC curves (north-west) (TPR: true positive rate; FPR: false positive rate)</p></caption><text>
   <p><b>Comparative performance of SAIC and four peer methods (STAC, GISTIC, KC-SMART, CMDS) on realistic simulation data sets, quantified by the partial ROC curves (north-west) (TPR: true positive rate; FPR: false positive rate).</b> The results are the averages calculated based on 100 replications under each of various parameter settings.</p>
</text><graphic file="1471-2164-13-342-3"/></fig></sec><sec><st><p>Application to four real cancer copy number data sets</p></st><p>We applied SAIC to four real cancer copy number data sets and identified many SCAs that encompass established or potentially novel cancer &#8216;driver&#8217; genes. The data sets are from ovarian cancer <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>, prostate cancer <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B18">18</abbr></abbrgrp>, lung adenocarcinoma <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B7">7</abbr></abbrgrp>, and glioblastoma <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B3">3</abbr></abbrgrp>. Due to their distinct biological functions in cancer development, SAIC analyzes separately chromosomes 1&#8211;22 and chromosome X/Y. To account for the different background CNA rates across chromosomes, we identify SCAs by performing SAIC on individual chromosomes. Other parameter settings include <it>T</it>&#8201;=&#8201;1000 and <it>&#945;</it>&#8201;=&#8201;0.05 (theoretical significance level or FPR/FWER). To provide a somewhat independent verification, we compared the SCAs detected by SAIC with what reproduced by GISTIC on lung adenocarcinoma and glioblastoma data sets that have been previously reported <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B7">7</abbr></abbrgrp>.</p></sec><sec><st><p>Results on the ovarian cancer data set</p></st><p>Our in-house ovarian cancer data set consists of <it>N</it>&#8201;=&#8201;63 tumor samples <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. Copy number signals were acquired using the Affymetrix Human Mapping 250&#8201;K Sty SNP Array platform <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Each sample contains a total of 238,230 probes across the whole genome. Other algorithm parameter settings include <it>&#952;</it><sub><it>&#961;</it></sub>&#8201;=&#8201;0.95, <it>&#952;</it><sub>amplification</sub>&#8201;=&#8201;0.263 (2.4 copies) and <it>&#952;</it><sub>deletion</sub>&#8201;=&#8201;&#8722;0.322 (1.6 copies) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The genome-wide landscapes (via -log<sub>10</sub><it>P</it>) of recurrent or sporadic CNAs observed in the data sets are given in Figure <figr fid="F4">4</figr>, where amplifications and deletions are separately shown (left and right sides). SAIC detected several SCAs (both amplification and deletion), many of which are biologically plausible and include known oncogenes (e.g., KRAS, CCNE1 and CCND2) and tumor suppressor genes (e.g., CDKN2A and CDKN2B) <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. Full lists of the genes covered by these SCAs are given in Additional file <supplr sid="S2">2</supplr> (ST 2). SAIC also identified many other cancer driver genes within individual chromosomes (ST 3), such as SKIL, CDK4, PIK3CA, PTEN, FGD4, FGFR1.</p><suppl id="S2"><title><p>Additional file 2</p></title><text><p><b>Table S2 and Table S3.</b> Details about the implicated SCAs and full list of genes covered by these SCAs, derived from the ovarian cancer data set.</p></text><file name="1471-2164-13-342-S2.doc">
   <p>Click here for file</p>
</file></suppl><fig id="F4"><title><p>Figure 4</p></title><caption><p>Genome-wide landscapes of recurrent or sporadic CNAs derived from 63 ovarian cancer samples</p></caption><text>
   <p><b>Genome-wide landscapes of recurrent or sporadic CNAs derived from 63 ovarian cancer samples.</b> Amplifications and deletions are displayed on the left and right sides, separately, where dashed lines correspond to the significance level <sub><it>&#945;</it> = 0.05</sub> for calling SCAs.</p>
</text><graphic file="1471-2164-13-342-4"/></fig></sec><sec><st><p>Results on the metastatic prostate cancer dataset</p></st><p>Our in-house prostate cancer data set consists of <it>N</it>&#8201;=&#8201;55 clustered metastatic tumor samples, obtained from 13 prostate cancer patients. Copy number signals were acquired using Affymetrix Genome-Wide Human SNP Array 6.0 <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B18">18</abbr></abbrgrp>. Each sample contains a total of 1,868,857 probes across the whole genome. To discount the potential bias due to imbalanced subject-cluster sampling <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, we chose to analyze the <it>N</it>&#8201;=&#8201;13 representative samples and to detect global recurrent CNAs by SAIC. Other algorithm parameter settings include <it>&#952;</it><sub><it>&#961;</it></sub>&#8201;=&#8201;0.95, <it>&#952;</it><sub>amplification</sub>&#8201;=&#8201;0.263 and <it>&#952;</it><sub>deletion</sub>&#8201;=&#8201;&#8722;0.322, the same as used in analyzing ovarian cancer data. The genome-wide landscape of recurrent or sporadic CNAs observed in metastatic prostate cancer data is given in Figure <figr fid="F5">5</figr>, where SAIC detected 15 amplification SCAs (318 genes) and 21 deletion SCAs (756genes). Full list of the genes covered by these SCAs are given in Additional file <supplr sid="S3">3</supplr> (ST 4). Many of these genes are cancer related (e.g., EGFR, BRCA2, TP53, ATBF1, MYC and RB1). In individual chromosome analysis of the data set, SAIC identified many other SCAs involved with cancer driver genes, such as PTEN (ST 5).</p><suppl id="S3"><title><p>Additional file 3</p></title><text><p><b>Table S4 and Table S5. </b>Details about the implicated SCAs and full list of genes covered by these SCAs, derived from the prostate cancer data set.</p></text><file name="1471-2164-13-342-S3.doc">
   <p>Click here for file</p>
</file></suppl><fig id="F5"><title><p>Figure 5</p></title><caption><p>Genome-wide landscapes of recurrent or sporadic CNAs derived from 13 metastatic prostate cancer samples</p></caption><text>
   <p><b>Genome-wide landscapes of recurrent or sporadic CNAs derived from 13 metastatic prostate cancer samples.</b> Amplifications and deletions are displayed on the left and right sides, separately, where dashed lines correspond to the significance level <it>&#945;</it>&#8201;=&#8201;0.05 for calling SCAs.</p>
</text><graphic file="1471-2164-13-342-5"/></fig></sec><sec><st><p>Results on the lung adenocarcinoma and glioblastoma datasets</p></st><p>The lung adenocarcinoma data set consists of <it>N</it>&#8201;=&#8201;371 tumor samples, publicly available at <url>http://www.broad.mit.edu/cancer/pub/tsp</url><abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Copy number signals were acquired using Affymetrix 250K Sty SNP Array, where each sample contains a total of 216,327 probes across the whole genome <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. To assure the general comparability of the results produced by SAIC and GISTIC, we adopted similar algorithm parameter settings used by GISTIC for detecting focal SCAs: <it>&#952;</it><sub>amplification</sub>&#8201;=&#8201;0.848 and <it>&#952;</it><sub>deletion</sub>&#8201;=&#8201;&#8722;1.15, in addition to <it>&#952;</it><sub><it>&#961;</it></sub>&#8201;=&#8201;0.9. The genome-wide landscape of recurrent or sporadic CNAs observed in lung adenocarcinoma data is given in Figure <figr fid="F6">6</figr>, where SAIC detected 23 amplification SCAs and 26 deletion SCAs (after combining some of 98 recurrent CNAs within the same cytobands). Full list of the genes covered by these SCAs is given in Additional file <supplr sid="S4">4</supplr> (ST 6). The Venn diagram in Figure <figr fid="F7">7</figr> reveals the numbers of common and distinctive SCAs detected by SAIC and GISTIC. It can be seen that SAIC successfully detected most (87% amplification and 75% deletion regions) of the SCAs that have been detected by GISTIC, while also revealing many additional SCAs (10 amplification and 23 deletion regions) <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. In addition, the result from within-chromosome analysis of the data set is listed in Additional file <supplr sid="S4">4</supplr> (ST 7).</p><suppl id="S4"><title><p>Additional file 4</p></title><text><p><b>Table 6 and Suplementary Table 7. </b>Details about the implicated SCAs and full list of genes covered by these SCAs, derived from the lung adenocarcinoma data set.</p></text><file name="1471-2164-13-342-S4.doc">
   <p>Click here for file</p>
</file></suppl><fig id="F6"><title><p>Figure 6</p></title><caption><p>Genome-wide landscapes of recurrent or sporadic CNAs derived from 371 lung adenocarcinoma samples</p></caption><text>
   <p><b>Genome-wide landscapes of recurrent or sporadic CNAs derived from 371 lung adenocarcinoma samples.</b> Amplifications and deletions are displayed on the left and right sides, separately, where dashed lines correspond to the significance level <sub><it>&#945;</it> = 0.05</sub> for calling SCAs.</p>
</text><graphic file="1471-2164-13-342-6"/></fig><fig id="F7"><title><p>Figure 7</p></title><caption><p>Venn diagram on the numbers of common and distinct focal SCAs detected by SAIC and GISTIC in the lung adenocarcinoma samples</p></caption><text>
   <p>
      <b>Venn diagram on the numbers of common and distinct focal SCAs detected by SAIC and GISTIC in the lung adenocarcinoma samples.</b>
   </p>
</text><graphic file="1471-2164-13-342-7"/></fig><p>The glioblastoma data set consists of <it>N</it>&#8201;=&#8201;141 tumor samples, publicly available at <url>http://www.broad.mit.edu/cancer/pub/GISTIC</url>, where each sample contains a total of 115,593 probes across the whole genome <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Once again, we adopted the similar algorithm parameter settings used by GISTIC for detecting focal SCAs. The genome-wide landscape of recurrent or sporadic CNAs observed in glioblastoma data is given in Figure <figr fid="F8">8</figr>, where SAIC detected 15 amplification SCAs and 30 deletion SCAs (after combining some of 67 recurrent CNAs within the same cytobands). Full list of the genes covered by these SCAs are given in Additional file <supplr sid="S5">5</supplr> (ST 8). The Venn diagram in Figure <figr fid="F9">9</figr> reveals the numbers of common and distinctive SCAs detected by SAIC and GISTIC. It can be seen that SAIC successfully detected most (88% amplification and 75% deletion regions) of the SCAs that have been detected by GISTIC, while it also revealed many additional SCAs (8 amplification and 27 deletion regions) <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. In addition, the result from within-chromosome analysis of the data set is listed in Additional file <supplr sid="S5">5</supplr> (ST 9).</p><suppl id="S5"><title><p>Additional file 5</p></title><text><p><b>Table S8 and Table S9.</b> Details about the implicated SCAs and full list of genes covered by these SCAs, derived from the glioblastoma data set.</p></text><file name="1471-2164-13-342-S5.doc">
   <p>Click here for file</p>
</file></suppl><fig id="F8"><title><p>Figure 8</p></title><caption><p>Genome-wide landscapes of recurrent or sporadic CNAs derived from 141 glioblastoma samples</p></caption><text>
   <p><b>Genome-wide landscapes of recurrent or sporadic CNAs derived from 141 glioblastoma samples.</b> Amplifications and deletions are displayed on the left and right sides, separately, where dashed lines correspond to the significance level <sub><it>&#945;</it> = 0.05</sub> for calling SCAs.</p>
</text><graphic file="1471-2164-13-342-8"/></fig><fig id="F9"><title><p>Figure 9</p></title><caption><p>Venn diagram on the numbers of common and distinct focal SCAs detected by SAIC and GISTIC in the glioblastoma samples</p></caption><text>
   <p>
      <b>Venn diagram on the numbers of common and distinct focal SCAs detected by SAIC and GISTIC in the glioblastoma samples.</b>
   </p>
</text><graphic file="1471-2164-13-342-9"/></fig><p>The common SCAs regions (e.g., 7p11.2, 12p12.1, 9p21.3, etc.) are highly consistent with previous reports, and largely encompass well-known oncogenes or tumor suppressor genes. For example, EGFR (epidermal growth factor receptor) is an oncogene within 7p11.2 whose mutations or amplifications have been shown to contribute to uncontrolled cell division (a predisposition for cancer) <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Many additional SCAs regions (e.g., 8p23.2, 21q22.2) contain or adjacent to disease-related genes (e.g., CSMD1 and TMPRSS3) and may warrant further study.</p></sec></sec><sec><st><p>Discussion</p></st><p>SAIC is similar to many peer methods in that it assesses statistical significance of SCAs using a permutation-based null distribution <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B12">12</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. However, in contrast to the existing procedures, the CNA units used by SAIC preserve the essential correlation structures of serial probes whose estimated average correlation coefficient can be as high as 0.985 <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Moreover, by automatically adjusting P-values for multiple comparisons <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp> and iteratively re-estimating the null distribution exclusive of detected SCAs <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, SAIC can preserve the intrinsic false positive rate, without compromising detection power to resort to sometimes overly conservative schemes <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. Theoretic analysis and extensive experimental results show that SAIC preserves both type 1 error and detection power, see Tables&#8201;<tblr tid="T1">1</tblr><tblr tid="T2">2</tblr>. Furthermore, the novel concept of CNA unit and associated scoring and permutation scheme neatly parallels many considerations in the revised GISTIC2.0 <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, for example, serial probes covering driver events should be more highly correlated than probes covering only passengers and thus more likely to identify the target genes. The flexible length-adaptive significance assessment of CNA units via Eq. (4) automatically accounts for distinct background rates according to their lengths and thus more likely to detect independent SCAs.</p><p>As for the <it>&#952;</it><sub>amplification</sub> and <it>&#952;</it><sub>deletion</sub> parameters in the SAIC algorithm, there is no general guideline about how to select their values <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, since different types of cancers usually have different rates and magnitudes of background CNAs <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B26">26</abbr><abbr bid="B35">35</abbr></abbrgrp>. In addition, various degrees of normal cell contamination <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and intratumor heterogeneity <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp> occur in many samples and these further complicate the selection of parameter values. Practically, lower thresholds were used to define broad (arm-level) CNAs while higher thresholds were used to define focal CNAs <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B14">14</abbr></abbrgrp>. A newly proposed strategy is to apply joint magnitude-length thresholds <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> and to correct normal cell contamination using BACOM <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Since our main objective here is to identify focal CNAs, we have largely adopted the same strategy used in <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B14">14</abbr></abbrgrp>, i.e., we used relatively higher thresholds to define focal CNAs for subsequent analyses. Specifically, based on the observation that the magnitude of CNAs in ovarian and prostate cancers is relatively low, we used relatively lower and commonly used thresholds (2.0&#8201;&#177;&#8201;0.4), i.e., 2.4 copies for amplification and 1.6 copies for deletion. In contrast, on the datasets of lung adenocarcinoma and glioblastoma, we applied relatively higher thresholds (2.0&#8201;+&#8201;1.6, 2.0&#8211;1.1), i.e., 3.6 copies for amplification and 0.9 copies for deletion, that are similar to the thresholds used by GISTIC algorithms <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B14">14</abbr></abbrgrp>.</p><p>Similar situation occurs to the selection of <it>&#952;</it><sub><it>&#961;</it></sub> in defining CNA units <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Lower values of <it>&#952;</it><sub><it>&#961;</it></sub> often produce longer CNA units while higher values of <it>&#952;</it><sub><it>&#961;</it></sub> often produce shorter CNA units. It has been reported that the average successive probe correlation of the segmented data can be as high as 0.985 <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B32">32</abbr></abbrgrp>. In our experience in analyzing real cancer datasets, a value of <inline-formula><m:math name="1471-2164-13-342-i23" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:msub>
   <m:mi>&#952;</m:mi>
   <m:mi>&#961;</m:mi>
</m:msub>
</m:math></inline-formula> taking between 0.7 and 0.95 would be a suitable choice.</p><p>It is important to note that the general conclusion on the relative performance of our SAIC and peer methods, at least based on the extensive simulation studies, remains largely true. We have used the same parameter values in all methods so that a fair comparison on their relative performances can be assured. Based on our analysis of real datasets using current parameter settings, it appears that SAIC performs well when compared to peer methods. In addition, the results of extensive simulation studies, performed under a variety of probe correlation schemes, show that SAIC preserves well the expected type 1 error, even when the probes follow non-stationary correlation structures similar to those found in real data <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p><p>SAIC currently can perform either genome-wide (except X/Y chromosome due to its distinct biological role) or chromosome-based CNA unit permutations. In the application of SAIC to real cancer data sets, we performed genome-wide, autosome-based, and X/Y-chromosome-based permutations. The combined results from using different permutation schemes contain more SCAs that may involve novel cancer driver genes. By exploiting the novel concepts of CNA probe, CNA unit, and multiscale permutation, experimental results show that SAIC can accurately detect the boundaries of SCAs with different lengths, see Additional file <supplr sid="S1">1</supplr>.</p><p>We have also performed simulation studies (data not shown) that indicate that detection power of SAIC can be further improved by correcting for normal tissue contamination using a recently developed BACOM method <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. However, the current version of BACOM requires paired tumor-normal sampling, availability of two-channel signals, and existence of deletion CNAs. Thus, we leave the combination of SAIC and BACOM as an extension for future research.</p></sec><sec><st><p>Conclusions</p></st><p>We have presented a novel approach to accurately detect significant recurrent CNAs in cancer genomes which is both statistically-principled and which, as illustrated by real examples, can be very effective at revealing SCAs within data. The concepts of CNA unit and iterative permutation are relatively simple to interpret, yet still convey considerable novel mathematical insights into data structure and bias correction.</p><p>It is worth noting that there are three novel features associated with SAIC. First, we define CNA unit to capture the intrinsic correlation structure in copy number data. Second, we perform iterative SCA-exclusive permutation to produce an unbiased null distribution. Third, we apply SAIC to real cancer copy number datasets and detect most previously reported SCAs covering well-known cancer genes.</p><p>Two important pending issues with the present algorithm are the expected significant impact of intratumor heterogeneity and normal cell contamination <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>. We are currently investigating applications of BACOM based normal cell correction <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and hierarchical bi-clustering that optimize critical steps such as the selection of various thresholds and identification of subtype-specific copy number alterations.</p></sec><sec><st><p>Appendix A</p></st><p><it>Proof of theorem 1.</it> Let  &#945;&#8217; be the significance level used in each iteration to detect SCAs in Algorithm 2. Under the truth converging null distribution, we have</p><p><display-formula id="M5"><m:math name="1471-2164-13-342-i24" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi mathvariant="normal">Pr</m:mi>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:msup>
            <m:mtext>SCA</m:mtext>
            <m:mrow>
               <m:mo stretchy="false">(</m:mo>
               <m:mi>r</m:mi>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
         </m:msup>
         <m:mtext>= 'yes'</m:mtext>
         <m:mo stretchy="true">|</m:mo>
         <m:msup>
            <m:mtext>SCA</m:mtext>
            <m:mrow>
               <m:mo stretchy="false">(</m:mo>
               <m:mi>r</m:mi>
               <m:mo>&#8722;</m:mo>
               <m:mn>1</m:mn>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
         </m:msup>
         <m:mtext>= 'yes'</m:mtext>
      </m:mrow>
   </m:mfenced>
   <m:mo>=</m:mo>
   <m:mi>&#945;</m:mi>
   <m:mo>'</m:mo>
   <m:mtext>,</m:mtext>
</m:mrow>
</m:math></display-formula></p><p>for iterations <inline-formula><m:math name="1471-2164-13-342-i25" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>r</m:mi>
   <m:mo>=</m:mo>
   <m:mn>1</m:mn>
   <m:mo>,</m:mo>
   <m:mn>2</m:mn>
   <m:mo>,</m:mo>
   <m:mo>&#8230;</m:mo>
   <m:mo>,</m:mo>
   <m:mi>&#8734;</m:mi>
</m:mrow>
</m:math></inline-formula> since SAIC assesses the &#8216;new&#8217; SCAs at the <it>r</it>th iteration conditional on having found the &#8216;existing&#8217; SCAs at the (<it>r</it>-1)th iteration.Considering</p><p><display-formula id="M6"><m:math name="1471-2164-13-342-i26" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd>
         <m:mspace width="1em"/>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>2</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:mtext>=</m:mtext>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>2</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:msup>
                  <m:mtext>= 'yes',SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:mo>=</m:mo>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>2</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
               <m:mo stretchy="true">|</m:mo>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:mo>=</m:mo>
         <m:mi>&#945;</m:mi>
         <m:mo>'</m:mo>
         <m:mo>&#183;</m:mo>
         <m:mi>&#945;</m:mi>
         <m:mo>'</m:mo>
         <m:mo>=</m:mo>
         <m:mi>&#945;</m:mi>
         <m:msup>
            <m:mo>'</m:mo>
            <m:mn>2</m:mn>
         </m:msup>
         <m:mtext>.</m:mtext>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math></display-formula>Therefore for the <it>r</it>th iteration,</p><p><display-formula id="M7"><m:math name="1471-2164-13-342-i27" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd>
         <m:mspace width="0.5em"/>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:mtext>=</m:mtext>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:msup>
                  <m:mtext>= 'yes',SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo>&#8722;</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes',</m:mtext>
               <m:mo>&#8230;</m:mo>
               <m:msup>
                  <m:mtext>,SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:mo>=</m:mo>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
               <m:mo stretchy="true">|</m:mo>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo>&#8722;</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:msup>
                  <m:mtext>= 'yes',SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo>&#8722;</m:mo>
                     <m:mn>2</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes',</m:mtext>
               <m:mo>&#8230;,</m:mo>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:mo>&#183;</m:mo>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo>&#8722;</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
               <m:mo stretchy="true">|</m:mo>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo>&#8722;</m:mo>
                     <m:mn>2</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
               <m:mo>,</m:mo>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo>&#8722;</m:mo>
                     <m:mn>3</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes',</m:mtext>
               <m:mo>&#8230;</m:mo>
               <m:msup>
                  <m:mtext>,SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes</m:mtext>
            </m:mrow>
         </m:mfenced>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:mo>&#183;</m:mo>
         <m:mo>&#8230;</m:mo>
         <m:mo>&#183;</m:mo>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>2</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
               <m:mo stretchy="true">|</m:mo>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:mo>=</m:mo>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
               <m:mo stretchy="true">|</m:mo>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo>&#8722;</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
         <m:mo>&#183;</m:mo>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo>&#8722;</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
               <m:mo stretchy="true">|</m:mo>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>r</m:mi>
                     <m:mo>&#8722;</m:mo>
                     <m:mn>2</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:mo>&#183;</m:mo>
         <m:mo>&#8230;</m:mo>
         <m:mo>&#183;</m:mo>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>2</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
               <m:mo stretchy="true">|</m:mo>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
         <m:mi mathvariant="normal">Pr</m:mi>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msup>
                  <m:mtext>SCA</m:mtext>
                  <m:mrow>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mtext>= 'yes'</m:mtext>
            </m:mrow>
         </m:mfenced>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:mo>=</m:mo>
         <m:mi>&#945;</m:mi>
         <m:mo>'</m:mo>
         <m:mo>&#183;</m:mo>
         <m:mi>&#945;</m:mi>
         <m:mo>'</m:mo>
         <m:mo>&#183;</m:mo>
         <m:mi>&#945;</m:mi>
         <m:mo>'</m:mo>
         <m:mo>&#8943;</m:mo>
         <m:mi>&#945;</m:mi>
         <m:mo>'</m:mo>
         <m:mo>=</m:mo>
         <m:mi>&#945;</m:mi>
         <m:msup>
            <m:mo>'</m:mo>
            <m:mi>r</m:mi>
         </m:msup>
         <m:mtext>.</m:mtext>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math></display-formula>The rationale behind the above derivation is that <inline-formula><m:math name="1471-2164-13-342-i28" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msup>
      <m:mtext>SCA</m:mtext>
      <m:mrow>
         <m:mo stretchy="false">(</m:mo>
         <m:mi>r</m:mi>
         <m:mo>&#8722;</m:mo>
         <m:mn>1</m:mn>
         <m:mo stretchy="false">)</m:mo>
      </m:mrow>
   </m:msup>
   <m:mtext>= 'yes'</m:mtext>
</m:mrow>
</m:math></inline-formula> already implies <inline-formula><m:math name="1471-2164-13-342-i29" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msup>
      <m:mtext>SCA</m:mtext>
      <m:mrow>
         <m:mo stretchy="false">(</m:mo>
         <m:mi>r</m:mi>
         <m:mo>&#8722;</m:mo>
         <m:mn>2</m:mn>
         <m:mo stretchy="false">)</m:mo>
      </m:mrow>
   </m:msup>
   <m:mtext>= 'yes',</m:mtext>
   <m:mo>&#8230;,</m:mo>
   <m:msup>
      <m:mtext>SCA</m:mtext>
      <m:mrow>
         <m:mo stretchy="false">(</m:mo>
         <m:mn>1</m:mn>
         <m:mo stretchy="false">)</m:mo>
      </m:mrow>
   </m:msup>
   <m:mtext>= 'yes'</m:mtext>
</m:mrow>
</m:math></inline-formula>. In other words, we have</p><p><display-formula id="M8"><m:math name="1471-2164-13-342-i30" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi mathvariant="normal">Pr</m:mi>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:msup>
            <m:mtext>SCA</m:mtext>
            <m:mrow>
               <m:mo stretchy="false">(</m:mo>
               <m:mi>r</m:mi>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
         </m:msup>
         <m:mtext>= 'yes'</m:mtext>
      </m:mrow>
   </m:mfenced>
   <m:mtext>=</m:mtext>
   <m:mi mathvariant="normal">Pr</m:mi>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:msup>
            <m:mtext>SCA</m:mtext>
            <m:mrow>
               <m:mo stretchy="false">(</m:mo>
               <m:mi>r</m:mi>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
         </m:msup>
         <m:msup>
            <m:mtext>= 'yes',SCA</m:mtext>
            <m:mrow>
               <m:mo stretchy="false">(</m:mo>
               <m:mi>r</m:mi>
               <m:mo>&#8722;</m:mo>
               <m:mn>1</m:mn>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
         </m:msup>
         <m:mtext>= 'yes',</m:mtext>
         <m:mo>&#8230;,</m:mo>
         <m:msup>
            <m:mtext>SCA</m:mtext>
            <m:mrow>
               <m:mo stretchy="false">(</m:mo>
               <m:mn>1</m:mn>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
         </m:msup>
         <m:mtext>= 'yes'</m:mtext>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math></display-formula></p><p>and</p><p><display-formula id="M9"><m:math name="1471-2164-13-342-i31" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mtable columnalign="left">
      <m:mtr columnalign="left">
         <m:mtd columnalign="left">
            <m:mrow>
               <m:mi mathvariant="normal">Pr</m:mi>
               <m:mfenced open="(" close=")">
                  <m:mrow>
                     <m:msup>
                        <m:mtext>SCA</m:mtext>
                        <m:mrow>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>r</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                     </m:msup>
                     <m:mtext>= 'yes'</m:mtext>
                     <m:mo stretchy="true">|</m:mo>
                     <m:msup>
                        <m:mtext>SCA</m:mtext>
                        <m:mrow>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>r</m:mi>
                           <m:mo>&#8722;</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                     </m:msup>
                     <m:msup>
                        <m:mtext>= 'yes',SCA</m:mtext>
                        <m:mrow>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>r</m:mi>
                           <m:mo>&#8722;</m:mo>
                           <m:mn>2</m:mn>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                     </m:msup>
                     <m:mtext>= 'yes',</m:mtext>
                     <m:mo>&#8230;,</m:mo>
                     <m:msup>
                        <m:mtext>SCA</m:mtext>
                        <m:mrow>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                     </m:msup>
                     <m:mtext>= 'yes'</m:mtext>
                  </m:mrow>
               </m:mfenced>
            </m:mrow>
         </m:mtd>
      </m:mtr>
      <m:mtr columnalign="left">
         <m:mtd columnalign="left">
            <m:mrow>
               <m:mo>=</m:mo>
               <m:mi mathvariant="normal">Pr</m:mi>
               <m:mfenced open="(" close=")">
                  <m:mrow>
                     <m:msup>
                        <m:mtext>SCA</m:mtext>
                        <m:mrow>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>r</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                     </m:msup>
                     <m:mtext>= 'yes'</m:mtext>
                     <m:mo stretchy="true">|</m:mo>
                     <m:msup>
                        <m:mtext>SCA</m:mtext>
                        <m:mrow>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>r</m:mi>
                           <m:mo>&#8722;</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                     </m:msup>
                     <m:mtext>= 'yes'</m:mtext>
                  </m:mrow>
               </m:mfenced>
            </m:mrow>
         </m:mtd>
      </m:mtr>
   </m:mtable>
   <m:mtext>.</m:mtext>
</m:mrow>
</m:math></display-formula>Let <it>&#945;</it> be the targeted FPR, we have</p><p><display-formula id="M10"><m:math name="1471-2164-13-342-i32" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd>
         <m:mspace width="0.5em"/>
         <m:mi>&#945;</m:mi>
         <m:mtext>=</m:mtext>
         <m:mstyle>
            <m:munderover>
               <m:mo>&#8721;</m:mo>
               <m:mrow>
                  <m:mi>r</m:mi>
                  <m:mo>=</m:mo>
                  <m:mn>1</m:mn>
               </m:mrow>
               <m:mi>&#8734;</m:mi>
            </m:munderover>
            <m:mrow>
               <m:mi mathvariant="normal">Pr</m:mi>
               <m:mfenced open="(" close=")">
                  <m:mrow>
                     <m:msup>
                        <m:mtext>SCA</m:mtext>
                        <m:mrow>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>r</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                     </m:msup>
                     <m:mtext>= 'yes'</m:mtext>
                  </m:mrow>
               </m:mfenced>
            </m:mrow>
         </m:mstyle>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd>
         <m:mo>=</m:mo>
         <m:mi>&#945;</m:mi>
         <m:mo>'</m:mo>
         <m:mo>+</m:mo>
         <m:mi>&#945;</m:mi>
         <m:msup>
            <m:mo>'</m:mo>
            <m:mn>2</m:mn>
         </m:msup>
         <m:mo>+</m:mo>
         <m:mo>&#8230;</m:mo>
         <m:mo>+</m:mo>
         <m:mi>&#945;</m:mi>
         <m:msup>
            <m:mo>'</m:mo>
            <m:mi>r</m:mi>
         </m:msup>
         <m:mo>+</m:mo>
         <m:mo>&#8230;</m:mo>
         <m:mo>=</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:mi>&#945;</m:mi>
               <m:mo>'</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:mi>&#945;</m:mi>
               <m:mo>'</m:mo>
            </m:mrow>
         </m:mfrac>
         <m:mtext>,</m:mtext>
         <m:mspace width="1.25em"/>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:mi>&#945;</m:mi>
               <m:mo>'</m:mo>
               <m:mo>&lt;</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:mfenced>
         <m:mtext>.</m:mtext>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math></display-formula>Accordingly, we have <inline-formula><m:math name="1471-2164-13-342-i33" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>&#945;</m:mi>
   <m:mo>'</m:mo>
   <m:mo>=</m:mo>
   <m:mi>&#945;</m:mi>
   <m:mo>/</m:mo>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:mn>1</m:mn>
         <m:mo>+</m:mo>
         <m:mi>&#945;</m:mi>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math></inline-formula>.</p></sec><sec><st><p>Additional files</p></st></sec><sec><st><p>Competing interests</p></st><p>The authors declared that they have no competing interests.</p></sec><sec><st><p>Authors&#8217; contributions</p></st><p>XY, GY and YW participated in the design of concepts and methods. XY and GY developed the permutation strategy and CNA simulation algorithm. XY implemented the C++ code. RRW implemented the R code of GISTIC. GY, XY and XH analyzed and evaluated the algorithm. XH and YW constructed and proved Theorem 1. YW, XY and GY drafted the manuscript. IMS and EPH interpreted the results on real cancer data. JZ, RC and EPH help edited the manuscript. YW, RC and ZZ conceived of the study, participated in its design and coordination, and helped edited the paper. All authors read and approved the final manuscript.</p></sec></bdy><bm><ack><sec><st><p>Acknowledgements</p></st><p>This work was supported in part by the US National Institutes of Health under Grants CA160036, CA149147, NS029525, and GM085665, and the Project Supported by Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2012JQ8027), and the Fundamental Research Funds for the Central Universities (No.K50511030002), and the Natural Science Foundation of China under Grants 61070137, 91130006, and 60933009.</p></sec></ack><refgrp><bibl id="B1"><title><p>The landscape of somatic copy-number alteration across human cancers</p></title><aug><au><snm>Beroukhim</snm><fnm>R</fnm></au><au><snm>Mermel</snm><fnm>CH</fnm></au><au><snm>Porter</snm><fnm>D</fnm></au><au><snm>Wei</snm><fnm>G</fnm></au><au><snm>Raychaudhuri</snm><fnm>S</fnm></au><au><snm>Donovan</snm><fnm>J</fnm></au><au><snm>Barretina</snm><fnm>J</fnm></au><au><snm>Boehm</snm><fnm>JS</fnm></au><au><snm>Dobson</snm><fnm>J</fnm></au><au><snm>Urashima</snm><fnm>M</fnm></au><etal/></aug><source>Nature</source><pubdate>2010</pubdate><volume>463</volume><issue>7283</issue><fpage>899</fpage><lpage>905</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature08822</pubid><pubid idtype="pmcid">2826709</pubid><pubid idtype="pmpid" link="fulltext">20164920</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Integrated analysis of homozygous deletions, focal amplifications, and sequence alterations in breast and colorectal cancers</p></title><aug><au><snm>Leary</snm><fnm>RJ</fnm></au><au><snm>Lin</snm><fnm>JC</fnm></au><au><snm>Cummins</snm><fnm>J</fnm></au><au><snm>Boca</snm><fnm>S</fnm></au><au><snm>Wood</snm><fnm>LD</fnm></au><au><snm>Parsons</snm><fnm>DW</fnm></au><au><snm>Jones</snm><fnm>S</fnm></au><au><snm>Sjoblom</snm><fnm>T</fnm></au><au><snm>Park</snm><fnm>BH</fnm></au><au><snm>Parsons</snm><fnm>R</fnm></au><etal/></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2008</pubdate><volume>105</volume><issue>42</issue><fpage>16224</fpage><lpage>16229</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0808041105</pubid><pubid idtype="pmcid">2571022</pubid><pubid idtype="pmpid" link="fulltext">18852474</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma</p></title><aug><au><snm>Beroukhim</snm><fnm>R</fnm></au><au><snm>Getz</snm><fnm>G</fnm></au><au><snm>Nghiemphu</snm><fnm>L</fnm></au><au><snm>Barretina</snm><fnm>J</fnm></au><au><snm>Hsueh</snm><fnm>T</fnm></au><au><snm>Linhart</snm><fnm>D</fnm></au><au><snm>Vivanco</snm><fnm>I</fnm></au><au><snm>Lee</snm><fnm>JC</fnm></au><au><snm>Huang</snm><fnm>JH</fnm></au><au><snm>Alexander</snm><fnm>S</fnm></au><etal/></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2007</pubdate><volume>104</volume><issue>50</issue><fpage>20007</fpage><lpage>20012</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0710052104</pubid><pubid idtype="pmcid">2148413</pubid><pubid idtype="pmpid" link="fulltext">18077431</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>The genomic landscapes of human breast and colorectal cancers</p></title><aug><au><snm>Wood</snm><fnm>LD</fnm></au><au><snm>Parsons</snm><fnm>DW</fnm></au><au><snm>Jones</snm><fnm>S</fnm></au><au><snm>Lin</snm><fnm>J</fnm></au><au><snm>Sjoblom</snm><fnm>T</fnm></au><au><snm>Leary</snm><fnm>RJ</fnm></au><au><snm>Shen</snm><fnm>D</fnm></au><au><snm>Boca</snm><fnm>SM</fnm></au><au><snm>Barber</snm><fnm>T</fnm></au><au><snm>Ptak</snm><fnm>J</fnm></au><etal/></aug><source>Science</source><pubdate>2007</pubdate><volume>318</volume><issue>5853</issue><fpage>1108</fpage><lpage>1113</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1145720</pubid><pubid idtype="pmpid" link="fulltext">17932254</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays</p></title><aug><au><snm>Zhao</snm><fnm>X</fnm></au><au><snm>Li</snm><fnm>C</fnm></au><au><snm>Paez</snm><fnm>JG</fnm></au><au><snm>Chin</snm><fnm>K</fnm></au><au><snm>Janne</snm><fnm>PA</fnm></au><au><snm>Chen</snm><fnm>TH</fnm></au><au><snm>Girard</snm><fnm>L</fnm></au><au><snm>Minna</snm><fnm>J</fnm></au><au><snm>Christiani</snm><fnm>D</fnm></au><au><snm>Leo</snm><fnm>C</fnm></au><etal/></aug><source>Cancer Res</source><pubdate>2004</pubdate><volume>64</volume><issue>9</issue><fpage>3060</fpage><lpage>3071</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1158/0008-5472.CAN-03-3308</pubid><pubid idtype="pmpid" link="fulltext">15126342</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Array comparative genomic hybridization and its applications in cancer</p></title><aug><au><snm>Pinkel</snm><fnm>D</fnm></au><au><snm>Albertson</snm><fnm>DG</fnm></au></aug><source>Nat Genet</source><pubdate>2005</pubdate><volume>37</volume><issue>Suppl</issue><fpage>S11</fpage><lpage>S17</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">15920524</pubid></xrefbib></bibl><bibl id="B7"><title><p>Characterizing the cancer genome in lung adenocarcinoma</p></title><aug><au><snm>Weir</snm><fnm>BA</fnm></au><au><snm>Woo</snm><fnm>MS</fnm></au><au><snm>Getz</snm><fnm>G</fnm></au><au><snm>Perner</snm><fnm>S</fnm></au><au><snm>Ding</snm><fnm>L</fnm></au><au><snm>Beroukhim</snm><fnm>R</fnm></au><au><snm>Lin</snm><fnm>WM</fnm></au><au><snm>Province</snm><fnm>MA</fnm></au><au><snm>Kraja</snm><fnm>A</fnm></au><au><snm>Johnson</snm><fnm>LA</fnm></au><etal/></aug><source>Nature</source><pubdate>2007</pubdate><volume>450</volume><issue>7171</issue><fpage>893</fpage><lpage>898</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature06358</pubid><pubid idtype="pmcid">2538683</pubid><pubid idtype="pmpid" link="fulltext">17982442</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Copy number analysis indicates monoclonal origin of lethal metastatic prostate cancer</p></title><aug><au><snm>Liu</snm><fnm>W</fnm></au><au><snm>Laitinen</snm><fnm>S</fnm></au><au><snm>Khan</snm><fnm>S</fnm></au><au><snm>Vihinen</snm><fnm>M</fnm></au><au><snm>Kowalski</snm><fnm>J</fnm></au><au><snm>Yu</snm><fnm>G</fnm></au><au><snm>Chen</snm><fnm>L</fnm></au><au><snm>Ewing</snm><fnm>CM</fnm></au><au><snm>Eisenberger</snm><fnm>MA</fnm></au><au><snm>Carducci</snm><fnm>MA</fnm></au><etal/></aug><source>Nat Med</source><pubdate>2009</pubdate><volume>15</volume><issue>5</issue><fpage>559</fpage><lpage>565</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nm.1944</pubid><pubid idtype="pmcid">2839160</pubid><pubid idtype="pmpid" link="fulltext">19363497</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>DiNAMIC: a method to identify recurrent DNA copy number aberrations in tumors</p></title><aug><au><snm>Walter</snm><fnm>V</fnm></au><au><snm>Nobel</snm><fnm>AB</fnm></au><au><snm>Wright</snm><fnm>FA</fnm></au></aug><source>Bioinformatics</source><pubdate>2011</pubdate><volume>27</volume><issue>5</issue><fpage>678</fpage><lpage>685</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btq717</pubid><pubid idtype="pmcid">3042182</pubid><pubid idtype="pmpid" link="fulltext">21183584</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Modeling recurrent DNA copy number alterations in array CGH data</p></title><aug><au><snm>Shah</snm><fnm>SP</fnm></au><au><snm>Lam</snm><fnm>WL</fnm></au><au><snm>Ng</snm><fnm>RT</fnm></au><au><snm>Murphy</snm><fnm>KP</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><issue>13</issue><fpage>i450</fpage><lpage>i458</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btm221</pubid><pubid idtype="pmpid" link="fulltext">17646330</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Finding recurrent copy number alteration regions: a review of methods</p></title><aug><au><snm>Rueda</snm><fnm>OM</fnm></au><au><snm>Diaz-Uriarte</snm><fnm>R</fnm></au></aug><source>Curr Bioinforma</source><pubdate>2010</pubdate><volume>5</volume><fpage>17</fpage></bibl><bibl id="B12"><title><p>STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments</p></title><aug><au><snm>Diskin</snm><fnm>SJ</fnm></au><au><snm>Eck</snm><fnm>T</fnm></au><au><snm>Greshock</snm><fnm>J</fnm></au><au><snm>Mosse</snm><fnm>YP</fnm></au><au><snm>Naylor</snm><fnm>T</fnm></au><au><snm>Stoeckert</snm><fnm>CJ</fnm></au><au><snm>Weber</snm><fnm>BL</fnm></au><au><snm>Maris</snm><fnm>JM</fnm></au><au><snm>Grant</snm><fnm>GR</fnm></au></aug><source>Genome Res</source><pubdate>2006</pubdate><volume>16</volume><issue>9</issue><fpage>1149</fpage><lpage>1158</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.5076506</pubid><pubid idtype="pmcid">1557772</pubid><pubid idtype="pmpid" link="fulltext">16899652</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>CNAnova: a new approach for finding recurrent copy number abnormalities in cancer SNP microarray data</p></title><aug><au><snm>Ivakhno</snm><fnm>S</fnm></au><au><snm>Tavare</snm><fnm>S</fnm></au></aug><source>Bioinformatics</source><pubdate>2010</pubdate><volume>26</volume><issue>11</issue><fpage>1395</fpage><lpage>1402</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btq145</pubid><pubid idtype="pmpid" link="fulltext">20403815</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers</p></title><aug><au><snm>Mermel</snm><fnm>CH</fnm></au><au><snm>Schumacher</snm><fnm>SE</fnm></au><au><snm>Hill</snm><fnm>B</fnm></au><au><snm>Meyerson</snm><fnm>ML</fnm></au><au><snm>Beroukhim</snm><fnm>R</fnm></au><au><snm>Getz</snm><fnm>G</fnm></au></aug><source>Genome Biol</source><pubdate>2011</pubdate><volume>12</volume><issue>4</issue><fpage>R41</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2011-12-4-r41</pubid><pubid idtype="pmcid">3218867</pubid><pubid idtype="pmpid" link="fulltext">21527027</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data</p></title><aug><au><snm>Klijn</snm><fnm>C</fnm></au><au><snm>Holstege</snm><fnm>H</fnm></au><au><snm>de Ridder</snm><fnm>J</fnm></au><au><snm>Liu</snm><fnm>X</fnm></au><au><snm>Reinders</snm><fnm>M</fnm></au><au><snm>Jonkers</snm><fnm>J</fnm></au><au><snm>Wessels</snm><fnm>L</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><issue>2</issue><fpage>e13</fpage><xrefbib><pubidlist><pubid idtype="pmcid">2241875</pubid><pubid idtype="pmpid" link="fulltext">18187509</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data</p></title><aug><au><snm>Zhang</snm><fnm>Q</fnm></au><au><snm>Ding</snm><fnm>L</fnm></au><au><snm>Larson</snm><fnm>DE</fnm></au><au><snm>Koboldt</snm><fnm>DC</fnm></au><au><snm>McLellan</snm><fnm>MD</fnm></au><au><snm>Chen</snm><fnm>K</fnm></au><au><snm>Shi</snm><fnm>X</fnm></au><au><snm>Kraja</snm><fnm>A</fnm></au><au><snm>Mardis</snm><fnm>ER</fnm></au><au><snm>Wilson</snm><fnm>RK</fnm></au><etal/></aug><source>Bioinformatics</source><pubdate>2010</pubdate><volume>26</volume><issue>4</issue><fpage>464</fpage><lpage>469</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp708</pubid><pubid idtype="pmcid">2852218</pubid><pubid idtype="pmpid" link="fulltext">20031968</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>A double-layered mixture model for the joint analysis of DNA copy number and gene expression data</p></title><aug><au><snm>Choi</snm><fnm>H</fnm></au><au><snm>Qin</snm><fnm>ZS</fnm></au><au><snm>Ghosh</snm><fnm>D</fnm></au></aug><source>J Comput Biol</source><pubdate>2010</pubdate><volume>17</volume><issue>2</issue><fpage>121</fpage><lpage>137</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1089/cmb.2009.0019</pubid><pubid idtype="pmcid">3148827</pubid><pubid idtype="pmpid" link="fulltext">20170400</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>BACOM: in silico detection of genomic deletion types and correction of normal cell contamination in copy number data</p></title><aug><au><snm>Yu</snm><fnm>G</fnm></au><au><snm>Zhang</snm><fnm>B</fnm></au><au><snm>Bova</snm><fnm>GS</fnm></au><au><snm>Xu</snm><fnm>J</fnm></au><au><snm>Shih</snm><fnm>IM</fnm></au><au><snm>Wang</snm><fnm>Y</fnm></au></aug><source>Bioinformatics</source><pubdate>2011</pubdate><volume>27</volume><issue>11</issue><fpage>1473</fpage><lpage>1480</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btr183</pubid><pubid idtype="pmcid">3102226</pubid><pubid idtype="pmpid" link="fulltext">21498400</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Approximating the extreme right-hand tail probability for the distribution of the number of patterns in a sequence of multi-state trials</p></title><aug><au><snm>Fu</snm><fnm>JC</fnm></au><au><snm>Johnson</snm><fnm>BC</fnm></au><au><snm>Chang</snm><fnm>Y-M</fnm></au></aug><source>Journal of Statistical Planning and Inference</source><pubdate>2011</pubdate><volume>142</volume><issue>2</issue><fpage>473</fpage><lpage>480</lpage></bibl><bibl id="B20"><title><p>Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection</p></title><aug><au><snm>Li</snm><fnm>C</fnm></au><au><snm>Wong</snm><fnm>WH</fnm></au></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2001</pubdate><volume>98</volume><issue>1</issue><fpage>31</fpage><lpage>36</lpage><xrefbib><pubidlist><pubid idtype="pmcid">14539</pubid><pubid idtype="pmpid" link="fulltext">11134512</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data</p></title><aug><au><snm>Lin</snm><fnm>M</fnm></au><au><snm>Wei</snm><fnm>LJ</fnm></au><au><snm>Sellers</snm><fnm>WR</fnm></au><au><snm>Lieberfarb</snm><fnm>M</fnm></au><au><snm>Wong</snm><fnm>WH</fnm></au><au><snm>Li</snm><fnm>C</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><issue>8</issue><fpage>1233</fpage><lpage>1240</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth069</pubid><pubid idtype="pmpid" link="fulltext">14871870</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>A faster circular binary segmentation algorithm for the analysis of array CGH data</p></title><aug><au><snm>Venkatraman</snm><fnm>ES</fnm></au><au><snm>Olshen</snm><fnm>AB</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><issue>6</issue><fpage>657</fpage><lpage>663</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl646</pubid><pubid idtype="pmpid" link="fulltext">17234643</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Circular binary segmentation for the analysis of array-based DNA copy number data</p></title><aug><au><snm>Olshen</snm><fnm>AB</fnm></au><au><snm>Venkatraman</snm><fnm>ES</fnm></au><au><snm>Lucito</snm><fnm>R</fnm></au><au><snm>Wigler</snm><fnm>M</fnm></au></aug><source>Biostatistics</source><pubdate>2004</pubdate><volume>5</volume><issue>4</issue><fpage>557</fpage><lpage>572</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/biostatistics/kxh008</pubid><pubid idtype="pmpid" link="fulltext">15475419</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>A comparison study: applying segmentation to array CGH data for downstream analyses</p></title><aug><au><snm>Willenbrock</snm><fnm>H</fnm></au><au><snm>Fridlyand</snm><fnm>J</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><issue>22</issue><fpage>4084</fpage><lpage>4091</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti677</pubid><pubid idtype="pmpid" link="fulltext">16159913</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Comparative analysis of methods for detecting interacting loci</p></title><aug><au><snm>Chen</snm><fnm>L</fnm></au><au><snm>Yu</snm><fnm>G</fnm></au><au><snm>Langefeld</snm><fnm>CD</fnm></au><au><snm>Miller</snm><fnm>DJ</fnm></au><au><snm>Guy</snm><fnm>RT</fnm></au><au><snm>Raghuram</snm><fnm>J</fnm></au><au><snm>Yuan</snm><fnm>X</fnm></au><au><snm>Herrington</snm><fnm>DM</fnm></au><au><snm>Wang</snm><fnm>Y</fnm></au></aug><source>BMC Genomics</source><pubdate>2011</pubdate><volume>12</volume><fpage>344</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-12-344</pubid><pubid idtype="pmcid">3161015</pubid><pubid idtype="pmpid" link="fulltext">21729295</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Analysis of DNA copy number alterations in ovarian serous tumors identifies new molecular genetic changes in low-grade and high-grade carcinomas</p></title><aug><au><snm>Kuo</snm><fnm>KT</fnm></au><au><snm>Guan</snm><fnm>B</fnm></au><au><snm>Feng</snm><fnm>Y</fnm></au><au><snm>Mao</snm><fnm>TL</fnm></au><au><snm>Chen</snm><fnm>X</fnm></au><au><snm>Jinawath</snm><fnm>N</fnm></au><au><snm>Wang</snm><fnm>Y</fnm></au><au><snm>Kurman</snm><fnm>RJ</fnm></au><au><snm>Shih Ie</snm><fnm>M</fnm></au><au><snm>Wang</snm><fnm>TL</fnm></au></aug><source>Cancer Res</source><pubdate>2009</pubdate><volume>69</volume><issue>9</issue><fpage>4036</fpage><lpage>4042</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1158/0008-5472.CAN-08-3913</pubid><pubid idtype="pmcid">2782554</pubid><pubid idtype="pmpid" link="fulltext">19383911</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>DNA copy numbers profiles in affinity-purified ovarian clear cell carcinoma</p></title><aug><au><snm>Kuo</snm><fnm>KT</fnm></au><au><snm>Mao</snm><fnm>TL</fnm></au><au><snm>Chen</snm><fnm>X</fnm></au><au><snm>Feng</snm><fnm>Y</fnm></au><au><snm>Nakayama</snm><fnm>K</fnm></au><au><snm>Wang</snm><fnm>Y</fnm></au><au><snm>Glas</snm><fnm>R</fnm></au><au><snm>Ma</snm><fnm>MJ</fnm></au><au><snm>Kurman</snm><fnm>RJ</fnm></au><au><snm>Shih Ie</snm><fnm>M</fnm></au><etal/></aug><source>Clin Cancer Res</source><pubdate>2010</pubdate><volume>16</volume><issue>7</issue><fpage>1997</fpage><lpage>2008</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1158/1078-0432.CCR-09-2105</pubid><pubid idtype="pmcid">2848895</pubid><pubid idtype="pmpid" link="fulltext">20233889</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Analyzing DNA copy number changes using fused margin regression</p></title><aug><au><snm>Feng</snm><fnm>Y</fnm></au><au><snm>Yu</snm><fnm>G</fnm></au><au><snm>Wang</snm><fnm>T-L</fnm></au><au><snm>Shih</snm><fnm>I-M</fnm></au><au><snm>Wang</snm><fnm>Y</fnm></au></aug><source>Intl J of Functional Informatics and Personalized Medicine</source><pubdate>2010</pubdate><volume>3</volume><issue>1</issue><fpage>3</fpage><lpage>15</lpage><xrefbib><pubid idtype="doi">10.1504/IJFIPM.2010.033242</pubid></xrefbib></bibl><bibl id="B29"><title><p>The biology of ovarian cancer: new opportunities for translation</p></title><aug><au><snm>Bast</snm><fnm>RC</fnm></au><au><snm>Hennessy</snm><fnm>B</fnm></au><au><snm>Mills</snm><fnm>GB</fnm></au></aug><source>Nat Rev Cancer</source><pubdate>2009</pubdate><volume>9</volume><issue>6</issue><fpage>415</fpage><lpage>428</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrc2644</pubid><pubid idtype="pmcid">2814299</pubid><pubid idtype="pmpid" link="fulltext">19461667</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Integrated genomic analyses of ovarian carcinoma</p></title><source>Nature</source><pubdate>2011</pubdate><volume>474</volume><issue>7353</issue><fpage>609</fpage><lpage>615</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature10166</pubid><pubid idtype="pmcid">3163504</pubid><pubid idtype="pmpid" link="fulltext">21720365</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib</p></title><aug><au><snm>Lynch</snm><fnm>TJ</fnm></au><au><snm>Bell</snm><fnm>DW</fnm></au><au><snm>Sordella</snm><fnm>R</fnm></au><au><snm>Gurubhagavatula</snm><fnm>S</fnm></au><au><snm>Okimoto</snm><fnm>RA</fnm></au><au><snm>Brannigan</snm><fnm>BW</fnm></au><au><snm>Harris</snm><fnm>PL</fnm></au><au><snm>Haserlat</snm><fnm>SM</fnm></au><au><snm>Supko</snm><fnm>JG</fnm></au><au><snm>Haluska</snm><fnm>FG</fnm></au><etal/></aug><source>N Engl J Med</source><pubdate>2004</pubdate><volume>350</volume><issue>21</issue><fpage>2129</fpage><lpage>2139</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1056/NEJMoa040938</pubid><pubid idtype="pmpid" link="fulltext">15118073</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>High-resolution global genomic survey of 178 gliomas reveals novel regions of copy number alteration and allelic imbalances</p></title><aug><au><snm>Kotliarov</snm><fnm>Y</fnm></au><au><snm>Steed</snm><fnm>ME</fnm></au><au><snm>Christopher</snm><fnm>N</fnm></au><au><snm>Walling</snm><fnm>J</fnm></au><au><snm>Su</snm><fnm>Q</fnm></au><au><snm>Center</snm><fnm>A</fnm></au><au><snm>Heiss</snm><fnm>J</fnm></au><au><snm>Rosenblum</snm><fnm>M</fnm></au><au><snm>Mikkelsen</snm><fnm>T</fnm></au><au><snm>Zenklusen</snm><fnm>JC</fnm></au><etal/></aug><source>Cancer Res</source><pubdate>2006</pubdate><volume>66</volume><issue>19</issue><fpage>9428</fpage><lpage>9436</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1158/0008-5472.CAN-06-1691</pubid><pubid idtype="pmpid" link="fulltext">17018597</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><aug><au><snm>Westfall</snm><fnm>PH</fnm></au><au><snm>Young</snm><fnm>SS</fnm></au></aug><source>Resampling-based multiple testing : examples and methods for P-value adjustment</source><publisher>New York, Wiley</publisher><pubdate>1993</pubdate></bibl><bibl id="B34"><title><p>Multiple hypothesis testing</p></title><aug><au><snm>Shaffer</snm><fnm>JP</fnm></au></aug><source>Annu Rev Psychol</source><pubdate>1995</pubdate><volume>46</volume><fpage>24</fpage></bibl><bibl id="B35"><title><p>Cancer. Heterogeneity and tumor history</p></title><aug><au><snm>Shibata</snm><fnm>D</fnm></au></aug><source>Science</source><pubdate>2012</pubdate><volume>336</volume><issue>6079</issue><fpage>304</fpage><lpage>305</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1222361</pubid><pubid idtype="pmpid" link="fulltext">22517848</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>Intra-tumour heterogeneity: a looking glass for cancer?</p></title><aug><au><snm>Marusyk</snm><fnm>A</fnm></au><au><snm>Almendro</snm><fnm>V</fnm></au><au><snm>Polyak</snm><fnm>K</fnm></au></aug><source>Nat Rev Cancer</source><pubdate>2012</pubdate><volume>12</volume><issue>5</issue><fpage>323</fpage><lpage>334</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrc3261</pubid><pubid idtype="pmpid" link="fulltext">22513401</pubid></pubidlist></xrefbib></bibl></refgrp></bm></art>