Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan;26(1):64-70.
doi: 10.1093/annonc/mdu479. Epub 2014 Oct 15.

Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data

Affiliations

Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data

F Favero et al. Ann Oncol. 2015 Jan.

Abstract

Background: Exome or whole-genome deep sequencing of tumor DNA along with paired normal DNA can potentially provide a detailed picture of the somatic mutations that characterize the tumor. However, analysis of such sequence data can be complicated by the presence of normal cells in the tumor specimen, by intratumor heterogeneity, and by the sheer size of the raw data. In particular, determination of copy number variations from exome sequencing data alone has proven difficult; thus, single nucleotide polymorphism (SNP) arrays have often been used for this task. Recently, algorithms to estimate absolute, but not allele-specific, copy number profiles from tumor sequencing data have been described.

Materials and methods: We developed Sequenza, a software package that uses paired tumor-normal DNA sequencing data to estimate tumor cellularity and ploidy, and to calculate allele-specific copy number profiles and mutation profiles. We applied Sequenza, as well as two previously published algorithms, to exome sequence data from 30 tumors from The Cancer Genome Atlas. We assessed the performance of these algorithms by comparing their results with those generated using matched SNP arrays and processed by the allele-specific copy number analysis of tumors (ASCAT) algorithm.

Results: Comparison between Sequenza/exome and SNP/ASCAT revealed strong correlation in cellularity (Pearson's r = 0.90) and ploidy estimates (r = 0.42, or r = 0.94 after manual inspecting alternative solutions). This performance was noticeably superior to previously published algorithms. In addition, in artificial data simulating normal-tumor admixtures, Sequenza detected the correct ploidy in samples with tumor content as low as 30%.

Conclusions: The agreement between Sequenza and SNP array-based copy number profiles suggests that exome sequencing alone is sufficient not only for identifying small scale mutations but also for estimating cellularity and inferring DNA copy number aberrations.

Keywords: cancer genomics; copy number alterations; mutations; next-generation sequencing; software.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Representative output of the Sequenza algorithm. Exome sequencing data from an ovarian tumor (TCGA-42-2591-01A) and matched normal (TCGA-42-2591-10A) specimen were applied to Sequenza. (A) The log posterior probability (LPP) of the observed data were calculated for a range of candidate ploidy and cellularity values. The point estimate is the ploidy and cellularity with maximum LPP. The 95% confidence region is the smallest (not necessarily contiguous) set of points with a total posterior probability >0.95. The background color indicates the rank of the LPP (blue = most likely, white = least likely), provided here to contrast other possible parameters that are very unlikely under our model but might still be of interest. Local maxima are indicated with a ‘+’ and indicate possible alternative solutions. (B) Observed depth ratio and BAF values for each genomic segment (black circles and dots) along with the representative joint LPP density (colors). The representative joint LPP density is calculated for the cellularity and ploidy estimates identified in (A), and is calculated for a hypothetical representative 10 Mb segment. The actual joint LPP density is dependent on segment size and variability and thus varies quantitatively but not qualitatively for each segment. Observed segments with highly unlikely DR and BAF values may indicate subclonality, measurement errors, or incorrect model parameters. (C) Chromosome plot indicating mutant allele frequency (top panel), B allele frequency (middle panel), and depth ratio (bottom panel) according to genomic position. Here, chromosome 1 is shown. The mutant allele frequency at a given position is the fraction of reads with a mutation, and is displayed if >0.1 for each genomic position with sufficient sequencing depth. For the sake of visualization, the B allele frequency and depth ratio are summarized within 1 Mb windows staggered every 0.5 Mb. Within each window, a thick black line indicates the median value, and a blue bar indicates the interquartile range. Red lines indicate segmented values. The thin dotted lines indicate the expectation values under the fitted model; their placement is based on the estimated cellularity, ploidy, and copy number profile. In the top panel, the dotted lines indicate the number of alleles with mutation, with the lowest line starting at one. In the middle panel, the dotted lines indicate the minor allele copy number, with the lowest line starting at zero. In the lower panel, the dotted lines indicate the copy number.
Figure 2.
Figure 2.
Comparison of cellularity and ploidy estimates and copy number profiles derived from exome sequence to those derived from SNP array and testing on simulated data. (A–C) Matched tumor-normal exome sequencing and SNP array data from 10 ovarian cancer patients and 20 renal cell carcinoma patients were obtained from TCGA. Exome data was analyzed with Sequenza, and SNP array data were analyzed with ASCAT. (A) Ploidy and (B) cellularity estimates were compared between the two platforms. (C) Copy number profiles were compared by calculating the absolute difference in estimated copy number for each genomic position (ΔCN). The figure indicates the fraction of the covered genome with each level of ΔCN. Asterisks indicate tumors for which the Sequenza cellularity estimate is lower than 0.4. (D and E) Sequenza (D) ploidy and (E) cellularity estimates from simulated whole-genome sequencing with varying cellularity for cell lines HCC1954 and HCC1143. Vertical lines indicate 95% confidence intervals on the estimates. Dashed horizontal lines indicate ploidy estimates of the same cell lines by SNP array in an independent study [4].

Comment in

Similar articles

Cited by

References

    1. Hudson TJ, Anderson W, Artez A, et al. International network of cancer genome projects. Nature. 2010;464(7291):993–998. - PMC - PubMed
    1. Popova T, Manié E, Stoppa-Lyonnet D, et al. Genome Alteration Print (GAP): a tool to visualize and mine complex cancer genomic profiles obtained by SNP arrays. Genome Biol. 2009;10(11):R128. - PMC - PubMed
    1. Van Loo P, Nordgard SH, Lingjærde OC, et al. Allele-specific copy number analysis of tumors. Proc Natl Acad Sci U S A. 2010;107(39):16910–5. - PMC - PubMed
    1. Carter SL, Cibulskis K, Helman E, et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol. 2012;30(5):413–421. - PMC - PubMed
    1. Lamy P, Andersen CL, Dyrskjot L, et al. A Hidden Markov Model to estimate population mixture and allelic copy-numbers in cancers using Affymetrix SNP arrays. BMC Bioinformatics. 2007;8:434. - PMC - PubMed

Publication types

  NODES
INTERN 1
Note 1
Project 1
twitter 2