New statistical tool will improve understanding of cancer genetics

July 12, 2013
An article published in the July 8 Proceedings of the National Academy of Sciences (PNAS) describes the development of a new data-mining tool that will improve researchers’ understanding of cancer genetics.
Guanhua Chen

Guanhua Chen
Dr. Michael Kosorok

Dr. Michael Kosorok

The paper, “Biclustering with heterogeneous variance,” is co-authored by Guanhua Chen, biostatistics doctoral student in the Gillings School of Global Public Health; Patrick Sullivan, MD, Ray M. Hayworth and Family Distinguished Professor of psychiatry, professor of genetics, and adjunct professor of epidemiology in The University of North Carolina at Chapel Hill School of Medicine and Gillings School of Global Public Health; and Michael Kosorok, PhD, W.R. Kenan Jr. Distinguished Professor and chair of biostatistics and professor of statistics and operations research.

The diagnosis and treatment of disease is improved by categorizing patients into subtypes based on a disease’s etiology and types of therapy to which it responds. This is particularly true of cancer, which in reality is composed of several diseases.One way to group patients is by clustering, which categorizes subgroups of individuals with similar genetic profiles. Clustering is a means of grouping such that the objects in one cluster have more similarity to each other than to objects in other groups.

Current clustering methods have limitations, however, as the process does not account for the fact that people or disease characteristics may display differing magnitudes of volatility in the way genes decode, or express, genetic information. This “heterogeneity of variance,” if not accounted for, can lead to inaccurate cluster sets and result in incorrect research results.

Chen and colleagues developed and implemented a statistical framework that captures both mean and variance structures in genetic data. The resulting data-mining tool, which the researchers applied both to synthetic (simulated) data and to two cancer data sets, identifies for the first time certain genes and cancer types that express hypervariability of DNA methylation levels and detects clearer subgroup patterns in lung cancer. DNA methylation is a process in which methyl groups are added to certain DNA nucleotides in order to maintain healthy cell life.

“Not only is this work important scientifically,” Kosorok said, “but it is also significant that a paper appearing in a top-tier general science journal such as PNAS has a student as first author.”

The software used in the study is available online.


Gillings School of Global Public Health contact: David Pesci, director of communications, (919) 962-2600 or