New statistical tool will improve understanding of cancer genetics
One way to group patients is by clustering, which categorizes subgroups of individuals with similar genetic profiles. Clustering is a means of grouping such that the objects in one cluster have more similarity to each other than to objects in other groups.
Current clustering methods have limitations, however, as the process does not account for the fact that people or disease characteristics may display differing magnitudes of volatility in the way genes decode, or express, genetic information. This “heterogeneity of variance,” if not accounted for, can lead to inaccurate cluster sets and result in incorrect research results.
Chen and colleagues developed and implemented a statistical framework that captures both mean and variance structures in genetic data. The resulting data-mining tool, which the researchers applied both to synthetic (simulated) data and to two cancer data sets, identifies for the first time certain genes and cancer types that express hypervariability of DNA methylation levels and detects clearer subgroup patterns in lung cancer. DNA methylation is a process in which methyl groups are added to certain DNA nucleotides in order to maintain healthy cell life.
“Not only is this work important scientifically,” Kosorok said, “but it is also significant that a paper appearing in a top-tier general science journal such as PNAS has a student as first author.”
The software used in the study is available online.