Novel biostatistics methodology yields more efficient design and analysis of genomic studies

July 17, 2013
Gillings School of Global Public Health biostatisticians have developed a novel approach to analyzing genetic traits in large cohorts. The technique allows researchers to correctly evaluate the associations between genetic variants and disease traits when only the subjects with the highest or lowest trait values were selected in a sequencing study, thereby leading to improved genetic understanding and treatment of disease.

Dr. Danyu Lin

Dr. Danyu Lin
Dr. Donglin Zeng

Dr. Donglin Zeng

Co-authors are Danyu Lin, PhD, Dennis Gillings Distinguished Professor; Donglin Zeng, PhD, professor; and Zhengzheng Tang, doctoral student, all in the Gillings School’s Department of Biostatistics.

Their research, “Quantitative trait analysis in sequencing studies under trait-dependent sampling,” was published online July 11 in the Proceedings of the National Academy of Sciences (PNAS).

When examining genetic traits in a large cohort, it is not economically feasible to sequence all cohort members. A cost-effective strategy is to select only the subjects who have extreme trait values. Not accounting for such “trait-dependent” sampling in the association analysis would substantially increase false-positive results and reduce true-positive results. Lin and colleagues developed valid and efficient methods for such analysis.

The authors applied their methodology to data from the National Heart, Lung and Blood Institute’s Exome Sequencing Project (ESP), a signature initiative of the National Institutes of Health’s use of funds from the American Recovery and Reinvestment Act.
The NHLBI ESP was designed to identify genetic variants in all protein-coding regions of the human genome that are associated with heart, lung and blood diseases. Subjects were drawn from multiple large cohorts, including the Atherosclerosis Risk in Communities study, Cardiovascular Health Study, Women’s Health Initiative, Framingham Heart Study and Jackson Heart Study. From approximately 150,000 cohort members, investigators selected about 3,000 subjects with the lowest or highest values of body mass index, LDL cholesterol level or blood pressure.
A genome is the portion of a gene that codes information. An exome, which makes up about one percent of the total genome, is thought to contain about 85 percent of the mutations that cause disease.
The study is available on the PNAS website.


Gillings School of Global Public Health contact: David Pesci, director of communications, (919) 962-2600 or