Scientists train a computer to classify breast cancer tumors
November 30, 2018
In a study published in the journal NPJ Breast Cancer, researchers from the University of North Carolina at Chapel Hill reported using a form of artificial intelligence, called machine learning, or deep learning, to train a computer to identify certain features of breast cancer tumors from images.
The computer also could identify the tumor type based on complex molecular and genomic features, a feat that pathologists cannot yet do from a picture alone.
The researchers believe this approach, while still in its early stages, eventually could lead to cost savings for the clinic and in breast cancer research.
Melissa Troester, PhD, professor of epidemiology at the UNC Gillings School of Global Public Health and member of the UNC Lineberger Comprehensive Cancer Center, and James Stephen Marron, PhD, professor of biostatistics at the UNC Gillings School and Amos Hawley Distinguished Professor of statistics and operations research in the College of Arts and Sciences, are co-authors of the study.
“Your smartphone can interpret your speech, and find and identify faces in a photo,” said the study’s first author Heather D. Couture, a graduate research assistant in the UNC-Chapel Hill Department of Computer Science. “We’re using similar technology by which we capture abstract properties in images, but we’re applying it to a totally different problem.”
For the study, the researchers used a set of 571 images of breast cancer tumors from the Carolina Breast Cancer Study to train the computer to classify tumors for grade, estrogen receptor status, PAM50 intrinsic subtype, histologic subtype and risk of recurrence score. To facilitate this, they created software that learned how to predict labels from images using a training set, so that new images could be processed similarly.
They then used a different set of 288 images to test the computer’s ability to distinguish features of the tumor on its own, comparing the computer’s responses to findings by a pathologist for each tumor’s grade and subtype, and to separate tests for gene expression subtypes.
The computer was able to distinguish low-intermediate versus high-grade tumors 82 percent of the time, they found. When two pathologists reviewed the tumor grade for the low-intermediate grade group, the pathologists agreed with each other about 89 percent of the time, which was slightly higher than the computer’s accuracy.
In addition, the computer, with high levels of accuracy, identified estrogen receptor status, distinguished between ductal and lobular tumors, and determined whether each case had a high or low risk of recurrence. The computer also identified one of the molecular subtypes of breast cancers – the basal-like subtype, which is based on how genes within the tumor were expressed – with 77 percent accuracy.
“Using artificial intelligence, or machine learning, we were able to do a number of things that pathologists can do at a similar accuracy, but we were also able to do a thing or two that pathologists are not able to do today,” said UNC Lineberger’s Charles M. Perou, PhD, the May Goldman Shaw Distinguished Professor of Molecular Oncology, and professor of genetics and of pathology and laboratory medicine in the UNC School of Medicine. “This has a long way to go in terms of validation, but I think the accuracy is only going to get better as we acquire more images with which to train the computer.”
The computer’s ability to identify the basal-like subtype was exciting to researchers and could have applications in cancer research. The researchers also believe the technology could help to validate pathologists’ findings and have applications in communities that do not have pathology resources.
“We were surprised that the computer was able to get a fairly high accuracy in estimating biomarker risk just from looking at the pictures,” said Troester. “We spend thousands of dollars measuring those biomarkers using molecular tools, and this new method can take the image and get 80 percent accuracy or better at estimating the tumor phenotype or subtype. That was quite amazing.”
Couture said deep learning technology has been used in a range of applications, including speech recognition and autonomous vehicles.
“Humans can look at one or two examples of something and be able to generalize when they see other objects,” Couture said. “For example, chairs come in so many different forms, but we can recognize it as something we sit on. Computers have a much harder time generalizing from small amounts of data. But on other hand, if it you provide enough labeled data, they can learn concepts that are much more complex than humans can assess visually – such as identifying the basal-like subtype from an image alone.”
The unique aspect of their work, researchers said, was the ability to use the technology to see features of the tumors that humans cannot. The team members want to determine what the computer is seeing and study whether the technology could predict outcomes.
“The computer extracted a lot of information from the images,” Troester said. “We would like to test how well these features predict outcomes, and whether we can use these features together with things such as molecular data to give patients a more precise view of their disease course and which treatments might be most effective.”
A version of this article was posted first on the UNC Lineberger Comprehensive Cancer Center website.
Contact the Gillings School of Global Public Health communications team at sphcomm@listserv.unc.edu.