Neural nets analyze gene expression

Computer scientists at Carnegie Mellon University have used a common deep learning method used for facial recognition and other image-based applications, and applied it to learn more about gene relationships. The scientists represented large quantities of gene expression data in a graphical form that can be analyzed by convolutional neural networks (CNNs). CNNs are excellent at analyzing visual imagery, so they can be used to make inferences about interactions between genes. The CNNs developed by the researchers are better for gene analysis than any made thus far.

A report concerning how CNNs can aid in the identification of genes related to genes and genetic and developmental pathways that drugs might target was published on Dec. 10, 2019 in The Proceedings of the National Science. Ziv Bar-Joseph, a professor of machine learning and computational biology at Carnegie Mellon, explained the applications beyond gene interactions for the new method, called CNNC. Bar-Joseph and Ye Yuan, a postdoctoral researcher in Carnegie Mellon’s Machine Learning Department, co-authored the paper.

“CNNs, which were developed a decade ago, are revolutionary. I'm still in awe of Google Photos, which uses them for facial recognition,” Bar-Joseph said in a press release. “We sometimes take this technology for granted because we use it all the time. But it's incredibly powerful and is not restricted to images. It's all a matter of how you represent your data.”

To fully comprehend human development or diseases, researchers must find out how genes work with one another in networks and complexes, as all 20,000 human genes function interconnectedly. These relationships can be inferred by observing gene expression, which represents gene activity levels in cells. For the most part, according to Yuan, if two genes are both active at the same time, they are interacting. However, it could be a mere coincidence or both could be activated by another gene.

Yuan and Bar-Joseph trained CNNs on single-cell expression data to analyze gene relationships. Single-cell expression data are taken from experiments that can discern the activity level of all genes in a given cell. The data from hundreds of thousands of these single-cell analyses were organized into a matrix or histogram. In this matrix, every cell serves as a representation of a different level of co-expression of some pair of genes. By representing the data in this manner, it more closely resembles an image, which CNNs can analyze better. The researchers used data from genes with already-established interactions to train the CNNs to detect the genes that were interacting and those that were not on the basis of the matrix’s visual patterns.

“It's very, very hard to distinguish between causality and correlation,” Yuan commented. While the CNNC method’s effectiveness is limited by the quantity of gene expression data available, it proved statistically more accurate than existing methods. Yuan and Bar-Joseph expect CNNC will be one of several techniques that researchers will eventually deploy in analyzing large datasets. In the future, we could see CNNCs used to discern causality in all sorts of phenomena, from social networking to financial data.