Accelerating the discovery of disease mechanisms through deep learning and high-dimensional data analysis
Synopsis
The most critical next step to push the frontier of disease understanding is the discovery of the underlying mechanisms causing the large number of symptoms, sea of biomarkers and millions of damaged variable combinations of perturbation across individual patients. For most of these 6,000 diseases, medical scientists and practitioners have very limited knowledge and hypotheses regarding the mechanism and pathways involved. Studies to explore the disease mechanisms are usually conducted independently for a single disease or only consider a few diseases. Deep learning models and high-dimensional data analysis for integrating the huge amount of multi-omics, clinical, life habits and other possible heterogeneous data have drawn attention to accelerate the discovery of disease mechanisms regarding multiple diseases.
Research in this area will be critical for progress given the complexity and variability of diseases. Diseases are usually brought about by the variable perturbations between different types of diseases including genetic, environmental, lifestyle, etc., which causes changes of biochemical reactions and pathways perturbation, and further generates the symptoms and biomarkers perturbation. Another reason is that a large number of diseases are complex diseases caused by different factors. They may cause alike damages to identical organs and cells, or generate symptoms and biomarkers, and vice versa manifold on the elementary damage perturbation. From medical analysis, for most diseases, only the disease label and a limited amount of simple treatment of a single disease experiment are provided, which is far from enough to infer the mechanism. Understanding the mechanism is still an important yet challenging task preventing the breakthrough cure on those medically obscure diseases. High-dimensional data has become very common in biological research, which records the behavior of biological systems under different conditions or after some operation. A critical requirement is to infer the underlying biological mechanisms or models explaining the observed behavior. A large number of methodologies have been proposed to analyze the high-dimensional data and infer its model. Most existing methods mainly focus on the low-dimensional parsimonious differential equation model, such as ordinary differential equations, stochastic differential equations, Gaussian processes, etc. These modes are rigid to represent most phenomena of the biological systems. With the rapidly growing data, these models cannot satisfactorily explain the complex non-linear behavior of biological systems. The research goal is to propose new more flexible and powerful models and methodologies to automatically recover the underlying mechanisms of the high-dimensional data with a little prior knowledge of the system and data. A crucial role for improving human health welfare is constantly a hot topic of interdisciplinary collaboration, stimulating an enormous amount of attention and research.