
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
keywords:
ml
interpretable
transparent
explainable
Alzheimer’s Disease (AD), an irreversible neurodegeneration disease, affects over 55 million people globally. The understanding of key genes responsible for AD remains incomplete. This study presents the innovative Reverse-Gene-Finder technology, a ground-breaking neuron-to-gene-token backtracking approach in a neural network architecture to elucidate the novel causal genetic biomarkers and the underlying mechanisms driving AD. Leveraging recent advancements in pre-trained foundation models and large-scale genomic data, this approach identifies previously unknown genetic candidates contributing to AD onset. The novel Reverse-Gene-Finder technology comprises three key innovations. Firstly, we exploit the observation that genes with the highest probability of causing AD, defined as the most causal genes (MCGs), must have the highest probability of activating those neurons (in the neural network architecture) with the highest probability of causing AD, defined as the most causal neurons (MCNs). Secondly, we utilize a gene token representation at the input layer, where genes are represented by gene tokens, so that when we backtrack from the MCNs to the input layer, we can identify the Most Causal Tokens (MCTs) most likely to cause AD. Lastly, in contrast to the existing neural network architectures, which track neuron activations from the input layer to the output layer in a feed-forward manner, we develop an innovative backtracking method to track backwards from the MCNs to the input layer, identifying the MCTs and, consequently, the corresponding MCGs. Our Reverse-Gene-Finder approach proceeds in three stages. First, we fine-tune a Gene-former model specifically for AD classification, utilizing single-cell gene expression data from AD patients and healthy control subjects. Second, by modifying the input data by masking out known genes strongly associated with AD, we systematically identify MCNs related to AD, employing causal tracing techniques to observe the effects of in-silico perturbations of such known genes on neurons across different layers of the fine-tuned Gene-former. Finally, our Reverse-Gene-Finder backtracking method enables the identification of MCTs (and the MCGs they represent) most likely to activate MCNs. By utilizing gene tokens instead of genes as inputs, our approach ingeniously discovers previously unknown genetic candidates significantly contributing to AD, offering fresh insights into the genetic mechanisms underlying the disease. The groundbreaking Reverse-Gene-Finder technology offers a highly interpretable, generalizable, and adaptable framework, providing a promising avenue for application in other disease scenarios to uncover novel causal genetic biomarkers and underlying disease mechanisms.