Hope in the Dark: How AI is Unlocking New Possibilities for Rare Disease Patients

Rare diseases affect a significant portion of the population, with over 7000 rare diseases collectively affecting more than 400 million people worldwide. The majority of these diseases fall under the category of neurodevelopmental disorders (NDDs). Patients with rare diseases face numerous challenges, including misdiagnosis, delayed diagnosis, and a lack of available treatment options. These challenges are exacerbated by the limited data and information available about these diseases. The growing availability of open biomedical data has opened up opportunities for machine learning methods to aid in the retrieval of valuable and insightful information. However, this information poses its own set of challenges, including data representation and accessibility for healthcare professionals and disease or parent communities. In this post, we discuss the benefits of the digitalisation of the healthcare sector in addressing challenges such as these. The usage of AI-based technologies provides a promising approach for dealing with the vast amounts of biomedical data available, while also addressing the challenges posed by rare disease data. By utilising machine learning and interactive data visualisation, this framework enables the prioritisation and exploration of rare disease-related documents. It also provides healthcare professionals with a valuable tool for enhancing their understanding of rare diseases, which can lead to improved diagnosis and treatment options. This post highlights the potential of machine learning in aiding in the analysis and understanding of rare diseases and their associated biomedical data.

rare diseases

Machine learning-based text mining technologies present a unique opportunity to overcome these obstacles by providing more comprehensive and accessible information about rare diseases. One significant barrier to accessing information is language, which can be a significant challenge for disease-driven communities worldwide. However, text mining technologies can overcome this language barrier by providing information retrieval and interaction in various languages. The AI framework that we have developed in the course of several European Commission projects, is a good representation of such an effort, allowing to explore best practices across worldwide news and published science, being easily available to anyone capable of actively using the well-established PubMed search engine. It utilises the MEDLINE dataset and its controlled vocabulary, the MeSH Headings, to annotate and analyse rare disease-related documents. By automatically annotating text documents with MeSH Headings and extending their metadata, this system enables the exploration of rare disease data from various sources such as news and scientific publications through interactive data visualisation. 

To assess the first results and identify the bottlenecks and challenges ahead, we focus on 16 representative rare monogenic neurodevelopmental disorders. The literature review is usually a demanding first stage in healthcare research, mostly due to the fast pace of scientific publication (one of the main takeaways on the usefulness of our work discussed within the stakeholders of the MIDAS project on Big Data Health, earlier discussed in this post). We have used the labelled data to learn text mining algorithms and built an automated MeSH classifier that is capable of assigning those medical categories to any snippets of text, from health records to posts in disease community discussion forums. We expect that this contribution empowers the retrieval of best practices and insights from past publications and experiences, enriching the common body of knowledge on the targeted rare diseases. 

This work represents many different challenges than in the study of more prevalent diseases such as cancer or diabetes. A preliminary analysis, to be published next month, shows that most of the targeted 16 diseases have low coverage of published news articles or scientific articles, with the most exposed being the Prader-Willi syndrome, the Rett syndrome and the Fragile-X syndrome, each with around a thousand news articles in 2022, and 2500 scientific articles annotated in MEDLINE until 2022. The early results of this study will be presented now in June at the Kleefstra Syndrome Conference, which is organised by the NGO iDefine Europe focusing on these topics from the parenting community perspective, with a remarkable interaction with the healthcare research community. I will also be presenting this work at the workshop on Semantics-Powered Data Mining and Analytics, colocated with the #AIMed2023, the major research conference on AI and Medicine. 

rare diseases dashboard
Figure - Literature review on rare diseases across languages in worldwide news, published science and accepted patents

In recent years, AI and machine learning technologies have been applied to yet another research direction: rare disease diagnosis, primarily through imaging-based approaches. However, the volume and heterogeneity of biomedical publications remain a significant obstacle to utilising AI in the diagnosis and treatment of rare diseases. By employing text mining technologies, these obstacles can be overcome, providing patients and healthcare providers with more comprehensive and accessible information. Our approach can also facilitate the development of a real-time recommendation system focusing on the challenges in rare diseases, with a significant amount of historical data about target diseases ingested from publications and news, health records and guidelines, but also crowdsourced by patient communities and related non-governmental organisations.

While the healthcare domain has traditionally been slow to adopt new technologies, recent successes in other industries with AI have paved the way for innovation in healthcare. Evidence-based decision-making is critical to healthcare, and AI technologies can help by providing more comprehensive and accurate information about rare diseases. With small cohorts and limited data, AI can make a significant impact on the diagnosis and treatment of rare diseases. Overall, the use of machine learning-based text mining technologies presents a promising opportunity for the healthcare industry to improve the diagnosis and treatment of rare diseases. By overcoming language barriers and accessing more comprehensive information, AI can help bridge the gap between rare disease patients and the limited resources available to them. This is definitely a battle worth fighting for, don’t you think? Please provide us with your comments here, and share your opinion on how other AI-related developments could help overcome the many obstacles of rare diseases communities.