MIT Solve

Solution Overview

Solution Name:

Congenica

One-line solution summary:

Delivery of effective personalized rare disease therapies in an equitable manner to underrepresented communities

Pitch your solution.

Rare disease is recognized globally as a public-health priority due to the difficulty of navigating through the diagnostic odyssey which can be lengthy and costly for many patients, though for many underrepresented minority individuals, this odyssey is either more arduous or never initiated.

Congenica’s scalable end-to-end software has been globally used to analyse and interpretate next-generation sequencing data allowing to significantly reduce the diagnostic odyssey for patients affected by rare diseases.

If properly supported, we can provide a free-of-charge diagnostic solution for thousands of people from disadvantaged groups affected by rare conditions. The newly collected data will also play a key role in identifying ancestry bias in our A.I.-engine and generating explanatory DNA variant lists that minimize the bias towards underrepresented populations.

This will create a virtuous data system that can increase the diagnostic rate across ethnic minorities and mitigate the injustice in healthcare access for disadvantaged groups.

Film your elevator pitch.

What specific problem are you solving?

Rare diseases affect more than 400 million people globally, and although scientists are sequencing millions of human genomes in the hope of revolutionizing healthcare, most participants in these studies are of European descent. It is well-known that there is significant ethnic-specific variation in rare disease genes and the strong bias in the data will lead to missed diseases in underrepresented ethnicities.

As an example:

300,000 infants are born with sickle cell anemia annually with higher prevalence in Africans, Hispanics, and Native Americans than in Caucasians
Keloid formation has been linked to genes predominant within African ancestries with an incidence of 6%-16% in this population
Familial hypercholesterolemia affects 1 in 250 people and it differs across ethnic/racial groups

However, even for rare diseases that occur across populations, there are still significant disparities. Ethnic minorities are likely to experience important barriers to screening, diagnosis, and treatment due to a variety of cultural and socioeconomic factors.

With this grant we would like to provide free access to our platform for the analysis and diagnosis of thousands of ethnic minority patients. In doing so we will collect new data that will help us improve our diagnostic models considering the ancestry heterogeneity of rare diseases.

What is your solution?

Congenica’s unique heritage from the Wellcome Trust Sanger Institute and the Great Ormond Street Hospital has played a key role in developing our pioneering clinical decision support platform for rare diseases. More than 150 institutes around the world are using our platform and we have been selected as exclusive partner of the UK NHS Genomic Medicine Service. We have recently developed and validated an innovative interpretable machine learning framework that can significantly increase the diagnostic yield.

Our solution aims at selecting a group of institutes around the world which will be able to use our platform free-of-charge for the diagnosis of ethnic minority patients. Candidate institutes will be selected by our Patient Advocacy and Engagement Advisory Board with the help of the International Rare Disease Research Consortium (IRDiRC). These institutes will support the advancement of rare diseases diagnosis in underrepresented populations by collecting new data from these minorities. These data will be used to identify and tackle any ancestry bias in our A.I.-engine, validate/improve our existing clinical data-packs (disease-specific variants’ lists) and develop new ones for specific diseases of interest. These data-packs will be continuously refined to improve the diagnosis and treatment of rare diseases across all ancestries.

Who does your solution serve, and in what ways will the solution impact their lives?

There are more than 7,000 recognised rare diseases affecting 400 million people, that is, the 6-8% of the world population. Most rare diseases (~80%) have genetic associations and are often severely debilitating, impairing physical and mental abilities and shortening life expectancy.

Despite advances in rare disease diagnosis, the benefits are inequitable across diverse populations. Rare variants tend to be population-specific and too often studies exclude ethnic minority samples to produce diagnostic models that are fair across all ancestries. For example, less than 3% of the participants in published, genome-wide association studies are of African or Hispanic or Latin American ancestries, and 86% of clinical trial participants are white.

The lack of diversity in research and clinical studies is not the only problem. For underserved rare disease communities, individuals can face important barriers and obstacles to care often before they even enter the healthcare system. For example, there are an estimated 370-500 million Indigenous people that due to linguistic and socio-economic barriers are not able to access to diagnostic applications [1]. In addition, diverse communities are more likely to have delays and greater chances of hospitalization from preventable conditions. It has been estimated that in the US, if non-Hispanic blacks had had the same rate of preventable hospitalizations as non-Hispanic whites, they would have had approximately 430,000 fewer hospitalizations [2].

Congenica is a member of the International Rare Disease Research Consortium (IRDiRC) and we have always considered the problem of providing access to timely and correct diagnosis to all. For this we are actively involved in initiatives to understand how the differences determined by culture, religion or ethnicity impact differences in accessing genome testing around the world. An example is the work that we have been carrying out with the International Cerebral Palsy Genomic Consortium (IGPGC) in the Aboriginal community, but we are also looking at this in Malaysia, South Africa, India, and Muslim communities. We will expand this into as many countries and communities as we can.

The proposed solution has great potential: it can significantly improve the diagnostic rate among ethnic minorities and in doing so will make genomic testing more acceptable in these underrepresented communities. In addition, it will provide us with an invaluable amount of information and data to further develop our A.I. frameworks, by detecting significant differences, not only in disease patterns between different ethnic groups, but also in their responses to therapy via validated clinical data-packs.

Which dimension of the Challenge does your solution most closely address?

Address the unjust and disproportionate burden of rare diseases faced by disinvested communities and historically underrepresented identity groups

Explain how the problem you are addressing, the solution you have designed, and the population you are serving align with the Challenge.

Our solution aims at using Congenica’s unique capabilities to address the important problem of inequity in healthcare access for minority groups, with a focus on rare diseases. In the short term, we will provide free-of-charge access to our pioneering platform, which will help clinicians in determining the accurate diagnosis for thousands of ethnic minority patients affected by rare conditions. In the mid-long term, we want to create validated lists of rare disease-specific variants and their clinical evidence (clinical data-packs) to provide effective personalized treatments regardless of the social and racial background of the patients.

In what city, town, or region is your solution team headquartered?

Cambridge, UK

What is your solution’s stage of development?

Scale: A sustainable enterprise working in several communities or countries that is looking to scale significantly, focusing on increased efficiency.

Explain why you selected this stage of development for your solution.

Congenica provides the world’s leading software for the analysis and interpretation of next-generation sequencing data with our Patient Advocacy and Engagement initiative being at the core of the company to ensure that all our products serve patients as best they can. Our pioneering scalable end-to-end platform is currently used across more than 150 institutes and national programs spread across 5 continents and has led to the publication of more than 30 articles in peer-reviewed journals.

Who is the Team Lead for your solution?

Sandro Morganella

More About Your Solution

If your solution has a website or an app, provide the links here:

https://www.congenica.com/

If you have additional video content that explains your solution, provide a YouTube or Vimeo link here:

Which of the following categories best describes your solution?

A new application of an existing technology

What makes your solution innovative?

Disease-specific variants’ lists (clinical data-packs) can be used for the identification of personalized therapies. However, these lists of variants are currently generated using time-consuming static approaches that are constrained by the number of known pathogenic variants extracted from repositories that are largely derived from individuals of European ancestry.

Our innovative solution overcomes these limitations by using a data self-sufficient approach that is scalable, dynamic, and emphasises the heterogeneity of rare diseases across different ethnicities.

Our solution implements a virtuous feedback loop where the step of data collection and diagnosis of ethnic minority patients is followed by a reinforcement step aimed at using these new data to improve our existing A.I.-engine and data-packs. These new diagnostic models will replace the old ones into our platform and will be used for both future diagnoses and reanalysis of previously undiagnosed cases.

At the core of our reinforcement step there is our pioneering, high-scalable, and fully interpretable machine learning framework that:

Is trained on our unique and curated dataset spanning more than 50 rare disease areas and addresses the problem of ethnicity bias via ad-hoc features
Can accurately predict the pathogenicity of rare variants that may be previously unseen in patients (including variants of uncertain significance which represents a major clinical problem for decision-making), removing the bottleneck of providing predictions only for known variants
Can generate predictions for thousands of patients in a few hours, which when fully integrated with our end-to-end platform allows the generation of clinical data-packs

Describe the core technology that powers your solution.

Our solution is supported by two different technologies: our end-to-end platform for the analysis and interpretation of NGS data and our new A.I.-engine for the characterization of causative rare variants.

Provide evidence that this technology works. Please cite your sources.

Our end-to-end platform is currently used across more than 150 institutes and national programs around the world and has led to the publication of several articles [1]. One exclusive collaboration that is worth mentioning is with Genomics England. Genomics England has been successful in establishing the world’s first national health service to offer whole genome sequencing, the UK NHS Genomic Medicine Service. Congenica software has enabled Genomics England to achieve 50% increase in diagnostic yield and 95% reduction in interpretation times delivering over 2,700 whole genomes per week [2]. Here is a video with a demo of our product.

We have successfully validated our pioneering A.I. solution for the characterization of rare diagnostic variants on both internal and external cohorts [unpublished data]. We showed that we could predict the pathogenicity of rare variants with 96% accuracy, and we were able to correctly reclassify 92.8% of variants that were previously considered VUS by our clinical team. We retrospectively assessed our ability to detect the diagnostic variants in 73 cases, showing that in 79% of cases the causative variant was in the top 3. We have also analyzed 126 cases from two external institutes showing that in 62% of cases the diagnostic variant was in the top 3. Our external validation cohort included Hispanic/Latino cases which are the largest racial/ethnic minority populations in the US and often lack access to healthcare and have poor health outcomes. Comparison between different ethnic groups showed that our model provided fair predictions.

Our new Congenica A.I.-powered framework has been used to generate a data-pack for familial hypercholesterolemia, which consists of clinically reviewed data, gene/disease relationships, variant/disease relationships, and other key evidence including ACMG criteria and the associated publications. We are currently validating this data-pack via expert clinical review.

Please select the technologies currently used in your solution:

Artificial Intelligence / Machine Learning
Big Data

Does this technology introduce any risks? How are you addressing or mitigating these risks in your solution?

The main risk associated with our solution is:

Use of our A.I.-engine could increase the risk of detecting high-impact, non-treatable incidental findings, that is, findings concerning an individual that have potential health significance but are beyond the aims of the study. This is an important aspect that can be mitigated by:
- Have a clear clause in the agreement for disclosure and use of possible incidental findings
- Discard all predictions that are not directly associate with the reported patient’s phenotypes (this functionality is already implemented into our A.I.- prediction model)
- Use of phenotype-specific gene panels

Select the key characteristics of your target population.

Minorities & Previously Excluded Populations
Persons with Disabilities

Which of the UN Sustainable Development Goals does your solution address?

3. Good Health and Well-being

In which countries do you currently operate?

Australia
Austria
Belgium
Canada
Chile
China
Czechia
Denmark
Finland
Germany
Greece
Hong Kong SAR, China
India
Iran, Islamic Rep.
Ireland
Italy
Malaysia
Netherlands
Portugal
Romania
Saudi Arabia
Singapore
Slovak Republic
Spain
Switzerland
Thailand
United Arab Emirates
United Kingdom
United States

In which countries will you be operating within the next year?

Australia
Austria
Belgium
Canada
Chile
China
Czechia
Denmark
Finland
Germany
Greece
Hong Kong SAR, China
India
Iran, Islamic Rep.
Ireland
Italy
Malaysia
Netherlands
Portugal
Romania
Saudi Arabia
Singapore
Slovak Republic
Spain
Switzerland
Thailand
United Arab Emirates
United Kingdom
United States

How many people does your solution currently serve? How many will it serve in one year? In five years?

Current: Our end-to-end platform is currently used across more than 150 institutes and national programs in 29 countries spread across 5 continents. In our databases we have data for over 50,000 people. We have also produced our first clinical data-pack for familial hypercholesterolemia with the potential to serve ~150,000 people (This estimate is based on the fact that this condition affects 1 in 250 people, this means that ~30 million people around the world leave with familial hypercholesterolemia, given that world-wide use of our platform we have conservatively assumed that we can reach 0.5% of the entire population)

1 Year: In one year we will have several additional users in different countries and based on our estimate we would have ~100,000 people in our databases. In the meantime, we will also be working to generate data-packs for three additional genetic conditions (confidential data) which based on the previous calculation should give us the opportunity to serve ~500,000 people

5 Years: In five years we are aiming to have a world leading product used in more than 100 countries and a total close to 500,000 patients in our database (these would include ~10,000 patients from ethnic minorities that would have received our free-of-charge service as results of this project). In addition, we would work to produce a portfolio of at least 15 data-packs (confidential data) which based on the previous calculation should give us the opportunity to serve ~5 million of people.

What are your impact goals for the next year and the next five years, and -- importantly -- how will you achieve them?

In the first year of our project, we will select a set of institutes (number to be decided) that will have free-of-charge use to our platform for the analysis of ~10,000 samples. Institutes will be selected based on key parameters focusing on their access to ethnic minorities. We will achieve this by a strict collaboration between our Patient Advocacy and Engagement Advisory Board with the help of the International Rare Disease Research Consortium (IRDiRC).

In the next five years we will have provided a diagnosis for most of the ~10,000 ethnic minority patients. This interpretation work will have significantly increased the heterogeneity of our dataset. This new data will be fed back into our existing reinforcement pipeline to improve our clinical models, which will be used retrospectively to analyze previously undiagnosed cases. In addition, the heterogeneity of the data will play a key role in generating clinical data-packs that are effective across all ancestries.

How are you measuring your progress toward your impact goals?

Number of selected institutes that will access the free-of-charge service
Number of ethnic minority patients submitted to our platform
Number of ethnic minority patients submitted to our platform that have received a diagnosis
Number of new diagnostic variants identified in the ethnic minority patients
Number of unique rare diseases contained in our databases
Number of validated clinical data-packs

About Your Team

What type of organization is your solution team?

For-profit, including B-Corp or similar models

How many people work on your solution team?

Our solution will be based on our platform for the analysis and interpretation of next-generation sequencing data for rare diseases. This platform is at the core of our company all 150+ people employed by Congenica work on this (although from different perspectives).

How long have you been working on your solution?

9 years

How are you and your team well-positioned to deliver this solution?

Our Company provides a unique combinations of profiles that will play a key role in the effective implementation of the proposed solutions. In particular:

A.I. Team: will be responsible for the implementation and tailoring of the required A.I. infrastructures
Data Team: will be responsible for data collection and curation
Engineering and Development: will be responsible for evaluating, maintaining and improving our analysis platform based on specific requests
Clinical Team: will be providing support to the institutes for the interpretation of the most complex cases
Customer Support: will be investigating and resolving any questions/complaints from the institutes
Patient Advocacy and Engagement Advisory Board : will play the key role of selecting and engaging with the institute that will have the exclusive free-of-charge use of our platform

What is your approach to building a diverse, equitable, and inclusive leadership team?

Diversity and inclusion have been always the focus of Congenica. We aims at creating a diverse environment where people can flourish knowing that diversity of thought is key to creating a successful business. A number of interventions have aimed to address a perceived imbalance in representation of some groups.

Congenica is dedicated to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, colour, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.

We have also in place a periodic review to explore the state of diversity inside the company. This is carried out by our 'diversity committee' which systematically look at specific trends and how these have changed over time, to highlight any bias and take the appropriate actions.

Is your team led or managed by a person with a rare disease?

Our Company is advised by a Patient Advocacy Board of patients and families of those with rare disease.

Your Business Model & Partnerships

Partnership & Prize Funding Opportunities

Why are you applying to Solve?

Main goal of our solution is to produce clinical data-pack for rare diseases that are effective across all ancestries by collecting more data from ethnic minorities. Winning this competition would give us both some initial findings to offer free access to our platform to institutes that are in strict contact with these underrepresented populations and visibility which could help us in starting some key collaborations to grow a community aimed at reducing the gap in healthcare access for minority groups.

In which of the following areas do you most need partners or support?

Monitoring & Evaluation (e.g. collecting/using data, measuring impact)

Please explain in more detail here.

We would like to collect more data from ethnic minorities to identify and tackle any ancestry bias of our diagnostic models, with great emphasis on validating/improving our existing clinical data-packs (disease-specific variant lists) and developing new ones for specific diseases of interest.

What organizations would you like to partner with, and how would you like to partner with them?

Recently (May 2021) a new consortium focused on reducing the ancestry bias in genomic databases for polygenic risk scoring was created by the National Human Genome Research Institute (NHGRI). This consortium includes research sites at the University of Maryland, College Park; Massachusetts General Hospital; the University of North Carolina, Chapel Hill; the Broad Institute; and the University of California, Los Angeles. Working alongside with this consortium could be a breakthrough for our goal (i.e., tackling the ancestry bias of diagnostic models for rare diseases).

We would also like to work together with pharmaceutical companies to improve/create clinical data-packs of interest aimed at the characterization of personalized therapies by using the longitudinal data obtained from their clinical studies.