Modern Automatic Speech Recognition (ASR) advancement and natural language processing do not exist for most Indigenous languages. Not only does this technology enable voice assistants like Siri and educational apps like Duolingo, but it can also act to preserve and protect a language when deployed by the right hands. Time is threatening the revitalization of many Indigenous languages. Currently, there are only 16 living speakers of Makah.
The goal of the International Wakashan AI Consortium is to empower the Makah to build advanced voice experiences and conduct research on their own endangered language in order to revitalize it for generations to come. To bridge this technical gap, international brethren will collect data for, conduct research, and build skills to create Automatic Speech Recognition AI. This AI will be a Deep Neural Net model implemented on the machine learning platform TensorFlow using Open Source Software.
The ASR model will take audio files as input and return text corresponding to the audio in the Makah orthography. Using the resulting text, future Makah software engineers can use simple logic to build interactive experiences. Because Makah is closely related to other Wakashan languages this model can be used to jumpstart custom models accommodating unique attributes in each language and vice-versa.
In our partner community of 4,000 people only 16 members are Native speakers of the Makah language — we’re teaming up to revitalize this language and others in the Wakashan language family.
The scale of this problem impacts the Makah nation of nearly 4,000 members, plus the communities of the Wakashan language families in Canada comprising over 25,000 members internationally. Furthermore, because Makah and the Wakashan language family are highly polysynthetic, this solution will contribute to the broader Indigenous language research community working to build ASR for polysynthetic languages across the globe. Broadly, researching polysynthetic ASR affects hundreds of language communities across North America, with the potential to impact millions of Indigenous peoples and thousands of endangered languages worldwide.
The ASR market is expected to value at $24.9 billion by 2025, while the overall natural language processing market is expected to grow to $80.68 Billion by 2026 at a CAGR of 32.4 percent.
We are currently working with members of the Makah community and will help them learn to annotate data in support of language and cultural revitalization efforts for years to come.
After establishing our model for Automatic Speech Recognition (ASR) software for polysynthetic language groups, using the Wakashan language family as our prototype, we aim to grow our relationships with other Indigenous communities to adapt our ASR technology for new language groups.
International Wakashan AI Consortium currently seeks:
To partner with seasoned experts in Computational Linguistics, particularly those with a strong background in polysynthetic language groups.
The support of partners who have experience successfully launching products that harness AI for community benefit.
Seattle, Washington, United States
US and Canada
- Michael Running Wolf Software Development Engineer, International Wakashan AI Consortium