Khaya African Language Translation and Speech Recognition AI Demonstrates Major Improvements
Khaya AI App Expands into Northern Ghana, Demonstrates Major Improvements in Performance by Learning from User Feedback, and Outperforms Google Translate in Yoruba Text Translation
The following describes work done by the NLP Ghana and Algorine teams, both of which I am a part of, to democratize modern machine learning tool access for Ghanaian and other African Languages. It culminates in the release of version 1.0.4 of the Khaya AI app, pushing further the automatic speech recognition (ASR) and machine translation “state-of-the-art” for over 60 million people in West Africa.
What Is The Inspiration Behind The Name Khaya AI?
Khaya AI is named after the Khaya African Mahogany tree. Just like the tree, it is rooted in Africa. We hope it will similarly become a nourishing, sustaining resource for Africa and Africans in the digital future. It is also a word for “home” in several Southern African languages.
What could the Previous Version of Khaya AI do?
Nine months ago, version 1.0.3 of the Khaya AI App was released — providing the world with Twi and Yoruba Automatic Speech Recognition (ASR) capabilities, as well as Ga, Ewe, Twi and Yoruba neural machine learning text translators. It included the crucial ability to gather feedback from the public to improve its quality over time.
Over the past 9 months, the NLP Ghana and Algorine teams have been working diligently to improve the quality of these machine learning systems, and to address any concerns raised. We are happy to release today version 1.0.4 of the Khaya AI App, showcasing dramatic improvements in quality. You can already test the app on Web, Android or iOS by following the links in https://linktr.ee/nlpghana
We outline the various improvements achieved in this article, as summarized by the following list and subsequent sections. In summary, the systems are improving — a lot!
What Can The New Version of Khaya AI Do?
Highlights
1. Addition of Dagbani translation and speech recognition marks the beginning of expansion of the AI into Northern Ghana — Hausa, Frafra and Buli are in the release pipeline next, among others. We are committed to a fairness of linguistic coverage.
2. Our Yoruba text translator outperforms Google Translate. We are committed to providing world-class solutions across Africa
3. Across the board improvements in text translation have been achieved— as measured by the BLEU metric, confirmed by human evaluators, and shown in Table 1.
4. Across the board improvements in Automatic Speech Recognition (ASR) has been achieved — as measured by the Word Error Rate (WER), confirmed by human evaluators, and shown in Table 2.
5. Collaboration with the Harvard African Language School on East African Languages — to date including Swahili, Kikuyu and Kimeru — furthers our commitment to providing world-class solutions across Africa
Now let’s dive deeper into the enhancements being released and ongoing work.
Dagbani Introduced as Expansion in Northern Ghana Begins
If you go through the reviews gathered by the Khaya app on the Android store (where a majority of the current app user base exists) a notable theme is the absence of Northern Ghanaian languages in version 1.0.3 of the app. We invested a significant amount of effort in Northern language research over the past year and are proud to introduce Dagbani text translation and ASR in version 1.0.4. We worked closely with the Dagbani Wikimedia Group on data and evaluation of the Dagbani technologies. This is the tip of the proverbial iceberg for our planned Northern Ghanaian language coverage, with languages like Hausa, Gurune (Frafra) and Buli slated for text translator release shortly. Models for languages such as Dagaare, Mamprusi, Gonja and Kasem are in early research phases as well.
Text Translator Improvements
We analyzed the feedback submitted by the user base — 10s of thousands of users across mobile (Android & iOS) and the web app. Human evaluators were needed to analyze this data since it was noisy, sometimes including entire messages of encouragement from users to us 🙃
Validated suggestions and corrections were fed back to the training data and the models were fine-tuned further on it. Training data was also augmented manually using other sources, using this feedback as a guide to uncover deficiencies in coverage. Additionally, training data was augmented with back translations using monolingual data, which is a well-known technique in the machine translation community to improve the fluency of models. As a result, significant improvements in the BLEU metric were achieved for most languages — See Table 1 (Ga improvements are ongoing, which is why it has been omitted from this table).
What is notable from Table 1 is that our text translators outperform Google translate in both directions! Since Yoruba is the only overlap in coverage — neither Google Translate nor any other existing solution cover any of the Ghanaian languages we can handle, to the best of our knowledge — this comparison gives us confidence that we are building a world-class solution.
Automatic Speech Recognition (ASR) Improvements
Due to the more sensitive nature of voice data, as well as the higher cost of collecting and storing it, we did not solicit ASR feedback in our deployment. We did however perform internal quality checks, discover drawbacks and keep augmenting the training data — tracking performance using the Word Error Rate (WER) metric, and confirming improvements using human evaluators. See Table 2 for the performance enhancements measured, which varied between 3.5% and 6%, and try the app today on your preferred platform to test the improvements for yourself!
What Comes Next?
1. API Release
We are building an API to empower African developers to build application solutions for their communities on top of the AI and ML systems we have created. Currently, API release is scheduled to open up for free trials within a couple weeks of this writing (scheduled for early May).
Want to build a smart home solution? A language learning app? Any technology you can imagine that relies on translating and transcribing local language speech? Access our models through the API and deploy your solutions in a matter of days 😊
2. More Languages
Improvements to the Ga text translation system are in the pipeline for release. Additions of — 1. Swahili, Hausa, Frafra (Gurune), Buli, Dagaare, Mamprusi, Shona text translators, 2. Swahili and Hausa ASR — are on the road map this year.
Addition of the Kenyan languages Kikuyu and Kimeru — joint work with Prof. John Mugane of the Harvard African Language school — is in progress. Our ambition is to scale out across the African continent, with languages such as Amharic and Wolof also scheduled. As long as there is a need for our solutions, we will continue building them.
In fact the translators and models for a lot of these cases have already been built, the real difficulty has been scaling them out in production, at an affordable rate for a bootstrapping unfunded outfit like ourselves. Which brings us to the next section.
3. Language Scaling
When we started out, our models were “one-to-one” — meaning a separate model was trained for Twi to English, English to Twi, English to Ga, etc. As you can imagine, this approach is difficult to scale in production, as general purpose translators and speech recognition systems are relatively large and computationally expensive.
We have since moved to a “many-to-one” and “one-to-many” approach requiring one model for translating any local language to English and one model for translating English to any local language. This has eased a lot of the initial challenges we had scaling out the app to more languages, so we expect a more rapid expansion timeline from now on. Additionally, we are working towards a “many-to-many’’ model, requiring a single model for everything — which also will be able to handle translations between any pair of local languages — such as Ga to Ewe, Twi or Yoruba, and vice versa.
HOW TO SUPPORT OUR WORK
Please consider purchasing an ad-free version of the app to support our work. We have priced it at quite an affordable 10$/year or 1$/month. We will also be working to add some amazing exclusive functionality to the premium subscription, beyond being ad-free. How does Augmented and Virtual Reality with Object Detection in your chosen language sound? How about Voice to Voice translation? Or translating entire web pages on the web so you can browse the internet in your local language? Stay tuned!
Please share our work with your friends and social media communities. This work will only be able to continue with your support. If there isn’t enough support from the community, the work may not be able to sustain itself and stop. So please do support the work if you think it is worthy of existence. On the other hand, more support means we will be able to expand to new languages and make improvements to all systems faster and better. Be sure to submit feedback when you find it not performing well, so it can improve over time.
Are you a researcher who wants to participate in this exciting research revolution? Hit us up through https://ghananlp.org/contact