Introducing ABENA: BERT Natural Language Processing for Twi

Fig. 1: We named our main model ABENA — A BERT Now in Akan

Introduction

Fig. 2: Our fastText (subword word2vec) Twi embedding model screenshot from a previous article
  • We first employ transfer learning to fine-tune a multilingual BERT (mBERT) model on the Twi subset of the JW300 dataset, which is the same data we used to develop our fastText model. This data is largely composed of the Akuapem dialect of Twi.
  • Subsequently, we fine-tune this model further on the Asante Twi Bible data to obtain an Asante Twi version of the model.
  • Additionally, we perform both experiments using the DistilBERT architecture instead of BERT — this yields smaller and more lightweight versions of the (i) Akuapem and (ii) Asante ABENA models.

Motivation

  • He examined the cell under the microscope.
  • He was locked in a cell.
  • ɔkraman no so paa — the dog is very big
  • ɔkra no da mpa no so — the cat is sleeping on the bed
Fig. 3: Illustrating the key idea behind transfer learning — instead of learning things for scratch, prior knowledge and experience should be shared and used to make the current task easier. Here, we see that learning to play the drum is easier if one already played the piano. Image from “Transfer Learning for NLP” [https://www.manning.com/books/transfer-learning-for-natural-language-processing]

ABENA Twi BERT Models

Fig.4: Convergence info for ABENA models. All models were trained on a single Tesla K80 GPU on an NC6 Azure VM instance.
Fig.5: Convergence info for DistilABENA models. All models were trained on a single Tesla K80 GPU on an NC6 Azure VM instance.

BAKO Twi BERT Model

Fig. 6: (BAKO) We also investigate training BERT models from scratch, yielding BAKO — BERT with Akan Knowledge Only. The Twi word “Bako” or “Baako” means “One”.
Fig.7: Convergence info for RoBAKO models trained from scratch. All models were trained on a single Tesla K80 GPU on an NC6 Azure VM instance.

Simple Sentiment Analysis/Classification Example

Fig.8: Description of all the models we trained and shared in this work.
Fig. 9: Simple Sentiment Analysis Example Dataset

Limitations and Ongoing/Future Work

Join Us?

--

--

--

Paul Azunre holds a PhD in Computer Science from MIT and has served as a Principal Investigator on several DARPA programs. He founded Algorine & Ghana NLP

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Building a Big Data Machine Learning Spark Application for Flight Delay Prediction

The Difference Between Data Analytics vs. Machine Learning

Evolving Game of Life: Neural Networks, Chaos, and Complexity

Decisions from Data: How Offline Reinforcement Learning Will Change How We Use ML

SGD in Machine Learning

Shipping Label OCR using OpenCV and Deep Learning

Introduction to Artificial Neural Networks-Perceptron Learning

4 Steps To Making Your First Prediction — K Nearest Neighbors (Regression) In R

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Paul Azunre

Paul Azunre

Paul Azunre holds a PhD in Computer Science from MIT and has served as a Principal Investigator on several DARPA programs. He founded Algorine & Ghana NLP

More from Medium

Toward fine-tuning a state of the art Natural Language Inference (NLI) model for Persian

What does NLP stand for in AI

Look to Computers for the Next Dr. Seuss: The World of Language Generators and GPT-3 Technology

JANSY: Just Another Natural language processing attacker on Sentimental analYsis