Future-proofing te reo with data science
An iwi-led media company in Northland is proving Aotearoa can not only keep up but lead the way in global data science. Te Hiku Media, based in Kaitāia, has created a cutting edge Automatic Speech Recognition (ASR) software, transcribing te reo Māori.
It is the first ASR to recognise any indigenous language and operates with a 92 per cent accuracy rate, outperforming similar attempts by major international tech companies, according to Time magazine.
The ASR has been developed using decades worth of archived radio content and additional requested audio clips.
Te Hiku Media CEO Peter-Lucas Jones says the organisation started its life as an iwi broadcaster in 1990, and its connection to Māori language and people has acted as a foundation for the transition into data science.
“Over 30 years, we have gathered huge amounts of corpus. Our focus, of course, is Māori language broadcasting and as part of our digital transition we worked with our communities to digitise all of these analog recordings. Alongside that was a governance responsibility, which involves managing the assets we develop.”
In addition to his role at Te Hiku Media, Jones is chair of Te Whakaruruhau o Ngā Reo Irirangi Māori (National Iwi Radio Network), and deputy chair of Whakaata Māori. He is a leading innovator in natural language processing (NPL), and a pioneer for the revitalisation of te reo Māori and other indigenous languages.
When approaching AI, Jones says directors and board members need to have a crystal clear understanding of what their core responsibilities are, which vary between industries and as tech evolves.
“Nothing is stagnant. We’re teaching computers how to speak a minority language. It’s an endangered language, a language that could be counted among the thousands of languages that are predicted to become extinct by 2050. So what does that mean for us with the work we’re doing?”
Te Hiku Media is governed by a “unique set of circumstances”, where the board is made up of representatives from each tribal group connected to the organisation, and members are appointed by those groups.
The original recordings pulled from the iwi radio archive and digitised include those from native te reo Māori speakers – something Jones says was a key consideration with the board when the project was in early stages of development.
When making decisions on how to proceed with not only digitising its years of precious radio audio but using it to future- proof te reo Māori, Jones said responsible, ethical data management was always at the forefront of their decision-making.
“We have made strategic decisions about how we can connect with our people living in Aotearoa and around the world, and serve them with an experience that is enhanced by the artificial intelligence tools we create.
“An example of that is recognising there are many lifestyles our people have. There are more traditional lifestyles in our marae communities, and then there are urban lifestyles. In recognising that, and that language revitalisation occurs in different ways in different settings, strategically, how do we make decisions that account for cultural context and don’t just empower one domain?”
Examples for this at Te Hiku Media include ensuring the organisation has ongoing control and ownership over its data. “As a board member, you need to establish how to govern these new types of assets and recognise that data is an asset, data is land. These are the strategic decisions that all board members need to know about.”
When artificial intelligence tools are used in this way, Jones says they can become part of an educational experience – bridging the gap or seen as a risk. “You’re constantly identifying risks, but also acknowledging how a risk can be turned into an opportunity.”
“It’s important that directors and boards understand what benchmarking is, and understand that big tech has the capacity to erode marginalised language by not understanding what quality means to the communities that those languages belong.”
These considerations can be used to elevate a board’s strategic decisions, such as bringing services in-house. “You’re either paying somebody else for those services, or you’re creating jobs for your community. In doing that, we wanted to ensure we had a Graphics Processing Unit (GPU) cluster in Kaitāia, creating the best bilingual tools for te reo Māori in the world, at home.”
That cluster was supported by Te Hiku Media’s acceptance into an NVIDIA program, which provided it with half-price GPUs – making the tech accessible to groups that were recognised as a form of indigenous startup.
Jones says directors should be asking themselves, ‘what are the key drivers for your organisation’? “Once you have your values, your principles and your focus, which for us is providing a quality service and contributing to building a Māori language economy in New Zealand, then you can begin to provide a quality service.
“It’s important that directors and boards understand what benchmarking is, and understand that big tech has the capacity to erode marginalised language by not understanding what quality means to the communities that those languages belong.”
Jones has recently returned from visiting the United States, where he was named in Time magazine’s top 100 AI leaders list. He says the international recognition of Te Hiku Media has brought a significant level of interest from other communities around the world.
“We are closely related linguistically and culturally to other Pacific cultures. So thinking about how we can contribute at a governance level is related to how we can contribute internationally.
“Our tools support providing quality information to decision-makers, but they also provide a new and innovative way to curate information. At a governance level, they can extract information which supports good decision-making. Your decision-making was only ever going to be as good as the quality of information you’re referring to.”