Co-developing a Speech Emotion Recognition System for te reo Māori

Identifying emotions from speech, known as speech emotion recognition, is inherently popular in human-computer interaction due to the demand for contactless interaction. However, such speech technology development is happening only in a few well-resourced languages (100 out of the over 7,000 world languages) like American-English. Hence, speech technology used in human-computer interaction is biased against low-resourced languages like Aotearoa New Zealand’s Indigenous language, te reo Māori. Current speech emotion recognition systems often classify speech into predefined emotion categories from English or their direct translations, assuming emotions are universal. However, there is an ongoing debate about how emotions are categorised differently across languages due to cultural variations. To date, no studies have verified how emotions are categorised in te reo Māori. Current speech emotion recognition systems often do not employ community-oriented approaches in developing technology for Indigenous languages and also lack the integration of speech processing knowledge.

This project aims to address these gaps by co-developing the first te reo Māori speech emotion recognition system with the community.

To achieve this, we have developed a five-stage methodology.

1. Community Media Collection

As the first step, emotional speech data in te reo Māori was collected from Te Hiku Media and various Māori-language television shows. The TV shows were accessed through the University of Auckland (UoA) library, with the required permissions.

2. Identifying Common Social Emotions in Te Reo Māori

To understand what the common social emotion categories are in te reo Māori, we conducted an online questionnaire to gather feedback from the community. This resulted in the identification of 218 emotion terms. We then held focus group discussions to further analyse and categorise these terms. Through this process, we identified 16 emotion categories for te reo Māori speech which are not fully captured in presumptive universal emotions.

2. Database recording

We are building a te reo Māori emotional speech database by recording an emotional speech corpus based on the 16 identified emotion categories. This involves working with Māori voice actors to capture expressive speech that reflects each emotion. So far, we have recorded 224 sentences with one professional actor. These recordings are currently being used in preliminary experiments to explore how emotional expression is conveyed in te reo Māori speech.

3. Acoustic Feature Analysis

The next step in our project is to conduct an acoustic analysis of the recorded speech. We will examine features such as fundamental frequency, mean intensity, speech rate, and Mel-frequency cepstral coefficients (MFCCs). This analysis will help us determine which acoustic features are most effective in differentiate between the different emotional categories.

4. Technology Development
Our final goal is to develop emotion recognition technology that works for the Māori community. Using the identified emotion categories and acoustic features, we will train machine learning-based classifiers to recognise emotions in te reo Māori speech.

This project will be the first research of its kind into Indigenous emotions-based technology, thereby paving the way for other languages to develop frameworks following community-oriented approaches.

Project Team

 

Publications & Outputs

 

  1. Rathnayake, H., James, J., Leoni, G., Nicholas, A., Watson, C., and Keegan, P. (2024). Towards Identifying the Common Social Emotions in Spoken te reo Māori: A Community-Oriented Approach. Australasian International Conference on Speech Science and Technology [Link].

Funding and Support