The Project at a Glance

The Challenge

Transcribing RNZ’s diverse spoken audio content, which standard AI models, less familiar with te reo Māori and English in a New Zealand accent, could not accurately process.

What We Did

Evaluated existing AI transcription technologies, exploring their adaptability to RNZ’s unique requirements for accurate transcription.

The Solution

Provided recommendations on the most appropriate AI technologies based on the results of our evaluation. Also provided were recommendations for how to integrate AI technologies into RNZ's audio publishing workflows to allow reviewing and editing of results, and the opportunity to improve language models.

Outcome

The evaluation, tested against a sample of RNZ’s audio, resulted in a set of tailored recommendations, laying the groundwork for a prototype service to meet RNZ’s specific transcription needs.

Background

RNZ sought to improve the accessibility and discoverability of their spoken audio content by providing transcripts and captions alongside it. RNZ approached Ackama to help them understand the landscape for AI-generated transcripts, captions and what sort of experience current technologies might help them deliver.

As a publicly-funded media organisation, RNZ must endeavour to provide digitally inclusive stories for all New Zealanders. This means people in different contexts or with different needs or abilities shouldn’t face barriers to access and understand RNZ’s content. By providing and indexing transcripts and captions of RNZ’s spoken audio, people will have greater access to RNZ’s content as part of an improved user experience.

Cartoon of red microphone on a yellow background with the text 'On the Air'

RNZ’s spoken audio files can be accompanied by a summary of their content, but don’t include more specific information such as discussion topics, speakers or quotes that a full transcription might otherwise provide. RNZ also understand that the context some of their audience experience spoken audio in could use the support of captions. Unfortunately, RNZ does not have the resources to manually transcribe audio content, review and edit, particularly where the relevance of information depends on the time it takes to publish. To help solve this problem, they decided to seek support to investigate technological solutions.

In recent years AI technologies have made significant advancements, but they are still challenged by the New Zealand accent and te reo Māori. Ackama was eager to tackle this challenge and test the available AI technologies to see how they could work in the New Zealand context with RNZ’s publishing workflows.

Exploratory Approach to AI

Our experiment set out to evaluate if non-generative AI technologies were capable of helping make RNZ’s considerable audio more accessible. Our goal was to conduct an experiment that could identify a solution that could transcribe RNZ’s content effectively. 

We rigorously evaluated a range of AI technologies to assess their adaptability to the unique linguistic and cultural elements of RNZ’s content. We collaborated closely with RNZ to develop evaluation criteria to fit their requirements. This collaboration involved using a test dataset from RNZ’s audio content across multiple AI technologies, ranging from major cloud providers to niche firms. 

 

The evaluation focused on several critical aspects: 

 

  • The accuracy of the transcription 
  • The ability to handle captioning data 
  • Speaker identification
  • Overall transcription quality

 

This thorough testing aimed to determine whether any existing AI technologies could integrate with RNZ’s existing workflows to provide a sustainable and reliable transcription service.

The AI services available on the market fell short of our criteria, lacking in consistent accuracy and requiring a significant level of manual corrections for the transcriptions to be fit for publishing. 

Instead of viewing this as a setback, we saw it as a constructive pivot point. It shifted the focus from seeking a fully autonomous solution to one where AI technologies could handle the bulk of transcription with human oversight supplementing the transcription with quality assurance to ensure editorial safety and accuracy and to help the AI language models develop.

Rationalising Our Method

In exploring AI transcription solutions, Ackama navigated a landscape rich with bold promises but also rife with specificity that often does not translate well to real-world applications. The project brought attention to the stark differences in AI performance within and beyond the boundaries of its training data. 

For instance, transcription AI might inaccurately fill in gaps in understanding, which can distort the intended message. This can be particularly detrimental in contexts like RNZ’s. 

Through our methodical investigation, we deepened our understanding of RNZ’s unique needs and recognised that an AI technology’s broad claims must be carefully evaluated against its ability to handle specific contexts. 

Microphone in a studio

We discovered that although none of the services produced close to 100% accuracy, there is still significant value in producing transcriptions that meet an acceptable level of quality, on the understanding that it will be reviewed — especially for audio files where there are currently no transcripts available. 

This process led us to pivot from the initial concept of a one-size-fits-all solution to a collaborative approach with a company skilled in local language models. This change is expected to ensure the integrity and quality of the transcription service that will be developed. 

Transcriptions intended for public consumption need to balance the need for accuracy and accessibility. One of our recommendations is to use a hybrid model that enables AI technologies to capture initial transcriptions that are ready for review before publishing. Alternatively, transcripts that are clearly marked as AI-generated, such as those currently available on platforms like YouTube, may be more easily forgiven for mistakes if used in that context.

We also note that transcriptions for news need to be accurate and should appropriately reflect te reo Māori phrases and place names to remain credible across RNZ’s audience.

Our experience reaffirms the importance of context in deploying AI technologies and underscores the value of localised solutions that better honour regional contexts.

 

The collaboration between Ackama and RNZ uncovered the path to a potentially more aware transcription service and recommended a strategic model for the flexible application of AI technologies. 

By focusing on an outcome that is practical and adaptable, this project has laid the groundwork for future projects that would benefit from a nuanced approach to technology implementation.