Pronunciation and AI: Will the AI revolution remove the need for pronunciation teachers?

 

PRONSIG 12 October 2024 conference

A whole day’s  conference  where we discussed what the AI revolution could mean for pronunciation instruction.

These are the talks I attended with excerpts taken from the amazing PRONSIG conference brochure.

Discussions about AI’s role in English language teaching are inescapable at present, as are concerns that it could ultimately replace teachers, or, at the very least, change our learning environments beyond all recognition. This is particularly visible in the field of pronunciation instruction, where voice recognition technology, automated learner feedback, and voice assistants are becoming increasingly common. While this may cause worry initially, we cannot ignore the potential benefits these developments could offer both learners and teachers.

Talk 1

Title: What would it take for AI to Replace Pronunciation Teachers?

Opening Plenary speaker - Beata Walesiak University of Warsaw (UOUW)

Beata took us through the history of AI and pronunciation and explained many acronyms:

ANI Artificial Narrow Intelligence

AGI Artificial General Intelligence

ML Machine Learning  

DL Deep Learning

DNN Deep Neural Networks capable of mimicking human behaviour with more complex patterns. Deeplearning.ai offering good courses and nividia.com

ASR Automatic Speech Recognition eg dictation tools

TTS Text to speech eg ChatGPT

VC Voice Conversion changing pitch and tone of your voice eg Golden Speaker Builder

VC Voice Cloning eg Speechify.com, a digital replica of a person's voice.

Thanks to the above: accessibility, personalisation and scalability, AI is currently flexible enough to support learners in self-studying, and help teachers design classroom materials requiring only minor corrections.

Many apps offer task-based pronunciation practice, but only some offer diagnostics and personalised feedback. Most apps focus on the sounds but rarely prosodic features, such as stress are covered.

Eg Lola Speak gives a percentage of your native  speech score. We all shuddered in dismay! It also underlines the stressed syllable.

Elsa.ai does role play and provides feedback at phoneme level, eg ‘th’ pronunciation.

PingoLearn

Flowchase a Belgian app

Ewa a great app for teenagers

Twee.com creates dialogues for reading. Eg can take a YouTube clip and create a dialogue from it. A good exercise to reuse vocabulary. Other teachers at the conference were using this.

For AI to effectively replace pronunciation teachers, however, Beata emphasised it would need to achieve, much more. Tailored pronunciation resources fitted to any CEFR level and age, personalised feedback on real-time speech matching the Communicative Framework for Teaching English Pronunciation, or detailed analyses of suprasegmentals leading to the “So what?” are just a few aspects AI would still need to get better at handling.

Talk 2
Title: AI Technology as an Efficient Colleague, Not a Foe

Nobuo Yuzawa is a professor of English phonetics at Utsunomiya University, Japan. His research focuses on English intonation, teaching pronunciation, pronunciation models, and varieties of English accent

In Japan, some learners are shy and worried about making mistakes in class. To overcome this, Prof Yuzawa, believes practising at home with AI technology can be beneficial.

Learning English pronunciation in Japan faces several challenges, especially in traditional school settings. With classrooms of 35-40 students, teaching pronunciation and speaking in pairs is difficult without rearranging the environment. Additionally, the grammar-translation method used by many Japanese teachers, driven by limited budgets, places little emphasis on authentic communication or conversational skills.

English is not a daily language in Japan, so there is limited communicative need for conversational practice. As a result, there is a stronger focus on receptive skills like reading and listening, rather than productive skills like speaking. This, combined with the significant grammatical differences between Japanese and English, makes learning English even more challenging for many students.

Since English pronunciation is crucial yet often neglected in traditional classrooms, many students turn to private tutors or self-study. However, finding a skilled communication coach or pronunciation expert can be difficult in Japan. With an 8-hour time difference between Britain and Japan, accessing British tutors adds further complexity. Learning at home is often the most practical solution, with the added benefit of students feeling more relaxed and confident.

To enhance learning, visual aid examples play a critical role in helping students understand how to position their vocal organs correctly when practicing pronunciation. Tools like Elsa.ai, a pronunciation app with a free and paid version, are also gaining popularity. While it doesn’t guarantee good production, it allows students to practice simulated conversations at their own pace, motivating them to improve. Interestingly for me, many Japanese students prefer an American accent over a British one, finding it more accessible and popular.

Talk 3

Title: Empowering Self-Directed Pronunciation Learning through AI

Stella Palavecino is a teacher, a teacher trainer, ELT material writer, author, editor and consultant specialised in English Phonetics, in Buenos Aires, Argentinas. She lectures in Phonetics and Phonology at teacher training college

Stella showed us how self-directed learning has become more accessible than ever. She demonstrated using TTSMAKER,

text to speak speech maker app where the students can master pronunciation features through a ‘learning to learn’ approach, supported by AI. She explained that there is no difference in Spanish between long and short vowels and that you can put in minimal pairs:

kiss keys, wheel will, eat it, men mean

The student could copy the minimal pair recording themselves and get feedback.

Stella also showed us playphrase.me which finds tongue twisters in movie clips. You can then copy and paste the phrase. Students find it fun. Following this you can repeat the tongue twister into ChatGPT or the student's phone and get a transcription to see if the transcription app has understood you correctly. You can also receive an oral response from ChatGPT.
Stella firmly believes AI develops confidence and empowers students but teachers are needed to guide them through the different tools and fill in the gaps with AI complementing and assisting.

Talk 4

Title: In and Out of Class AI-based Pronunciation Activity

William Gottardi is a doctoral candidate at the Federal University of Santa Catarina (Brazil). His current research interests include digital technologies for pronunciation teaching and learning, CALL, and autonomous learning.

William demonstrated the use of AI-based Pronunciation Activities intended for use in second language (L2) English classes. The activities incorporate an AI-powered assistant, the free standard version of Google Gemini, to use its automatic speech recognition feature and instant feedback during the pronunciation practice.

An activity suggested was to practice big numbers using Gemini. If having difficulty with spelling, say your name or address,  ask Gemini to ‘spell each letter of each word I say very slowly’ and you can repeat. You can also ask it to transcribe the words you say or add a phonetic transcription. This is useful because the sound ‘M’ does not exist in Portuguese.

Talk 5

Title: Streamlined Pronunciation Practice Using Automatic Speech Recognition, Text-toSpeech and Large Language Models

Anton de la Fuente is a software engineer with a PhD in theoretical physics. He currently lives in Japan and has developed an AI app to help his Japanese friends practice English pronunciation.

Anton developed an app prosodyai.com to make it easy to practice pronunciation. It uses large language models (LLM, ChatGPT) to generate personalised phrases, which users repeat to match a text-to-speech (TTS) voice. The app provides intelligibility-based feedback using automatic speech recognition (ASR). Teachers can also assign phrases as homework, allowing students to practice independently and focus on specific challenges in class.

Beata described this as the ultimate drill machine where you can speak with the app and repeat afterwards!

Talk 6

Title: Exploring the Impact of AI on Spanish-Catalan Teenagers’ English Pronunciation

Marina Palomo holds BA and MA degrees in English Studies from Universitat Rovira i Virgili. She wrote theses on neurolinguistics and pronunciation/AI and will soon begin a PhD in Neurophonetics.

Her session explored AI’ s potential in teaching American English vowels to L2 Spanish-Catalan learners, because as we all know pronunciation is the hardest aspect to learn in English and many teachers are unsure how to teach it, some think students can just assimilate it and there is never sufficient time devoted to English pronunciation in the curriculum.

There are 14 American vowel sounds and 5 short Spanish vowel sounds and 8 Catalan vowel sounds, five of which are identical to the Spanish vowels. In both Spanish and Catalan, there is a one-to-one correspondence of sound and spelling, unlike in English vowels, English pronunciation.

Marina presented a study with 15- year-olds comparing two AI tools with traditional methods. Her findings suggest that the AI tools used do not significantly improve pronunciation and that stimuli type affects AI comprehension, indicating that AI is not yet generally efficient enough for effective EFL pronunciation teaching. However, she noted that 14 to 15 year old students were very motivated to learn using apps.

Talk 7

End plenary: Studying the gap between CAPT computer assistant pronunciation training and STT speech to text programmess

Dr Shannon McCrockline of Applied Language and Technology in Southern Illinois University took us through the fundamentals of good pronunciation training and academic references from 2016 - 2024:

·        intelligibility is the focus

·        students need to be assessed and their specific needs addressed

·        they require a model for intelligible pronunciation

·        they require explicit feedback

·        the teacher needs to promote learner autonomy and a choice of learning

·        and finally the student needs opportunities to experiment .

 Shannon also took us through the 3 Ps model prediction, perception and production with controlled, guided and communicative production.

Then she explained the CAPT and gave us examples:

  • Pronunciation power where you can see the articulators in the mouth

  • English accent coach

  • Blue canoe with a colour-coded gamification system

  • ELSA with good listening exercises, phonetics and giving feedback as a percentage.

    Her research confirmed that CAPT could integrate non-native speakers well, giving a high scoring accuracy and maybe more useful for beginners. Students were positive about using it.

STT apps were built entirely for native speakers, can improve segmental accuracy and overall intelligibility.

However, CAPT has no focus on meaning or intelligibility but it's good with modelling and provides explanations and structured lessons.

STTs provide no feedback but can focus on meaning.

 Shannon also researched the question ‘Can AI chat bots fill the void like Gliglish which does role plays and can do 1 to 1 classes and operates in different modes: beginner, intermediate and advanced. It also produces written and audio and provides suggestions. However, it is not totally accurate and explanations were not sufficient. It did not provide useful feedback on pronunciation errors although better on vowels but very poor on the suprasegmentals sometimes not acknowledging any errors.

Shannon concluded that none of the AI apps have a completely coherent plan and teachers  need to continue modelling and guiding.