The importance of the voice in digital learning materials

This week the Miller chapter along with the Film Festival chat has firmed up a  few of my emerging thoughts about the human relationship to tech, particularly around disembodiment and the importance of the voice.

The importance of the voice in digital learning materials

During the week I’ve been creating some learning materials and I wanted to include a voice over to introduce each section.  With this week’s reading a the back of the mind but lacking the time to record someone ‘real’ I decided to use an online text to speech site, the output of which sounded almost indistinguishable from a ‘live’ actor, the emphasis being on almost.  The the subtle nuances and imperfection of natural speech were missing and, while the end result was a very close facsimile of the ‘real thing’, the automation was still evident.

So I decided to find out if there is any research to  indicate a difference in learning when presented by a human voice versus a synthesised voice.   Writing about the results of two experiments Mayer, Sobko, and Mautone (2003) found that students performed better in a transfer test and rated the speaker more positively if narrator had a standard accent rather than a foreign accent (Experiment 1) and if the voice was human rather than machine synthesised (Experiment 2).  So does this mean that learners will always respond better to a on-screen human tutor than a computer-generated equivalent? There is research that indicates that people will treat computers in the same way as humans given the right circumstances.  Reeves and Nass (1996) found that people will comply with social conventions and be polite to computers when asked to evaluate them directly, as compared to evaluating one computer from a different one (the equivalent of giving face to face feedback compared to giving feedback about someone to a third-party).

Moreno, Mayer, Spires, & Lester (2001) found that there was little difference in the test performance  of students learning about botany principles presented by a cartoon on-screen tutor compared to an on-screen human tutor.  They also found that students learned equally well even if there was no on-screen tutor so long as the students could hear the tutor’s voice. This suggests that voice quality and clarity is more important than whether it is a human voice or not.

My own experience of being ‘fooled’ by automated telephone services suggest that it will not be long before AI is indistinguishable from a human agent.   The  more recent Mayer, Sobko, and Mautone experiments suggest that this could be beneficial to those producing digital learning materials, whereas the Moreno, Mayer, Spires, & Lester (2001) experiment indicates that it might not make much difference.

Visualising the concepts in Miller, V (2011)

I’m continuing to mindmap the set readings and other related texts I’ve researched.  At this stage the maps are just my way of visualising the concepts and arguments so that I can see how they fit together, currently they don’t offer any critical examination of the texts.

This is my mindmap of the Miller text, it’s a better resolution than the previous maps, which I will update when I revisit them.

Mindmap of Miller V (2011) Chapter 9 The Body and Information Technology
Mindmap of Miller V (2011) Chapter 9 The Body and Information Technology. Right click and select open in new tab to view full screen and enable zooming.

Miller, V. (2011) Chapter 9: The Body and Information Technology, in Understanding Digital Culture. London: Sage.

Mayer , R.E. , Sobko , K. , & Mautone , P.D. ( 2003 ). Social cues in multimedia learning: Role of speaker’s voice . Journal of Educational Psychology , 95 , 419 – 425 .

Reeves , B. , & Nass , C. ( 1996 ). The media equation: How people treat computers, television, and new media like real people and places . New York : Cambridge University Press .

Moreno , R. , Mayer , R.E. , Spires , H. , & Lester , J. ( 2001 ). The case for social agency in computer-based teaching: Do students learn more deeply when they interact with animated pedagogical agents? Cognition and Instruction , 19 , 177 – 214

2 thoughts on “The importance of the voice in digital learning materials”

  1. Excellent summary here Nigel!

    Do note the 250 word count for these summary posts. Nevertheless, lots of interesting ideas here! Remember that the focus in these weekly summaries should be the specific content that has gone into your lifestream the previous week (I know you started a little late though), rather than a general reflection on the themes. We want to hear why you’ve added particular tweets or Youtube videos to your lifestream.

    However, as I say, great reflection here that definitely has a place, although perhaps more as a blog post. On the subject of machine ‘believability’, have you come across ELIZA ( ? As the research you cite shows, there is plenty of evidence to show that humans are pretty accepting of non-human agents. The Turing Test ( is the early example of trying to figure this out.

    I have to say, these debates seem to be more concerned with our own psychological responses to machines, rather than the machines themselves – the very simply programming of ELIZA being the perfect example of that.

    Super mind map of the Miller chapter! Would there be another way to share higher resolution versions, perhaps into your lifestream?

    1. Thank you for the encouragement and advice Jeremy.
      >Do note the 250 word count for these summary posts.
      Yes I had got rather caught up in the novelty of the medium and lost sight of the assessment criteria! I will do a little site curation to resolve, using the original post to keep everything in chronological order (which will keep me happy!)
      I’ve also noted the point you’ve made regarding the tweeted content, it’s taken me a little while to understand how best to use the Lifestream format and to home in on topics that will help my professional practice. You summed this up nicely in your intro video to week 3 as initial exploration and experimentation ahead of getting more focused on some narrower topics.

      >have you come across ELIZA
      I hadn’t although I am aware of the Turing test. I am aware that there has been a lot of research into the topic, although my specific interest was in the intonation, tone and other peculiarities that make us assume that a voice is ‘real’ (such as uncertainty and hesitation). In other words how the words are said rather than what is said. I thought this was illustrated beautifully in the Gumdrop clip.

      >Would there be another way to share higher resolution versions, perhaps into your lifestream?
      I’m working on that, the size limit makes it difficult but the latest map I uploaded in png format looks a lot better, so at the very least I will go back and update the previous maps. I intend to add my own thoughts to all of them so they’ll need to be updated anyway.

      Thanks again for the feedback 🙂

Leave a Reply