Create Motivating YouTube Videos:

Using Dual Coding Theory and Multimedia Learning Theory to Investigate Viewer Perceptions


AERA SIG: Advanced Technologies for Learning



Ya-Ting Teng

Department of Human Resource Education

University of Illinois at Urbana-Champaign


Curtis J. Bonk

Professor, Department of Instructional Systems Technology

Indiana University, Bloomington


Alex J. Bonk

Sociology Department

Indiana University, Bloomington


Meng-Fen Grace Lin
Department of Educational Technology
University of Hawaii


Georgette M. Michko

Educational Technology and University Outreach

University of Houston




This research was part of a larger study that attempted to determine why people share, create, save, and comment on YouTube videos. It also explored motivational and instructional design elements of such shared online video. For this study, six videos representing three different types of videos were compared in this study (2 videos from each category): (1) text only, (2) text, pictures, and voicing, and (3) celebrity advocacy. There were 113 respondents randomly sent to one of these six Web-based surveys. As predicted by dual coding theory and multimedia learning theory, participants preferred videos which had multiple media elements-text, pictures, and voice. Such media rich videos were deemed to be more creative and engaging.


Create Motivating YouTube Videos:

Using Dual Coding Theory and Multimedia Learning Theory

to Investigate Viewer Perceptions


In the past few years, people have begun to reconsider the way that Internet technology can improve human life and education. The Web 2.0, the second generation of Web technology, has been created to promote the concept of rich user experiences and collective intelligence (OReilly, 2005). Among the Web 2.0 sites, YouTube has revolutionized the use of Internet by bringing shared online video into everyday life. In 2005, YouTube capitalized on the fact that the Flash viewer was ideal for viewing videos (Downes, 2008). As Green noted, “Flash allows Web users to post videos in a convenient format that is not constrained to platform-specific players or applications. YouTube further enhances this advantage by allowing users to upload their video in almost any format and have it converted into Flash video” (Green, 2006, para. 8). These innovations reduce the cost and barriers to create and share videos online. As a result, many now see the immense potential to use YouTube and other shared online video resources in education.


Educational Uses of YouTube Videos


When it comes to perspectives on teaching and learning, YouTube is generating waves of new opportunities in K-12, higher education, workplace learning, and e-learning. For instance, Christopher Conway, a professor in Latin American literature and culture, used YouTube videos to provide background contexts for students to see the ways in which the legacy of the 19th-century Venezuelan Liberator, Simon Bolívar, remains alive today (Conway, 2006). Beyond individual instructors, many universities create their own channels on YouTube and share their lecture videos with the world as a way of academic outreach, such as Stanford University and Massachusetts Institute of Technology (MIT).


As aforementioned examples, YouTube quickly transformed the traditional way of using videos in teaching and learning. No longer does one rely on an audio-visual department for course videos. Now, YouTube videos are relied upon for supplemental course resources. A great number of how-to videos offering procedural knowledge are now shared on YouTube. These types of videos are prominent in areas such as cooking, music, art, and photography.  Like the Professor Conway example above, shared online videos can also be an anchor or ender for instructional activities (Bonk, 2008), including discussions, reflection exercises, and debates. Such videos provide a “macrocontext” (CTGV, 1990, p. 3) or commonly viewed experience for later learning and reflection. The macrocontext provides a learning space that can be replayed or revisited and discussed from many perspectives and over an extended period of time. Videos provide a common experience for learners to discuss and reflect on concepts and ideas as in anchored instruction


Theoretical Frameworks


Paivio’s Dual Coding Theory concerns the relationships and the connections between verbal and nonverbal/imagery representations and the processes that can enhance the development and activation of memory structures. The verbal mode includes visual, auditory, and other forms of text. The nonverbal mode contains modality-specific images for pictures, sounds, actions, and other nonlinguistic objects. From the perspective of the dual coding theory, the uses of concrete and personal examples help comprehension and retention of concepts (Clark & Paivio, 1991). Multimedia Learning Theory proposed by Mayer moves a step forward (Reed, 2006). In Mayer’s model, the multimedia presentation is either presented in words or pictures (Meyer, 2001). Verbal and non-verbal representations are processed both in sensory and working memory. The theory distinguishes sounds and visual images which are separately organized into a verbal model and into a pictorial model. Prior knowledge is also retrieved from long-term memory and integrated with the verbal and pictorial models within working memory.


Both of Dual Coding Theory and Multimedia Learning Theory already received some supports from empirical studies. Learners can recall more information when they listened to stories while viewing relevant pictures, comparing to a group of children who only heard the stories (Levin & Berry, 1980). Students have better retention and transfer when they were exposed under both pictures and words materials (Moreno & Valdez, 2005). Regarding only verbal representations, Alty, Al-Sharrah, and Beacham (2006) found that students learn better in a presentation of both spoken text and diagrams, comparing with a presentation with only written text and a combination with written text and diagrams. Moreover, students have higher learning performance in learning occurs in a voice, text, and picture combination than when just with text and pictures (Truman & Truman, 2006).


Research Questions


Depending on the presentation of videos, YouTube may extend learning beyond text to auditory, visual, or episodic memory, thereby fostering student dual coding of information (Paivio, 1986) and increasing learner retention and transfer of information (Moreno & Valdez, 2005). Given the vast educational opportunities of YouTube videos, it is important to select appropriate videos. The criteria of picking videos should be based on their instructional value and whether they serve learning objectives well. In the meantime, motivating and engaging videos can arouse student interest in a course. However, studies based on Dual Coding Theory and Multimedia Learning Theory hardly consider the impact of different designs on learners’ perceptions. Therefore, the purpose of this study is to understand the relationship between different components within videos and viewers’ perception and motivation with a Web 2.0 platform.




This survey took place between August 2007 and March 2008 using SurveyShare, a Web-based survey tool. Survey participants first answered a few demographic questions. In the middle of the survey, they randomly assigned to watch one of six videos from the list presented in Table 1. After the respondent watched a YouTube video, he or she responded to a few more questions about that video. An assortment of techniques was employed to solicit survey respondents (e.g., Google ad words, iPods drawings, Facebook research group, blogs, online newsletter announcements, and free membership in the survey tool).


Three different types of videos were compared in this study: (1) text only, (2) text, pictures, and voicing, and (3) celebrity advocacy. While 60 total videos were in the original study, we focused this particular study on two videos from each of these three categories or six videos in total.  Table 1 shows the characteristics of each video. The six YouTube videos were selected based on their popularity, content, length, and overall appropriateness to the study. The two text only videos include background music with texts describing various facts about globalization or digital learning. The two videos with a combination of text, pictures, and audio (i.e., voice narration) introduce the ideas of wikis and RSS. Finally, the two celebrity advocacy videos include Michael J. Fox backing stem cell research and Leonardo DiCaprio supporting environmental issues. While celebrity advocacy videos are not found in previous research, these videos have been widely created and distributed. Theoretically, celebrities represent an image that may have more emotional connections with viewers. In addition to the bond with celebrities, the messages that they distributed are considered as auditory verbal mode. Thus, this type of videos is hypothesized to be more engaging than the text only videos.


Table 1

Characteristics of Videos









Text only

Did You know; Shift Happens–Globalization; Information Age




Pay Attention




Text, pictures, and voicing

Wikis in Plain English




Video: RSS in Plain English




Celebrity advocacy

Michael J. Fox




Leonardo DiCaprio's YouTube Message






Our 113 survey participants comprised of 55.8% female and 44.2% male. Participants did represent all age groups (see Table 2) and education levels (see Table 3). However, they tended to be older and more educated.

Table 2

 Frequency of Age Groups

Age groups


Percent (%)

Under 25












Over 55







Table 3

 Frequency of Level of Education

Level of Education


Percent (%)

Less than high school or high school



Some college



2 yrs college



4 yrs college






Masters Plus










Regarding their experience in Web 2.0 applications, only 15.9% have created a YouTube channel and a mere 12.4% have produced a podcast. Not surprisingly, more people have used a social network site (61.1%), created an online photo album (43.4%), and maintained a blog (41.6%). Figure 1 illustrates the various technology applications that our participants do or have done, which represents their capabilities of new technologies. In effect, downloading video (80.5%) was highly familiar to our respondents.


Figure 1. Percentage of the uses of technologies.



Participants were randomly assigned to one of these six videos. Among the 113 respondents, 29 of them watched text-only videos, 41 watched videos with a combination of text, pictures, and voicing, and 43 watched celebrity advocacy videos. In parallel to the video rating system of YouTube, for the rating of three different types of video, a 5-star rating was used. The result shows that the combination were given the highest point (M = 3.78, SD = 1.255), followed by the text only videos (M = 3.62, SD = 1.178), and the celebrity advocacy videos (M = 3.4, SD = 1.072). It is surprising to see that the celebrity advocacy videos were rated even lower than the text only videos. However, a one-way ANOVA analysis on the rating between the three types does not reveal a significant difference.


More specifically, respondents rated whether the video includes the following four positive aspects: (1) creative and originality, (2) current information, (3) funny and humorous, and (4) informative and educational (See Figure 2). More people rated the text only and celebrity advocacy videos as more informative and current, and more selected the combination videos as not only informative but also creative. Each of the four aspects above was either selected (score=1) or not selected (score=0). A post hoc Tukey comparison test shows that the combination videos are significantly more creative (F = 12.993, p<.000) and funnier (F = 18.307, p<.000) than the text only videos and celebrity advocacy videos. In terms of how engaging the videos are, the result shows that the combination videos (M = 2.17, SD = .704, where 1 = No, not engaging,” 2 = “Yes, somewhat engaging,” and 3 = “Yes, extremely engaging) are significantly more engaging than text only videos (M = 2.14, SD = .743,) and the celebrity advocacy videos (M = 1.79, SD = .638), F = 3.776, 2, 110, p < 0.05.


Figure 2. Percentage of positive attributes.



Regarding motivation to watch the videos, participants rated eight different motives: (1) captivating, (2) funny or humorous, (3) heard about or seen this one before, (4) informational or educational, (5) inspiration or motivational, (6) interesting topic, (7) read the reviews, and (8) sharable with others. The first three motives for the text only videos are informational, interesting, captivating (see Figure 2). Informational and interesting are the top two motives across three types. However, funny arises as the third most selected motive for inspiring people to watch the combination videos, and it is significantly selected by more people (p<.01) than other two types. Inspirational is the third motive for viewing the celebrity advocacy videos, and significantly more people (p<.05) chose this motive than those who watched the combination videos. Respondents were then asked if they would recommend this video to others or add it to their favorites. Most people watching text only videos (62.1%) and the combination videos (61%) would recommend the video. A majority of celebrity advocacy viewers (55.8%) would not recommend the video that they watched.


Figure 3. Percentage of motives for watching each type.



The findings indicate that Dual Coding Theory and Multimedia Learning Theory can predict behaviors and reactions to shared online video. The richness of the online media influences not only how well people learn but also on viewer’s perception and motivation to watch. People rated the videos with a combination of text, pictures, and voicing more positive and reported the type more engaging than the text only videos and the celebrity advocacy videos. Text only videos are informative but dull; people rated this type of videos is less sharable. This finding is important for educators wishing to use YouTube and other shared online video in their classes. Such educators should attempt to find multimedia rich videos or those with text only. Perhaps the ultimate form of the video selected will depend on needs, subject matter, and timing within the course.


Interestingly, although respondents reported different positive aspects and different engaging levels to these three types of videos, there was no significant difference among their ratings. This finding implies that the one-dimensional rating on a 5-star scale may not be able to reflect the quality of videos. People may rate on the 5-star scale based on different criterion. There is a need for future research on what the 5-star scale represents and how to design a better rating scheme to identify quality of videos, so that the rating can facilitate instructors and students to select videos based on their educational values.


While the richness of video seemed to be central to usability, the celebrity advocacy videos failed to arouse more positive perceptions than the text only videos. One possible explanation is that the two videos may not have created strong enough emotional connections with our respondents. Future research can study a more well-known celebrity and also control viewer’s understanding and relatedness of the celebrity, in order to clearly identify if a celebrity advocacy video is more engaging than a pure text video.




