SlideShare une entreprise Scribd logo
1  sur  43
A CHILD’S JOB IS TO PLAY, WE SHOULD LET THEM...

Pamela Wong

Research Manager, Direction First




                                                  Page 1 of 1
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Introduction
There appears to be very little consensus and a shortage of research investigating effective
research approaches and question types with children. Direction First has put the standard
approaches to the test, along with using the latest technology from GMI to evaluate more modern
approaches. We wanted to know which questionnaire scales gave better discrimination and to
determine if the use of interactive and gaming scales would improve data quality by improving
engagement.

Direction First has undertaken original research challenging the traditional approach of questioning
children by creating audio and visually interactive game based techniques designed to answer
‘traditional’ objectives. Today’s children live in a digital world and we wanted to test if online
gaming methodologies maintained attention better and led to better quality data. There have been
few studies that compare and measure the discrimination and engagement of different question
types and methods. In this research we explored different question types and scales to understand
which types enabled better discrimination, and ultimately, which question types were more
engaging and provided better quality data.

We compared different question scales on over 500 Australian children between 7 and 10 years old
in an online survey. The research was conducted in three different stages. Each stage contained an
independent sample of participants. Children in each stage rated their liking of the same fifteen
items on one scale before moving onto the next scale until all four scales had been used.

Sensory Food Research on Children
Globally, the children’s market is estimated to be valued at $USD1.3 trillion (Nairn, 2010). Children
have much more autonomy and influence over household purchases than previous generations, to
such an extent that today’s youth are more likely to be described as consumers rather than as
children (Geraci, 2004). The growth in the consumption power of children as consumers and
influencers of family purchases, including household groceries, has been recognised as substantial
business, and this has similarly led to growth in spending to find out what children want, why and
how to best market to them.

The children's market is a notoriously challenging market to research. Whilst children are being
exposed to significantly more information and technology at a younger age, they still tend to have
limited linguistic and numeracy skills, cognitive abilities and short attention spans. Because of this,
they may be able to participate and respond to research in more limited ways unless techniques are
adapted. For this reason, there are specialty companies and departments dedicated to conducting
research with children.

                                                                                                 Page 1 of
A child’s job is to play, we should let them...

         Pamela Wong, Direction First




In food sensory research literature, it has been found that children have difficulty with understanding
and remembering instructions, interpreting abstract symbols or pictures, and completing tasks such
as seriation (ranking in order of magnitude) and attending to multiple aspects, for example, texture
and flavour of a food (Popper and Kroll, 2005, 2003). Younger children tend to focus on a single
aspect of a product, without attending to other aspects (Fliegelman et al, 2004).

Children develop linguistic, literacy and numeracy skills at different rates, and there is such
tremendous variation in such skills among children of the same age (variations up to 4 years) that
some researchers believe school grades may be better determinants of skills/abilities among
children than age alone (C&R Research, 2009). The changing vernacular of children from each
generation is of particular importance to researchers, as it affects the language with which we
communicate with children. Whilst language needs to be familiar, child friendly and suitable to the
age group, children often aspire to be older and look up to children who are older than them, so it is
important to keep things simple enough to understand and be familiar, they must not feel that
everything has been dumbed down for them. This also applies to themes and imagery.

When asking children questions, there is a tendency to respond positively to questions about
whether they like something for different reasons, that is, they are more likely to respond with
positive descriptors than negative (Geraci, 2004). Children tend to rate new products and ideas
positively because they are excited about novelty and not necessarily because they really like the
products. C&R Research addressed this issue by designing an unbalanced scale that made most
responses sound positive, such as a five point scale labelled as, “love it”, “like a lot”, like a little”, it’s
ok”, and “don’t like at all”. This aimed to enable children to distinguish products that they really
loved and those that were just interesting because they were new. Winning concepts were believed
to have clearly surfaced (Fliegelman et al, 2004).

If children don’t like an idea or product because it’s novel, then familiarity may also be a factor that
falsely drives liking. Introducing unfamiliar foods to kids several times has been found to enhance
liking of the product due to the “mere exposure effect” (Birch and Marlin, 1982). This has
implications for researchers and companies introducing new products to market. Most sensory
protocols expose a child only once to a novel food in small portions, however, Ubrick (2002)
proposes that new foods may require repeated testing to assess the true potential of a product.

Popper and Kroll (2005) have emphasised the importance of considering cognitive and social
factors that affect sensory food testing with children. Food preferences are influenced by the
interplay of nature (e.g. innate preference for sweet tastes, aversion to bitter tastes) and nurture
(e.g. parents, peers, and the environment). Peer influences can also have long lasting effects on
children’s food preferences. Children’s food choices may be affected by their desire to exercise
control of themselves and to be viewed as older and more mature. Changing societal influences
                                                                                                   Page 2 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




have led to children maturing earlier, which has resulted in increases in cognitive demands and
processing skills needed to meet these demands (Chambers, 2005). Today, technology continues
to create generations of child consumers that are exposed to more products, ideas and technology
than previous generations. Not only are children growing up with more media and entertainment
options to choose from, but more media is being targeted directly to them than in previous
generations. Multi-tasking while using various forms of technology (e.g. surfing the internet while
watching TV) is enjoyed by most children. This lends support to our belief that children may be
more capable of completing more sophisticated questionnaires than we originally thought.

Questionnaire scales
Researching children requires different procedures routinely applied to adults, including
psychological factors such as gaining confidence, trust and providing motivation, communicating in
child-appropriate language and using appropriate questionnaire scales (Schraidt, 2009).
Specialized research methods, adaptations and techniques have been developed by various firms
conducting research on children. One such firm is the Peyram & Kroll Research Corporation who
has published the bulk of sensory food research on children, and conduct a specialty practice in this
field. The P&K Corporation believe that there is a consensus among the research community that
children (as young as 5 years old) can discriminate, particularly in regard to expressing their degree
of liking, which means they are able to indicate a degree of preference if the correct measuring
techniques are used (Schraidt, 2009). There is little consensus in literature, however, on which are
the most effective techniques, question types and scales when conducting research on children.

Hedonic scales for food acceptance have been used widely for consumer testing. In Australia,
different agencies are using very different questioning types and scales for children, recognising the
fact that children require special questioning techniques. Questionnaire scales used on children
include face scales, star scales, line scales, and normal descriptor type scales amongst others
(Figures 1-4).

Figure 1. Standard 9 point hedonic scale for adults
    1         2         3         4         5       6               7          8          9
   Like                                 Neither                                        Dislike
extremely                               like nor                                      extremely
                                         dislike




                                                                                                Page 3 of
A child’s job is to play, we should let them...

         Pamela Wong, Direction First




Figure 2. Facial scale for children




Figure 3. P&K scale for children

   1           2           3             4            5          6          7          8         9
 Super       Really       Good       Just a        Maybe     Just a        Bad       Really    Super
 good        good                     little       good or    little                  bad       bad
                                     good          maybe      bad
                                                     bad

Figure 4. Star scales for children
                                                                                                 
                                                                                                 
                                                                                                 
                                                                                                 
                                                                                                 
                                                                                                 
                                                                                                 
                                                                                                 
                                                                                                 
  Dislike a lot                                                                           Like a lot 


Facial scales (Figure 2) which were designed to inspire closer attention to the scaling task, have
continued to be popular based on the rationale that children have limited reading and linguistic skills
and cannot understand complex words or phrases. Whilst this scale continues to be used by some
for conducting sensory research, it has been found to be less discriminating than other verbal
scales and may introduce unintended bias. Children tend to respond to pictures based on the
emotion that they show (a smiley face shows a happy person) rather than what they are supposed
to represent (how the food makes you feel). Pictorial facial hedonic scales have been said to be
ambiguous as the face, which is intended to show a degree of dislike can be interpreted by children
as feeling angry, which is an emotion not usually experienced when thinking about food (Popper
and Kroll, 2003; Cooper, 2002).


                                                                                                        Page 4 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




The P&K scale (Figure 3) was a child oriented scale developed specially by Peyram and Kroll to be
used for children who were semi-literate (Popper and Kroll, 2005). This scale was reported to
perform better than the standard hedonic scales and the smiley face scale. Whilst there are many
merits to the application of the face scale, Kroll (1990) found that the face scale was less effective
and less discriminating compared to hedonic ratings on the P&K scale.

No references were found in literature on the star scale (Figure 4), but several specialists in food
sensory research on children have recommended this scale above other scales, and it has been
used by sensory research firms in Australia for many years. It has been said that children
understand the star scale easily, as the stars represent grades or rewards that closely follow the
grades that they are awarded for good work at school. However, it is important when using any
scaling to emphasise that there are no right or wrong answers to help children to answer truthfully
(Fliegelman et al., 2004)

Other researchers believe that because children cannot distinguish shades of meaning, that asking
any type of rating question on a scale is not useful as they do not understand. Simplified, finite
scales such as” “like it”, “it’s ok” or “don’t like it” have been recommended for younger children
(Fliegelman et al, 2004). Pair-wise questionnaire approaches where children chose their favourite
option between 2 choices was reported as effective among very young children (Fliegelman et al,
2004). On a similar basis, a bifurcated approach where children were firstly asked if a food was
“good” or “bad” before being asked if it was “really good” or “really bad” was found to be effective for
children under 7 years old (Kroll, 1990).

Kroll (1990) conducted a comprehensive study on children to compare various sensory
questionnaire scales, scale lengths and the effectiveness of self-administered versus one-on-one
interviews. In this study, the relative merits of the different rating scales that can be used in testing
children were assessed. A standard hedonic scale, a face scale, a child-oriented scale (P&K) and
paired comparison were used with children between 5 and 10 years. Findings showed that the P&K
scale performed better than the standard hedonic or face scale in terms of discrimination. The use
of a shorter scale, under the hypothesis that it would offer simplicity (7 points as opposed to 9
points) was not found to offer any advantages among children. The 9 point scale resulted in better
discrimination and produced more reliable results than the 7 point scale. In one-on-one interviews, it
has been hypothesised that children may respond positively to acquiesce, which provides a
plausible reason for using self-administered questionnaires when possible. Children over 8 years
old performed as well in self administered questionnaires as one-on-one interviews.

Sensory researchers agree that children are different to adults and require tailored research
approaches. Guinard (2001) reported differences found in sensory intensity (strength) thresholds in
adults and children, however, these differences in perception may be more reflective of the
                                                                                              Page 5 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




differences in how children interpret questions and how they use intensity scales, rather than true
physiological differences. This provides further support to the need to conduct more research in this
area.

Respondent engagement
Respondent engagement in online research has been discussed extensively throughout the
research industry. Common metrics of engagement include completion rates, survey time spent,
verbosity of open ended responses, consistency checks, fatigue and satisficing (doing just enough
to complete a task) measures, and the ability of participants to follow instructions accurately. These
measures have been said to be indicators of engagement, which ultimately determine completion
rates, enjoyment and data quality.

SSI research revealed that on average, survey response rates in the UK, France and the
Netherlands collapsed dramatically from 30% in 2004 to 10% in 2009. Research was conducted to
understand the effects of survey length, fatigue and subsequent effects on response quality (Cape,
2009). Fatigue or satisficing behaviour was hypothesised as indicators of participant’s lack of
engagement, so researchers used various measures to investigate reasons for changes in survey
behaviour since 2004. By positioning non-mandatory question scales, SSI measured rates of non-
response. Data on drop-out rates, survey time spent, rates of satisficing, numbers of words typed in
open ended questions, and rates of answering falsely (in order to skip a section) were used as
metrics to explain survey behaviour, and measures of data quality. The research indicated that
there was a critical limit of 20 minutes for surveys, after which engagement and data quality
dropped.

Sleep and Puleston of Engage Research and GMI (2009) examined causes of boredom in online
surveys. Various techniques were tested with the aim of improving data quality, including the use of
visuals/animations, use of alternatives to grid questions, role playing, survey energisers and
improving language, amongst others. Data quality measures were examined including straight-
lining, responses to open ended questions and the ability to follow instructions accurately.
Techniques applied resulted in a successful reduction in drop-out rates, increased time spent and
supply of higher volumes of data (open ended responses, follow on questions) and better quality
data.

A substantial volume of research on improving engagement has been conducted on panels of adult
respondents, who it seems are becoming bored with online surveys. This is a trend seen globally.
So it seems reasonable to believe that for children who have much shorter attention spans, and
more limited cognitive abilities, that traditional “black and white” form surveys and research question
scales for adults are not likely to be highly engaging.

                                                                                                 Page 6 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Today’s youth are becoming more technologically savvy at a much younger age. In countries where
all choices of media are available, children use between 4 and 6 media a day (e.g. TV, radio,
internet and books), and often simultaneously (Solomon and Peters 2005). It is believed that the
ability to follow several topics more or less simultaneously with attention switching from one
medium to another demands quite an advanced level of cognitive and memory coordination.

While there is a consensus that children can provide valuable information for marketers, there is
little consensus on the extent to which survey design needs to be simplified to minimise confusion
and capture accurate information. The hypothesis is that children need simplicity, however many
researchers have found evidence contrary to this belief.

Connecting with the most inter-connected generation of youth is not an easy task. In Australia,
access to media is ubiquitous and over 90% of children aged between 7 and 10 years, spend
between 30 to 60 minutes a day, surfing the internet and using various types of media, often
simultaneously (Direction First online survey, June 2010). This level of multi-tasking by children
means that marketing messages need to be interesting and compelling, and this also applies to
market research on children.

Australia has been described as a “Game Nation” and playing video and computer games (e.g.
Figures 5-6) has become as popular as the internet and television. Whilst playing video games
does not compete for time spent in non-media activities, it competes with use of older media, and is
increasingly becoming a more social activity (Brand et al, 2009). The enormous popularity of games
and high proportion of young gamers under 10 years old gives us reason to believe that there are
certainly more ways in which we need to conduct research on young digital natives to capture their
attention, be more enjoyable, interactive, immersive and engaging.




                                                                                                Page 7 of
A child’s job is to play, we should let them...

       Pamela Wong, Direction First




Figure 5. A single player computer game of the past: Nintendo Tetris




Figure 6. A current massively multi-player online role playing game (MMORPG): Nintendo Wii
The Legend of Zelda




                                                                                    Page 8 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




One of the most successful television based educational-entertainment programs was Sesame
Street, which aired on television in 1969, after substantial academic scrutiny. The creators turned
what was considered a low involvement, not educational and non-interactive medium into an
enormously successful teaching tool (Gladwell, 2001). Inspiration was drawn from educational
psychology, television commercials and comedy sketches to improve numeracy and literacy skills
among preschoolers, which was proven to improve viewers reading and learning skills. Much in the
same way that “edutainment” derived its parentage from educational psychology, advertising and
entertainment to capture childrens’ attention and teach during play time, researchers can draw from
such techniques to make research more appropriate, fun and engaging for children and adults,
whilst collecting better quality data.

Background
In June 2010, Direction First conducted an online study to investigate which question scales work
best on children, and to determine whether interactive and gaming elements improved engagement.

The main objectives of the research were to:
     Test a standard hedonic questionnaire scale with scales designed for children to see which
        of them gave better discrimination power.
     To determine if the use of interactive elements or a combination of interactive gaming scales
        would improve data quality by improving engagement.
     To determine which of the scales and questionnaire formats was the most engaging,
        enjoyable and fun.

Over 500 Australian children aged between 7 and 10 years were invited to participate in the online
study conducted in June 2010. The research was conducted in three different stages with each
stage comprised of an independent sample of participants. Children in each stage rated their liking
of the same fifteen items on one scale before moving onto the next scale until all four scales had
been used. The orders of the scales were randomised in a balanced block design to avoid
positional bias. Parents firstly completed a screening exercise, with children taking over once the
screener was complete to undertake the survey.

‘Warm-up’ questions were asked at the beginning of each new scale to ensure respondents were
aware that they had progressed onto a new scale. Scale experience questions were presented at
the end of each scale to find out how much children enjoyed the experience and how easy it was for
them. Consistency check questions were used to determine whether respondents were engaged
and attentive at the beginning and end of the survey.


                                                                                              Page 9 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Fifteen concepts were selected for the research and included conceptual text descriptions and
images of unbranded common food consumption items, flavours, and unbranded commercial-like
products. Common food consumption images included milk, honey, ice cream, bread and water.
Flavours presented as words included mint, chocolate, cinnamon, peanut butter, and lemon. The
unbranded commercial-like concepts included images of sweet biscuit and savoury snack products
that were relatively similar to existing market products. Concepts were selected so that the range
contained a mix of liked, neutral, and disliked flavours and products to represent a wide hedonic
range. The concepts researched create a context for conducting concept testing as well as
addressing other aspects more likely to be presented in food sensory testing applications.

The 3 stages were as follows:
    Stage 1: Traditional. N=96.
    Stage 2: Interactive. N=167.
    Stage 3: Interactive and gaming. N=248.

The 4 question scales tested in each of the 3 stages included the following:
    9 pt standard hedonic scale
    5 pt smiley face scale
    9 pt P&K scale
    9 pt star scale

Scales read left to right from negative to positive in all surveys. Whilst some researchers use some
of the scales the other way around, we decided to keep it consistent with our current questionnaire
scales to avoid confusion.

Traditional (Stage 1)
The first stage of the research was designed to compare and put to test 4 different scales in their
traditional, ‘black and white’ format. A sample of 100 children evaluated concepts and flavours by
answering questions that appeared as they usually would on paper questionnaires. Essentially, this
was placing a paper questionnaire in an online survey (Figures 7 – 10).




                                                                                               Page 10 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Figure 7. Stage 1 - Standard 9pt hedonic scale




Figure 8. Stage 1 - 5pt Smiley face scale




Figure 9. Stage 1 - 9pt P&K scale




Figure 10. Stage 1 - 9 pt Star scale
                                                                                          
                                                                                          
                                                                                          
                                                                                          
                                                                                          
                                                                                          
                                                                                          
                                                                                          
                                                                                          
  Dislike a lot                                                                    Like a lot 

Interactive (Stage 2)
The second stage introduced the four scales in a graphically enhanced, interactive format, with
sliders and audio - visual scales. The interactive scales were designed by Direction First using flash
technology on GMI’s platform (Figures 11 – 14).



                                                                                                 Page 11 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Figure 11. Stage 2 - Standard 9pt hedonic scale




Figure 12. Stage 2 - 5pt Smiley face scale




Figure 13. Stage 2 - 9pt P&K scale




Figure 14. Stage 2 - 9pt Star scale




                                                          Page 12 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Gaming and interactive (Stage 3)
The third stage repeated the four interactive scales used in Stage 2. Drawing inspiration from the
latest online video games, Direction First designed an avatar-like character that participants were
asked to choose and dress at the beginning of the survey (Figure 15). The character continued
through the survey journey with the participant, in same way that popular role playing video games
are played today. This third stage also introduced a series of popular video game inspired
backgrounds (Figure 16).

Figure 15. Dressing your character




                                                                                              Page 13 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Figure 16. Character in survey




1. Comparison of scales
To ensure that the scales were comparable, we converted the 5 point smiley face scale to a 9 point
scale to be comparable to the other scales. The reason why the 5 point facial scale was used rather
than a 9 point scale was because they have not been commonly used, and after reviewing a 9 point
facial scale, we found the subtle differences in expressions to be too minute and somewhat
confusing. All mean scores were reported on a 9 point scale (Table 1).

Table 1. Comparison of scales
                  Scale Score
 9-point Standard 1         2           3        4         5         6        7         8          9
 9-point P&K      1         2           3        4         5         6        7         8          9
 9 point Star     1         2           3        4         5         6        7         8          9
 5 point Smiley   1                     2                  3                  4                    5

2. Comparison of stages
To ensure that the samples from each of the 3 stages were homogeneous and comparable,
interlocking quotas were used at each stage of the research to obtain even gender and age
balance. Because there were significant differences in age and gender proportions in each of the
stages, the dataset was weighted with each individual stage being balanced towards the target
quotas, with 25% obtained in each cell (Table 2).

                                                                                             Page 14 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Table 2. Weighted proportions in each sample
                        7 to 8yrs       9 to 10yrs
 Male                   25%             25%
 Female                 25%             25%

Which scale performed best?
The performance of the scales were compared using several different approaches to cater for the
different hypotheses surrounding inter-scale and inter-stage differences.
We hypothesised that the widely used star scale would be the most discriminating scale, followed
by the child-oriented P&K scale. We thought that the smiley face and standard hedonic scales
would perform equally in terms of discriminating power. We also believed that the interactive scale
would improve engagement and therefore lead to better quality data and consistency.

The main areas of measurement of scale effectiveness and inter-scale performance were scale
discrimination power and range (proportion) of scale used. We investigated a number of statistical
measures to compare the scales (see Appendix). Prior to scale comparisons, respondents who had
failed any of the consistency checks were removed from the data file.

In Stage 1, where the traditional, “black and white” survey format was used, there was an
opportunity to compare the effectiveness of the scales without the influence of interactive audio-
visual elements or avatars. We examined the results from this survey to determine which of the 4
scales provided the best discriminating power.

Repeated measures analysis of variance with Duncan’s tests were used to compare the scales on
all possible pairs of means. In Stage 1, the overall hedonic ratings showed a very similar pattern
across scales (Figure 17).




                                                                                               Page 15 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Figure 17. Stage 1 Means of the fifteen items tested across each scale




In the traditional, ‘black and white’ survey (Stage 1), the 4 different question scales (Standard, Star,
Smiley Face and P&K) performed similarly, in terms of providing similar patterns in overall hedonic
ratings for the fifteen concepts.




                                                                                                  Page 16 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




On examination of the overall hedonic results when scales were interactive (Stage 2), there was a
very similar pattern across the scales. An interesting pattern emerged whereby the P&K scale
tended to record slightly higher scores than the other scales. The same pattern was observed when
interactive gaming elements (Stage 3) were used. Furthermore, the Standard 9 point scale was
also shown to have a tendency toward higher ratings than the Star and Smiley Face scales.

Comparing discriminating power of the scales in the traditional survey (Stage 1), a very slight
advantage went to the Standard 9 pt hedonic scale. Despite having fewer scale points, the Smiley
face scale showed a similar level of performance as the other scales. When interactive elements
were used in Stage 2, scale discrimination was observed to drop overall, and no single scale
performed better. The P&K scale performed marginally better than the other scales in the
interactive gaming survey (Stage 3).

In terms of scale range or proportion used, a large proportion of the scales were used, and there
were no significant differences observed between the scales in the traditional questionnaire (Stage
1). Results were similarly observed when scales were interactive (Stage 2). When interactive
gaming elements were used (Stage 3), a significantly larger proportion of the Star and Smiley Face
scales were used compared to the P&K scale (Table 3).

Table 3. Proportion of scale used across the stages
                              Stage 1              Stage 2                  Stage 3
                              N=96                 N=167                    N=248
Scale                         Proportion of scale used
Star 9pt                      74%                  77%                      74%
Smiley Face 5 pt              77%                  78%                      76%
Standard 9pt                  76%                  76%                      71%
P&K 9pt                       72%                  73%                      67%

Further analysis of the scales revealed that when the15 hedonic scores were averaged, and the
scales compared on average performance, no significant differences were observed.

The inter-stage comparison of each individual scale revealed that there were also no significant
differences in the performance of the individual scales between the stages. This suggests that the
interactive and gaming elements did not affect the research outcome significantly.




                                                                                              Page 17 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Which questionnaire was more engaging?
To measure respondent engagement, 5-point Likert scales were used to obtain feedback from
participants on how easy and how much fun they had on each of the scales in the study. The items
on the ease of use scale ranged from 1=‘Hard’ to 5=‘Easy’. The items on the fun scale ranged from
1=‘No fun at all’ to 5=‘Lots of fun’. Both scales had numerical values assigned to all points on the
scale, and so were treated as scale variables for the purpose of analysis.

To compare the performance of the various stages, ability to follow instructions and response
consistency were measured through a series of question checks that were repeated at the
beginning and end of the survey. This involved clicking at selected points on scales and indicating
the number of brothers and sisters the participants had.

Time to complete the surveys was also recorded and compared at each stage.

Which scale was easiest to use?
All of the questionnaire scales used at each stage were seen as easy to use (mean scores of over 4
out of 5) (Table 4).

In the traditional survey (Stage 1), the Smiley face scale was considered as significantly easier to
use than the Standard and Star scales, but not significantly easier than the P&K scale. The P&K
scale was significantly easier to use than the Standard 9pt scale.

When participants used interactive scales (Stage 2), the Smiley face and P&K scales were
considered as slightly (directionally) easier to use than the Standard scale.

With gaming and interactive elements activated (Stage 3), all scales were considered as similarly
easy to use and there were no significant differences.




                                                                                                 Page 18 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Table 4. Mean scores on ease of use
 How EASY was it to answer
 the questions about the         Stage 1             Stage 2                  Stage 3
 flavours and foods on this      N=96                N=167                    N=248
 scale?
 Scale                           Mean/9
 Standard 9pt                    4.5                 4.5                      4.5
 Star 9pt                        4.6                 4.6                      4.6
 P&K 9pt                         4.7                 4.7                      4.6
 Smiley Face 5pt                 4.9                 4.8                      4.7
No significant differences were observed by age and gender

Which scale was fun to use?
All of the scales across all stages were seen as fun to use, with all obtaining mean scores of over 4
out of 5 (Table 5).

In the traditional survey (Stage 1), the Smiley scale was directionally more fun than the Standard
scale (i.e. approaching a significant level).

When interactive survey elements were used (Stage 2), the Smiley Face and Star scales were both
seen as significantly more fun to use than the Standard.

With interactive-gaming elements (Stage 3), the Smiley Face scale was viewed as significantly
more fun to use than the Standard and P&K scales. The Star scale was considered as significantly
more fun than the Standard.




                                                                                               Page 19 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Table 5. Mean scores for rating of fun
 How much FUN did you
 have answering the
                               Stage 1             Stage 2               Stage 3
 questions about the
                               N=96                N=167                 N=248
 flavours and foods on this
 scale?
 Scale                         Mean/9
 Standard 9pt                  4.2                 4.1                   4.1
 Star 9pt                      4.4                 4.5                   4.6
 P&K 9pt                       4.4                 4.3                   4.3
 Smiley Face 5pt               4.6                 4.5                   4.7
No significant differences were observed across age and gender in Stage 2 and 3. In Stage 1,
younger males did not have as much fun on the 9-point hedonic and P&K scales as their older
counterparts.

Response consistency and following instructions
The ability to answer consistently and follow instructions is a measure of respondent engagement,
as it determines whether a participant is paying attention and is engaged in the task.

Participants were asked to indicate how many brothers and how many sisters they had at 2 different
points in each survey stage (Figure 18). This was used because it was a question that was
relatively easy for most children to answer, didn’t require an opinion (unchanging), and therefore
should have remained constant. The questions are shown below:




                                                                                             Page 20 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Figure 18. Consistency question on number of siblings




The results below (Table 6) indicate that there were very high levels of consistency when the sibling
question was used and Chi-squared testing indicated that there were no significant differences
across the different stages on this measure (          ).

Table 6. Proportion of respondents making consistency errors when asked about number of
siblings
                        Stage 1             Stage 2               Stage 3
                        N=96                N=167                 N=248
 No mismatch            97%                 93%                   95%
 1 mismatch             2%                  7%                    5%
 Both mismatch          1%                  0%                    0%
 Total                  100%                100%                  100%

Participants were also asked to select a specific point on a scale at 2 different points in each survey
stage. The second consistency check was used to check if participants were paying attention and if
they were able to follow simple instructions at each stage. The question is shown below (Figure 19):




                                                                                                 Page 21 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Figure 19. Consistency question on following instructions




In Table 7 below, Chi squared testing revealed that there were significant differences in the
proportions of consistency errors made by participants across the 3 stages (            ).

Table 7. Proportion of respondents making consistency errors when following simple
instructions
                                       Stage 1         Stage 2           Stage 3
                                       N=96            N=167             N=248
 Neither wrong                         94%             79%               81%
 Both wrong                            3%              5%                10%
 First check wrong, second check right 3%              14%               9%
 First check right, second check wrong 0%              2%                0%
 Total                                 100%            100%              100%
Pairwise comparisons (p=0.05)

Further analysis revealed that one in ten participants in the gaming stage (Stage 3) got both
consistency checks incorrect, a significantly higher proportion than those in either the first or second
stages. 14% of those in Stage 2 got the first check wrong.

It is possible that interactive and gaming elements distracted participants from completing simple
tasks. Whilst a higher proportion of participants failed to follow the simple instructions properly in
Stage 2 and 3, they still managed to consistently answer questions about themselves.




                                                                                                   Page 22 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Time taken to complete survey
One common metric often used to determine data quality and respondent engagement has been
time spent in the survey. It has been said that spending too short a time or spending too much time
are both indicators of inattentiveness, resulting from speeding or distraction.

The time taken to complete the survey was similar across each stage and there were no significant
differences observed (Table 8). Note that this time was calculated based on time from frame to
frame so excluded the building of the character in Stage 3 (for fairer comparison).

Table 8. Time taken to complete survey by stage
                Time Taken
 Stage
                (HH:MM:SS)
 Stage 1        00:16:52
 Stage 2        00:15:24
 Stage 3        00:15:18
Data that were 2 standard deviations away from the mean were removed for analysis.

We thought that our participants would spend more time on surveys where interactive elements
were present, and even more time when gaming elements were activated. However, the results
show that there were no differences, and even very slightly (not significant) less time spent where
interactive and gaming elements were present.

Conclusions
In terms of inter-scale comparison, all 4 questionnaire scales (standard, star, smiley face and P&K)
presented in the traditional, ‘black and white’ survey format (Stage 1) performed similarly, in terms
of providing similar patterns in overall hedonic ratings for the fifteen concepts. It was observed that
the Standard scale offered a slight advantage, as it had marginally more discriminating power.
However, when the hedonic scores were averaged across all products and the individual scales
compared, there were no differences. This suggests that all scales performed equally and no scale
performed better in terms of discriminating power.

In the interactive survey (Stage 2), discriminating power of all scales appeared lower overall,
suggesting some level of interference and no single scale stood out from the rest. In the interactive-
gaming survey (Stage 3), discriminating power was on average on par with the traditional “black
and white” survey, with the P&K scale performing marginally better than the others.

The inter-stage comparison of each individual scale revealed that there were also no significant
differences in the performance of the individual scales between the stages. This suggests that the
                                                                                               Page 23 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




interactive and gaming elements did not affect the scales. However, inter-stage effects were
observed in consistency checks. A significantly higher proportion of respondents in the traditional
survey (Stage 1) followed instructions correctly than in the interactive (Stage 2) and interactive-
gaming (Stage 3) surveys. When asked questions about themselves (i.e. number of siblings), most
children answered consistently at all stages. Perhaps children don’t like being told what to do when
they are playing?

Overall, all 4 question scales at all stages were seen as easy and fun to use. In terms of ease of
use, the Standard scale was considered as less easy to use when presented in a traditional and
interactive survey. With interactive - gaming elements, all the scales were seen as similarly easy to
use, suggesting that gaming elements made them somewhat easier. There was consensus that the
Standard scale was less fun to use, and this was observed across all stages.

The addition of interactive and gaming elements neither enhanced nor reduced the level of
enjoyment. Because enjoyment and ease of use scores were positive and high across all stages, a
few questions arose:
     Did the children not want to admit that the task was difficult because they aspire to do things
       that older children can do easily?
     Were these results affected by the tendency for children acquiesce when asked if they had
       fun, even when they had not?
     Was it that we thought some of the scales were more fun than others, when in fact, from a
       child’s perspective, they were not quite as fun to them as we expected?

The standard scale offered slightly better discriminating power, but was not as easy and fun to use
compared to other scales. We would suggest that this scale could lead to boredom and be less
engaging when conducting research with children.

The P&K and star scales, designed originally for children, both performed well and similarly in
discriminating power. Both were considered as easy to use and fun. The P&K scale rated as slightly
easier to use in traditional format (Stage 1), whilst the Star scale was considered as slightly more
fun when used with interactive and gaming formats (Stage 2 and 3). Because both scales perform
equally well on one aspect or another, it would be important in moving forward that we test their
performance on other aspects, such as predicting real behaviour. Both of these scales remain more
or less suitable for research with children, as we have not yet found a better scale. With the star
scale, we would recommend that children should be reminded that the stars do not reflect right or
wrong answers, so giving lower scores to something does not mean they will not be rewarded and
vice versa. The P&K scale is not widely used in Australia and may be more suitable because the
language is more child-friendly, however, it should be noted that language differences between


                                                                                               Page 24 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




American and Australian children could mean that this scale needs to be adapted to the vernacular
of Australian kids. Language may be another area to investigate in future.

The smiley face scale performed on par with the other scales in terms of discrimination power, and
was seen as significantly more fun and easy to use than the Standard scale. This scale however,
has been criticised in sensory food literature because it is considered as ambiguous and may lead
to misinterpretation due to its emotional element. Whilst some would say that the advantages of this
scale would not be likely to outweigh its uncertainties, there may be more merits to this scale than
currently recognised. And it is possible that what some believe to be the misgivings of the scale, are
in fact the strengths. There is currently a convergence of thought surrounding the role of emotions
and decision making. Damasio (1994) in his book, “Emotion, Reason and the Human Brain”
suggests that rationality stems from emotion, and that emotion stems from bodily senses. This
theory is now informing developments of thinking in biometric and neuroscience research.

Next Steps
This research prompts Direction First to consider the potential role of biometrics and emotions in
sensory food research in future. Traditional sensory research relies heavily on self-reported data for
measuring hedonics. However, because self-reported data is often obscured by experience and
conscious thought, it may not provide enough insight into true responses and behaviour. Biometrics
measures involuntary physiological responses such as heart rate, respiration patterns, perspiration
and body movements. Biometrics such as those used by Bryant (2009) and Zeinstra (2009) utilise
cutting edge technology to interpret what are considered as involuntary and therefore, unobscured,
“real” and “true” measures of appeal, enjoyment, engagement, and attention. Currently, these
technologies are not widely available; however there may be forthcoming common measures in
emotions and sensory food research in the future. The ethics of the use of biometrics with children
will require industry discussion.

In light of this research, the general performance of the different scales, response consistency and
the suggestion that gaming elements did not significantly contribute to scale discrimination, we
suggest caution in moving scales strongly in this direction without consideration for the whole
research approach taken with children. Children played with us in their answers when we
established a more playful environment, and perhaps this is not what we want in research.




                                                                                                Page 25 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




References
Balogh,M., 2002.Cracking the kids marketing code, B&T, 2002
[http://www.bandt.com.au/articles/03/0C00FC03.asp, accessed 22.01.10]

Brand, J.,Borchard, J. And Holmes, K. 2009. Interactive Australia, 2009.National Research
prepared by Bond University for the Interactive Entertainment Association of Australia.

Bryant, J. A., Weinberg, L., Levine, B., Jacobs, D. and Massoudian, M., 2009. Inspiring Change:
Innovative Methods and Integrated Advertising. Online Research, Part 1, ESOMAR 2009.

Cape, P. 2009. Questionnaire Length, fatigue Effects and Response Quality Revisited. Survey
Sampling International.

Chambers, E IV. 2005. Conducting Sensory Research with Children: A Commentary. J. Sensory
Studies. 20: 90-92.

Cooper, H., 2002. Designing successful diagnostic scales for children. Presented at Ann. Mtg.
Institute o f Food Technologists, Anaheim, CA, June 15-19.

Covey, N., 2007. Connected Kids: Trends in Youth Gaming. ARF Youth Council, 21 August, 2007.
The Nielsen Company.

Cranmer, S. and Ulicsak, M., 2010. Gaming in Families, Final Report, Futurelab, United Kingdom.

C&R Research, 2009. YouthBeat, KidzBeat Magazine Winter.

Damasio, 1994. Emotion, Reason and the Human Brain.

Fliegelman, A., Metx, P., and McIlrath, M., 2004. The ABC’s of Conducting Effective Market
Research with Kids. C&R Research. Published in Media Research Club of Chicago (MRCC), June
2004.

Franco, C., 2010. Popular Online Games: new insight from European Research, WARC

Geraci, J.C. 2004. What Do Youth Marketers Think About Selling to Kids? Harris Interactive.
Published in Media Research Club of Chicago (MRCC), June 2004.

Gladwell, M., 2001. The Tipping Point, Abacus, London, UK.
                                                                                                Page 26 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Guinard, J.X., 2001. Sensory and consumer testing with children. Trends in Food Science and
Technology, 11(8), 273–283.

Kroll, B. J., 1990. Evaluating rating scales for sensory testing with young children. Food
Technology, 44, 78–86.

Nairn, A., 2009. Protection or Participation? Getting research ethics right for children in the digital
age, ESOMAR Congress.

Lawless, H. T., Popper, R. And Kroll, B. J. 2010. A comparison of the labelled magnitude (LAM)
scale, an 11-point category scale and the traditional 9-point hedonic scale. Food Quality and
Preference 21 (2010): 4-12.

Popper, R., & Kroll, J. J., 2005. Issues and viewpoints conducting sensory research with children.
Journal of sensory studies, 20(1), 75–87. Also published in Food Technology, May 2003 Vol 57:5,
60-65.

Popper, R. And Kroll, J.J. 2003. Conducting Sensory Research with Children. Food Technology,
Vol. 57:5, 60-65.

Schraidt, M.F., 2009. Testing with Children: Getting Reliable Information from Kids. Peyram & Kroll
Research Corporation (http://www.pk-research.com/paper_15.html, accessed April, 2010)

Sleep, D. And Puleston, J., 2009. Leveraging interactive techniques to engage online respondents,
Engage Research and GMI Interactive.

Solomon, D. and Peters, J., 2005. Resolving Issues in children’s research. Young Consumers,
Quarter 4, World Advertising Research Center, 68-73.

Ubrick, B. (2002). Kids have great taste: An update to sensory work with children. Presented at
Ann. Mtg. Institute of Food Technologists, Anaheim, CA, June 15-19.

Zeinstra, G.G, Koelen, M.A., Colindres, D., Kok, F.J.. de Graaf, C., 2009. Facial expressions in
school-aged children are a good indicator of ‘dislikes’, but not of ‘likes’. Food Quality and
Preference 20 (2009): 620–624.




                                                                                                   Page 27 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Appendix

           1.1 Materials and Methods

The study was conducted in three separate stages. Each stage contained an independent sample
of respondents. The core materials and methods used were the same in each of the three stages.
The main contrasts across each of the stages were as follows:
     The first experiment was designed to compare the four scales as they exist in their standard
       ‘black and white’ form.
     The second experiment introduced the four scales in a graphically enhanced ‘interactive’
       format, with the scales providing light and sound feedback to respondents.
     The third experiment repeated the four interactive scales with the introduction of an avatar
       like character that respondents designed at the beginning of the survey and which then
       operated as a guide taking them through the survey. This stage also introduced a series of
       background images that the guide was placed within.

                      1.1.1   Samples

The samples were presented as conceptual text descriptions and images of common food
consumption items, flavours, and commercial-like products.

The common food consumption items included milk, honey, ice cream, bread and water. The
flavours included the taste of mint, chocolate, cinnamon, peanut butter, and lemon. The commercial
like products were made up concepts of a mix of sweet biscuit and savoury snack products that
were relatively similar to some existing market products.

This mix of different foods, flavours and products was used to ensure that the scales were tested
across the different levels of food consumption – from basic flavours, to common foodstuffs, to
commercial products. This was to test the scales in contexts relating not only to concept testing,
but also on aspects more likely to be presented in sensory testing applications. The products were
also selected to contain a mix of liked, neutral, and disliked flavours and products, and so represent
a wider hedonic range.




                                                                                                Page 28 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




                      1.1.2    Measurement Instruments

The four scales were tested in each of the stages:
   1. 9 pt star scale
   2. 5 pt smiley face scale
   3. 9 pt hedonic scale
   4. 9 pt P&K scale

                      1.1.3    Procedure

Each participant rated their liking of the fifteen items on one scale, before moving onto the next
scale, until all four scales had been used to rate the items. The order of the scales was randomised
in a balanced block design across participants. Scale experience questions were presented at the
end of each scale, and ‘warm-up’ questions were used at the beginning of each new scale to
ensure respondents were aware that they had transitioned onto a new scale.

           1.2 Statistical Analysis

                      1.2.1    Making the samples from each stage comparable

To ensure samples from each of the stages of research were homogenous in terms of age and
gender, an interlocking quota was used in each stage of the research with an even balance of age
and gender as follows:

Table 1.
           7 to 8    9 to 10
Male       25%       25%
Female     25%       25%

At the completion of surveying (Chi-squared) testing indicated that there were significant differences
in the proportions across the stages (        ). The dataset was therefore weighted with each
individual stage being balanced towards the target quota’s, with 25% obtained in each cell.




                                                                                                Page 29 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




                      1.2.2   Making the scales comparable

As three of the four scales had nine data points, the smiley face scale of five points was converted
to a 9-point scale with 1=1, 2=3, 3=5 and so on, as shown in the following table.
Table 2.
                    Scale Score
 9-point
 hedonic            1         2        3          4         5         6          7         8        9
 9-point P&K        1         2        3          4         5         6          7         8        9
 Star (9 point)     1         2        3          4         5         6          7         8        9
 Smiley             1                  2                    3                    4                  5

All reported mean scores on the items tested are therefore on a 9 point scale.

                      1.2.3   How the scales were compared

The performances of the scales were compared using several different approaches to cater for the
different hypotheses surrounding inter-scale and inter-stage differences.
The main areas of measurement were response consistency, respondent engagement, scale
discrimination power, and range of scale used.

To compare the performance of the various stages, response consistency was measured through a
series of question checks that were repeated at the beginning and end of the survey. These
involved indicating the number of brothers and sisters participants had, and clicking at selected
points on scales.

To measure respondent engagement, 5-point likert scales were used to obtain feedback from
respondents on how easy and how much fun they had on each of the scales in the study. The items
on the ease of use scale ranged from 1 ‘Hard’ to 5 ‘Easy’. The items on the fun scale ranged from
1 ‘No fun at all’ to 5 ‘Lots of fun’. Both scales had numerical values assigned to all points on the
scale, and so were treated as scale variables for the purpose of analysis.

The F-Ratio from Anova represents the ratio of systematic to unsystematic variance, or signal to
noise. Consequently, it has been used as a measure of scales’ ability to differentiate products
(Lawless, Popper, & Kroll, 2010). The number of differences between means in post hoc
comparisons is also a common measure of product differentiation. Consequently, the product F-
ratio from Anova and number of different means by Duncan’s multiple range test were used as
measures of product discrimination. Where variances are uneven, non parametric alternatives
were used.
                                                                                               Page 30 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Prior to scale comparisons based on F-Ratios, respondents who had failed any of the consistency
checks were removed from the data file. Because there were different age and gender sample
sizes in each of the stages, weighting was applied to make these consistent across stages.
The range of the scale used was calculated as the highest minus the lowest rating given across all
fifteen attributes, and then divided by the total scale range.

                      1.1.1   Participants

In the first experiment one hundred participants with children aged from seven to ten years were
recruited to a web survey. Parents of children completed a screening exercise, with children taking
over once the screener was complete to undertake the survey.

            1.2 Results – Stage 1

                      1.2.1   Respondent Engagement

In some cases Levene’s test indicated that the variances associated with the scales were not even,
so a non-parametric F-test, Browne-Forsyth, was used. For paired comparison post hoc analysis in
such cases the Games-Howell test was used.

                                 1.2.1.1 How easy were the scales to use?

Browne-Forsyth revealed a significant difference among the scales. F(3,378)=6.4, p<.01. Games-
Howell paired comparison tests indicated that the Smiley scale was significantly easier to use than
the Standard 9pt and Star scales, but not significantly better than the P&K scale; while the P&K
scale was significantly easier to use than the Standard 9pt scale (P<.05 in all cases).

Table 3.
 Scale       Avg.
 Standard    4.9
 Star        4.6
 Super       4.7
 Smiley      4.9




                                                                                              Page 31 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




No significant differences were observed by age and gender.
Table 4.
                                          Age & Gender
How EASY was it to answer the
questions about the flavours and foods    7-8       7-8       9-10     9-10
on this scale?                            male      female    male     female
Stage1Star                                4.5       4.6       4.7      4.5
Stage1Smiley                              5.0       4.9       4.9      4.8
Stage19pt                                 4.2       4.7       4.7      4.4
Stage1P&K                                 4.5       4.7       4.7      4.9

                                1.2.1.2 How much fun were the scales to use?

Browne-Forsyth did not reveal a significant difference among the scales F(3,455)=2.2,p=.082. The
Welch F-Ratio almost reached a significant level F(3,262)=2.4, p=.067. Games-Howell paired
comparison tests revealed a ‘trend’ that the smiley scale was more fun than the standard 9-point
scale (p=.06).

Table 5.
 Scale      Avg.
 Standard   4.2
 Star       4.4
 Super      4.4
 Smiley     4.6

While no significant different differences were observed in terms of ease of use, there was a
directional indication that younger males did not have as much fun on the 9-point hedonic and P&K
scales as their older counterparts, as indicated by the Welch test, F(3,61)=2.6, p=.06.

Table 6.
                                          Age and Gender
How much FUN did you have
                                          7-8       7-8       9-10     9-10
answering the questions about the
                                          male      female    male     female
flavours and foods on this scale?
Stage1Star                                4.1       4.4       4.7      4.3
Stage1Smiley                              4.5       4.6       4.6      4.6
Stage19pt                                 3.9       4.3       4.6      4.1
Stage1P&K                                 4.0       4.4       4.7      4.4

                                                                                            Page 32 of
A child’s job is to play, we should let them...

         Pamela Wong, Direction First




                        1.2.2   Scale Discrimination Power

Repeated measures analysis of variance with Duncan’s tests were used to compare the scales all
possible pairs of means. The overall hedonic ratings showed a very similar pattern across scales.

Figure 1. Stage 1 Means of the fifteen items tested across each scale




*Sorted in descending order of product liking

A slight advantage went to the Standard 9pt Hedonic scale in terms of identifying a greater number
of significant differences. Despite having fewer scale points, the Smiley scale showed a similar
level of performance as the other scales.

Table 7. F-Ratios and Significant Duncan tests
                            Stage 1
                                     No. of sig Duncan
 Scale                      F-Ratio
                                     tests
 Star                       37.4     85 / 105
 Smiley                     41.6     85 / 105
 Standard                   41.0     87 / 105
 Super                      40.3     85 / 105
                                                                                             Page 33 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Df for stage 1 were (14,1784) for all F-Ratios. All F’s were significant at p<.001.
Duncan’s tests were performed on 105 possible pairs of means.

Proportion of Scale used

The proportion of the scale used was treated as a mean score (from 0-1). No significant differences
were observed.

Table 8.
                                                     Proportion
                    Scale                            of scale
                                                     used
                    Stage1Star                       74%
                    Stage1Smiley                     77%
                    Stage19pt                        76%
                    Stage1P&K                        72%

            1.1 Results – Stage 2

                       1.1.1   Respondent Engagement

                                    1.1.1.1 How easy were the scales to use?
Levene’s test indicated that the variances associated with the scales were not even, so a non-
parametric F-test, Browne-Forsyth, was used. For paired comparison post hoc analysis the
Games-Howell test was used.
All of the scales were felt to be easy to use – all obtaining a mean score of over 4 out of 5.
Browne-Forsyth revealed a just significant effect, F(3,429)=2.6, p=.05, and post-hoc tests revealed
a directional difference suggesting the Smiley scale to be easier to use than the 9-point Hedonic
scale (p=.07).

Table 9.
 Scale       Avg.
 Standard    4.5
 Star        4.6
 Super       4.7
 Smiley      4.8



                                                                                              Page 34 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




No significant differences were observed across age and gender.
Table 10.
                                           Age & Gender
 How EASY was it to answer the
 questions about the flavours and foods    7-8      7-8      9-10          9-10
 on this scale?                            male     female male            female
 Stage2Star                                4.5      4.5      4.9           4.5
 Stage2Smiley                              4.9      4.7      4.7           4.8
 Stage29pt                                 4.4      4.5      4.7           4.5
 Stage2P&K                                 4.7      4.9      4.7           4.6

                                  1.1.1.2 How much fun were the scales to use?

All of the scales were fun for the respondents to use – all obtaining a mean score of over 4 out of 5.
A significant difference between the scales was found with Browne-Forsyth, F(3,464)=4.5, p<.01.
Post-hoc tests revealed the Star and Smiley scales to be significantly more fun to use than the 9-
point Hedonic scale (p<.05).

Table 11.
 Scale      Avg.
 Standard   4.1
 Super      4.3
 Star       4.5
 Smiley     4.5

No significant differences were observed across age and gender.
Table 12.
                                           Age and Gender
 How much FUN did you have
                                           7-8      7-8      9-10          9-10
 answering the questions about the
                                           male     female male            female
 flavours and foods on this scale?
 Stage2Star                                4.7      4.3      4.6           4.3
 Stage2Smiley                              4.4      4.5      4.6           4.6
 Stage29pt                                 4.1      4.0      4.3           4.1
 Stage2P&K                                 4.2      4.1      4.5           4.4


                                                                                                Page 35 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




                       1.1.2   Scale Discrimination Power

Repeated measures analysis of variance with Duncan’s tests were used to compare the scales all
possible pairs of means. The overall hedonic ratings showed a very similar pattern across scales,
however, in stage 2 a pattern emerged whereby the P&K scale tended to record higher scores than
the other scales.

Figure 2. Stage 2 Means of the fifteen items tested across each scale




*Sorted in descending order of product liking

Table 13. F-Ratios and Significant Duncan tests
                              Stage 2
                                         No. of sig Duncan
 Scale                        F-Ratio
                                         tests
 Star                         29.4       79 / 105
 Smiley                       30.6       79 / 105
 Standard                     30.4       79 / 105
 Super                        30.0       80 / 105
Df for stage 1 were (14,1784) for all F-Ratios. All F’s were significant at p<.001.
                                                                                            Page 36 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Duncan’s tests were performed on 105 possible pairs of means.

                      1.1.3   Proportion of Scale Used

The proportion of the scales used in Stage 2 were not significantly different, ranging between 73
and 78 percent.

Table 14.
                                                    Proportion
                    Scale                           of scale
                                                    used
                    Stage2Star                      77%
                    Stage2Smiley                    78%
                    Stage29pt                       76%
                    Stage2P&K                       73%

            1.2 Results – Stage 3

                      1.2.1   Respondent Engagement

                                   1.2.1.1 How easy were the scales to use?

Levene’s test indicated that the variances associated with the scales were not even, so a non-
parametric F-test, Browne-Forsyth, was used. For paired comparison post hoc analysis the
Games-Howell test was used.
All of the scales were felt to be very easy to use – all obtaining a mean score of 4.5 or more out of
5. No significant differences between the scales were recorded, nor were any differences observed
across age and gender.

Table 15.
 Scale       Avg.
 Standard    4.5
 Super       4.6
 Star        4.6
 Smiley      4.7




                                                                                               Page 37 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Table 16.
                                            Age & Gender
How EASY was it to answer the
questions about the flavours and foods      7-8       7-8       9-10       9-10
on this scale?                              male      female    male       female
Stage3Star                                  4.7       4.6       4.7        4.6
Stage3Smiley                                4.6       4.7       4.8        4.7
Stage39pt                                   4.5       4.6       4.3        4.5
Stage3P&K                                   4.6       4.7       4.4        4.7

                                  1.2.1.2 How much fun were the scales to use?

All of the scales were fun for the respondents to use – all obtaining a mean score of over 4 out of 5.
Levene’s test indicated that the variances associated with the scales were not even, so a non-
parametric F-test, Browne-Forsyth, was used. For paired comparison post hoc analysis the
Games-Howell test was used.

Significant differences across the scales was identified, F(3,416)=6.6, p<.01. Post-Hoc tests
revealed that the Smiley scale was significantly more fun to use than the 9-point Hedonic and P&K
scales; and that the Star scale was significantly more fun than the 9-point Hedonic (p<.05 in all
cases).

Table 17.
 Scale      Avg.
 Standard   4.1
 Super      4.3
 Star       4.6
 Smiley     4.7




                                                                                                Page 38 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




No significant differences were observed across age and gender.
Table 18.
                                           Age and Gender
How much FUN did you have
                                           7-8      7-8       9-10      9-10
answering the questions about the
                                           male     female    male      female
flavours and foods on this scale?
Stage3Star                                 4.7      4.5       4.5       4.5
Stage3Smiley                               4.7      4.7       4.6       4.6
Stage39pt                                  4.2      4.3       4.0       4.0
Stage3P&K                                  4.4      4.4       4.1       4.2

                      1.2.2   Scale Discrimination Power

Repeated measures analysis of variance with Duncan’s tests were used to compare the scales all
possible pairs of means. The overall hedonic ratings showed a very similar pattern across scales.
The pattern observed in stage 2 was repeated in stage 3 where the P&K scale tended to record
higher scores than the other scales. In this stage, the 9-point Hedonic scale was also shown to
have a tendency toward higher ratings than the Star and Smiley scales.

Figure 3. Stage 3 Means of the fifteen items tested across each scale




*Sorted in descending order of product liking
                                                                                             Page 39 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Table 19. F-Ratios and Significant Duncan tests
                              Stage 3
                                         No. of sig Duncan
 Scale                        F-Ratio
                                         tests
 Star                         44.0       82 / 105
 Smiley                       43.8       83 / 105
 Standard                     39.7       83 / 105
 Super                        37.3       85 / 105
Df for stage 1 were (14,1784) for all F-Ratios. All F’s were significant at p<.001.
Duncan’s tests were performed on 105 possible pairs of means.

                       1.2.3   Proportion of Scale Used

A significantly larger proportion of the Star and Smiley scales were used than the P&K scale,
F(3,475)=4.3, p<.01.

Table 20.
                                                     Proportion
                   Scale                             of scale
                                                     used
                   Stage3Star                        74%
                   Stage3Smiley                      76%
                   Stage39pt                         71%
                   Stage3P&K                         67%


            1.3 Results – Comparisons of Stages

                       1.3.1   Response Consistency Measures – Clicking specified numbers
                               on a scale

Chi-squared testing indicated that there were significant differences in the proportions of
consistency errors across the stages (           ).
Follow up pairwise comparisons (p<.05) revealed that the first stage (containing standard scales)
contained a significantly higher proportion of respondents who made no errors compared to both
the second and third stages.

                                                                                                Page 40 of
A child’s job is to play, we should let them...

        Pamela Wong, Direction First




Table 21. Consistency check I
                                             Stage       Stage    Stage
                                             1           2        3
 Consistency Checks                          N=96        N=167 N=248
 Neither wrong                               94%         79%      81%
 Both wrong                                  3%          5%       10%
 First check wrong, second check right       3%          14%      9%
 First check right, second check wrong       0%          2%       0%
 Total                                       100%        100%     100%
Proportion of respondents making consistency errors
One in ten respondents got both of the consistency checks incorrect in the third stage, a
significantly higher proportion than in either the first or second stages (p<.05).

                      1.3.2   Consistency Measures – Number of brothers and sisters

The consistency measure that assessed the number of brothers and sisters a respondent had at
the beginning and then again at the end of the survey revealed very high levels of consistency.
Chi-squared testing indicated that there were no significant differences across the different stages
on this consistency measure (          ).

Table 22. Consistency check II
                       Stage       Stage      Stage
                       1           2          3
 No mismatch           97%         93%        95%
 1 mismatch            2%          7%         5%
 Both mismatch         1%          0%         0%

                      1.3.3   Time Taken

Time taken to complete the survey was similar across each stage, with no significant differences
observed. Data that were two standard deviations away from the mean were removed for analysis.
Note: this time is calculated based on time from frame to frame so excludes the building of the
character in stage 3 (for fairer comparison).




                                                                                                Page 41 of
A child’s job is to play, we should let them...

       Pamela Wong, Direction First




Table 23. Time taken to complete survey by stage
               Time Taken
 Stage
               (HH:MM:SS)
 Stage 1       00:16:52
 Stage 2       00:15:24
 Stage 3       00:15:18




                                                         Page 42 of

Contenu connexe

Similaire à A child's job is to play, we should let them (Final Paper

Understanding How 'Screen Time' Affects Learning
Understanding How 'Screen Time' Affects Learning Understanding How 'Screen Time' Affects Learning
Understanding How 'Screen Time' Affects Learning Lisa Guernsey
 
Naplan Pp
Naplan PpNaplan Pp
Naplan PpCourt22
 
ECE 205 CHILD DEVELOPMENT WEEK ONE INSTRUCTOR GUIDANCE .docx
 ECE 205 CHILD DEVELOPMENT WEEK ONE INSTRUCTOR GUIDANCE .docx ECE 205 CHILD DEVELOPMENT WEEK ONE INSTRUCTOR GUIDANCE .docx
ECE 205 CHILD DEVELOPMENT WEEK ONE INSTRUCTOR GUIDANCE .docxaryan532920
 
Learning Through Play: Position Paper
Learning Through Play: Position PaperLearning Through Play: Position Paper
Learning Through Play: Position PaperChristina Sookdeo
 
Integrating Toys into Children's wear
Integrating Toys into Children's wearIntegrating Toys into Children's wear
Integrating Toys into Children's wearLaura Lepre
 
Young children's deep processing of endorsement marketing
Young children's deep processing of endorsement marketingYoung children's deep processing of endorsement marketing
Young children's deep processing of endorsement marketingtim smits
 
Role Of Play In Overly Academic Kindergarten Naeyc 2010
Role Of Play In Overly Academic Kindergarten Naeyc 2010Role Of Play In Overly Academic Kindergarten Naeyc 2010
Role Of Play In Overly Academic Kindergarten Naeyc 2010gesellinstitute
 
Simulations For Teaching Social Interaction[1]
Simulations For Teaching Social Interaction[1]Simulations For Teaching Social Interaction[1]
Simulations For Teaching Social Interaction[1]waywilldo
 
Ethics case studies
Ethics case studiesEthics case studies
Ethics case studiesCarla Piper
 
Perbedaan Kritik Sastra Dan Essay Sastra
Perbedaan Kritik Sastra Dan Essay SastraPerbedaan Kritik Sastra Dan Essay Sastra
Perbedaan Kritik Sastra Dan Essay SastraHeather Lopez
 
The Best baby gift for all parents to make their kids early perfect learners
The Best baby gift for all parents to make their kids early perfect learnersThe Best baby gift for all parents to make their kids early perfect learners
The Best baby gift for all parents to make their kids early perfect learnersSuperDadi
 
Assessing of children article
Assessing of children articleAssessing of children article
Assessing of children articleAahil Malik
 
Talkingtoyour childrengrades
Talkingtoyour childrengradesTalkingtoyour childrengrades
Talkingtoyour childrengradesIS Manila
 
Early Sensitivity to Language Context in a Trilingual Toddler
Early Sensitivity to Language Context in a Trilingual ToddlerEarly Sensitivity to Language Context in a Trilingual Toddler
Early Sensitivity to Language Context in a Trilingual ToddlerJames Lee
 

Similaire à A child's job is to play, we should let them (Final Paper (20)

Understanding How 'Screen Time' Affects Learning
Understanding How 'Screen Time' Affects Learning Understanding How 'Screen Time' Affects Learning
Understanding How 'Screen Time' Affects Learning
 
Naplan Pp
Naplan PpNaplan Pp
Naplan Pp
 
Kids, parents, toys & gender
Kids, parents, toys & genderKids, parents, toys & gender
Kids, parents, toys & gender
 
ECE 205 CHILD DEVELOPMENT WEEK ONE INSTRUCTOR GUIDANCE .docx
 ECE 205 CHILD DEVELOPMENT WEEK ONE INSTRUCTOR GUIDANCE .docx ECE 205 CHILD DEVELOPMENT WEEK ONE INSTRUCTOR GUIDANCE .docx
ECE 205 CHILD DEVELOPMENT WEEK ONE INSTRUCTOR GUIDANCE .docx
 
Learning Through Play: Position Paper
Learning Through Play: Position PaperLearning Through Play: Position Paper
Learning Through Play: Position Paper
 
Integrating Toys into Children's wear
Integrating Toys into Children's wearIntegrating Toys into Children's wear
Integrating Toys into Children's wear
 
Young children's deep processing of endorsement marketing
Young children's deep processing of endorsement marketingYoung children's deep processing of endorsement marketing
Young children's deep processing of endorsement marketing
 
Role Of Play In Overly Academic Kindergarten Naeyc 2010
Role Of Play In Overly Academic Kindergarten Naeyc 2010Role Of Play In Overly Academic Kindergarten Naeyc 2010
Role Of Play In Overly Academic Kindergarten Naeyc 2010
 
Simulations For Teaching Social Interaction[1]
Simulations For Teaching Social Interaction[1]Simulations For Teaching Social Interaction[1]
Simulations For Teaching Social Interaction[1]
 
Pecha kucha 2
Pecha kucha 2Pecha kucha 2
Pecha kucha 2
 
CCHD_2007
CCHD_2007CCHD_2007
CCHD_2007
 
Ethics case studies
Ethics case studiesEthics case studies
Ethics case studies
 
Perbedaan Kritik Sastra Dan Essay Sastra
Perbedaan Kritik Sastra Dan Essay SastraPerbedaan Kritik Sastra Dan Essay Sastra
Perbedaan Kritik Sastra Dan Essay Sastra
 
The Best baby gift for all parents to make their kids early perfect learners
The Best baby gift for all parents to make their kids early perfect learnersThe Best baby gift for all parents to make their kids early perfect learners
The Best baby gift for all parents to make their kids early perfect learners
 
Assessing of children article
Assessing of children articleAssessing of children article
Assessing of children article
 
Talkingtoyour childrengrades
Talkingtoyour childrengradesTalkingtoyour childrengrades
Talkingtoyour childrengrades
 
Essay On Children
Essay On ChildrenEssay On Children
Essay On Children
 
Synops Final
Synops FinalSynops Final
Synops Final
 
Early Sensitivity to Language Context in a Trilingual Toddler
Early Sensitivity to Language Context in a Trilingual ToddlerEarly Sensitivity to Language Context in a Trilingual Toddler
Early Sensitivity to Language Context in a Trilingual Toddler
 
Review lecture 12 chapter 12
Review lecture 12   chapter 12Review lecture 12   chapter 12
Review lecture 12 chapter 12
 

Plus de DirectionFirst

AMSRS 2011 Highly Commended Paper by Erica van Lieven- The Caterpillar Become...
AMSRS 2011 Highly Commended Paper by Erica van Lieven- The Caterpillar Become...AMSRS 2011 Highly Commended Paper by Erica van Lieven- The Caterpillar Become...
AMSRS 2011 Highly Commended Paper by Erica van Lieven- The Caterpillar Become...DirectionFirst
 
Second Life - Highly Commended Paper AMSRS Conference 2009
Second Life - Highly Commended Paper AMSRS Conference 2009Second Life - Highly Commended Paper AMSRS Conference 2009
Second Life - Highly Commended Paper AMSRS Conference 2009DirectionFirst
 
Cnw preso v7 no video no animation slide share
Cnw preso v7 no video no animation slide shareCnw preso v7 no video no animation slide share
Cnw preso v7 no video no animation slide shareDirectionFirst
 
Research with Children
Research with ChildrenResearch with Children
Research with ChildrenDirectionFirst
 
New Virtual MR Festival - Semantic Web 3.0 preso rethought (2010)
New Virtual MR Festival - Semantic Web 3.0 preso rethought (2010)New Virtual MR Festival - Semantic Web 3.0 preso rethought (2010)
New Virtual MR Festival - Semantic Web 3.0 preso rethought (2010)DirectionFirst
 
Semantic web 3.0 paper (2009)
Semantic web 3.0 paper (2009)Semantic web 3.0 paper (2009)
Semantic web 3.0 paper (2009)DirectionFirst
 
Young Research Group Preso - When MR meets Glocal Trend (2011)
Young Research Group Preso - When MR meets Glocal Trend (2011)Young Research Group Preso - When MR meets Glocal Trend (2011)
Young Research Group Preso - When MR meets Glocal Trend (2011)DirectionFirst
 
A child’s job is to play, we should let them pamela wong direction first 09.0...
A child’s job is to play, we should let them pamela wong direction first 09.0...A child’s job is to play, we should let them pamela wong direction first 09.0...
A child’s job is to play, we should let them pamela wong direction first 09.0...DirectionFirst
 

Plus de DirectionFirst (9)

AMSRS 2011 Highly Commended Paper by Erica van Lieven- The Caterpillar Become...
AMSRS 2011 Highly Commended Paper by Erica van Lieven- The Caterpillar Become...AMSRS 2011 Highly Commended Paper by Erica van Lieven- The Caterpillar Become...
AMSRS 2011 Highly Commended Paper by Erica van Lieven- The Caterpillar Become...
 
Second Life - Highly Commended Paper AMSRS Conference 2009
Second Life - Highly Commended Paper AMSRS Conference 2009Second Life - Highly Commended Paper AMSRS Conference 2009
Second Life - Highly Commended Paper AMSRS Conference 2009
 
Cnw preso v7 no video no animation slide share
Cnw preso v7 no video no animation slide shareCnw preso v7 no video no animation slide share
Cnw preso v7 no video no animation slide share
 
Trends in Brief 2011
Trends in Brief 2011Trends in Brief 2011
Trends in Brief 2011
 
Research with Children
Research with ChildrenResearch with Children
Research with Children
 
New Virtual MR Festival - Semantic Web 3.0 preso rethought (2010)
New Virtual MR Festival - Semantic Web 3.0 preso rethought (2010)New Virtual MR Festival - Semantic Web 3.0 preso rethought (2010)
New Virtual MR Festival - Semantic Web 3.0 preso rethought (2010)
 
Semantic web 3.0 paper (2009)
Semantic web 3.0 paper (2009)Semantic web 3.0 paper (2009)
Semantic web 3.0 paper (2009)
 
Young Research Group Preso - When MR meets Glocal Trend (2011)
Young Research Group Preso - When MR meets Glocal Trend (2011)Young Research Group Preso - When MR meets Glocal Trend (2011)
Young Research Group Preso - When MR meets Glocal Trend (2011)
 
A child’s job is to play, we should let them pamela wong direction first 09.0...
A child’s job is to play, we should let them pamela wong direction first 09.0...A child’s job is to play, we should let them pamela wong direction first 09.0...
A child’s job is to play, we should let them pamela wong direction first 09.0...
 

A child's job is to play, we should let them (Final Paper

  • 1. A CHILD’S JOB IS TO PLAY, WE SHOULD LET THEM... Pamela Wong Research Manager, Direction First Page 1 of 1
  • 2. A child’s job is to play, we should let them... Pamela Wong, Direction First Introduction There appears to be very little consensus and a shortage of research investigating effective research approaches and question types with children. Direction First has put the standard approaches to the test, along with using the latest technology from GMI to evaluate more modern approaches. We wanted to know which questionnaire scales gave better discrimination and to determine if the use of interactive and gaming scales would improve data quality by improving engagement. Direction First has undertaken original research challenging the traditional approach of questioning children by creating audio and visually interactive game based techniques designed to answer ‘traditional’ objectives. Today’s children live in a digital world and we wanted to test if online gaming methodologies maintained attention better and led to better quality data. There have been few studies that compare and measure the discrimination and engagement of different question types and methods. In this research we explored different question types and scales to understand which types enabled better discrimination, and ultimately, which question types were more engaging and provided better quality data. We compared different question scales on over 500 Australian children between 7 and 10 years old in an online survey. The research was conducted in three different stages. Each stage contained an independent sample of participants. Children in each stage rated their liking of the same fifteen items on one scale before moving onto the next scale until all four scales had been used. Sensory Food Research on Children Globally, the children’s market is estimated to be valued at $USD1.3 trillion (Nairn, 2010). Children have much more autonomy and influence over household purchases than previous generations, to such an extent that today’s youth are more likely to be described as consumers rather than as children (Geraci, 2004). The growth in the consumption power of children as consumers and influencers of family purchases, including household groceries, has been recognised as substantial business, and this has similarly led to growth in spending to find out what children want, why and how to best market to them. The children's market is a notoriously challenging market to research. Whilst children are being exposed to significantly more information and technology at a younger age, they still tend to have limited linguistic and numeracy skills, cognitive abilities and short attention spans. Because of this, they may be able to participate and respond to research in more limited ways unless techniques are adapted. For this reason, there are specialty companies and departments dedicated to conducting research with children. Page 1 of
  • 3. A child’s job is to play, we should let them... Pamela Wong, Direction First In food sensory research literature, it has been found that children have difficulty with understanding and remembering instructions, interpreting abstract symbols or pictures, and completing tasks such as seriation (ranking in order of magnitude) and attending to multiple aspects, for example, texture and flavour of a food (Popper and Kroll, 2005, 2003). Younger children tend to focus on a single aspect of a product, without attending to other aspects (Fliegelman et al, 2004). Children develop linguistic, literacy and numeracy skills at different rates, and there is such tremendous variation in such skills among children of the same age (variations up to 4 years) that some researchers believe school grades may be better determinants of skills/abilities among children than age alone (C&R Research, 2009). The changing vernacular of children from each generation is of particular importance to researchers, as it affects the language with which we communicate with children. Whilst language needs to be familiar, child friendly and suitable to the age group, children often aspire to be older and look up to children who are older than them, so it is important to keep things simple enough to understand and be familiar, they must not feel that everything has been dumbed down for them. This also applies to themes and imagery. When asking children questions, there is a tendency to respond positively to questions about whether they like something for different reasons, that is, they are more likely to respond with positive descriptors than negative (Geraci, 2004). Children tend to rate new products and ideas positively because they are excited about novelty and not necessarily because they really like the products. C&R Research addressed this issue by designing an unbalanced scale that made most responses sound positive, such as a five point scale labelled as, “love it”, “like a lot”, like a little”, it’s ok”, and “don’t like at all”. This aimed to enable children to distinguish products that they really loved and those that were just interesting because they were new. Winning concepts were believed to have clearly surfaced (Fliegelman et al, 2004). If children don’t like an idea or product because it’s novel, then familiarity may also be a factor that falsely drives liking. Introducing unfamiliar foods to kids several times has been found to enhance liking of the product due to the “mere exposure effect” (Birch and Marlin, 1982). This has implications for researchers and companies introducing new products to market. Most sensory protocols expose a child only once to a novel food in small portions, however, Ubrick (2002) proposes that new foods may require repeated testing to assess the true potential of a product. Popper and Kroll (2005) have emphasised the importance of considering cognitive and social factors that affect sensory food testing with children. Food preferences are influenced by the interplay of nature (e.g. innate preference for sweet tastes, aversion to bitter tastes) and nurture (e.g. parents, peers, and the environment). Peer influences can also have long lasting effects on children’s food preferences. Children’s food choices may be affected by their desire to exercise control of themselves and to be viewed as older and more mature. Changing societal influences Page 2 of
  • 4. A child’s job is to play, we should let them... Pamela Wong, Direction First have led to children maturing earlier, which has resulted in increases in cognitive demands and processing skills needed to meet these demands (Chambers, 2005). Today, technology continues to create generations of child consumers that are exposed to more products, ideas and technology than previous generations. Not only are children growing up with more media and entertainment options to choose from, but more media is being targeted directly to them than in previous generations. Multi-tasking while using various forms of technology (e.g. surfing the internet while watching TV) is enjoyed by most children. This lends support to our belief that children may be more capable of completing more sophisticated questionnaires than we originally thought. Questionnaire scales Researching children requires different procedures routinely applied to adults, including psychological factors such as gaining confidence, trust and providing motivation, communicating in child-appropriate language and using appropriate questionnaire scales (Schraidt, 2009). Specialized research methods, adaptations and techniques have been developed by various firms conducting research on children. One such firm is the Peyram & Kroll Research Corporation who has published the bulk of sensory food research on children, and conduct a specialty practice in this field. The P&K Corporation believe that there is a consensus among the research community that children (as young as 5 years old) can discriminate, particularly in regard to expressing their degree of liking, which means they are able to indicate a degree of preference if the correct measuring techniques are used (Schraidt, 2009). There is little consensus in literature, however, on which are the most effective techniques, question types and scales when conducting research on children. Hedonic scales for food acceptance have been used widely for consumer testing. In Australia, different agencies are using very different questioning types and scales for children, recognising the fact that children require special questioning techniques. Questionnaire scales used on children include face scales, star scales, line scales, and normal descriptor type scales amongst others (Figures 1-4). Figure 1. Standard 9 point hedonic scale for adults 1 2 3 4 5 6 7 8 9 Like Neither Dislike extremely like nor extremely dislike Page 3 of
  • 5. A child’s job is to play, we should let them... Pamela Wong, Direction First Figure 2. Facial scale for children Figure 3. P&K scale for children 1 2 3 4 5 6 7 8 9 Super Really Good Just a Maybe Just a Bad Really Super good good little good or little bad bad good maybe bad bad Figure 4. Star scales for children                                                                                                                                                                     Dislike a lot                       Like a lot  Facial scales (Figure 2) which were designed to inspire closer attention to the scaling task, have continued to be popular based on the rationale that children have limited reading and linguistic skills and cannot understand complex words or phrases. Whilst this scale continues to be used by some for conducting sensory research, it has been found to be less discriminating than other verbal scales and may introduce unintended bias. Children tend to respond to pictures based on the emotion that they show (a smiley face shows a happy person) rather than what they are supposed to represent (how the food makes you feel). Pictorial facial hedonic scales have been said to be ambiguous as the face, which is intended to show a degree of dislike can be interpreted by children as feeling angry, which is an emotion not usually experienced when thinking about food (Popper and Kroll, 2003; Cooper, 2002). Page 4 of
  • 6. A child’s job is to play, we should let them... Pamela Wong, Direction First The P&K scale (Figure 3) was a child oriented scale developed specially by Peyram and Kroll to be used for children who were semi-literate (Popper and Kroll, 2005). This scale was reported to perform better than the standard hedonic scales and the smiley face scale. Whilst there are many merits to the application of the face scale, Kroll (1990) found that the face scale was less effective and less discriminating compared to hedonic ratings on the P&K scale. No references were found in literature on the star scale (Figure 4), but several specialists in food sensory research on children have recommended this scale above other scales, and it has been used by sensory research firms in Australia for many years. It has been said that children understand the star scale easily, as the stars represent grades or rewards that closely follow the grades that they are awarded for good work at school. However, it is important when using any scaling to emphasise that there are no right or wrong answers to help children to answer truthfully (Fliegelman et al., 2004) Other researchers believe that because children cannot distinguish shades of meaning, that asking any type of rating question on a scale is not useful as they do not understand. Simplified, finite scales such as” “like it”, “it’s ok” or “don’t like it” have been recommended for younger children (Fliegelman et al, 2004). Pair-wise questionnaire approaches where children chose their favourite option between 2 choices was reported as effective among very young children (Fliegelman et al, 2004). On a similar basis, a bifurcated approach where children were firstly asked if a food was “good” or “bad” before being asked if it was “really good” or “really bad” was found to be effective for children under 7 years old (Kroll, 1990). Kroll (1990) conducted a comprehensive study on children to compare various sensory questionnaire scales, scale lengths and the effectiveness of self-administered versus one-on-one interviews. In this study, the relative merits of the different rating scales that can be used in testing children were assessed. A standard hedonic scale, a face scale, a child-oriented scale (P&K) and paired comparison were used with children between 5 and 10 years. Findings showed that the P&K scale performed better than the standard hedonic or face scale in terms of discrimination. The use of a shorter scale, under the hypothesis that it would offer simplicity (7 points as opposed to 9 points) was not found to offer any advantages among children. The 9 point scale resulted in better discrimination and produced more reliable results than the 7 point scale. In one-on-one interviews, it has been hypothesised that children may respond positively to acquiesce, which provides a plausible reason for using self-administered questionnaires when possible. Children over 8 years old performed as well in self administered questionnaires as one-on-one interviews. Sensory researchers agree that children are different to adults and require tailored research approaches. Guinard (2001) reported differences found in sensory intensity (strength) thresholds in adults and children, however, these differences in perception may be more reflective of the Page 5 of
  • 7. A child’s job is to play, we should let them... Pamela Wong, Direction First differences in how children interpret questions and how they use intensity scales, rather than true physiological differences. This provides further support to the need to conduct more research in this area. Respondent engagement Respondent engagement in online research has been discussed extensively throughout the research industry. Common metrics of engagement include completion rates, survey time spent, verbosity of open ended responses, consistency checks, fatigue and satisficing (doing just enough to complete a task) measures, and the ability of participants to follow instructions accurately. These measures have been said to be indicators of engagement, which ultimately determine completion rates, enjoyment and data quality. SSI research revealed that on average, survey response rates in the UK, France and the Netherlands collapsed dramatically from 30% in 2004 to 10% in 2009. Research was conducted to understand the effects of survey length, fatigue and subsequent effects on response quality (Cape, 2009). Fatigue or satisficing behaviour was hypothesised as indicators of participant’s lack of engagement, so researchers used various measures to investigate reasons for changes in survey behaviour since 2004. By positioning non-mandatory question scales, SSI measured rates of non- response. Data on drop-out rates, survey time spent, rates of satisficing, numbers of words typed in open ended questions, and rates of answering falsely (in order to skip a section) were used as metrics to explain survey behaviour, and measures of data quality. The research indicated that there was a critical limit of 20 minutes for surveys, after which engagement and data quality dropped. Sleep and Puleston of Engage Research and GMI (2009) examined causes of boredom in online surveys. Various techniques were tested with the aim of improving data quality, including the use of visuals/animations, use of alternatives to grid questions, role playing, survey energisers and improving language, amongst others. Data quality measures were examined including straight- lining, responses to open ended questions and the ability to follow instructions accurately. Techniques applied resulted in a successful reduction in drop-out rates, increased time spent and supply of higher volumes of data (open ended responses, follow on questions) and better quality data. A substantial volume of research on improving engagement has been conducted on panels of adult respondents, who it seems are becoming bored with online surveys. This is a trend seen globally. So it seems reasonable to believe that for children who have much shorter attention spans, and more limited cognitive abilities, that traditional “black and white” form surveys and research question scales for adults are not likely to be highly engaging. Page 6 of
  • 8. A child’s job is to play, we should let them... Pamela Wong, Direction First Today’s youth are becoming more technologically savvy at a much younger age. In countries where all choices of media are available, children use between 4 and 6 media a day (e.g. TV, radio, internet and books), and often simultaneously (Solomon and Peters 2005). It is believed that the ability to follow several topics more or less simultaneously with attention switching from one medium to another demands quite an advanced level of cognitive and memory coordination. While there is a consensus that children can provide valuable information for marketers, there is little consensus on the extent to which survey design needs to be simplified to minimise confusion and capture accurate information. The hypothesis is that children need simplicity, however many researchers have found evidence contrary to this belief. Connecting with the most inter-connected generation of youth is not an easy task. In Australia, access to media is ubiquitous and over 90% of children aged between 7 and 10 years, spend between 30 to 60 minutes a day, surfing the internet and using various types of media, often simultaneously (Direction First online survey, June 2010). This level of multi-tasking by children means that marketing messages need to be interesting and compelling, and this also applies to market research on children. Australia has been described as a “Game Nation” and playing video and computer games (e.g. Figures 5-6) has become as popular as the internet and television. Whilst playing video games does not compete for time spent in non-media activities, it competes with use of older media, and is increasingly becoming a more social activity (Brand et al, 2009). The enormous popularity of games and high proportion of young gamers under 10 years old gives us reason to believe that there are certainly more ways in which we need to conduct research on young digital natives to capture their attention, be more enjoyable, interactive, immersive and engaging. Page 7 of
  • 9. A child’s job is to play, we should let them... Pamela Wong, Direction First Figure 5. A single player computer game of the past: Nintendo Tetris Figure 6. A current massively multi-player online role playing game (MMORPG): Nintendo Wii The Legend of Zelda Page 8 of
  • 10. A child’s job is to play, we should let them... Pamela Wong, Direction First One of the most successful television based educational-entertainment programs was Sesame Street, which aired on television in 1969, after substantial academic scrutiny. The creators turned what was considered a low involvement, not educational and non-interactive medium into an enormously successful teaching tool (Gladwell, 2001). Inspiration was drawn from educational psychology, television commercials and comedy sketches to improve numeracy and literacy skills among preschoolers, which was proven to improve viewers reading and learning skills. Much in the same way that “edutainment” derived its parentage from educational psychology, advertising and entertainment to capture childrens’ attention and teach during play time, researchers can draw from such techniques to make research more appropriate, fun and engaging for children and adults, whilst collecting better quality data. Background In June 2010, Direction First conducted an online study to investigate which question scales work best on children, and to determine whether interactive and gaming elements improved engagement. The main objectives of the research were to:  Test a standard hedonic questionnaire scale with scales designed for children to see which of them gave better discrimination power.  To determine if the use of interactive elements or a combination of interactive gaming scales would improve data quality by improving engagement.  To determine which of the scales and questionnaire formats was the most engaging, enjoyable and fun. Over 500 Australian children aged between 7 and 10 years were invited to participate in the online study conducted in June 2010. The research was conducted in three different stages with each stage comprised of an independent sample of participants. Children in each stage rated their liking of the same fifteen items on one scale before moving onto the next scale until all four scales had been used. The orders of the scales were randomised in a balanced block design to avoid positional bias. Parents firstly completed a screening exercise, with children taking over once the screener was complete to undertake the survey. ‘Warm-up’ questions were asked at the beginning of each new scale to ensure respondents were aware that they had progressed onto a new scale. Scale experience questions were presented at the end of each scale to find out how much children enjoyed the experience and how easy it was for them. Consistency check questions were used to determine whether respondents were engaged and attentive at the beginning and end of the survey. Page 9 of
  • 11. A child’s job is to play, we should let them... Pamela Wong, Direction First Fifteen concepts were selected for the research and included conceptual text descriptions and images of unbranded common food consumption items, flavours, and unbranded commercial-like products. Common food consumption images included milk, honey, ice cream, bread and water. Flavours presented as words included mint, chocolate, cinnamon, peanut butter, and lemon. The unbranded commercial-like concepts included images of sweet biscuit and savoury snack products that were relatively similar to existing market products. Concepts were selected so that the range contained a mix of liked, neutral, and disliked flavours and products to represent a wide hedonic range. The concepts researched create a context for conducting concept testing as well as addressing other aspects more likely to be presented in food sensory testing applications. The 3 stages were as follows:  Stage 1: Traditional. N=96.  Stage 2: Interactive. N=167.  Stage 3: Interactive and gaming. N=248. The 4 question scales tested in each of the 3 stages included the following:  9 pt standard hedonic scale  5 pt smiley face scale  9 pt P&K scale  9 pt star scale Scales read left to right from negative to positive in all surveys. Whilst some researchers use some of the scales the other way around, we decided to keep it consistent with our current questionnaire scales to avoid confusion. Traditional (Stage 1) The first stage of the research was designed to compare and put to test 4 different scales in their traditional, ‘black and white’ format. A sample of 100 children evaluated concepts and flavours by answering questions that appeared as they usually would on paper questionnaires. Essentially, this was placing a paper questionnaire in an online survey (Figures 7 – 10). Page 10 of
  • 12. A child’s job is to play, we should let them... Pamela Wong, Direction First Figure 7. Stage 1 - Standard 9pt hedonic scale Figure 8. Stage 1 - 5pt Smiley face scale Figure 9. Stage 1 - 9pt P&K scale Figure 10. Stage 1 - 9 pt Star scale                                                                                                                                                                     Dislike a lot                       Like a lot  Interactive (Stage 2) The second stage introduced the four scales in a graphically enhanced, interactive format, with sliders and audio - visual scales. The interactive scales were designed by Direction First using flash technology on GMI’s platform (Figures 11 – 14). Page 11 of
  • 13. A child’s job is to play, we should let them... Pamela Wong, Direction First Figure 11. Stage 2 - Standard 9pt hedonic scale Figure 12. Stage 2 - 5pt Smiley face scale Figure 13. Stage 2 - 9pt P&K scale Figure 14. Stage 2 - 9pt Star scale Page 12 of
  • 14. A child’s job is to play, we should let them... Pamela Wong, Direction First Gaming and interactive (Stage 3) The third stage repeated the four interactive scales used in Stage 2. Drawing inspiration from the latest online video games, Direction First designed an avatar-like character that participants were asked to choose and dress at the beginning of the survey (Figure 15). The character continued through the survey journey with the participant, in same way that popular role playing video games are played today. This third stage also introduced a series of popular video game inspired backgrounds (Figure 16). Figure 15. Dressing your character Page 13 of
  • 15. A child’s job is to play, we should let them... Pamela Wong, Direction First Figure 16. Character in survey 1. Comparison of scales To ensure that the scales were comparable, we converted the 5 point smiley face scale to a 9 point scale to be comparable to the other scales. The reason why the 5 point facial scale was used rather than a 9 point scale was because they have not been commonly used, and after reviewing a 9 point facial scale, we found the subtle differences in expressions to be too minute and somewhat confusing. All mean scores were reported on a 9 point scale (Table 1). Table 1. Comparison of scales Scale Score 9-point Standard 1 2 3 4 5 6 7 8 9 9-point P&K 1 2 3 4 5 6 7 8 9 9 point Star 1 2 3 4 5 6 7 8 9 5 point Smiley 1 2 3 4 5 2. Comparison of stages To ensure that the samples from each of the 3 stages were homogeneous and comparable, interlocking quotas were used at each stage of the research to obtain even gender and age balance. Because there were significant differences in age and gender proportions in each of the stages, the dataset was weighted with each individual stage being balanced towards the target quotas, with 25% obtained in each cell (Table 2). Page 14 of
  • 16. A child’s job is to play, we should let them... Pamela Wong, Direction First Table 2. Weighted proportions in each sample 7 to 8yrs 9 to 10yrs Male 25% 25% Female 25% 25% Which scale performed best? The performance of the scales were compared using several different approaches to cater for the different hypotheses surrounding inter-scale and inter-stage differences. We hypothesised that the widely used star scale would be the most discriminating scale, followed by the child-oriented P&K scale. We thought that the smiley face and standard hedonic scales would perform equally in terms of discriminating power. We also believed that the interactive scale would improve engagement and therefore lead to better quality data and consistency. The main areas of measurement of scale effectiveness and inter-scale performance were scale discrimination power and range (proportion) of scale used. We investigated a number of statistical measures to compare the scales (see Appendix). Prior to scale comparisons, respondents who had failed any of the consistency checks were removed from the data file. In Stage 1, where the traditional, “black and white” survey format was used, there was an opportunity to compare the effectiveness of the scales without the influence of interactive audio- visual elements or avatars. We examined the results from this survey to determine which of the 4 scales provided the best discriminating power. Repeated measures analysis of variance with Duncan’s tests were used to compare the scales on all possible pairs of means. In Stage 1, the overall hedonic ratings showed a very similar pattern across scales (Figure 17). Page 15 of
  • 17. A child’s job is to play, we should let them... Pamela Wong, Direction First Figure 17. Stage 1 Means of the fifteen items tested across each scale In the traditional, ‘black and white’ survey (Stage 1), the 4 different question scales (Standard, Star, Smiley Face and P&K) performed similarly, in terms of providing similar patterns in overall hedonic ratings for the fifteen concepts. Page 16 of
  • 18. A child’s job is to play, we should let them... Pamela Wong, Direction First On examination of the overall hedonic results when scales were interactive (Stage 2), there was a very similar pattern across the scales. An interesting pattern emerged whereby the P&K scale tended to record slightly higher scores than the other scales. The same pattern was observed when interactive gaming elements (Stage 3) were used. Furthermore, the Standard 9 point scale was also shown to have a tendency toward higher ratings than the Star and Smiley Face scales. Comparing discriminating power of the scales in the traditional survey (Stage 1), a very slight advantage went to the Standard 9 pt hedonic scale. Despite having fewer scale points, the Smiley face scale showed a similar level of performance as the other scales. When interactive elements were used in Stage 2, scale discrimination was observed to drop overall, and no single scale performed better. The P&K scale performed marginally better than the other scales in the interactive gaming survey (Stage 3). In terms of scale range or proportion used, a large proportion of the scales were used, and there were no significant differences observed between the scales in the traditional questionnaire (Stage 1). Results were similarly observed when scales were interactive (Stage 2). When interactive gaming elements were used (Stage 3), a significantly larger proportion of the Star and Smiley Face scales were used compared to the P&K scale (Table 3). Table 3. Proportion of scale used across the stages Stage 1 Stage 2 Stage 3 N=96 N=167 N=248 Scale Proportion of scale used Star 9pt 74% 77% 74% Smiley Face 5 pt 77% 78% 76% Standard 9pt 76% 76% 71% P&K 9pt 72% 73% 67% Further analysis of the scales revealed that when the15 hedonic scores were averaged, and the scales compared on average performance, no significant differences were observed. The inter-stage comparison of each individual scale revealed that there were also no significant differences in the performance of the individual scales between the stages. This suggests that the interactive and gaming elements did not affect the research outcome significantly. Page 17 of
  • 19. A child’s job is to play, we should let them... Pamela Wong, Direction First Which questionnaire was more engaging? To measure respondent engagement, 5-point Likert scales were used to obtain feedback from participants on how easy and how much fun they had on each of the scales in the study. The items on the ease of use scale ranged from 1=‘Hard’ to 5=‘Easy’. The items on the fun scale ranged from 1=‘No fun at all’ to 5=‘Lots of fun’. Both scales had numerical values assigned to all points on the scale, and so were treated as scale variables for the purpose of analysis. To compare the performance of the various stages, ability to follow instructions and response consistency were measured through a series of question checks that were repeated at the beginning and end of the survey. This involved clicking at selected points on scales and indicating the number of brothers and sisters the participants had. Time to complete the surveys was also recorded and compared at each stage. Which scale was easiest to use? All of the questionnaire scales used at each stage were seen as easy to use (mean scores of over 4 out of 5) (Table 4). In the traditional survey (Stage 1), the Smiley face scale was considered as significantly easier to use than the Standard and Star scales, but not significantly easier than the P&K scale. The P&K scale was significantly easier to use than the Standard 9pt scale. When participants used interactive scales (Stage 2), the Smiley face and P&K scales were considered as slightly (directionally) easier to use than the Standard scale. With gaming and interactive elements activated (Stage 3), all scales were considered as similarly easy to use and there were no significant differences. Page 18 of
  • 20. A child’s job is to play, we should let them... Pamela Wong, Direction First Table 4. Mean scores on ease of use How EASY was it to answer the questions about the Stage 1 Stage 2 Stage 3 flavours and foods on this N=96 N=167 N=248 scale? Scale Mean/9 Standard 9pt 4.5 4.5 4.5 Star 9pt 4.6 4.6 4.6 P&K 9pt 4.7 4.7 4.6 Smiley Face 5pt 4.9 4.8 4.7 No significant differences were observed by age and gender Which scale was fun to use? All of the scales across all stages were seen as fun to use, with all obtaining mean scores of over 4 out of 5 (Table 5). In the traditional survey (Stage 1), the Smiley scale was directionally more fun than the Standard scale (i.e. approaching a significant level). When interactive survey elements were used (Stage 2), the Smiley Face and Star scales were both seen as significantly more fun to use than the Standard. With interactive-gaming elements (Stage 3), the Smiley Face scale was viewed as significantly more fun to use than the Standard and P&K scales. The Star scale was considered as significantly more fun than the Standard. Page 19 of
  • 21. A child’s job is to play, we should let them... Pamela Wong, Direction First Table 5. Mean scores for rating of fun How much FUN did you have answering the Stage 1 Stage 2 Stage 3 questions about the N=96 N=167 N=248 flavours and foods on this scale? Scale Mean/9 Standard 9pt 4.2 4.1 4.1 Star 9pt 4.4 4.5 4.6 P&K 9pt 4.4 4.3 4.3 Smiley Face 5pt 4.6 4.5 4.7 No significant differences were observed across age and gender in Stage 2 and 3. In Stage 1, younger males did not have as much fun on the 9-point hedonic and P&K scales as their older counterparts. Response consistency and following instructions The ability to answer consistently and follow instructions is a measure of respondent engagement, as it determines whether a participant is paying attention and is engaged in the task. Participants were asked to indicate how many brothers and how many sisters they had at 2 different points in each survey stage (Figure 18). This was used because it was a question that was relatively easy for most children to answer, didn’t require an opinion (unchanging), and therefore should have remained constant. The questions are shown below: Page 20 of
  • 22. A child’s job is to play, we should let them... Pamela Wong, Direction First Figure 18. Consistency question on number of siblings The results below (Table 6) indicate that there were very high levels of consistency when the sibling question was used and Chi-squared testing indicated that there were no significant differences across the different stages on this measure ( ). Table 6. Proportion of respondents making consistency errors when asked about number of siblings Stage 1 Stage 2 Stage 3 N=96 N=167 N=248 No mismatch 97% 93% 95% 1 mismatch 2% 7% 5% Both mismatch 1% 0% 0% Total 100% 100% 100% Participants were also asked to select a specific point on a scale at 2 different points in each survey stage. The second consistency check was used to check if participants were paying attention and if they were able to follow simple instructions at each stage. The question is shown below (Figure 19): Page 21 of
  • 23. A child’s job is to play, we should let them... Pamela Wong, Direction First Figure 19. Consistency question on following instructions In Table 7 below, Chi squared testing revealed that there were significant differences in the proportions of consistency errors made by participants across the 3 stages ( ). Table 7. Proportion of respondents making consistency errors when following simple instructions Stage 1 Stage 2 Stage 3 N=96 N=167 N=248 Neither wrong 94% 79% 81% Both wrong 3% 5% 10% First check wrong, second check right 3% 14% 9% First check right, second check wrong 0% 2% 0% Total 100% 100% 100% Pairwise comparisons (p=0.05) Further analysis revealed that one in ten participants in the gaming stage (Stage 3) got both consistency checks incorrect, a significantly higher proportion than those in either the first or second stages. 14% of those in Stage 2 got the first check wrong. It is possible that interactive and gaming elements distracted participants from completing simple tasks. Whilst a higher proportion of participants failed to follow the simple instructions properly in Stage 2 and 3, they still managed to consistently answer questions about themselves. Page 22 of
  • 24. A child’s job is to play, we should let them... Pamela Wong, Direction First Time taken to complete survey One common metric often used to determine data quality and respondent engagement has been time spent in the survey. It has been said that spending too short a time or spending too much time are both indicators of inattentiveness, resulting from speeding or distraction. The time taken to complete the survey was similar across each stage and there were no significant differences observed (Table 8). Note that this time was calculated based on time from frame to frame so excluded the building of the character in Stage 3 (for fairer comparison). Table 8. Time taken to complete survey by stage Time Taken Stage (HH:MM:SS) Stage 1 00:16:52 Stage 2 00:15:24 Stage 3 00:15:18 Data that were 2 standard deviations away from the mean were removed for analysis. We thought that our participants would spend more time on surveys where interactive elements were present, and even more time when gaming elements were activated. However, the results show that there were no differences, and even very slightly (not significant) less time spent where interactive and gaming elements were present. Conclusions In terms of inter-scale comparison, all 4 questionnaire scales (standard, star, smiley face and P&K) presented in the traditional, ‘black and white’ survey format (Stage 1) performed similarly, in terms of providing similar patterns in overall hedonic ratings for the fifteen concepts. It was observed that the Standard scale offered a slight advantage, as it had marginally more discriminating power. However, when the hedonic scores were averaged across all products and the individual scales compared, there were no differences. This suggests that all scales performed equally and no scale performed better in terms of discriminating power. In the interactive survey (Stage 2), discriminating power of all scales appeared lower overall, suggesting some level of interference and no single scale stood out from the rest. In the interactive- gaming survey (Stage 3), discriminating power was on average on par with the traditional “black and white” survey, with the P&K scale performing marginally better than the others. The inter-stage comparison of each individual scale revealed that there were also no significant differences in the performance of the individual scales between the stages. This suggests that the Page 23 of
  • 25. A child’s job is to play, we should let them... Pamela Wong, Direction First interactive and gaming elements did not affect the scales. However, inter-stage effects were observed in consistency checks. A significantly higher proportion of respondents in the traditional survey (Stage 1) followed instructions correctly than in the interactive (Stage 2) and interactive- gaming (Stage 3) surveys. When asked questions about themselves (i.e. number of siblings), most children answered consistently at all stages. Perhaps children don’t like being told what to do when they are playing? Overall, all 4 question scales at all stages were seen as easy and fun to use. In terms of ease of use, the Standard scale was considered as less easy to use when presented in a traditional and interactive survey. With interactive - gaming elements, all the scales were seen as similarly easy to use, suggesting that gaming elements made them somewhat easier. There was consensus that the Standard scale was less fun to use, and this was observed across all stages. The addition of interactive and gaming elements neither enhanced nor reduced the level of enjoyment. Because enjoyment and ease of use scores were positive and high across all stages, a few questions arose:  Did the children not want to admit that the task was difficult because they aspire to do things that older children can do easily?  Were these results affected by the tendency for children acquiesce when asked if they had fun, even when they had not?  Was it that we thought some of the scales were more fun than others, when in fact, from a child’s perspective, they were not quite as fun to them as we expected? The standard scale offered slightly better discriminating power, but was not as easy and fun to use compared to other scales. We would suggest that this scale could lead to boredom and be less engaging when conducting research with children. The P&K and star scales, designed originally for children, both performed well and similarly in discriminating power. Both were considered as easy to use and fun. The P&K scale rated as slightly easier to use in traditional format (Stage 1), whilst the Star scale was considered as slightly more fun when used with interactive and gaming formats (Stage 2 and 3). Because both scales perform equally well on one aspect or another, it would be important in moving forward that we test their performance on other aspects, such as predicting real behaviour. Both of these scales remain more or less suitable for research with children, as we have not yet found a better scale. With the star scale, we would recommend that children should be reminded that the stars do not reflect right or wrong answers, so giving lower scores to something does not mean they will not be rewarded and vice versa. The P&K scale is not widely used in Australia and may be more suitable because the language is more child-friendly, however, it should be noted that language differences between Page 24 of
  • 26. A child’s job is to play, we should let them... Pamela Wong, Direction First American and Australian children could mean that this scale needs to be adapted to the vernacular of Australian kids. Language may be another area to investigate in future. The smiley face scale performed on par with the other scales in terms of discrimination power, and was seen as significantly more fun and easy to use than the Standard scale. This scale however, has been criticised in sensory food literature because it is considered as ambiguous and may lead to misinterpretation due to its emotional element. Whilst some would say that the advantages of this scale would not be likely to outweigh its uncertainties, there may be more merits to this scale than currently recognised. And it is possible that what some believe to be the misgivings of the scale, are in fact the strengths. There is currently a convergence of thought surrounding the role of emotions and decision making. Damasio (1994) in his book, “Emotion, Reason and the Human Brain” suggests that rationality stems from emotion, and that emotion stems from bodily senses. This theory is now informing developments of thinking in biometric and neuroscience research. Next Steps This research prompts Direction First to consider the potential role of biometrics and emotions in sensory food research in future. Traditional sensory research relies heavily on self-reported data for measuring hedonics. However, because self-reported data is often obscured by experience and conscious thought, it may not provide enough insight into true responses and behaviour. Biometrics measures involuntary physiological responses such as heart rate, respiration patterns, perspiration and body movements. Biometrics such as those used by Bryant (2009) and Zeinstra (2009) utilise cutting edge technology to interpret what are considered as involuntary and therefore, unobscured, “real” and “true” measures of appeal, enjoyment, engagement, and attention. Currently, these technologies are not widely available; however there may be forthcoming common measures in emotions and sensory food research in the future. The ethics of the use of biometrics with children will require industry discussion. In light of this research, the general performance of the different scales, response consistency and the suggestion that gaming elements did not significantly contribute to scale discrimination, we suggest caution in moving scales strongly in this direction without consideration for the whole research approach taken with children. Children played with us in their answers when we established a more playful environment, and perhaps this is not what we want in research. Page 25 of
  • 27. A child’s job is to play, we should let them... Pamela Wong, Direction First References Balogh,M., 2002.Cracking the kids marketing code, B&T, 2002 [http://www.bandt.com.au/articles/03/0C00FC03.asp, accessed 22.01.10] Brand, J.,Borchard, J. And Holmes, K. 2009. Interactive Australia, 2009.National Research prepared by Bond University for the Interactive Entertainment Association of Australia. Bryant, J. A., Weinberg, L., Levine, B., Jacobs, D. and Massoudian, M., 2009. Inspiring Change: Innovative Methods and Integrated Advertising. Online Research, Part 1, ESOMAR 2009. Cape, P. 2009. Questionnaire Length, fatigue Effects and Response Quality Revisited. Survey Sampling International. Chambers, E IV. 2005. Conducting Sensory Research with Children: A Commentary. J. Sensory Studies. 20: 90-92. Cooper, H., 2002. Designing successful diagnostic scales for children. Presented at Ann. Mtg. Institute o f Food Technologists, Anaheim, CA, June 15-19. Covey, N., 2007. Connected Kids: Trends in Youth Gaming. ARF Youth Council, 21 August, 2007. The Nielsen Company. Cranmer, S. and Ulicsak, M., 2010. Gaming in Families, Final Report, Futurelab, United Kingdom. C&R Research, 2009. YouthBeat, KidzBeat Magazine Winter. Damasio, 1994. Emotion, Reason and the Human Brain. Fliegelman, A., Metx, P., and McIlrath, M., 2004. The ABC’s of Conducting Effective Market Research with Kids. C&R Research. Published in Media Research Club of Chicago (MRCC), June 2004. Franco, C., 2010. Popular Online Games: new insight from European Research, WARC Geraci, J.C. 2004. What Do Youth Marketers Think About Selling to Kids? Harris Interactive. Published in Media Research Club of Chicago (MRCC), June 2004. Gladwell, M., 2001. The Tipping Point, Abacus, London, UK. Page 26 of
  • 28. A child’s job is to play, we should let them... Pamela Wong, Direction First Guinard, J.X., 2001. Sensory and consumer testing with children. Trends in Food Science and Technology, 11(8), 273–283. Kroll, B. J., 1990. Evaluating rating scales for sensory testing with young children. Food Technology, 44, 78–86. Nairn, A., 2009. Protection or Participation? Getting research ethics right for children in the digital age, ESOMAR Congress. Lawless, H. T., Popper, R. And Kroll, B. J. 2010. A comparison of the labelled magnitude (LAM) scale, an 11-point category scale and the traditional 9-point hedonic scale. Food Quality and Preference 21 (2010): 4-12. Popper, R., & Kroll, J. J., 2005. Issues and viewpoints conducting sensory research with children. Journal of sensory studies, 20(1), 75–87. Also published in Food Technology, May 2003 Vol 57:5, 60-65. Popper, R. And Kroll, J.J. 2003. Conducting Sensory Research with Children. Food Technology, Vol. 57:5, 60-65. Schraidt, M.F., 2009. Testing with Children: Getting Reliable Information from Kids. Peyram & Kroll Research Corporation (http://www.pk-research.com/paper_15.html, accessed April, 2010) Sleep, D. And Puleston, J., 2009. Leveraging interactive techniques to engage online respondents, Engage Research and GMI Interactive. Solomon, D. and Peters, J., 2005. Resolving Issues in children’s research. Young Consumers, Quarter 4, World Advertising Research Center, 68-73. Ubrick, B. (2002). Kids have great taste: An update to sensory work with children. Presented at Ann. Mtg. Institute of Food Technologists, Anaheim, CA, June 15-19. Zeinstra, G.G, Koelen, M.A., Colindres, D., Kok, F.J.. de Graaf, C., 2009. Facial expressions in school-aged children are a good indicator of ‘dislikes’, but not of ‘likes’. Food Quality and Preference 20 (2009): 620–624. Page 27 of
  • 29. A child’s job is to play, we should let them... Pamela Wong, Direction First Appendix 1.1 Materials and Methods The study was conducted in three separate stages. Each stage contained an independent sample of respondents. The core materials and methods used were the same in each of the three stages. The main contrasts across each of the stages were as follows:  The first experiment was designed to compare the four scales as they exist in their standard ‘black and white’ form.  The second experiment introduced the four scales in a graphically enhanced ‘interactive’ format, with the scales providing light and sound feedback to respondents.  The third experiment repeated the four interactive scales with the introduction of an avatar like character that respondents designed at the beginning of the survey and which then operated as a guide taking them through the survey. This stage also introduced a series of background images that the guide was placed within. 1.1.1 Samples The samples were presented as conceptual text descriptions and images of common food consumption items, flavours, and commercial-like products. The common food consumption items included milk, honey, ice cream, bread and water. The flavours included the taste of mint, chocolate, cinnamon, peanut butter, and lemon. The commercial like products were made up concepts of a mix of sweet biscuit and savoury snack products that were relatively similar to some existing market products. This mix of different foods, flavours and products was used to ensure that the scales were tested across the different levels of food consumption – from basic flavours, to common foodstuffs, to commercial products. This was to test the scales in contexts relating not only to concept testing, but also on aspects more likely to be presented in sensory testing applications. The products were also selected to contain a mix of liked, neutral, and disliked flavours and products, and so represent a wider hedonic range. Page 28 of
  • 30. A child’s job is to play, we should let them... Pamela Wong, Direction First 1.1.2 Measurement Instruments The four scales were tested in each of the stages: 1. 9 pt star scale 2. 5 pt smiley face scale 3. 9 pt hedonic scale 4. 9 pt P&K scale 1.1.3 Procedure Each participant rated their liking of the fifteen items on one scale, before moving onto the next scale, until all four scales had been used to rate the items. The order of the scales was randomised in a balanced block design across participants. Scale experience questions were presented at the end of each scale, and ‘warm-up’ questions were used at the beginning of each new scale to ensure respondents were aware that they had transitioned onto a new scale. 1.2 Statistical Analysis 1.2.1 Making the samples from each stage comparable To ensure samples from each of the stages of research were homogenous in terms of age and gender, an interlocking quota was used in each stage of the research with an even balance of age and gender as follows: Table 1. 7 to 8 9 to 10 Male 25% 25% Female 25% 25% At the completion of surveying (Chi-squared) testing indicated that there were significant differences in the proportions across the stages ( ). The dataset was therefore weighted with each individual stage being balanced towards the target quota’s, with 25% obtained in each cell. Page 29 of
  • 31. A child’s job is to play, we should let them... Pamela Wong, Direction First 1.2.2 Making the scales comparable As three of the four scales had nine data points, the smiley face scale of five points was converted to a 9-point scale with 1=1, 2=3, 3=5 and so on, as shown in the following table. Table 2. Scale Score 9-point hedonic 1 2 3 4 5 6 7 8 9 9-point P&K 1 2 3 4 5 6 7 8 9 Star (9 point) 1 2 3 4 5 6 7 8 9 Smiley 1 2 3 4 5 All reported mean scores on the items tested are therefore on a 9 point scale. 1.2.3 How the scales were compared The performances of the scales were compared using several different approaches to cater for the different hypotheses surrounding inter-scale and inter-stage differences. The main areas of measurement were response consistency, respondent engagement, scale discrimination power, and range of scale used. To compare the performance of the various stages, response consistency was measured through a series of question checks that were repeated at the beginning and end of the survey. These involved indicating the number of brothers and sisters participants had, and clicking at selected points on scales. To measure respondent engagement, 5-point likert scales were used to obtain feedback from respondents on how easy and how much fun they had on each of the scales in the study. The items on the ease of use scale ranged from 1 ‘Hard’ to 5 ‘Easy’. The items on the fun scale ranged from 1 ‘No fun at all’ to 5 ‘Lots of fun’. Both scales had numerical values assigned to all points on the scale, and so were treated as scale variables for the purpose of analysis. The F-Ratio from Anova represents the ratio of systematic to unsystematic variance, or signal to noise. Consequently, it has been used as a measure of scales’ ability to differentiate products (Lawless, Popper, & Kroll, 2010). The number of differences between means in post hoc comparisons is also a common measure of product differentiation. Consequently, the product F- ratio from Anova and number of different means by Duncan’s multiple range test were used as measures of product discrimination. Where variances are uneven, non parametric alternatives were used. Page 30 of
  • 32. A child’s job is to play, we should let them... Pamela Wong, Direction First Prior to scale comparisons based on F-Ratios, respondents who had failed any of the consistency checks were removed from the data file. Because there were different age and gender sample sizes in each of the stages, weighting was applied to make these consistent across stages. The range of the scale used was calculated as the highest minus the lowest rating given across all fifteen attributes, and then divided by the total scale range. 1.1.1 Participants In the first experiment one hundred participants with children aged from seven to ten years were recruited to a web survey. Parents of children completed a screening exercise, with children taking over once the screener was complete to undertake the survey. 1.2 Results – Stage 1 1.2.1 Respondent Engagement In some cases Levene’s test indicated that the variances associated with the scales were not even, so a non-parametric F-test, Browne-Forsyth, was used. For paired comparison post hoc analysis in such cases the Games-Howell test was used. 1.2.1.1 How easy were the scales to use? Browne-Forsyth revealed a significant difference among the scales. F(3,378)=6.4, p<.01. Games- Howell paired comparison tests indicated that the Smiley scale was significantly easier to use than the Standard 9pt and Star scales, but not significantly better than the P&K scale; while the P&K scale was significantly easier to use than the Standard 9pt scale (P<.05 in all cases). Table 3. Scale Avg. Standard 4.9 Star 4.6 Super 4.7 Smiley 4.9 Page 31 of
  • 33. A child’s job is to play, we should let them... Pamela Wong, Direction First No significant differences were observed by age and gender. Table 4. Age & Gender How EASY was it to answer the questions about the flavours and foods 7-8 7-8 9-10 9-10 on this scale? male female male female Stage1Star 4.5 4.6 4.7 4.5 Stage1Smiley 5.0 4.9 4.9 4.8 Stage19pt 4.2 4.7 4.7 4.4 Stage1P&K 4.5 4.7 4.7 4.9 1.2.1.2 How much fun were the scales to use? Browne-Forsyth did not reveal a significant difference among the scales F(3,455)=2.2,p=.082. The Welch F-Ratio almost reached a significant level F(3,262)=2.4, p=.067. Games-Howell paired comparison tests revealed a ‘trend’ that the smiley scale was more fun than the standard 9-point scale (p=.06). Table 5. Scale Avg. Standard 4.2 Star 4.4 Super 4.4 Smiley 4.6 While no significant different differences were observed in terms of ease of use, there was a directional indication that younger males did not have as much fun on the 9-point hedonic and P&K scales as their older counterparts, as indicated by the Welch test, F(3,61)=2.6, p=.06. Table 6. Age and Gender How much FUN did you have 7-8 7-8 9-10 9-10 answering the questions about the male female male female flavours and foods on this scale? Stage1Star 4.1 4.4 4.7 4.3 Stage1Smiley 4.5 4.6 4.6 4.6 Stage19pt 3.9 4.3 4.6 4.1 Stage1P&K 4.0 4.4 4.7 4.4 Page 32 of
  • 34. A child’s job is to play, we should let them... Pamela Wong, Direction First 1.2.2 Scale Discrimination Power Repeated measures analysis of variance with Duncan’s tests were used to compare the scales all possible pairs of means. The overall hedonic ratings showed a very similar pattern across scales. Figure 1. Stage 1 Means of the fifteen items tested across each scale *Sorted in descending order of product liking A slight advantage went to the Standard 9pt Hedonic scale in terms of identifying a greater number of significant differences. Despite having fewer scale points, the Smiley scale showed a similar level of performance as the other scales. Table 7. F-Ratios and Significant Duncan tests Stage 1 No. of sig Duncan Scale F-Ratio tests Star 37.4 85 / 105 Smiley 41.6 85 / 105 Standard 41.0 87 / 105 Super 40.3 85 / 105 Page 33 of
  • 35. A child’s job is to play, we should let them... Pamela Wong, Direction First Df for stage 1 were (14,1784) for all F-Ratios. All F’s were significant at p<.001. Duncan’s tests were performed on 105 possible pairs of means. Proportion of Scale used The proportion of the scale used was treated as a mean score (from 0-1). No significant differences were observed. Table 8. Proportion Scale of scale used Stage1Star 74% Stage1Smiley 77% Stage19pt 76% Stage1P&K 72% 1.1 Results – Stage 2 1.1.1 Respondent Engagement 1.1.1.1 How easy were the scales to use? Levene’s test indicated that the variances associated with the scales were not even, so a non- parametric F-test, Browne-Forsyth, was used. For paired comparison post hoc analysis the Games-Howell test was used. All of the scales were felt to be easy to use – all obtaining a mean score of over 4 out of 5. Browne-Forsyth revealed a just significant effect, F(3,429)=2.6, p=.05, and post-hoc tests revealed a directional difference suggesting the Smiley scale to be easier to use than the 9-point Hedonic scale (p=.07). Table 9. Scale Avg. Standard 4.5 Star 4.6 Super 4.7 Smiley 4.8 Page 34 of
  • 36. A child’s job is to play, we should let them... Pamela Wong, Direction First No significant differences were observed across age and gender. Table 10. Age & Gender How EASY was it to answer the questions about the flavours and foods 7-8 7-8 9-10 9-10 on this scale? male female male female Stage2Star 4.5 4.5 4.9 4.5 Stage2Smiley 4.9 4.7 4.7 4.8 Stage29pt 4.4 4.5 4.7 4.5 Stage2P&K 4.7 4.9 4.7 4.6 1.1.1.2 How much fun were the scales to use? All of the scales were fun for the respondents to use – all obtaining a mean score of over 4 out of 5. A significant difference between the scales was found with Browne-Forsyth, F(3,464)=4.5, p<.01. Post-hoc tests revealed the Star and Smiley scales to be significantly more fun to use than the 9- point Hedonic scale (p<.05). Table 11. Scale Avg. Standard 4.1 Super 4.3 Star 4.5 Smiley 4.5 No significant differences were observed across age and gender. Table 12. Age and Gender How much FUN did you have 7-8 7-8 9-10 9-10 answering the questions about the male female male female flavours and foods on this scale? Stage2Star 4.7 4.3 4.6 4.3 Stage2Smiley 4.4 4.5 4.6 4.6 Stage29pt 4.1 4.0 4.3 4.1 Stage2P&K 4.2 4.1 4.5 4.4 Page 35 of
  • 37. A child’s job is to play, we should let them... Pamela Wong, Direction First 1.1.2 Scale Discrimination Power Repeated measures analysis of variance with Duncan’s tests were used to compare the scales all possible pairs of means. The overall hedonic ratings showed a very similar pattern across scales, however, in stage 2 a pattern emerged whereby the P&K scale tended to record higher scores than the other scales. Figure 2. Stage 2 Means of the fifteen items tested across each scale *Sorted in descending order of product liking Table 13. F-Ratios and Significant Duncan tests Stage 2 No. of sig Duncan Scale F-Ratio tests Star 29.4 79 / 105 Smiley 30.6 79 / 105 Standard 30.4 79 / 105 Super 30.0 80 / 105 Df for stage 1 were (14,1784) for all F-Ratios. All F’s were significant at p<.001. Page 36 of
  • 38. A child’s job is to play, we should let them... Pamela Wong, Direction First Duncan’s tests were performed on 105 possible pairs of means. 1.1.3 Proportion of Scale Used The proportion of the scales used in Stage 2 were not significantly different, ranging between 73 and 78 percent. Table 14. Proportion Scale of scale used Stage2Star 77% Stage2Smiley 78% Stage29pt 76% Stage2P&K 73% 1.2 Results – Stage 3 1.2.1 Respondent Engagement 1.2.1.1 How easy were the scales to use? Levene’s test indicated that the variances associated with the scales were not even, so a non- parametric F-test, Browne-Forsyth, was used. For paired comparison post hoc analysis the Games-Howell test was used. All of the scales were felt to be very easy to use – all obtaining a mean score of 4.5 or more out of 5. No significant differences between the scales were recorded, nor were any differences observed across age and gender. Table 15. Scale Avg. Standard 4.5 Super 4.6 Star 4.6 Smiley 4.7 Page 37 of
  • 39. A child’s job is to play, we should let them... Pamela Wong, Direction First Table 16. Age & Gender How EASY was it to answer the questions about the flavours and foods 7-8 7-8 9-10 9-10 on this scale? male female male female Stage3Star 4.7 4.6 4.7 4.6 Stage3Smiley 4.6 4.7 4.8 4.7 Stage39pt 4.5 4.6 4.3 4.5 Stage3P&K 4.6 4.7 4.4 4.7 1.2.1.2 How much fun were the scales to use? All of the scales were fun for the respondents to use – all obtaining a mean score of over 4 out of 5. Levene’s test indicated that the variances associated with the scales were not even, so a non- parametric F-test, Browne-Forsyth, was used. For paired comparison post hoc analysis the Games-Howell test was used. Significant differences across the scales was identified, F(3,416)=6.6, p<.01. Post-Hoc tests revealed that the Smiley scale was significantly more fun to use than the 9-point Hedonic and P&K scales; and that the Star scale was significantly more fun than the 9-point Hedonic (p<.05 in all cases). Table 17. Scale Avg. Standard 4.1 Super 4.3 Star 4.6 Smiley 4.7 Page 38 of
  • 40. A child’s job is to play, we should let them... Pamela Wong, Direction First No significant differences were observed across age and gender. Table 18. Age and Gender How much FUN did you have 7-8 7-8 9-10 9-10 answering the questions about the male female male female flavours and foods on this scale? Stage3Star 4.7 4.5 4.5 4.5 Stage3Smiley 4.7 4.7 4.6 4.6 Stage39pt 4.2 4.3 4.0 4.0 Stage3P&K 4.4 4.4 4.1 4.2 1.2.2 Scale Discrimination Power Repeated measures analysis of variance with Duncan’s tests were used to compare the scales all possible pairs of means. The overall hedonic ratings showed a very similar pattern across scales. The pattern observed in stage 2 was repeated in stage 3 where the P&K scale tended to record higher scores than the other scales. In this stage, the 9-point Hedonic scale was also shown to have a tendency toward higher ratings than the Star and Smiley scales. Figure 3. Stage 3 Means of the fifteen items tested across each scale *Sorted in descending order of product liking Page 39 of
  • 41. A child’s job is to play, we should let them... Pamela Wong, Direction First Table 19. F-Ratios and Significant Duncan tests Stage 3 No. of sig Duncan Scale F-Ratio tests Star 44.0 82 / 105 Smiley 43.8 83 / 105 Standard 39.7 83 / 105 Super 37.3 85 / 105 Df for stage 1 were (14,1784) for all F-Ratios. All F’s were significant at p<.001. Duncan’s tests were performed on 105 possible pairs of means. 1.2.3 Proportion of Scale Used A significantly larger proportion of the Star and Smiley scales were used than the P&K scale, F(3,475)=4.3, p<.01. Table 20. Proportion Scale of scale used Stage3Star 74% Stage3Smiley 76% Stage39pt 71% Stage3P&K 67% 1.3 Results – Comparisons of Stages 1.3.1 Response Consistency Measures – Clicking specified numbers on a scale Chi-squared testing indicated that there were significant differences in the proportions of consistency errors across the stages ( ). Follow up pairwise comparisons (p<.05) revealed that the first stage (containing standard scales) contained a significantly higher proportion of respondents who made no errors compared to both the second and third stages. Page 40 of
  • 42. A child’s job is to play, we should let them... Pamela Wong, Direction First Table 21. Consistency check I Stage Stage Stage 1 2 3 Consistency Checks N=96 N=167 N=248 Neither wrong 94% 79% 81% Both wrong 3% 5% 10% First check wrong, second check right 3% 14% 9% First check right, second check wrong 0% 2% 0% Total 100% 100% 100% Proportion of respondents making consistency errors One in ten respondents got both of the consistency checks incorrect in the third stage, a significantly higher proportion than in either the first or second stages (p<.05). 1.3.2 Consistency Measures – Number of brothers and sisters The consistency measure that assessed the number of brothers and sisters a respondent had at the beginning and then again at the end of the survey revealed very high levels of consistency. Chi-squared testing indicated that there were no significant differences across the different stages on this consistency measure ( ). Table 22. Consistency check II Stage Stage Stage 1 2 3 No mismatch 97% 93% 95% 1 mismatch 2% 7% 5% Both mismatch 1% 0% 0% 1.3.3 Time Taken Time taken to complete the survey was similar across each stage, with no significant differences observed. Data that were two standard deviations away from the mean were removed for analysis. Note: this time is calculated based on time from frame to frame so excludes the building of the character in stage 3 (for fairer comparison). Page 41 of
  • 43. A child’s job is to play, we should let them... Pamela Wong, Direction First Table 23. Time taken to complete survey by stage Time Taken Stage (HH:MM:SS) Stage 1 00:16:52 Stage 2 00:15:24 Stage 3 00:15:18 Page 42 of