Article for Category: ‘Evidence-based practice’

Extra large letter spacing improves reading in dyslexia. Or does it?

June 12th, 2012

High prevalence, high impact disorders like dyslexia are prone to sensational claims from quacks, scientists, journal editors and journalists. The latest is a sensational claim from an article in the high-profile journal Proceedings of the National Academy of Sciences (PNAS) that increasing letter spacing “ameliorated the reading performance of dyslexic children” (Zorzi et al., 2012).

The popular media has picked up these claims. For example the ABC quoted the lead author Marco Zorzi as saying: “Our findings offer a practical way to ameliorate dyslexics’ reading achievement without any training”. But are the claims fact or fiction?

The idea for the study seems to have been grounded in effects of crowding in dyslexia (see Martelli et al., 2009). Crowding occurs when stimuli from the periphery interfere with processing of stimuli in the focus of vision. I am not an expert in this aspect of the dyslexia literature and perhaps someone else may comment. However, my non-expert eye suggests two problems with this literature. First almost all studies (see here for an exception) have used word and/or letter stimuli which confounds reading ability and ‘crowding’ effects. Second, most studies have used age-matched controls rather than reading-age matched controls which leaves open the possibility that the effects on crowding tasks are the consequence of poor reading rather than the cause.

For the purposes of this post, let’s accept that crowding affects people with dyslexia more than good readers. Zorzi et al. (2012) predicted that widening the space between letters in words would decrease the effects of crowding and lead to better reading. They tested this idea in Italian and French people diagnosed with dyslexia (aged 8-14 years). The children had to read 24 short meaningful sentences taken from the Test for the Reception of Grammar. Print was Times New Roman 14-point. One reading condition had normal spacing between letters and the other had letter-spacing 2.5 points greater than normal (normal letter spacing is 2.7 pt in normal text; who knew?). Why they didn’t use single words and nonwords instead of the text-reading task is unclear given that dyslexia is widely acknowledged to be a deficit in single-word reading. People with dyslexia read better in context than they read words in lists (see here). Surely if crowding was the/a cause of dyslexia we would see it more in reading of word lists rather than stories and if increasing letter-spacing improved reading in dyslexia we would see larger effects in single word task?

Anyway….The results of one experiment showed that both the French and Italian groups with dyslexia made less reading errors in the condition in which letter-spacing was greater than normal. However, that on its own tells us nothing other than that doing something led to less errors. It doesn’t say that the specific manipulation (increased letter-spacing) was the key factor. It may be that chewing gum while reading does the same thing. Zorzi et al recognised this and suggested that if crowding really does affect reading and extra letter-spacing reduces crowding effects it is more important to show that people with dyslexia make less errors in the increased letter-spacing condition than reading-age matched controls. This they attempted to do in a second experiment.

The data from Experiment 2 (Zorzi et al) are shown in the figure below. Zorzi et al claimed that the increased letter-spacing condition improved the reading (i.e., they made fewer errors) of their French and Italian groups with dyslexia compared to the reading-age matched controls. These data are what the sensational claims reported in the media are based on. The problem is that their ‘control’ group were not of the same reading-age. Groups with the same reading ability should perform equally in the normal-spaced condition. The Figure below shows that this was not the case. The “reading-age matched controls” were significantly better readers in the first place.


What does this methodological flaw mean? Essentially it means that the claims Zorzi et al (or at least the media) made cannot be supported by the data. Using a group of younger children who were already better readers than the people with dyslexia is irrelevant to the research question. It leaves a data set that is subject to the same criticism as their first experiment. That is, it tells us nothing about the specific manipulation (increased letter-spacing) and it remains possible that any experimental manipulation, including the ridiculous like chewing gum, produces the same results.

Furthermore, it is possible, indeed likely in my view, that the reason the children in the “reading-age matched control” group did not improve as much in the increased-spacing condition is that they didn’t have much room to improve. They were already good readers and were at ceiling on the test. It is unlikely that any kind of experimental manipulation will make a good reader a better reader. Which leads me to my suggestion for replicating this study. Don’t replicate it!

I can’t see how using either age- or reading-age matched controls (i.e., good readers) will allow adequately testing of the hypothesis that increased letter-spacing results in improved reading ability in people with dyslexia because of the point I made above: it is unlikely that any kind of experimental manipulation will make a good reader a better reader. In my view, the next piece of research will need to use equivalent groups of people with dyslexia, one of which receives the extra-letetr spacing manipulation and who does not. It is also worth noting that recent research has shown that the effects of another visual manipulation (coloured overlays) on reading ability is not reliable on repeat testing (Henderson et al., 2012) so any future research should probably run the test multiple times for each condition. Finally, if the research is conducted in English, it would be interesting to see if increased letter-spacing changes error rates (for better or worse) for words that involve single grapheme-to-phoneme correspondence compared to those that have digraphs (e.g., chip and rain) or trigraphs (e.g., slight). It might also be interesting to see if increased letter-spacing reduces errors for words in which letter-positiuon errors can occur (e.g., trail-trial).

Until we see these data I’m keeping my ink dry.


NAPLAN and learning difficulties Part II

June 11th, 2012

In a recent blog post I identified what I saw as problems with the Australian literacy and numeracy testing (NAPLAN) for grades 3, 5, 7 and 9. A number of colleagues have questioned my position and, being the “social phobe” that I am, I am compelled to clarify my position.

There is certainly a lobby opposed to a national literacy and numeracy test full stop but I’m not buying their arguments. A national literacy and numeracy test has many advantages. We know that teachers are not great at ranking student reading achievement for example (see Madelaine & Wheldall, 2007). An objective test may be helpful in identifying struggling students who would otherwise not be identified if we relied on teacher judgment alone. A standardised test can also allow us to see how literacy and numeracy standards are changing across time. For example, a Grade 3 cohort with good maths skills who became mediocre by Grade 5 might highlight a need for better maths instruction in Grades 4 and 5.

What I am arguing is that NAPLAN in its current form fails in two important ways.

First, it begins in Grade 3 by which stage most children who are going to fail have already failed. This is a problem because early intervention is crucial for children who have learning difficulties. If we take reading as the example, the effects of intervention halve after Grade 1. For example, the US National Reading Panel report (NICHHD, 2000) reported that the mean effect size for systematic phonics in kindergarten was d = 0.56; d = 0.54 for first grade; and d = 0.27 for grades 2-6. Clearly we have to get in early. One might argue that schools have early identification and intervention in hand before Grade 3 NAPLAN. I strongly suspect this isn’t the case in most schools. I recently published a book chapter that looked at the growth in reading skills of a group of 61 poor readers and 52 good readers over the course of a school year. All poor readers were engaged in learning support interventions of some description. The outcome was that only one of the 61 children who were poor readers at the beginning of the year made meaningful growth in reading skills. All the others essentially stayed at the same level. Figure 1 below taken from Wright and Conlon (2012) shows standard scores from the Woodcock Basic Reading Skills Cluster (a combination of nonword and real word reading skills) taken at the beginning of each of the four school terms. The age-standardised scores of controls (good readers) didn’t change as one would expect. Unfortunately, the same thing occurred for the poor readers. If they were a poor reader at the beginning of the year they remained so at the end of the year. The conclusion was the same as Denton et al. (2003). Normal school learning support services often lack the specificity and intensity to make a meaningful change in the reading skills of struggling readers. That is, they do little to “close the gap”.

Figure 1 (from Wright & Conlon, 2012). Poor reader and control group means and standard deviations (standard scores with mean of 100 and SD of 15) on the Basic Reading Cluster at the beginning of each of the four school terms.


The second problem with NAPLAN in its current form has been discussed in the previous post. That is, the test format does not provide data that helps teachers identify what specific parts of the reading, writing and numeracy (and arguably language) processes are going wrong and, most importantly, does not provide data that on its own allows design of effective interventions. See also Kerry Hempenstall’s comments.

The answer may lie in the quality response-to-intervention (RTI) approach in Australian schools that I admitted yearning for in my previous post. I would like to see every Kindergarten/Prep teacher employ the very best methods for the teaching of language, reading, spelling and maths skills/knowledge. A sample of children should be subject to weekly tests on curriculum-based measures of the above skills. Estimates of normal growth rates can then be obtained. Every child in Kindy/Prep should then be assessed on these measures weekly and their growth plotted. Any child with a lower than average growth rate should be hit with extra instruction. These children should again be assessed at the beginning of Grade 1 on standardised tests and, if they are still behind their peers, should be hit with another round of intervention using a systematic program (see examples here and here). A NAPLAN style test in May-June of Grades 1, 3, 5, 7 and 9 can then be used to ensure that these children maintain their gains and to identify any children missed by the previous procedure.


Vision therapy and dyslexia: What’s the evidence?

March 25th, 2012

The title of the original version of this post may have implied that vision therapy is an inappropriate treatment in general. Whereas, I intended only to refer to it as inappropriate for treating dyslexia/reading problems. There is evidence that vision therapy is an appropriate treatment for some vision problems such as convergence insufficiency. I regret that any adverse inference may have been drawn from the title.

Vision and reading

Many people have attributed reading problems to one or more subtle ocular or visual abnormalities, including Samuel Orton, who wrote about the difficulty he thought children with dyslexia had with reversible letters and words (eg. b/d, god, dog). However, scientific research has shown that Orton’s view and other views that reading problems are the result of issues with visual processing, visual perception or visual memory are almost certainly incorrect.

In the 1970′s Frank Vellutino and colleagues performed a series of studies in which they compared poor and good readers on a variety of visual processing tasks (e.g., visual discrimination, spatial orientation, visual memory, and visual learning). Most importantly, the tasks carefully controlled for verbal coding ability.

For example, Vellutino et al. found that memory for visually presented words and letters that were visually similar (e.g., b/d, was/saw) was the same in good and poor readers when a written rather than a verbal response was required. In other words, the kids with dyslexia see the same thing and can replicate the symbol but have more difficulty producing the letter or word name verbally. Another experiment showed that good and poor readers performed equally on visual recognition and recall of symbols from the Hebrew script with which both groups were equally unfamiliar. Finally, poor readers do make more ‘visual’ errors when reading compared to good readers of the same age. However, they DO NOT make more ‘visual’ errors than younger children of the same reading age. These data tell us that ‘visual’ errors are the result of poor reading, not the cause.

Even when individuals make errors that seem “visual”, such as migration errors within words (e.g., reading trail as trial) or between adjoining words (e.g., reading fig tree as fig free) what seems to be a visual or attention problem is actually a specific problem with the word-reading process. We know this because people who make these errors do not make the same errors for digit stimuli.

Eye movements and dyslexia

The eye movements of individuals who have dyslexia do differ from those of skilled readers (Rayner, 1998). While reading, the people with dyslexia exhibit longer duration of eye fixation, shorter saccades and a higher proportion of regressions (backward) saccades than controls (Huxler et al., 2006). However, research has demonstrated that abnormalities in eye movements occur specifically in reading tasks. When people with dyslexia and controls are compared on non-reading visual tasks that require similar perceptual and ocular motor demands to reading, there are no differences between eye movements of the groups. Hence the divergent eye movement patterns of people with dyslexia during reading reflect difficulties in the reading process rather than a primary impairment of ocular motor control (Huxler et al., 2006). This conclusion is supported by studies that have demonstrated that the eye movements of people with dyslexia do not differ from younger, reading age matched controls (Hyona & Olson, 1995) and that when people with dyslexia are given reading-level texts, their eye movements are comparable to controls (Olson et al., 1983).

Vision therapy

Vision therapy involves eye exercises, eye-hand coordination tasks and other exercises designed to improve the individual’s motor memory activity. Although in widespread use, a number of reviews have concluded that vision therapy has limited evidence for efficacy (e.g., Barrett, 2009Bishop, 1989, Wright, 2007). In response to concerns regarding the use of visual therapies, a number of influential bodies have conducted reviews and released policy statements for their members. For example, the joint statement of the Committee on Children With Disabilities, American Academy of Pediatrics, American Association for Pediatric Ophthalmology and Strabismus, and the American Academy of Ophthalmology states the following in regard to visual therapy:

“No scientific evidence supports claims that the academic abilities of children with learning disabilities can be improved with treatments that are based on 1) visual training, including muscle exercises, ocular pursuit, tracking exercises, or ‘training’ glasses; 2) neurological organisational training (laterality training, crawling, balance board, perceptual training).”

They go on to say that: “diagnostic and treatment approaches for dyslexia that lack scientific evidence of efficacy such as behavioral vision therapy and eye muscle exercises are not endorsed or recommended.”

Other recent reviews (e.g., The American Academy of Ophthalmology; Wright, 2007) have concluded that there is no scientific evidence that supports behavioural vision therapy or orthoptic vision therapy as effective treatments for reading difficulties. Claims of improvement after visual therapy have typically been based on poorly controlled studies and testimonials and reported benefits can often be explained by the traditional educational strategies with which vision therapies are usually combined or by placebo effects. Eye movements and visual perception are not critical factors in the reading impairment found in dyslexia and the majority of people with known ocular motility and eye movement defects read normally and even people with severely misaligned eyes can excel in reading and academics.

What is the evidence?

Evidence for any form of therapy can come from two sources: (a) theoretical evidence linking a treatment to a problem in a logical manner (e.g., excessive caloric intake results in weight gain so reducing caloric intake will probably result in weight loss) and (b) direct evidence for the efficacy of the treatment in reducing symptom severity (e.g., reducing caloric intake leads to greater weight loss in people with obesity compared to a placebo treatment). Of the two, the latter is a far stronger form of evidence. Both of these forms of evidence are missing for vision therapy as a treatment for dyslexia/reading problems.

Theoretical evidence linking vision and vision therapy to reading

It is likely true that people with dyslexia and other reading difficulties experience vision problems. However, it is also likely that there are good readers who experience vision problems. To prove causality between low scores on vision/visual processing tests one would have to demonstrate: (a) the vision/visual processing problems are specific to individuals who have reading problems, (b) that individuals who have vision/visual processing problems have different behavioural sequelae to individuals who do not have vision/visual processing problems, (c) that therapy that targets the putative visual problem leads to improvements in reading skills in the therapy group but not in an equivalent control group who received a placebo therapy and, (d) vision therapy produces greater gains in reading ability than reading intervention alone. There is currently no evidence showing a-d to be true. Therefore, one has to conclude that there is limited theoretical evidence for using vision therapy to treat reading problems.

Evidence for efficacy

The importance of demonstrating that vision therapy is an effective and therefore appropriate treatment for reading problems has been highlighted in a recent review (p. 5). It was noted that: “Demonstrating treatment efficacy is especially important here because these children and their parents represent a vulnerable group” and  “the onus is clearly on treatment providers to produce the evidence in support of the treatment(s) that they are offering. Without such evidence, parents inevitably run the risk of wasting their time, effort and resources, and they and their children may become disillusioned if expectations are repeatedly raised and then dashed.

There is currently no evidence that vision therapy improves reading ability directly or that greater reading growth occurs following vision therapy (i.e., that it enables better learning). The one study of sufficient scientific merit showed that vision therapy did not improve reading and spelling scores in poor readers compared to a control group of poor readers who received a placebo treatment.

Given the widespread use of vision therapy in a range of learning difficulties and its considerable cost, in terms of money and time, it is astounding that the clinicians and professional bodies that represent those clinicians have not invested in better research. The onus must surely be on clinicians and their professional bodies to prove scientifically that their treatments work. Until they manage that the conclusions of previous reviews that vision therapy is not recommended remain valid.




Comments Off

Using scientific evidence to improve educational decisions

February 15th, 2012

I often find myself playing the role of Grumpy Old Man in conversations about the selection of intervention programs and other teaching practices. A statement along the lines of “but there’s no evidence that it works” is often preceded by much face rubbing and hair pulling on my part. The response I hear most often is “but we see it working” which precipitates more face rubbing and hair pulling from yours truly. Perhaps the biggest barrier in these conversations is that teachers and scientists often have different definitions of what is meant by “evidence“. This blog attempts to explain what scientists mean by the term evidence.

Evidence-based practice

The term evidence-based practice began in medicine. It seeks to maximise the accuracy of clinical decisions based on evidence gathered from the scientific method. No one wants to see a doctor who prescribes a treatment just because they believe it works or because they heard a colleague give a presentation on it. There should be a burden of proof (and theoretically there is, although this burden doesn’t exempt doctors from making mistakes) on a doctor to make treatment choices that have been shown to work significantly better than no treatment or, if there are alternative treatments available, to choose the one that works most effectively with the fewest side effects. These statements are axiomatic but they aren’t often applied to education, an area of at least equal importance as health.

What constitutes scientific evidence?

A very brief description of the scientific method in relation to treatments is as follows:

  1. Select 2 equivalent groups. If they don’t share the same characteristics and the same level of skills before the intervention you can’t be certain that any differences observed after the treatment were due to the treatment itself or to the pre-existing differences between the groups.
  2. Gather pre-treatment data using well-validated and reliable instruments that clearly measure the outcome in question. For example, a comparison of a group versus a one-on-one reading program should be evaluated with tests of reading ability not of motor skill.
  3. Randomly allocate students to the groups. If students, parents or teachers actively select the groups it damages the results. For example, a recent study compared neurofeedback therapy for ADHD to a non-treatment group. The results showed that parent-report measures of ADHD symptoms improved in the neurofeedback group relative to the untreated group. However, because parents actively decided to enrol their child in the neurofeedback group or actively decided not to, all these data show is that if parents believe that neurofeedback is going to work they will report that it does work!
  4. Implement the treatment while making sure that the treatment is run the same for everyone and that additional teaching or therapies are not going on at the same time.
  5. Administer post-tests to determine outcome.

Additional points of note:

  1. Children’s development is not static; they are in a constant state of personal improvement. Observations that students seem to improve over the course of a program/intervention are therefore mostly meaningless. All students will improve over time. The only way to tell if a teaching method works is to compare a child (or preferably a group of children) to an equivalent group who receive a different teaching method.
  2. Placebo effects are large in children. They will often improve just because of the extra attention paid to them. Therefore, observations, or even more empirical data, collected on a treated group in the absence of a group receiving an equal amount of attention are prone to error.
  3. Regression to the mean is a statistical artefact that holds that extreme scores (e.g., low scores on a reading test) are likely to return closer to the mean (average) on repeat testing. Hence, proponents of a brief intervention, say of 2-weeks, may claim that their treatment resulted in the student improving from a score of 70 to 80 and that change was significant when in fact the reason for the improvement may simply be regression to the mean.
  4. Beware research using tests that measure what is taught in the treatment. See Merzenich et al. (1996) for an example.

An example of good data

Hatcher, Hulme, and Ellis (1994) compared four groups of children all of whom had comparable reading dififuclties. The children were randomly allocated to four groups each of which received a different treatment. One got phonological awareness training, one phonological awareness plus reading and another just did reading. The last group received no treatment. Although the 3 treated groups received different treatments, they received the same amount of instruction in terms of time, attention and contact with teachers. Because this study controlled for all other variables they were able to claim that the stronger reading growth in the phonological awareness plus reading group was due to that treatment being superior. These claims could not have been made if the students were not randomly assigned to groups, if they differed on some other variable before treatment, or if some other variable such as the time spent on intervention differed between the groups.

An example of poor data

I recently suggested to a teacher that their suggestion that a child with reading problems see a behavioural optometrist was not evidence-based. See here for a review of the evidence for visual therapy in reading difficulties. The person claimed that they had a student in a previous year who improved in their reading ability while doing vision therapy and that they therefore believed that vision therapy worked. Unfortunately, the belief simply cannot be supported by scientific principles. First, the improvements could have been a placebo effect. In fact, it is safe to assume that placebo effects represent at least part of all positive teaching outcomes for all students. Paying attention to them helps them improve. It is probably equally likely that the child would have improved if they recited the alphabet while standing on a wobble board each day. Second, children improve in almost all skills over the course of a year as a natural course of events. In this case, there is no way of being certain that the improvement wouldn’t have occurred anyway. Finally, teachers obviously want students to improve and history is full of examples where even eminent scientists have deluded themselves into believing something because they were keen for it to be true (e.g., see the case of cold fusion). In summary, beware of using observations of single cases like this as the basis for educational decisions.

Types of evidence

Carter and Wheldall (2008) have proposed 5 levels of evidence that can be used to guide interpretation of educational research.

Level 1 

Level 1 programs or practices meet two criteria. First, they are consistent with existing scientific evidence in terms of current theory and recommended practice. Second, there is evidence of efficacy from a number of independent randomised controlled trials. Carter and Wheldall (2008) refer to Level 1 as the ‘gold standard’ and suggest that programs and practices meeting these criteria may be recommended with confidence.

The Hatcher et al. (1994) study described above represents an example of the gold standard Level 1 evidence.

Level 2

Like programs or practices that meet Level 1 criteria, Level 2 programs or practices are consistent with existing scientific evidence in terms of current theory and recommended practice. They also have empirical evidence supporting their efficacy but the design of the studies may not quite meet the gold standard of a randomised controlled trial necessary for Level 1 rating. These programs represent the silver standard and can be recommended with reasonable confidence.

An example, of a Level 2 program is my own Understanding Words reading intervention program. The data we have on Understanding Words is summarised briefly below.

  1. A clinic study showed that a group of students made significant improvements in response to two terms of Understanding Words teaching. The strength of the gains were strong and similar to the average growth seen in randomised trials reported in the literature. However, because the study didn’t have a control group we can’t guarantee that the changes were the result of the intervention rather than to some other variable.
  2. A controlled study that showed that a group of Grade 1 students with reading difficulties, made significantly greater gains than a control group of average readers. In other words, the poor readers ‘closed the gap’ on the good readers as a result of the intervention. This study goes close to meeting the gold standard criteria except that the students were not randomly allocated to groups and the research was not independent of the program developer.
  3. We also have four studies using well-controlled case series designs. These studies have showed that introduction of the Understand Words treatment prompts increased reading growth in individual students compared to baseline periods in which no treatment or an alternative treatment was being provided.

Together, these studies fall short of the gold standard of Level 1 evidence but the program fits into the Level 2 strata based on its theoretical soundness and treatment-outcome data.

Level 3

Level 3 programs and practices make theoretical sense. These programs could be said to be based on evidence because there is often empirical data showing the effectiveness of the type of teaching contained in the program. However, there have been no scientific studies documenting the effectiveness of the program or practice itself. These programs might be used in the absence of an alternative that has stronger evidence. However, they should be used with caution. An example may be the ELF reading program. Arguably, ELF has a reasonably sound theoretical basis; however, there is no evidence beyond observations that the program works. Teachers and clinicians who want to be evidence-based practitioners would be cautious about selecting the program when there are other programs with stronger evidence bases.

Level 4

Level 4 programs are Not Recommended. They provide little or no empirical evidence for efficacy. They often rely on testimonials and observational ‘data’ to support their claims. Examples include fish oil as a treatment for ADHD and behavioural optometry as a treatment for reading difficulties.

Level 5

Level 5 programs and practices represent those for which there is evidence that the program is unsafe or results in negative outcomes. These programs and practices should be avoided at all costs.


Teachers can do a lot of good by becoming evidence-based teachers. At present, most of teachers’ professional reading involves practically-oriented periodicals or books rather than research-based and peer-reviewed journals (Rudland & Kemp, 2004). It has also been reported that regular and special education teachers value the opinion of colleagues, workshops and in-service activities (which may present opinions with no evidence-base) as more trustworthy than professional journals (Landrum, Cook, Tankersley, & Fitzgerald, 2002). Further, Boardman, Arguelles, Vaughn, Hughes, and Klinger (2005) reported that when making decisions about classroom implementation of practices, special education teachers did not consider it important that they be research-based. I suspect that this is understandable as practical strategies may seem to be more applicable for teachers. However, it would be nice if these things began to change as they have in medicine. Teachers could become more critical about the claims made by proponents of educational practices and more critical of their own teaching methods. They could begin by asking themselves what the evidence is to support the use of programs and practices. They could actively seek out that evidence from peer-reviewed sources rather than relying on books, the Internet or the opinions of colleagues presenting PD. They could ask themselves: “Would I be happy if my GP used a Level 3, 4 or 5 treatment on me just because s/he believed that it worked?“. If not, they could ask another question: “Should I therefore be cautious in selecting educational programs that have limited evidence?“.

Final note

The last thing I intended was for this blog to be interpreted as teacher-bashing. I could write a similar blog about some of my psychologist, occupational therapist, or paediatrician colleagues. Nor am I immune to human foibles and biases. However, the fact is that we should all strive to do better for students who have learning difficulties and indeed all children. To do that we need to recognise the dangers of our belief system and of relying on the opinions of peers. We all need to strive to base decisions on science, not on philosophy or pseudoscience. We would also do better to recognise the limits of our knowledge. To paraphrase Donald Rumsfeld, the best teachers (and other clinicians) know what they don’t know.