Measuring Gains with the WJ: A "Back of the Napkin" Explanation of W Scores and RPI scores

When re-evaluating a student, we're often faced with a tricky situation: How do you effectively measure a student's growth? Usually, parents and teachers want to know, "Did the student make gains over time? Did they respond well, adequately, or not at all to the intervention? Did they actually lose ground? Did they catch up with their peers?" These questions are difficult to answer with Standard Scores. For example, let's say you test a child who gets a WJ-IV Letter-Word Identification Standard Score of 89 at Time 1, and a Standard Score of 88 a year later at Time 2. Did the child actually get worse with treatment? Or did she stay the same, or maybe even make some progress? More importantly, did she learn anything? Did she get any better at the skill we're measuring? It's hard to answer these questions with Standard Scores (we'll go more into why later in this post). So, what can you look at? Well, there are informal methods, like teacher or tutor reports. There are also curriculum based measures, which will cover in another post. You can also look at circumstantial evidence, such as the child displaying less frustration and anxiety when faced with academic tasks. But to actually, directly answer the question of growth using nationally-normed tests, it helps to know about the W score and the RPI.

The W score and the RPI are scores you can get from the Woodcock-Johnson Tests of Academic Achievement, Fourth Edition. I frequently refer to Assessment Bulletin #11 (which is for the WJ-III-NU but still applicable) when thinking about W scores and RPI. Here's the link: https://www.hmhco.com/~/media/sites/home/hmh-assessments/clinical/woodcock-johnson/pdf/wjiii/wj3_asb_11.pdf?la=en. You may want to bookmark the bulletin to refer back to -- it explains everything I'll talk about below, and then some.

Here's the way I think about W scores: The W score is essentially a mathematically transformed raw score. It measures the child's actual ability on the skill being measured, on an equal-interval scale*. The W score is not a Standard Score, which measures the child's skills in comparison to peers. It's a numerical measure of how much of skill X this student knows.


We don't really have any kind of equivalent score, so it's a little difficult to give a good analogy. However, let's imagine we could quantify all of "K-12 math" on a 1000 point scale. You know 0 maths when you start school, and 1000 maths when you finish 12th grade, and each math point is worth exactly the same as any other math point. The W score is basically like being able to say "Right now, Sheila knows 343 of all of K-12 math. She knows 343 maths."

One of the important features of W scores is that when the W score increases, the child is actually performing better on the test. She is getting more items right, and specifically more difficult items right. By extension, when the W score increases, the student likely actually knows more of the skill being measured.


At a very basic level, it's a more precise or better way of saying "This child was able to get more and harder items correct the second time she was tested." Or, to use the analogy above, it's a way to say something like, "6 months later, after some intervention, Sheila now knows 371 of all of "K-12 math. She's gained 28 maths!" So the W score lets you actually see if the student has made gains in the skill, just by seeing if the W score increased from Time 1 to Time 2.

Other great properties of W scores are that all the items on the WJ are weighted for item difficulty (this is called the W difficulty score) and there's a reference W score for each age level and grade level (the reference W is the W score at which 50% of kids that age got the item right and 50% got it wrong). I assume anyone reading this just fell asleep after that sentence. Here's the important thing about those properties: With this information, the WJ scoring program calculates a student's Relative Proficiency Index, or RPI.

Here's how I think about RPI:

The RPI is the child's likelihood of being successful on tasks that her same-grade level peers have a 90% chance of being successful at. The RPI is expressed as X/90. For example, if Sheila has an RPI of 45/90, this means she has a 45% chance of success on tasks that her typical grade-level peers have a 90% chance of completing successfully. As another example, an RPI of 60/90 means Sheila has a 60% chance of being successful on things her grade-level peers can complete with 90% success. In the bulletin I linked above, there's a table on page 10 that have descriptive labels that tell you how hard the child would likely think grade level work is based on her RPI score. Specifically, it tells you if she would find grade level work: extremely easy, very easy, easy, manageable, difficult, very difficult, extremely difficult, or impossible. The table also has some other useful information, like how to describe her proficiency based on her RPI score. However, I tend to use the difficulty label most often, since it's pretty intuitive.

As pointed out in the Bulletin, this allows you to make statements like this one from page 17:

“Since we last tested Johnny, his RPI has increased from 35/90 to 75/90. Whereas a year ago, he was likely to handle grade-level reading material with about 35% success, his current scores indicate that he’d be about 75% successful."

Let's say Johnny was in grade 2 at Time 1 and grade 3 at Time 2. In narrative terms, looking at that table on page 10, we can now say Johnny probably found second grade level reading "very difficult" last year. This year, after a year of intervention, he probably finds third grade level reading (even though it's now a grade level harder) only "difficult."

So, this all means that the best ways to measure growth over time on the WJ are to:

  1. Look at the W score to see if there was actual gains in the skill over time, and

  2. Look at the RPI to see if likelihood of success with grade-level material has improved with intervention.

Because the WJ uses the W score to calculate age-equivalents and grade-equivalents, you can also look at those scores to get a rough estimate of growth (in a way parents are, honestly, more likely to understand). Because they are derived from W score, the AEs and GEs n the WJ are more mathematically anchored on the WJ than on many tests that use those metrics. However, age-equivalents and grade-equivalents do not capture variability very well. For example, if Johnny gets a grade equivalent score of 3.5, it's not really the case that he has "third grade level reading skills" overall. Probably, he has some reading skills that are much stronger than the average third grader, and some areas where the average third grader would actually be more successful than him. Also, age-equivalents and grade-equivalents are not equal-interval scales. In other words, 1 year of improvement in reading in primary school is much more improvement than 1 year of improvement in reading in high school. To put that another way, going from a grade equivalent of 2.0 to 3.0 is not the same level of gain as going from a grade equivalent of 9.0 to 10.0. That one year of gain is not equal, and instead depends on the child's age/grade level. These problems are interpretative problems that W scores and RPI do not have, which is what makes them so valuable for measuring growth.

To be honest, I still use age- and grade-equivalents in my reports, despite these problems of interpretation. In my opinion, these metrics are the easiest way to get the information or "story" across to a layperson in a limited amount of time. However, I always make sure the W scores and RPI scores support the rough grade- or age-equivalence level argument I'm making. That way, if I need to better quantify or defend my evaluation results, I can easily do so.

Of course, you can still look at the standard score to try to see if the child has kept pace, improved, or declined. However, keep in mind that this interpretation would be require thinking about the child's gains in relation to peers who started at the same level as he did. This requires a lot of inference, especially because Standard Scores for struggling students often stay the same (how's that for alliteration!) or even decline, even though the child might have clearly learned more of the skill over the time interval. This is especially true in early grades where non-struggling students are learning at an astronomical rate. In other words, our struggling student Sheila might definitely be making progress compared to herself with intervention. However, unless a long time and much intervention has occurred between Time 1 and Time 2, she probably she hasn't at all caught up with her non-struggling peers. She may even be a little bit further behind, since her peers are making leaps and bounds of progress. This means Sheila's Time 2 Standard Score will probably be about the same or even lower than her Time 1 score, even though she's actually learning new skills through intervention. For these reasons, It's often hard or downright impossible to explain (or even see) if the child is actually making progress and learning more of the skill just by looking at the change in Standard Score from Time 1 to Time 2.

Hopefully this "back of the napkin" explanation helps those who are trying to figure out how to best use W scores and RPI scores. If you're a stats nerd and I got anything wrong, please let me know! For a much more sophisticated explanation , Benson, Beaujean, Donahue, and Ward have a 2016 article for practitioners on W scores, what they mean, and how to interpret them called "W Scores: Background and Derivation." It is unfortunately paywalled for me, but in case you have journal access, or $36 to spare, here's the link for that article: https://journals.sagepub.com/doi/abs/10.1177/0734282916677433.


134 views

©2018 by Stephanie Nelson, Ph.D.