Let's Talk about Stats

Stephanie Nelson Ph.D.
Apr 2, 2020
12 min read

Let's talk about some advanced statistical concepts. Concepts that can inform complex decisions about when to test and treat. Stats issues that when thought through carefully, lead to some potentially counter-intuitive conclusions. Such as that the kindest thing you could do right now if you suspect a child has dyslexia is to not test them.

If you're thinking, "wait... what?!" then read on.

We can make empirically-based decisions about when to test or treat for a suspected condition. While we should always strive towards empirically-based assessment, these principles are even more essential during uncertain times. Empirically-based decisions depend on 3 factors:

The prior probability of the disorder. That is, how likely is it that the child has the condition? This depends on the base rate, both in the generally population and in your specific clinic. It also depends on other things that may increase the likelihood of a disorder, like family history.
The positive and negative predictive validity of your test. That is, how accurate are your tests?
Your threshold for treatment. That is, how confident do you need to be before you recommend treatment? This is going to depend on issues such as the costs of treatment and potential adverse effects.

Let's consider each of these points in turn, using the example of dyslexia. (By the way, if you're looking for more reading in this area, Youngstrom's group has fantastic resources available. They talk about how and why you should be considering these factors in your practice in many places, including this excellent article.)

As we know, dyslexia is a high incidence, pervasive, and well-understood disorder. Somewhere between 5-16% of children meet criteria, depending on what criteria you're using. It's present across setting (i.e., pervasive) pretty much whenever a child is reading. It's causes are well-understood. Research shows that ~75-80% of the time a child struggles with reading (more than expected for their age, cognitive/language level, and amount of education), it's due to deficits in phonological awareness and/or rapid naming. (For more info on this, see Dehaene 2009 and many, many other references).

So if a child with no known cognitive/language issues or educational gaps has trouble reading, the prior probability of dyslexia (or at the very least, subthreshold weaknesses in PA and RN) is extremely high. That is, there's already a 75-80% chance that the child has dyslexia even in the absence of any other information.

As early as 1954, Paul Meehl advised in the absence of other info, you should "bet on the base rate".

In turn, we know there's a very good chance the child would benefit from a phonologically-based reading intervention.

We also know the probability of dyslexia is even higher if other risk factors are present. For example, a family history of dyslexia. And a developmental history that includes early markers of phonological processing deficits. And school records showing they spelled 'guitar' as "jektrr." Let's say the probability is as high as 85% in these cases. (Technically, we'd use Bayes' Theorem and multiply the pre-test odds by the likelihood ratio to arrive at this posterior probability, calculated by plotting it onto a nomogram, but that's a lot of statistics, so for now let's assume roughly 85%).

The risks of recommending treatment for a struggling reader are also pretty low. Instruction in phonological awareness and phoneme-grapheme correspondence helps all children learning to read. This is especially true in orthographically-opaque languages like English. There are are also very few known adverse side effects of treatment. The treatment can be expensive, so that needs to factor into our decision. However, the potential benefits of treatment outweigh the costs in most cases.

Fun fact: Because Italian is entirely phonetically-regular, the incidence of dyslexia is much lower than it is in English.

Given this info, how confident do you want to be that a child has dyslexia before you recommend treatment? 70% confident? 80% confident? 90%? Let's say that given the low risks and high potential reward, we're comfortable recommending a struggling reader receive treatment for dyslexia if there's an 80% chance they have it.

Ok, so now we have data point #1 (an 85% probability that they have dyslexia) and data point #3 (our threshold for treatment is set at 80% confidence). Given this, testing is unnecessary.

We are already more confident than our threshold to treat! We should skip testing and go ahead and recommend treatment. If testing is expensive, time consuming, or difficult to get, we're even more advised to skip testing. We should go straight to recommendations.

[If you're protesting: But... but... but what about if the school has already done testing and says it's not dyslexia?! Shouldn't I test those kids? Well, given that our tests are not perfectly sensitive and the prior probability is so high, you should still skip testing. Indeed, given how likely it is that a struggling reader has dyslexia, you should assume the previous test was a false negative. That's right. Unless you have other compelling data, you should assume the last assessor was wrong. You should go ahead and recommend your standard dyslexia recommendations.]

In fact, low-fidelity testing in these high-probability situations is probably going to muck things up.

"No one told me assessment would be this muddy."

Testing these kids now will slightly decrease your diagnostic accuracy. Especially compared to just assuming they have dyslexia. [This decline in accuracy will be due to false negatives in most situations. However, if you do the testing via telehealth (as we'll get to later), you also increase your risk of false positives.]

As you may have noticed, I'm not a fan of comprehensive testing for high incidence disorders like dyslexia. [I prefer simple screening with a low threshold to treat, and saving comprehensive evaluations for the kids who do not get better or who it turns out screening missed.]

Wide-spread comprehensive testing for high prior probability conditions leads to many systemic problems. This is a little outside the main point, but worth thinking about.

These problems include:

Requiring comprehensive testing creates unnecessary barriers to treatment. This is especially true if testing is costly (e.g., expensive, time-consuming, or a health threat). Over time, these barriers creates an unjust society where only the richest families get treatment. See this heartbreaking USA Today article about a family who was able to pay for a $5000 NP eval to confirm dyslexia and a family who was not.
Testing comes to be seen as "unnecessary" or even "useless" by many. Others notice that most of the time, our testing just confirms the suspected disorder and doesn't add anything. This in turn leads to things like insurance refusing to pay for the "unnecessary" service. It leads to schools seeing outside evals as useless. Even many psychologists(!) view much of their testing as a way to "check the boxes" or prove that they are thorough and conscientious.
This also leads to testing psychologists exaggerating the importance of testing for high incidence conditions like dyslexia. In part, this is because psychologists are "right" so often when their tests say that is dyslexia. As we'll see below, given the high probability, a positive test result is basically a lock for dyslexia. So, a psychologist who tests for dyslexia a lot is going to constantly high-five themselves for getting it "right." They'll naturally ignore the times they missed it, because we all have that cognitive bias. This will be true even though really good tests for high incidence conditions have a lot of false negatives. (Which means we all miss it a lot).

"We're so right and everyone else is wrong!"

Testing psychologists will come to think schools "miss" dyslexia a lot more than they actually do. This is because psychologists often test the false negatives the school missed. This bias towards crediting ourselves for when we got it right and blaming schools when they got it wrong leads to squabbling and bad feelings. We'll each distrust the other, even though the problem is a statistical one and not a problem of accuracy.
More to the point: when psychologists see their testing as more important and accurate than it really is, they'll insist on doing testing when it's not really necessary. Such as during a global pandemic.
Psychologists also forget their evaluations should focus on trying to find the 20% of kids with reading problems who do not have dyslexia. Many psychologists forget they should spend most of their eval trying to rule in or out other potential causes of reading problems. For example, ID, language probs, or rare visual problems. Or lack of educational opportunities. Or social-emotional causes. Or rare medical disorders. Instead, they'll err towards giving a high number of reading-related tests. trying to confirm dyslexia. This is redundant, since, well, we already know it's probably dyslexia. It's also tough on kids who have to take all those reading tests, when reading is something that's hard for them.
Evaluations get so focused on confirming dyslexia rather than trying to figure out what's going on when it's not dyslexia. This means kids can end up with an eval that says "It's not dyslexia" because the GORT and CTOPP are fine. But, the family is not told what could be causing the reading problem, nor what might help. Nor are they told the likelihood of this being a false negative, or when they should consider a re-eval.
Testing companies focus their efforts on creating better tests to find the disorder (which we're already pretty good at finding based on simple history!). This is unhelpful. Testing companies should make tests to explore the alternate explanations for reading problems. We're actually pretty good at finding the kids with dyslexia and helping them. We need to better find the kids who have something else going on, and figuring out how to help them.

I appreciate your patience for that little diversion. The reason that those things were worth thinking about is that many these issues magnify when we test under low-fidelity conditions.

Let's look at why.

I want to make some sort of pun about the magnocellular theory of dyslexia but I can't think of one.

We've figured out that we can skip testing if we're OK recommending treatment when we're 80% confident it's dyslexia. But let's say we want to be be 90% confident before recommending treatment. And let's assume for the moment that testing is free for families and the time or effort involved is not a burden. In this hypothetical situation, since we're only 85% sure it's dyslexia from the history, should test to increase our confidence to 90%?

This is where data point #2 (how good your testing is) comes into play.

We should do testing in this situation if our tests have good positive and good negative predictive validity. That is, we should test if we're confident our evaluation will do a pretty good job of finding dyslexia if it's present. And an amazing job of finding an alternate explanation for the reading problems if it's not dyslexia.

If we go ahead, we're going to need a high-fidelity testing environment and a comprehensive test battery. In fact, there are two things the statistics would tell you that you don't want to do, especially in a crisis:

Shorten our test battery, or
Mess with the fidelity of testing.

Let's look at the first point. We definitely do not want to focus only on trying to "find" the dyslexia in our eval. That is, we want to avoid only giving reading tests. We especially want to avoid convincing ourselves this is a "starting point" in a time of crisis, and that "something is better than nothing."

Because it's already probably dyslexia, this well-meaning strategy ironically increases our likelihood of false negatives. You know, those cases where the CTOPP scores are OK... but that's only because the test isn't sensitive enough to pick up the child's subtle phonological problems.

I'm going to throw some statistics at you here. You can plot these out yourself at calculators like this one if you don't trust my math.

Let's look at what happens under even "good" testing conditions. In this "good" condition, let's assume our tests have sensitivities and specificities of roughly .8. In this scenario, if our test is "positive" for dyslexia, we end up with a posterior probability of 96%. This "translates" to essentially every kid who tests positive being a true positive. Good job us!

But if our test is "negative" for dyslexia, we end up with a posterior probability of 59%. This "translates" to 1 out of every 2.4 negative tests being a false negative. So 1 out of every 2.4 times we get a negative test result, the child actually does have dyslexia and we "missed" it.

These false negative stats get worse if we lower the specificity of evaluation. For example, if we don't rule out other possible reasons for reading problems. Narrowing the evaluation in this case increases our likelihood of missing the problem.

What's the bottom line here?

Especially in situations where testing is difficult to conduct, we can better serve children by saying "It's probably dyslexia." We can then recommend our standard recs, including to "get a comprehensive eval in a few months if treatment doesn't help". The alternative is creating an unacceptably high number of false negatives that then later have to be undone. The alternative is having those children lose opportunities for intervention.

The alternative is the very thing we wanted to avoid doing.

The other thing we would not want to do is give tests under non-valid conditions. Or what I called "messing with the fidelity.

That is, we don't want to test in conditions where we can reasonably assume the test results will be less accurate. For example, we already know reading online reduces comprehension relative to print reading. Here's another paper showing this, which was somehow published in the future of July 2020 (when presumably everything is great again). This research means tests of reading comprehension are going to be hard to administer validly via telehealth.

Tests of phonological processing are also acoustically-sensitive tests. So we can surmise that online administration will affect them. [Though maybe that's just me. I have pretty crap phonological processing and already have enough trouble telling if the child said wulanuwup or wuvanuwup in the quiet of my office... I can't imagine how I'd be accurate via telehealth.]

Giving tests in a way that decreases their accuracy is going to, well, decrease our accuracy.

Specific to reading comprehension and phonological process, we'd suspect increased false positives. That's because the factors above -- not to mention the general level of distress most people are feeling right now -- will depress scores. In turn, this will result in over-identification.

The false negatives will still outweigh the false positives given that the base rate is so high, even if our tests are wildly inaccurate. But this does mean we're more likely to incorrectly say it's dyslexia in those tricky cases where the child has trouble reading but it's not actually due to dyslexia. This is probably not a huge deal in the grand scheme of things, given that the treatment for dyslexia is low risk.

But it's definitely not the best use of anyone's time.

(There's also a small but real risk that doing testing like this will further convince certain folks that comprehensive, in-office testing is "useless", since we showed we were "able" to give abbreviated online testing.)

"I gave my kid the CTOPP at home... can you analyze these and just tell me yes or no by 6PM?"

For all the above reasons, we can't reasonably recommend dyslexia testing at this time.

It would be much kinder to "consult" with families. Skip testing altogether. If the risk factors are there, recommend whatever you'd suggest if you were sure it was dyslexia.

If later you need test results for a specific reason, like the IEP process, do that testing under more accurate conditions. Then, you'll even have the possible advantage of some data on whether intervention helped the child.

We also can't advise testing for very low incidence disorders like emerging psychosis or bipolar. You can think this through why this is for yourself, or do the actual math. But the bottom line is you risk massive numbers of false positives.

Right now, you could make a case for testing of medium incidence, multifactorial disorders. Or those where the costs associated with treatment are quite high. For example, testing for ADHD and ASD.

However, you'd have to be convinced you can conduct a valid evaluation during these conditions. You'd also have to accept certain limits to your accuracy. You'd probably only want to consider this option if you considered it primarily a screening. And if the child was in crisis, and you needed to make an immediate clinical decision. But you could view this kind of non-ideal testing as a necessary screener to help you determine whether a child:

Should get primary intervention even if that intervention is expensive or unpalatable
Should get secondary intervention and be triaged for immediate comprehensive eval once things go back to normal, or
Should go into what Youngstrom and colleagues call the "wait-to-treat + primary prevention zone."

Personally, I'm not confident in my ability to be valid or accurate enough for this kind of testing right now. I view evaluations as an interactive and iterative process. One where the test results and my behavior and process observations help me generate and evaluate hypotheses. I can't figure out how to test online in a way that doesn't jeopardizes the validity of both my test results and my behavior and process observations.

However, that's my opinion, rather than a conclusion based on statistics.

Other psychologists may make different decisions. If you disagree with me, as long as you can justify your decisions based on something other than your intuition, that's fantastic. Go forth and help children with your telehealth evaluations. Be careful about it and do good work.

But if you're making the same decision as I am and suspending your practice, rest assured that you are making an empirically-informed decision that is also actually helping children.

And maybe take this time to refresh your memory about Bayesian statistics.

This is the only "statistics" photo Wix had

Note: This post is written at the 9th grade level. There are 211 sentences, 48 of which are hard to read and 16 of which are very hard to read.

Second Note: I should mention that I only used "best case scenario" numbers above, where you're only testing kids who are struggling with reading. If you are talking dyslexia in the "average random kid" who you assess and you don't necessarily expect dyslexia, your prior probability will be much lower (as low as 5% and maybe as high as 30%?). In that case doing the testing, especially when your sensitivity and specificity are likely to be compromised, will be roughly as accurate as a coin flip under the best of circumstances.