top of page

Empirically-Based Assessment in the Wild

Updated: May 30, 2023

You've heard the phrase "testing is an art and a science." How do you balance the clinical art with the science of assessment? How do you ensure you are really doing empirically-based assessment?

Eric Youngstrom and his colleagues have been looking at this problem and developing a model they use in their clinic that is rigorous and scientifically supported. In 2014, they published their Clinical Guide to Empirically-Based Approach to Diagnosis and Treatment in the journal Cognitive and Behavioral Practice. If I had my way, this would be required reading for psychologists who do assessment.

In their model, they use a scientifically-guided and clinically-meaningful 12 step process to applying evidence-based medicine to psychological diagnosis and treatment.

Their approach assumes that making diagnoses and initiating treatment should balance the competing interests of both: (1) ensuring that individuals get the help they need in a timely manner, and (2) being thorough and conservative in making diagnoses and treating problems, so that individuals are not being given treatment they do not need. They give the example of parents of a child with attention problems who want to be sure their child has ADHD before they decide to treat. These parents want to pursue treatment if it is necessary, but only if they can be reasonably sure the treatment is appropriate.

Balancing these competing interests involves finding "threshold" at which proceeding to treatment makes sense. For example, these parents will likely feel comfortable trying treatment if we can determine that the likelihood of ADHD is 90%. As such, their empirically-based assessment model is focused on procedures that can determine if the probably of ADHD is at or above the threshold to treat for that child.

Making this determination involves 8 of the 12 steps in their model (with the later steps in the model relating to monitoring progress, deciding when to stop treatment, and guarding against relapse).

Their first steps involve looking at base rates, both nationally and locally (e.g., in their region and in their own clinic). Using the example above, this information determines the pre-assessment likelihood of ADHD. They also use demographic information (e.g., family history, age, gender) to identify any immediate moderators that might raise or lower the "index of suspicion" that the child has ADHD. In their model, they use something called a probability nonogram to "combine the posterior probability with likelihood ratios to determine the posterior probability of ADHD."

Which, you know, good for them. They helpfully provide the formulae and other information you would need to do this for yourself, but in my practice, that's beyond what I have the time or sophistication to do. This is where I start using their model for empirically-guided inspiration, rather than following it perfectly.

Later steps in their model involve using the information you now have about base rates to prioritize assessment probabilities, guide test selection, and refine the assessment problem. These steps are great and the model is a marvel. However, it's also a bit cumbersome, theoretical, mathematical, and complex for my everyday use.

So, I use their model as a guide for my approach. Using their model as a very strong inspiration, I have developed an 8-step approach I use to try to perform Empirically-Based Assessment in the Wild (by which I mean, as a solo practitioner in a very busy practice).

Here are the 8 steps I follow:

  1. Consider demographic information, referral source, referral questions, and local base rates to have a sense of the likelihood of all various outcomes prior to assessment. For example: If the referral question relates to "auditory hallucinations" and the patient is a 7 year-old child, research on base rates shows us the probability of the outcomes of no diagnosis, anxiety, and ADHD/behavioral disorder are all quite high, while the probability of childhood schizophrenia is extremely low. However, if the child is 17, base rates and demographic information raise the likelihood of the outcome of psychosis considerably.

  2. Use intake interview to gather information about known risks and moderators that raise or lower the "index of suspicion" for each outcome. In this step, I am considering factors that research has shown could change how I interpret the test results. Examples of these kinds of factors include family history, medical health (e.g., sleep, exercise level, nutrition, pain, substance use, comorbid medical conditions, medical history), and past and current stressors. For example, if I learn the child has a sleep disorder, I will be much more cautious about making a diagnosis of ADHD in the presence of attention problems than I would be for a child without a sleep disorder.

  3. Use information from the intake, background questionnaire, observations, and other sources to consider additional contextual factors. These factors involve considering the "whole child" and her environment, rather than just test results. They are contextual factors because they provide context for interpreting the test results. These factors will not change how the test results themselves are interpreted, but will affect treatment planning. Examples of these kinds of factors include family functioning, family social support, the child's coping skills, his or her temperament and personality (and match with parent temperament), child-environment fit, and child strengths and interests.

  4. Design a test battery that includes comprehensive testing of all likely outcomes, and screens of less likely concerns. Generally, I set my threshold at about 10% -- that is, if I estimate the probability of a concern to be 10% or lower, I am only screening for that concern. For any concern where I estimate a higher likelihood, I conduct more thorough testing. Usually I have a list of about 1-3 highly probable outcomes, an additional 3-5 plausible outcomes, and another 3-5 unlikely outcomes. I design a test battery that will comprehensively test for the highly probable and plausible outcomes, and screen for the less likely outcomes.

  5. Add testing and other forms of assessment as needed to clarify diagnosis. My test battery is flexible. If I have planned comprehensive testing of a domain but the child is scoring well above expected levels in that domain, I do not exhaustively insist on additional testing in that area. Similarly, if the screening test suggested more assessment is needed in a domain, I add in testing to clarify. I will also add in other forms of assessment (e.g., school observation; additional collateral informants; functional behavior assessment) if needed to clarify the diagnoses.

  6. Interpret cross-informant and cross-domain data patterns. This step is the ultimate mixture of art and science. Here, I am trying to use my knowledge of the research and accumulated clinical wisdom to synthesize 5 sources of data (informant(s) and self-report, standardized rating scale scores, direct test scores, and behavior observations) while considering 5 things: (A) Comparison to Normative Sample and/or Functional Impairment: Is this child actually demonstrating a deficit or impairment in some area relative to his or her peers (e.g., "Is this normal? Or atypical?") (B) Brain-Behavior Relationships: Does this child's pattern of test scores fit with what is known about the functioning of brain structures and systems in the context of development? (C) Processing Strengths and Weaknesses: Is this child demonstrating the specific processing strengths and weaknesses that research has shown are related to a specific disorder or condition? (D) Research on cross-informant agreement: How consistent are the concerns across setting, keeping in mind the research on the relatively low agreement between raters, and even between behavioral observations and everyday impairment. (E) Research on Ecological Validity: How well do the test results match informant data, when considered in the context of the relatively low ecological validity of some of our tests (e.g., tests of executive functioning).

  7. Review step 3 to finalize case formulation and generate treatment recommendations that consider the "whole child." Once I have a diagnostic formulation, I return to the information about contextual factors I obtained in step 3. In this step, I am thinking about how this information "colors" how I am going to explain the results to the family and what I am going to recommended. For example, I am going to make different recommendations and explain things in a slightly different way to families depending on things like the family's functioning, the parent's personality, what resources and social supports are available to the family, whether the child attends public or private school, and what coping skills the child has.

  8. Present findings using therapeutic techniques, including seeking out and incorporating client preferences in the treatment plan. Families are not likely to follow through on recommendations if they understand and agree with the diagnostic formulation and treatment plan. Even if medication is the best treatment option, if the family is dead-set against medication, it does not make sense to spend the entire feedback session talking about medication. Let's talk more about the recommendations they are likely to actually follow through on. Similarly, if the family does not agree with a specific diagnosis, let's find what we can agree on (e.g., "problems making friends") and focus the majority of the report and treatment planning on those agreed-upon problems.

So, that's my take on conducting "real-world" yet scientifically-guided evaluations. In other words, this is how I conduct "Empirically-based Assessment in the Wild."

521 views1 comment

Recent Posts

See All

1 Comment

I am so happy to hear someone else bring up base rates. Base rate fallacy, bayes theorem/Bayesian statistics were drilled into us throughout grad school but when I talk to other clinicians "in the real world" almost none of them were taught about their importance. In forensic work, people are fixated on specificity and sensitivity but don't consider the base rates of the populations upon which the specific measures were normed. I could pointificate for way too long, but I just wanted to say I appreciate this thoughtful post.

bottom of page