How Outcomes are Measured in Psychiatric Research


This article discusses the ways in which the improvement or worsening of people diagnosed with mental disorders is measured in research settings and clinical drug trials, and how symptom questionnaires are used to determine drug effectiveness.

What do researchers actually measure when they are trying to measure the effectiveness of drugs?

There are various ways that the effectiveness of psychiatric drugs is measured in clinical trials, but generally the approach is very different than how measurements are typically done in clinical studies of most other pharmaceutical drugs. Usually the effectiveness of a drug is measured by observing the biological reduction or elimination of a targeted disease or pathology such as an infection or cancerous tumor. Other times, a drug may be aimed at influencing a theorized biological proxy or surrogate marker for a disease – such as drugs that aim to reduce cholesterol levels because in a small percentage of high-risk individuals cholesterol may apparently contribute towards worsening heart disease.

However, there are no biological methods of any kind to detect if mental disorders are present or not present. (See ICI’s “Learn about How Mental Disorders are Diagnosed” for more information.) Therefore, it’s also impossible to biologically detect if people’s conditions are improving or worsening in any way. Consequently, in psychiatric drug studies and clinical trials, researchers sometimes note whether the trial participants taking the drugs report feeling or functioning better, and occasionally in longer-term studies they examine whether people’s work lives, overall health, or relationships improve. But most often in studies of psychiatric drug effectiveness, researchers simply try to determine if people’s “symptoms” – or problematic experiences and behaviors – generally increase or decrease as a result of their taking the prescribed drugs for a period of a few weeks to a few months.

Very roughly and generally, if a drug seems to decrease over half of the clinical trial participants’ symptoms more than placebo pills do, then the drug stands a good chance of being determined to be effective. So how is it determined if a psychiatric drug is decreasing people’s symptoms or not?

What instruments do researchers use to measure improving or worsening of people’s conditions during psychiatric drug clinical trials?

In most clinical trials for psychiatric drugs, researchers administer questionnaires of about 5-30 questions that ask about patients’ feelings, thoughts, emotions, experiences and behaviors. These questionnaires are also sometimes called rating scales, symptom rating instruments, or outcome measurement tools. Much like mental health screening tests and diagnostic questionnaires, these clinical research questionnaires are designed to help figure out how strong the matches are between what patients are experiencing or doing and the experiences and behaviors that are listed as symptoms of particular mental disorders in the Diagnostic and Statistical Manual of Mental Disorders.

Common clinical research questionnaires include for example the Hamilton Rating Scale for Depression, the Hamilton Anxiety Rating Scale, the Clinician Administered PTSD Scale, the Positive and Negative Syndrome Scale, and the Young Mania Rating Scale. Usually clinicians complete these questionnaires while interviewing and observing patients, though sometimes patients themselves, parents, or school teachers who know the patients may complete them. The questions have multiple-choice answers and each answer receives a specific score, often from 0 to 4 points. The highest possible total scores on some common tests are 52, 60 or 210, and total scores are believed to indicate how severe a person’s mental disorder symptoms are.

The same questionnaires are administered at the beginning of the trial and at the end, and sometimes on several occasions during the course of a trial. The symptom scores at the end are compared with those at the beginning to determine if a patient has improved or worsened.

Are the questions on clinical research questionnaires appropriate for reliably measuring people’s experiences, and why is that important?

Much like mental health screening tests, diagnostic questionnaires, and the Diagnostic and Statistical Manual of Mental Disorders itself (see ICI’s “How Mental Disorders are Diagnosed”), the questions on most clinical research questionnaires tend to be vague and widely open to personal interpretation. For example, the extensively used Hamilton Anxiety Rating Scale (HAM-A) includes questions about to what degree a person shows “worries, anticipation of the worst…”, has “difficulty in concentration, poor memory,” or is exhibiting “fidgeting, restlessness or pacing…” The clinician observes and scores these experiences in the person as being not present, mild, moderate, severe or very severe.

The commonly used Young Mania Rating Scale rates people on issues like, for example, how well-groomed or unkempt they are, and whether the clinician believes they are “animated” or have “excessive” energy. 

These kinds of questions leave vast amounts of room for personal, ad-hoc assessments of the meanings of key words and ideas, including central issues relating to intensity, frequency, duration and context. For example:

  • How does one rate the exact degree – from “not present” or “mild” to “moderate”, “severe” or “very severe” – to which a person is experiencing “worry” or “anticipations of the worst”? (intensity of emotions)
  • How does one distinguish between “mild” amounts and “moderate” amounts of fidgeting and restlessness that are taking place? (frequency of feelings) 
  • How does one distinguish between “appropriate” social grooming, “minimally unkempt” grooming, “moderately disheveled” grooming and “disheveled” grooming? When is someone’s level of energy in social situations to be considered acceptably “animated” versus inappropriately “excessive”? (cultural context for behaviors)
  • For how many consecutive seconds, minutes or hours at a time must a person seem talkative in order to be scored as “talkative” versus “verbose at times” versus “difficult to interrupt”? (duration of behaviors)

The questionnaires do not include instructions on exactly how clinicians are expected to make these kinds of determinations, so a great deal is left to clinicians’ personal opinions.

And this wide leeway for personal opinion is extremely important. Even tiny nuances in how clinicians choose to interpret these answers have very significant ramifications for the clinical trial, and for our broader society’s beliefs about psychiatric drugs. For instance, when a clinician lowers a person’s score from “very severe” to “severe”, or from “moderate” to “mild” on just a few of the HAM-A’s 14 total questions, that creates a 3 or 4-point symptom-improvement score. And in most clinical trials, the measured difference of effectiveness between psychiatric drugs and placebo pills for reducing people’s symptoms is on average little more – and often less – than that amount. Nevertheless, those few points are usually enough for a drug to be approved by the U.S. Food and Drug Administration and for the drug company to be allowed to promote to physicians and the public that the drug is “effective”. (For detailed examples of this actually occurring in pivotal clinical trials for major psychiatric drugs, read any of the articles or mini-booklets about psychiatric drugs in ICI’s Interventions section.)

If clinical research questionnaires usually only focus on symptoms, what is being missed?

Another significant limitation of this whole approach to measuring drug effectiveness is that most of the questionnaires focus strictly on very limited sets of symptoms. So for example if the researchers are testing a drug that is intended to alleviate depression and the questionnaires show that people’s feelings of depression decreased during the clinical trial, then the drug is considered to be effective for treating depression. But if the drug simultaneously increased many people’s feelings of anxiety during the trial and also caused a variety of adverse physical effects, that generally has no bearing on evaluating the drug’s effectiveness at specifically treating depression.

When considering whether or not a psychiatric drug’s risks outweigh its possible benefits, then, it’s critical to understand exactly how that particular drug’s effectiveness was measured. By way of example, as many as 50-70% of antidepressant users will reportedly experience negative effects on their libido and sexual function, and the drugs’ have well-documented abilities to increase suicidal and homicidal feelings, bone fractures, tremors, and other adverse effects. Yet it’s now widely recognized that, compared to placebo, antidepressant drugs on average reduce people’s feelings of depression by only about 2 points out of 50 or 62 total points on common depression rating scales. This is important information to have when weighing the possible risks  of taking antidepressants versus the likely benefits.