Circulation 36,475 • Volume 16, No. 3 • Fall 2001

Special Article: How Many Medical Error Deaths Are There Really? Noted Expert Critiques Conflicting Study Reports

Lucian L. Leape, MD

Noted Expert Critiques Conflicting Study Reports

When the Institute of Medicine (IOM) Report “To Err is Human” was released in November 19991, the shocking figures that 44,000-98,000 patients die annually as the result of medical errors drowned out the primary message of the report. That message was that accidental injuries result from faulty systems, not from faulty doctors and nurses. As detailed in the IOM report, errors in health care, like errors in every other walk of life, are common. However, they can be prevented or mitigated by attention to how we design the processes and conditions of our work. Certainly no group of physicians understands that concept better than anesthesiologists. The application of this concept over the past 20 years has dramatically improved the safety of anesthesia and provided an excellent model for the rest of medicine.

Most physicians do not accept that concept. They are well-trained to believe that errors represent carelessness or failure to try hard enough. Many find it very difficult to believe that their errors might have been caused by factors beyond their immediate control. From this reference point, all the IOM numbers do is to make us look bad. A corollary of this assessment is that if the numbers were really less, we would not look so bad. One way to reduce the numbers is to show that many of the patients would have died anyway. This is what Hayward and Hofer set out to do. In their recent critique of the IOM Report using multiple reviewers they reviewed 111 deaths2. They found that 6% would not have died if they had received optimal care, but that only 0.5% would have survived intact for 3 months. From this they conclude that previous studies overestimated the mortality from errors.

I believe there are three serious flaws in this study: its objective is ethically indefensible, the findings are irrelevant, and the methods are inappropriate. The fundamental proposition that since a patient is going to die anyway the error (or suboptimal care) is somehow less important or does not “count” is ethically indefensible. Hospitals are full of sick people, some of them deathly so — they need our best efforts. Certainly, patients expect our best efforts regardless of the prognosis, and to provide less, or “excuse” less, violates our ethical principles. In all fairness, Hayward and Hofer make that point in their discussion. However, they then say that “previous interpretations of medical error statistics are probably misleading” (i.e., overblown), and their main conclusion is explicitly that only 0.5% would have made it — i.e., would have “counted.” Not surprisingly, the press calculated new totals from their percentages and concluded that the number of deaths that “count” is not 98,000 but 5,000-15,000. Not exactly what Hayward and Hofer said, but the “message” of the paper, as so often in life, is not what is said but what is implied. The implication was clear, and the press heard it correctly.

The finding that only 0.5% would have lived three months is irrelevant for advancing safety, for the simple reason that the purpose of analyzing accidents is not to “bring back the dead,” but to prevent future injuries and deaths. Whether the patient lived three months, or three hours, or three years, the purpose of the analysis is the same. The goal is to understand the underlying causes of the errors and fix them if we can. This protects subsequent patients from the same mistakes. For this reason, most serious researchers in safety are — not interested in counting — deaths, injuries, or errors — but in understanding their causes. In addition, the finding that only 0.5% would have lived three months has no validity. (Surely no physician who takes care of patients believes it!)

Fatal Flaws?

Finally, even if the study were justified and the findings were of value, the methods by which the data were obtained and analyzed have serious, and I believe, “fatal,” flaws:

a) The sample was not representative: it was a convenience sample of elderly veterans from seven participating VA hospitals. Not a population from which to extrapolate to national estimates!

b) The sample was very small. The conclusions in this report are based on multiple reviews. (That was one point of the report — that with multiple reviews you get closer to the truth.) But reviews were only multiple for 62 patients (not 111 as the abstract indicates). Fifty-nine patients had only one review, and 33 had two reviews (the number used in the Medical Practice Study (MPS)), so only 29 patients had more reviews than in previous studies. Therefore, conclusions were made about the universe of patients who die in hospitals based on a variable number of reviews of 62 nonrepresentative patients.

c) Actually, the major finding that only 0.5% would have lived three months in the absence of an error is based on the evaluation of only seven patients! The authors found that 6% of patients probably would have survived with optimal care. With a total sample of 111, that equals 7 patients. (The low confidence limits reflect the large number of reviews. Get enough judgments and it gets very low — but it is still based on just seven patients!)

d) The multiple reviews were not done randomly. Why would anyone design a study to assess inter-rater reliability in this way?

e) The statistical methods were inappropriate. I am not qualified to independently judge this, so I asked the two best statisticians I know at Harvard. They identified several maneuvers that they felt led to false (low) conclusions, including the use of back transformation by means of inverse log-odds transformation and lack of true randomization of reviewers. The immense variation [one-14] in reviews suggests it was an assignment of convenience. The use of inappropriate confidence intervals does not reflect the sampling variability of records and reviewers. I call it “tortured” statistics, which some have found offensive. At the very least, I think a fair reading is that there are serious methodological questions that should give anyone pause in extrapolating these results.

Reviewer Bias?

One last methodological point: despite the investigators’ best, and, I believe, sincere efforts, reviewer bias looms large in this study. This is often the case in studies that require judgments on outcome of care. As many have pointed out, physician reviews are strongly affected by mindset, as well as hindsight bias. While reviewer bias was also present in the first MPS, it was undoubtedly more randomly distributed — for and against attributing causation. When we undertook the MPS we had no idea what we would find. Although all reviewers brought their own personal biases into the process, they were not aware that there was a huge problem; 180,000 deaths from medical accidents, 98,000 of them preventable. They had no “axe to grind.” They had no reason to find for or against error as a causation of injury. The release of the MPS figures changed all that. My observation is that the vast majority of physicians today do not believe these numbers, do not want to believe them, and think that somebody is just trying to make them look bad. Unless Hayward and Hofer found a very exceptional group, it is reasonable to assume that a fair number of their reviewers had at least unconscious desires to show that things are not as bad as people say they are.

But the most interesting part of this paper is that even with all these methodological problems they came up with an estimate of the fraction of preventable deaths (before estimating how many would have lived three months) of 6%, which is remarkably similar to the 8% found in the MPS! You rarely get that kind of agreement in two studies! In fact, it was closer to the MPS than the findings from the population-based study done in Utah and Colorado, and was, thus, smack in the middle of the IOM estimate of 44,000-98,000 taken from those two studies! If that was all they had done, the newspaper headline for this paper should have been: “VA study confirms high estimate of deaths due to medical injury.”

The take home message from their study is less about the numbers or the methods used to derive them than about our failure to change physicians’ mind sets away from individual culpability toward systems analysis. Virtually everyone who has seriously investigated medical accidents finds the numbers are higher, not lower, than the crude population estimates indicate. The personal medical experiences of our friends and loved ones bear this out. The focus on the numbers is a sad diversion from the work of safety. This is not where the future of safety lies. We do not make hospitals safer by counting bodies.

We have horrendous problems in our hospitals. It is past time to face up to that and do something about it. We do not do it by pretending that these problems do not exist or are not as bad as some people say. The challenge is to get doctors and administrators to recognize this and move beyond blame.

Dr. Leape is Adjunct Professor, Harvard School of Public Health, Boston, Massachusetts, and Member, Quality of Healthcare in America Committee, Institute of Medicine.


1. Kohn LT, Corrigan, Donaldson MS. To Err is Human: Building a Safer Health System. Washington, DC: National Academy Press, 1999.

2. Hayward RA, Hofer TP. Estimating hospital deaths due to medical errors: preventability is in the eye of the reviewer. JAMA 2001;286(4):415-20.