Reading notes on `Evaluating the Quality of Intelligence Analysis: By What (Mis) Measure?’ (Stephen Marrin, 2012)

UPDATE 2017-03-22: related reading: “How good is your batting average?” Early IC Efforts To Assess the Accuracy of Estimates by Jim Marchio, in Studies in Intelligence Volume 60, Number 4 (December 2016).

These are my reading notes concerning `Evaluating the Quality of Intelligence Analysis: By What (Mis) Measure?‘ (Stephen Marrin, 2012).

Marrin provides an exposition of the problem of evaluating the quality of intelligence analysis. Marrin first discusses three measures that are employed for retrospective evaluation of the quality of intelligence analysis:

  • Accuracy;
  • Preventing surprise;
  • Influence on policy.

According to Marrin, all measures are problematic, and considering the inevitability of intelligence failures, evaluation of intelligence quality should not rely on absolute measures (e.g. black/white: something is accurate or inaccurate) but be oriented toward relative measures and `improving on the margins’.

Below are snippets from various sections of Marrin’s paper that I keep here for my own purposes. Emphasis is mine.


“One way to evaluate intelligence analysis is according to an accuracy standard. (…) However, while using accuracy as an evaluative criterion is simple in theory, actually comparing the analysis to ground truth and determining whether the analysis was accurate or inaccurate can be very difficult to implement in practice.

First, there is the presence of qualifiers in the analysis. Uncertainty is part of the intelligence production process. (…) When precise information is desired, such as the condition of a country’s WMD program, a CIA analyst cobbles together bits and pieces of information to form a picture or story and frequently discovers many gaps in the data. As a result, an intelligence analyst’s judgment frequently rests on a rickety foundation of assumptions, inferences and educated guesses.

Caveats and qualifiers are necessary in finished intelligence as a way to communicate analytic uncertainty.  Intelligence agencies would be performing a disservice to policymakers if their judgments communicated greater certainty than the analysts possessed.


Words such as ‘probably’, ‘likely’ and ‘may’ are scattered throughout intelligence publications and prevent easy assessment of accuracy.


Removing caveats for sake of simplicity in assessing intelligence accuracy also unfairly removes the record of analytic uncertainty and, in the end, assesses something with which the analyst never would have agreed. For example, if an analyst says that a coup is likely to occur in a foreign country within six months, and the coup happened 12 months later, would that analysis be accurate or inaccurate?


In addition, the analytic judgment could not be considered completely accurate nor completely inaccurate; it is somewhere in between. It is for this reason that the then-Director of Central Intelligence George Tenet said: ‘In the intelligence business, you are almost never completely wrong or completely right’.


In addition, even if accurate analysis was produced, a ‘self-negating prophecy’ resulting from analysis produced within a decision cycle could occur. This means that intelligence analysis can help change what may happen in the future, making the analysis inaccurate. (…) This causal dynamic exists for all intelligence issues including political, economic, and scientific due to the nature of the intelligence mission.”

Preventing surprise

“Like accuracy, another absolute standard for evaluating analytic quality involves the prevention of decision-maker surprise. By describing, explaining, evaluating, and forecasting the external environment, intelligence analysts facilitate decision-maker understanding to the point that decision-makers are not surprised by the events that take place. When decision-makers are surprised, by definition there must have been an intelligence failure since it failed to achieve its objective; preventing surprise.

The problem with this expectation, of course, is that surprise is ever present in international relations. Many surprises are the intentional result of adversaries who employ secrecy to hide their intentions. Secrecy in policy creation and implementation magnifies the effectiveness of power application internationally because, when done successfully, the intended target has little or no time to effectively counter the respective policy. (…)

(…) According to Christopher Andrew: ‘Good intelligence diminishes surprise, but even the best cannot possibly prevent it altogether. Human behavior is not, and probably never will be, fully predictable’. Richard Betts, in his article on the inevitability of intelligence failure, suggests that policies should be implemented in such a way as to be able to withstand the inevitability of surprise, with ‘tolerance for disaster’. (…)”

Betts and Shifting the Standard from Absolute to Relative

“Richard Betts has made the greatest contributions to shifting the evaluative metric from the unattainable ideal of accuracy to something more realistic with his argument that intelligence failures, consisting of either inaccuracy or surprise, are inevitable. Betts’ argument is a sophisticated one which acknowledges that (1) the analytic task is really difficult; and (2) anything done to ‘fix’ or reform perceived problems will lead to other problems. (…)

Betts is primarily responding to earlier efforts to eliminate failure by identifying causes of inaccuracy or surprise and then trying to eliminate them one by one. Many causes of failure have been identified, including an individual analyst’s cognitive limitations, and as a result ‘analysis is subject to many pitfalls – biases, stereotypes, mirror-imaging, simplistic thinking, confusion between cause and effect, bureaucratic politics, group-think, and a host of other human failings’, according to Ron Garst and Max Gross. Many efforts to identify causes of failure then proceed to produce recommendations for ways to eliminate them.

Betts, on the other hand, does not believe failure can be eliminated. According to Betts, failure results from paradoxes of perception that include the impossibility of perfect warning, and the distortion of analysis due to motivated biases resulting from organizational or operational goals. Another source of analytic inaccuracy, according to Betts, is a byproduct of the inherent ambiguity of information, which is related to the limitations of intelligence analysis due to the inductive fallacy which Klaus Knorr highlighted in 1964.

(…) Betts’ conclusion that failure will be inevitable has become the consensus among intelligence scholars. (…)

(…) Yet at the same time Betts says that failure can become less frequent on the margins and also has recommendations for how policymakers can make the inevitable failures less costly or significant.”

The Batting Average Metaphor

“Metaphors from baseball are frequently employed by scholars to frame the evaluation of intelligence performance precisely because many useful inferences can be derived from them. For example, the difference between the fielding percentage, where most anything less than perfection is an error, and a batting average, which provides more room for error without condemnation, highlights the importance of the standard used to evaluate relative performance. In addition, the use of the batting average metric also makes it clear that it is relative success versus an opposing force in the context of a competition where the fates of the batters will, as Betts says, ‘depend heavily on the quality of the pitching they face’. The fact that relative success or failure is contingent on the skill of the opposition has clear parallels in the world of intelligence.”

Using Decision-makers’ Evaluative Framework

“(…) Intelligence analysis is regularly ignored by decision-makers, and frequently has limited to no impact on the decisions they make.

As a result, a new kind of theory or model more effectively explaining what happens at the intersection of intelligence analysis and decision-making has been developed. It conceptualizes the purpose of intelligence as to ensure that decision-makers’ power is used as effectively and efficiently as possible, with the purpose of intelligence analysis being to integrate and assess information as a delegated, subordinate, and duplicative step in a decision-making process. This conceptualization privileges the role of the decision-maker in the assessment process over that of the analyst, thus turning the standard model in its head.

(…) Unfortunately, this emphasis on the significance of the decision-maker in evaluating the analytic product has not been universally embraced by scholars, practitioners, or the general public. Instead, more simplistic measures such as accuracy or surprise tend to predominate the discussion of intelligence performance as a way of characterizing failures. Yet evaluating intelligence analysis using the decision-makers’ perspective could be important since, as Kuhns suggests, the decision-maker is ‘the only person whose opinion really matters’. If decision-makers find the analysis informative, insightful, relevant or useful, then the intelligence analysis has succeeded whereas if the decision-makers are left unsatisfied then the analysis has failed.

Intelligence analysis can be evaluated based on the decision-maker’s perception of its relevance, influence, utility or impact. First, there is intelligence analysis that is relevant to decision-makers. (…) Second, there is intelligence analysis that is influential in terms of shaping or influencing the decision-maker’s judgment on a particular issue. (…) Third, there is intelligence analysis that is useful – which also has to be relevant by definition – and could indicate analysis that is either useful in the sense of improving judgment (i.e. influential) or useful in the sense of achieving policy outcomes, or both.

Asking for feedback from decision-makers may be a way to evaluate the analysis’ relevance, influence on judgment or utility, but doing so can be fraught with peril. Policymaker satisfaction with intelligence analysis is a notoriously fickle and idiosyncratic metric. Decision-makers may not be satisfied with intelligence analysis if it conflicts with their own biases, assumptions, policy preferences, or conveys information that indicates a policy may be failing. (…)

(…) Unfortunately, as Ford goes on to say, reason is not the only factor that drives policymaking. Instead, ‘all kinds of forces go into their making of policy, not excluding timidity, ambition, hubris, misunderstandings, budgetary ploys, and regard for how this or that policy will play in Peoria’.

Clarifying Purpose and Improving on the Margins

“In the end, there is no single metric or standard used to evaluate intelligence analysis, and different people use different standards. This highlights an even more significant issue: the reason that different standards are used is because there is no consensus in either the practitioner’s or scholar’s camp regarding the purpose of intelligence analysis.Some believe that the purpose of intelligence analysis is to be accurate; others believe the purpose is to prevent surprise; while yet others believe the purpose is to be influential or useful. If the intelligence analysis does not meet any of these criteria, then failure is the descriptor that is frequently used.

But the fact that different kinds of failures really represent different normative visions of what intelligence analysis is supposed to accomplish is not acknowledged by most participants. Despite the fact that intelligence analysis has existed as a function of government for decades, both practitioners and scholars have failed to develop a consensus on or even acknowledge differences of opinion regarding exactly what it is supposed to do.

If the failure is determined to be inaccuracy, is the implicit expectation perfection? If the failure is one of surprise, is the implicit expectation omniscience? If the failure is one of lack of influence, to what degree is that more of a policy failure than an intelligence failure? These are questions that both scholars and practitioners should make explicit when they discuss the quality of intelligence analysis and the causes of intelligence failure.

Perhaps the goal of policy should be trying to improve intelligence analysis across the board by improving accuracy, preventing surprise and increasing the value of the product for the decision-maker. But even this will not eliminate failures altogether. As Betts has said, echoing Knorr before him, we should focus on improving performance on the margins – raising the batting average by 50 points, or raising the level of liquid in the glass – not achieving perfection or omniscience.


Finally, understanding and improving intelligence analysis may also require clarifying what we believe the purpose of intelligence analysis is, or what the purposes of intelligence analysis are, and how to best achieve them. Rather than focusing on and studying failure, perhaps trying to achieve success will do more to improve the quality of intelligence analysis than trying to eliminate failure.”


Leave a Reply

Your email address will not be published. Required fields are marked *