JOM Forum: Theory Testing Is Theory Generation

Mikko Ketokivi et al.

Journal of Operations Management2026https://doi.org/10.1002/joom.70039article

FT50UTD24AJG 4*ABDC A*

Weight

0.50

What the paper says

In this paper, we propose that theory-testing research offers just as much potential for generating theory as theory-building and theory-elaborating research, the two variants typically associated with theory generation (Ketokivi and Choi 2014; Lee et al. 1999). Responding to Bendoly and Oliva's (2025) call for searching meaningful theoretical pathways for research contributions, we suggest that theory-testing research has always constituted a meaningful pathway to theoretical contributions when it extends beyond merely applying theory to challenging, expanding, and elaborating it. These extensions can lead to significant adjustments in bodies of knowledge over time as research programs progress. To understand the generative aspect of theory testing, we must distinguish it from theory application. When we apply theory, the objective is usually to address a practical problem, without the interest of contributing to an ongoing theoretical conversation. In empirical operations management (OM) research, the application of factory physics offers an illustrative example: Researchers apply concepts such as Little's Law and laws of variability to improve factory productivity (Schmenner and Swink 1998). In this context, theory consists of the relevant applicable laws that are treated as given, which makes theory effectively axiomatic from an epistemological point of view (Popper 1935/2005, 51).1 In stark contrast to theory application, the fundamental idea in theory-testing research is to place the theory itself under empirical scrutiny. Accordingly, theory is no longer treated as self-evident and certain but propositional and conjectural, subject to revisions (Lakatos 1970; Popper 1963). As an example of theory-testing research, consider Williamson's (1971) question “Why do firms integrate vertically?” This question gave birth to transaction cost economics (TCE), one of the most influential and established research programs on organizational boundaries (Santos and Eisenhardt 2005). The theoretical essence of TCE is succinctly captured by the discriminating alignment hypothesis: “Transactions, which differ in their attributes, are aligned with governance structures, which differ in their costs and competencies, in a discriminating (mainly transaction cost economizing) way” (Williamson 1996, 46–47). Importantly, this statement is not meant as axiomatic but conjectural, as the word ‘hypothesis’ implies: Whether actual governance decisions align transactions and governance structures in a “mainly transaction cost economizing way” is to be settled empirically. Consider Walker and Weber's (1984) seminal TCE-based study that examined the make-or-buy decision in the final assembly of automobiles. TCE-as-conjecture becomes salient in the discussion section where several TCE's central propositions are called into question based on the empirical analysis. For example, the finding that “the effect of transaction costs on make-or-buy decisions was substantially overshadowed by comparative production costs” (Walker and Weber 1984, 387) is inconsistent with TCE's original central proposition that transactions will be aligned with governance structures in “mainly transaction cost economizing” (Williamson 1996, 47, emphasis added) way. When the qualifier “mainly” is interpreted as conjectural and malleable, empirical research not only tests but also informs theory. Walker and Weber's (1984) findings suggest that while transaction costs are relevant, they constitute only a portion of total costs, which are decisive in make-or-buy decisions. Such findings, and many others, have expanded TCE's focus over time from transaction costs to total costs. Another more recent development is that instead of focusing on costs, researchers have incorporated the revenue side into the comparative analysis as well (Ketokivi and Mahoney 2020). More generally, reviews of empirical TCE literature (e.g., Macher and Richman 2008) demonstrate how TCE as a theory has developed significantly over time, mainly through the broadening of its scope. TCE illustrates a general and essential characteristic of theory-testing research: When theory is taken as conjectural, testing theory also generates theory through marginal adjustments. Such adjustments link individual theory-testing research efforts to a broader theoretical conversation and, consequently, enable the accumulation of theoretical knowledge and theory progress. We do not witness similar accumulation in knowledge communities where theories are merely applied.2 Theory-testing research is often described as hypothetico-deductive (Mantere and Ketokivi 2013). We submit that the label “deductive” is accurate for theory application but inaccurate for theory testing; for the latter, the descriptively accurate term is hypothetico-abductive. In this section, we seek to establish this by comparing reasoning in theory testing versus theory application. To understand the role of abduction, we need to distinguish between two central reasoning tasks in theory-testing research: connecting theoretical and observational statements (the theorist's concern) and connecting observational statements with data (the statistician's concern) (Meehl 1990, 116). The statistician's concern is comparatively straightforward, and there is no difference between theory application and theory testing: The statistician's concern is addressed using the established tools of statistical inference, that is, a combination of deductive and inductive reasoning. Differences are found in how the researcher addresses the theorist's concern (Figure 1). In theory application, the theorist's concern is methodologically comparatively simpler. When theory is merely applied, there is no feedback arrow from observational predictions to theory. Furthermore, if theory consists of empirically salient concepts, observational predictions can be deduced from the theoretical foundation (Schmenner and Swink 1998)—hence the term hypothetico-deductive. The case of theory testing is comparatively more complex, as adjustments to theoretical conjectures do not follow a deductive, computational logic (Mantere and Ketokivi 2013). Rather, adjustments are iterative steps of abductive inferences which adjust conjectures based on often surprising findings (Peirce 1877). As an example, let us revisit TCE's discriminating alignment hypothesis. Its central terms (e.g., transaction, governance structure, competence) are theoretical and must be translated from the language of theory into the language of empirical observation. Given that translation involves several possible, non-obvious interpretations (Quine 1951), the reasoning process cannot possibly be deductive. Similarly, since translation does not involve generalization of any kind, it cannot be inductive either. The only remaining form of reasoning is abduction, which is indeed the reasoning tool by which theory-testing researchers bridge the theoretical to the empirical. The abductive translation process is generative because it creates new meaning for theoretical concepts (Gadamer 1975). In their make-or-buy study, Walker and Weber (1984) translated TCE's general concept of uncertainty into volume uncertainty and further into unpredictable fluctuations in demand for components in automobile final assembly. This translation created specific and contextualized—in a word, new—meaning for the concept of uncertainty. The other complicating factor has to do with the feedback arrow to theory (Figure 1). Specifically, testing hypotheses is ultimately a means to the end of testing theoretical conjectures. Empirical evidence that is consistent with the hypothesis constitutes an instance of positive corroboration, whereas inconsistency means negative corroboration (Popper 1935/2005, 264–266). Both kinds not only inform theory but may also lead to adjustments and elaborations. The feedback arrow to theory makes the reasoning process in theory-testing significantly more complex than in theory-application research because it involves the use of modus tollens.3 The use of modus tollens becomes particularly complex in the case of negative corroboration: What conclusions do we draw about theory if the evidence is inconsistent with a theoretical prediction? In his seminal contribution to the literature on theory testing, Lakatos (1970, 133) maintained that in the case of negative corroboration, we are not permitted to direct the modus tollens to the “hard core” of the theory but to its “protective belt” (i.e., measurement issues, data quality, contextual issues, and other problems or oversights that might have given rise to the failed prediction). This is particularly relevant when the theory under scrutiny has amassed a high degree of positive corroboration from past research, or, as Meehl (1990, 108) put it, has “money in the bank.” To suggest that all this money would be forfeited based on just one instance of negative corroboration is both unreasonable and methodologically dubious: There are no defensible methodological principles that permit us to immediately direct the modus tollens to the hard core of the theory. Reasoning about corroboration is an abductive process. The specific form of abduction used in back-translating the empirical to the theoretical differs from the abduction used in translating the theoretical to the empirical; consistent with Bendoly and Oliva's (2025, 7) terminology, we label these “abduction a posteriori” and “abduction a priori,” respectively.4 Understanding how theory testing is theory generation hinges specifically on understanding these two variants of abduction. The connection from abduction to theory generation stems from the fact that abduction is the only form of reasoning that allows the introduction of new ideas in the conclusion of a reasoning process (Locke et al. 2008). Bendoly and Oliva's (2025, 7) observation that abduction is a form of sensemaking offers a useful starting point for establishing that theory testing generates theory. Because both the practices and the objectives of our sensemaking are diverse (Weick 1995), so are the forms of abduction: some forms are selective, others creative; some are theoretical, others empirical; some are explanatory, others non-explanatory; some incorporate only observables while others include unobservables; and so on. Given that there are literally dozens of variants of abduction (Hoffmann 2011; one must be about the specific form In the we the use of abduction in the two of theory-testing of the theoretical conjectures in Walker and Weber (1984) addressed the role of a has in a the of a (Walker and Weber 1984, where the hypothesis to the that we hypotheses from theory, Walker and Weber that the hypothesis was and not from does not include in a as a factor in production the hypothesis incorporated not only TCE's theoretical logic but also contextual Ketokivi and is well that in the automobile the final in a informs the make-or-buy To be there are other in which the final has no relevant in a which effectively the make-or-buy decision into a while the hypothesis about was consistent with it was by contextual as The reasoning used was not but abduction, described by emphasis added) as a given theory core to new application Walker and Weber TCE to the case of associated with the assembly of theory testing is the a and of the theoretical conjectures. the a the a involves a because one from one language to Both in the case of positive and negative corroboration, this involves an where the conclusion is theoretical, the reasoning an to an This is no longer abduction but to the of abduction 2008). The word a and must be examined for positive and negative we we to In case the empirical are consistent with the theory the word the researcher not to the a theory without as this can lead to For example, and point that many of the empirical as for TCE can be to other theories as we not we to When empirical are inconsistent with theory the word the researcher to a of the we to the application of modus Given that the empirical evidence the has failed are we to are the theoretical of the the theory and if in does it have to be Understanding the use of abduction both a and a the generative potential of theory-testing We the of a theory with the of its empirical application. For example, TCE is often described as a theory of the make-or-buy the make-or-buy decision is merely one of TCE's empirical Macher and TCE's empirical to have significantly over In contrast with that on (Williamson TCE-based ultimately to any that can be as a structure, organizational structure, organizational and and Richman More generally, theory-testing research can be generative when it a a for The other aspect of to the a TCE is merely one example of empirical testing not only to but also to significant theoretical adjustments and elaborations. The to total costs and to not only costs but also the and revenue side of transactions has significantly by empirical these have based on empirical we submit that the essence of the contributions is indeed As theoretical contributions are not about generating new theory from about This is the by which theory-testing research is Empirical analysis application and theory testing both an role in empirical In this paper, we have to establish that the has always generative potential to our theoretical To this we must our abductive reasoning practices both in the a and a of in reasoning this involves that theories to new empirical on the one and to theory based on negative corroboration on the This the process by which abductive sensemaking the of theoretical Bendoly and an pathway to theory. The As in the original discussion theory testing is generative because it involves abduction. Researchers must theoretical ideas into a that is and often ideas have to from the evidence to theory, which for they have be because the evidence can often be in more than one and a core it has More such point to problems with or scope. time, these kinds of adjustments a theory can and its this in theory from new theories and more from through theory testing (e.g., and This is a that for the of the and, for in a that a theoretical contribution as a for a question by general principles about how to contributions to theory how do put into What can to these principles in To this of methodological research to a theory contributions and Importantly, demonstrate the practical and of this do this by with the illustrative case of TCE theory as Specifically, how et al. which the of of the meaningful theory contributions by of the steps some of Researchers must by where a theory is or et al. by that TCE has but the of the suggest there is much to their study the of TCE's central the need to evidence on TCE's core and to and this in the addresses the need to use evidence to theoretical on evidence than empirical et al. a of to TCE's This findings into than The is that this addresses the that theory testing must be and because often are Researchers must a theory from where it The et al. study core TCE predictions discriminating from that uncertainty from TCE theoretical This researchers where a theory or with emphasis on theoretical development incorporate instead of as et al. theory to firms under volume uncertainty and theory to merely lead to more These theories not they used to effect Importantly, this addresses theory generation by how theories et al. Researchers use effect to the of theoretical et al. that TCE predictions are often their are (e.g., to These a of TCE's and the conclusion that transaction costs only of This theory from an logic the and of et al. is essential for theory testing and et al. that are a of specific and that are when are both specific and than when they are merely interpreted this as evidence that empirical often and theoretical boundaries between TCE and This is a example of theory through and both TCE and or findings be treated as for theory et al. that the TCE's that transaction costs, and governance such as governance may are than the these as for theory development than of empirical this with the that are the of theory not to theory section theoretical et al. by specific when versus how TCE with other theories under and how governance transaction costs. this in the addresses from theory testing to theory In the to the general epistemological principles into a it is essential to into this and the a study will and theory in and other The When was by his physics from the of the when with a In the a with propose a theory, a study to the theory, the study to and the to or the theory, to the to the language that a without terms or abduction. In their Ketokivi and focus mainly on the and in that is, a study to a theory, which they label as “abduction a priori,” and the to the theory, which they label as “abduction a their new are when terms and are but their to these two since both and their are more and than they these Ketokivi and a finding are the if the is is, how be and how their be may be useful to with some positive analysis by and and researchers in their of and their it is usually to a to it. Such may is, of the and problems we do not well as which of are well or This by for between theory and study as the does with Walker and Weber and for between and theoretical interpretations of practical from such may researchers to understand the their and The of analysis of are no only In a of and there is no study for or to observation can only be by more (e.g., in which case the may be by and of it. under or to may or study may be by (e.g., or or by on data or to for their the and are in the theory to the example of “mainly transaction cost to fundamental concepts time and the of and demand some about is in study and some from and in under may to in the Walker and Weber (1984) when and TCE theory for a specific in a specific but is an is a more concern than about with his question for to TCE to not the concept it to the time of the study, given to and it is to that study is a that will be in some the we can for is a practical of study that more on and tools than a of The is also of where also the modus tollens example in the an in which the or of empirical can be interpreted of for the of where is the of the theory is the of the is the of the is the of the empirical and the remaining are the of the this would not only to for the of but also knowledge defensible about the or other so that for or for its This is more than but is further by two of to and the of to some of the may to their their or with in which case many of the from study example, to between and (Weick researchers may be of some or of in of these is often of may always more of an than an As are or they are and they are Swink researchers their as or by of the (i.e., hypotheses research much of is theory testing research in in other is not testing is In researchers often to a study and for well any theory is research in of in literature is by than theory. Researchers theories to to research and This a form of than that described by Ketokivi and theory application as using a laws to an in researchers often to a for theoretical do researchers with a theory and an to or its core In practical is a and many would that this is the of its to such as and the and the has most through such as the and than researchers have theories from other (e.g., organizational theory, some view this on theories as a it is a of of the theories we have as a the in most and was as production or management in these was by theories (e.g., theory, to in the past has more into and empirical research where is is that and by and researchers to about In the has the of its the time, in an to we have theory development as a central research More than on theories from other theoretical in is the of research through a of a but in direct to There have The over versus in one The rise of operations in the of through most theories in with meaningful theoretical the of theory described by Ketokivi and and the of our can under we researchers to new Given the in our Theory-testing extensions and are to than theory-building The broadening of established as transaction cost the or the theory of Such have these theories by new and it may be much to theory testing to new theory Importantly, negative corroboration is often the most for theory When methodological can be surprising or findings the of a abductive by Ketokivi and as Ketokivi and also point theory development also a abduction. may be particularly researchers be hypothesis and than on the logic of a theory This would and more in the that empirical than merely or to a theoretical is to This would in our to that can a contribution as a to theory and would need to for in our or through data analysis to hypothesis This logic the on which to for research by observation than by to an established theoretical such as these may the development of new theories in the time, the to our empirical and to the of both and research so will us the of much theory, not (Schmenner et al. Bendoly The Ketokivi and a case that research with the of testing theory has always generative We central abductive reasoning in both the a and a of theory testing the of new theoretical consistent with our view that theories are but a of from to of In our terms and their to research that on we by theoretical conjectures and to than to that in or empirical on how their analysis and our understanding of while about the role of in theory We are particularly to how their of abduction a and a the connection between abductive sensemaking and the of theoretical that we as a meaningful pathway for research analysis of how Walker and Weber (1984) translated TCE's concept of uncertainty into meaning illustrates the of generative reasoning we in we to and the in that for Ketokivi and draw a between theory application theory is axiomatic and there is no feedback to and theory testing theory is conjectural and feedback is this is for the of a that is central to our In research, researchers use theory to in The theory is it is as not As in the more general case described when from the theory as they the researcher is into the of abductive sensemaking that Ketokivi and The creates a where application and testing are not but The in our and theory the process but the data by the into theory development research does not into side of Ketokivi and and its because it one of the for theory generation in the of to theoretical abductive Ketokivi and establish that abduction is the reasoning form through which theory testing becomes theory not all of abduction are created that abduction is generative is a The more question for our a abductive to to and from a We have that the of a can be the by it be in and empirically These apply with to the abductive inferences theory When Walker and Weber (1984) as a factor not from that abduction all is, it was with TCE's in contextual knowledge of automobile and with The example because the abduction was not because it was merely Furthermore, and a practical for the abductive Ketokivi and When a surprising finding from theory testing, the researcher must a new for the makes need not a is the is as we have in the of and The point is that abduction structure, not just We submit that the case for generative testing when in the of researchers is the study of how and to feedback structures, variability and the of and and This for the case for generative testing because particularly for abduction. When an researcher a general theoretical concept into Walker and Weber translated into in automobile final are not merely are theory. The contextual translation that Ketokivi and as an a abductive is, in the where is where concepts concepts, and where our contributions the of variability feedback or empirical for a abduction. When us in an context, the process itself often the is not organizational boundaries or but in the of This is theory testing in has generative because the but because the are we an that Ketokivi and makes but does not In our and we from Ketokivi and analysis that generative theory testing is where these two study as an through a abduction. when a to the the researcher into new theoretical to for was The a is, in the bridge between our two a This has a practical researchers in theory testing be permitted by and from deductive to abductive sensemaking the The of which we have is particularly in theory-testing research because it the a that Ketokivi and so effectively

Open paper page →

Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact	0.50 × 0.4 = 0.20
M · momentum	0.50 × 0.15 = 0.07
V · venue signal	0.50 × 0.05 = 0.03
R · text relevance †	0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.