In 1979, Xerox discovered it was losing market share to Japanese competitors whose unit manufacturing costs were lower than Xerox's selling prices. The company's response was not to cut harder. It was to measure differently.
The discipline that forgot to measure itself
Robert Camp's subsequent benchmarking programme at Xerox – over 200 studies comparing processes against the best in any industry, not just direct competitors – became the founding case study for modern competitive benchmarking (Camp, 1989). The principle was simple: you cannot manage a competitive position you cannot see.
That principle has since become infrastructure across most enterprise disciplines. Financial performance has the income statement, audited against international accounting standards and comparable across sectors, regions and time. Safety performance has RIDDOR reporting, standardised by the Health and Safety Executive and underpinning regulatory enforcement across every sector in the UK. Environmental compliance has emissions reporting frameworks validated by the Environment Agency and international protocol. Even customer loyalty, a famously subjective concept, gained an independent benchmark when Bain & Company partnered with J.D. Power and ROIRocket to create the Certified Net Promoter Score – surveying 3,000 to 100,000 consumers per sector through a neutral intermediary to eliminate the sample and responder bias that invariably surfaces when companies survey their own customers.
Asset performance management – the discipline governing how trillions in physical assets are maintained, invested in and ultimately retired across oil and gas, mining, power generation, water, manufacturing and infrastructure – has nothing equivalent. No independent, comparable benchmark. No way for a board to ask "where do we stand relative to our peers?" and receive a rigorous, cross-sector, evidence-based answer.
The absence is not for lack of trying. It is structural. And understanding the structure of the gap is the precondition for filling it.
What a benchmark is – and what it is not
The word "benchmark" is used loosely in asset management. Maturity assessments are called benchmarks. Consulting scorecards are called benchmarks. Vendor dashboards carry benchmark labels. The imprecision matters because it obscures a structural distinction between two fundamentally different kinds of measurement.
A diagnostic assesses one organisation against a standard or an internal frame of reference. It identifies strengths and gaps. It answers the question: where do we stand against what we said we would do? The Institute of Asset Management's Self Assessment Methodology (SAM2025), aligned to ISO 55001:2024, is a diagnostic. It evaluates an organisation's capabilities against the 40 subject areas of the Asset Management Anatomy v4 on a 0–5 maturity scale. It is rigorous, well-established and supported by over a decade of practitioner use. It is not a benchmark.
A benchmark compares across organisations using a consistent methodology, producing data that allows relative positioning over time. It answers a different question: where do we stand relative to our peers – and is that position improving or declining?
The distinction is not semantic. It is architectural. A diagnostic can be deeply rigorous without producing a single data point that is comparable to another organisation. A benchmark, by definition, must.
Solomon Associates understood this when it built what became the gold standard for refining benchmarking. For nearly 40 years, Solomon's Comparative Performance Analysis (CPA™) has accumulated data from over 500 refineries worldwide, using patented methodologies and rigorous data validation to ensure that every comparison is fair, relevant and defensible. Its Energy Intensity Index – where 100 represents the average and deviations above or below indicate relative efficiency – became so trusted that regulators adopted it as the benchmark for refinery greenhouse gas emissions, and operators across the North Sea and UK downstream sector use it as the reference point for operational performance. Solomon did not achieve this by building a better diagnostic. It built a different kind of instrument entirely: one designed from inception for cross-organisational comparison, longitudinal tracking and peer-group granularity.
Asset management has produced excellent diagnostics. It has not yet produced a CPA™.
Five instruments and the space between them
The past two decades have given asset management a rich measurement ecosystem. A review of the principal approaches reveals why none of them, individually or collectively, functions as a benchmark – and what a benchmark would need to do differently.
Standards-based maturity assessments
ISO 55001:2024 and the GFMAM Asset Management Landscape v3.0 (2024) provide the requirements and taxonomy for an asset management system. The IAM's SAM2025 enables organisations to evaluate their capability against every shall statement in ISO 55001's 31 clauses and sub-clauses. PAS 55 preceded this with a similar structure aligned to the earlier British standard. These assessments are robust and well-established. Their limitation is structural: they measure management system conformance, not operational performance. An organisation can score highly on governance maturity while its assets underperform. Conversely, an organisation can deliver strong availability figures through reactive heroics rather than systematic management. The assessment cannot distinguish between the two.
Metrics-based benchmarking
SMRP Best Practices, the European Federation of National Maintenance Societies (EFNMS), and ARC Advisory Group publish maintenance and reliability metrics: overall equipment effectiveness, mean time between failure, planned maintenance percentage, maintenance cost as a percentage of replacement asset value. Solomon provides sector-specific metrics for refining and petrochemicals. The limitations here are more fundamental than scope. First, these metrics are lagging indicators – they report what already happened, not whether the organisation has the capability to sustain or improve it. An operation reporting 97% availability through round-the-clock reactive firefighting looks identical to one achieving 97% through systematic preventive programmes. The metric cannot distinguish between the two – yet one is sustainable and the other is not. Second, they are vulnerable to exactly the distortion Goodhart described: when availability becomes the target, the definition of "available" quietly narrows; when planned maintenance percentage becomes the KPI, work orders get reclassified rather than planning discipline improving. Third, they fragment by discipline – maintenance metrics, reliability metrics, cost metrics – with no integration across governance, workforce capability, digital maturity, risk management, or supply chain resilience. A high OEE number tells you nothing about whether the organisation's decision-making, investment strategy, or culture can reproduce that number next year.
Consulting-led assessments
Major consulting firms offer proprietary maturity assessments calibrated to the client's sector and context. These can be thorough and actionable. Their limitation is comparability: each uses a different framework, different scales and different weightings. Results from one engagement cannot be compared to results from another. The assessment, however rigorous, dies with the project.
Industry surveys
Plant Engineering, Reliabilityweb and the GFMAM periodically survey their members and readership. The GFMAM's 2024 Asset Management Survey provides a useful directional picture of trends and maturity levels across regions. These surveys offer breadth but not depth: they are descriptive rather than scored, and do not produce a composite index or maturity classification for individual respondents.
OEM and vendor scorecards
Technology vendors offer adoption and usage metrics within their platform ecosystems – condition monitoring dashboards, APM analytics, digital twin maturity scores. These measure adoption of a product, not the performance of the management system around it. They are, by definition, not independent.
The common thread: every instrument measures artefacts or outcomes, never both. Standards-based tools assess the documented system – policies, procedures, governance structures, role definitions – without measuring whether those artefacts produce operational results. Metrics-based tools measure operational outcomes without assessing the management system that produces them. Consulting tools are rigorous but not comparable. Surveys offer breadth without depth. Vendor scorecards measure product adoption, not organisational capability. The profession has invested heavily in measuring what it prescribes. It has not invested in measuring what it performs.
When a measure becomes a target
Before defining what a credible benchmark requires, it is worth pausing on why the measurement problem is harder than it appears.
Charles Goodhart, writing on monetary policy in 1975, observed that "when a measure becomes a target, it ceases to be a good measure." The insight has aged well. In asset management, it manifests as organisations optimising for the metrics that are easiest to report rather than the capabilities that are hardest to build. Planned maintenance percentage rises on paper because work orders are reclassified, not because planning discipline has improved. Availability figures hold steady because the definition of "available" has been quietly narrowed. Maturity scores climb because the assessment covers documented systems, and documentation is cheaper to produce than operational change.
This is not cynicism. It is a structural feature of any measurement system that assesses only one side of the equation. Callon's (1998) distinction between models as cameras (passive descriptions of pre-existing reality) and models as engines (active devices that shape the reality they claim to describe) is instructive here. Standards and maturity models are engines: they are intended to shape organisational behaviour. But engines only work under the right conditions. J.L. Austin (1962) called these felicity conditions – the circumstances that allow a performative act to actually change the world. When those conditions are absent, social institutions restrict performativity, and the model fails to reshape its setting (Brisset, 2016, 2017).
My doctoral research (Moyo, 2018) tracked this dynamic in detail. Over 27 months of ethnographic fieldwork inside a multinational oil and gas company, I studied the Planned Maintenance Change Request (PMCR) routine – the boundary-spanning process through which the organisation developed and changed maintenance procedures for offshore assets. The study identified four felicity conditions that supported the new strategy: pressures from ambiguity that motivated change, the recognition of rework and inefficiencies, participants' capacity for synthesis and improvisation and the truces and accommodations negotiated across functional boundaries. It also identified four infelicity conditions that resisted it: the creation of new uncertainty, contradictions and tensions between competing requirements, apathy and ambivalence among participants and open conflict and antagonism between teams. The interplay of these conditions – some supporting alignment, others maintaining the status quo – defined what I termed the performativity space: the zone within which models and their settings mutually constitute one another.
In asset management terms, felicity conditions include clear decision rights, competent and confident people interpreting data, a culture that tolerates surfacing ambiguity, and incentives that reward proactive rather than reactive performance. You can have a beautifully architected APM stack and still fail performatively if those conditions are absent. You can have modest tooling but strong sociotechnical conditions and outperform peers with far more technology. The ethnographic evidence showed that even well-designed strategies produce two distinct forms of adaptation: adaptation through strict performativity, where intended outcomes are realised, and adaptation through performativity struggles and overflows, where realised outcomes diverge from intentions (Moyo, 2018). Both forms are ongoing. Neither is captured by a snapshot assessment.
A benchmark that measures only the artefacts – the documented system, the technology stack, the reported metrics – risks reinforcing the very distortion Goodhart warned about. A credible benchmark must measure both what the system prescribes and what the organisation actually performs.
Six conditions for a credible industry benchmark
Drawing on the design principles that underpin established benchmarks across finance, safety, energy and customer experience – including the structural features that earned Solomon its regulatory standing and Bain's Certified NPS its cross-industry adoption – six conditions emerge.
| Condition | Rationale |
|---|---|
| Independence | The benchmark must not be operated by an entity with a commercial interest in the outcome. Solomon's credibility rests on its independence from the refineries it benchmarks. Bain's Certified NPS eliminated responder bias by collecting data through a neutral intermediary. A benchmark tied to a technology vendor, certification body, or consulting engagement cannot hold this ground. |
| Methodological transparency | The scoring architecture, dimension weightings, bias controls and classification thresholds must be published and open to scrutiny. An index that cannot explain how it arrives at a score cannot earn trust. Camp's (1989) original benchmarking methodology at Xerox was documented, shared and adopted precisely because its logic was visible. |
| Dual-sided measurement | It must measure both what assets deliver (operational outcomes) and how the organisation is structured to sustain that delivery (management system capability). Measuring one without the other is the structural limitation that prevents every existing instrument from functioning as a benchmark. |
| Peer comparison architecture | Participants must be able to compare against meaningfully defined peer groups – by sector, region, asset type and organisational scale. Solomon segments by refinery complexity and throughput. A gold mine in the Pilbara should not be compared against a European water utility. The architecture must ensure that comparison is meaningful, not misleading. |
| Longitudinal design | A single measurement is a photograph. Watson's (1993) five-phase model of benchmarking evolution – from reverse engineering through competitive, process and strategic benchmarking to global benchmarking – requires data that compounds over time. One measurement gives a baseline. Three reveal trajectory. Five or more surface deeper patterns. |
| Independent governance | Methodological decisions must be subject to external oversight. An advisory council or governance body with no commercial dependency on the index provides institutional credibility that self-governance cannot. This is the mechanism that separates a benchmark from a proprietary scorecard wearing the benchmark label. |
GARPI™: designed against the six conditions
The Global Asset Reliability & Performance Index was not conceived as an improvement on existing tools. It was designed to occupy a space none of them claims. Each of the six conditions above was a design constraint from inception.
Independence. GARPI™ is developed and administered by Optimal® – independent, with no technology platform to sell, no certification to issue and no vendor partnership that influences scoring. Participation is free. There is no commercial relationship between a participant's score and any product recommendation.
Methodological transparency. The scoring architecture is documented in two published whitepapers. Measuring What Matters (Moyo, 2026a) reviews five categories of existing measurement instrument – standards-based maturity assessments, metrics-based benchmarking, consulting-led assessments, industry surveys and vendor scorecards – identifies the structural gap, and sets out the complete scoring framework including six requirements any credible global index must satisfy. The Performativity Gap in Asset Management (Moyo, 2026b) provides the theoretical foundations, drawing on Callonian performativity theory, Austin's felicity conditions, Brisset's analysis of how social institutions restrict performativity, and 27 months of doctoral ethnographic research within a multinational oil and gas company (Moyo, 2018). That thesis, Exploring Limits to Performativity: (Re)Constituting Everyday Performances through Planned Change, developed both the framework for the performativity space and the Model for the Co-performation of Routines and Strategy that underpin GARPI™'s architecture. The instrument comprises 70 scored items across 12 sections, eight weighted dimensions explicitly aligned to ISO 55001:2024 and the GFMAM Asset Management Landscape v3.0. Dimension weightings, maturity tier thresholds and the composite scoring formula are published and open to peer review.
Dual-sided measurement. Four dimensions assess management system maturity: strategy and governance, risk and decision-making, people and culture and information and technology. Four assess operational outcomes: asset reliability and availability, maintenance effectiveness, cost and value and supply chain and resilience. Every GARPI™ score reflects both what the system prescribes and what the organisation achieves in practice. The instrument includes reverse-scored items for social desirability bias control and cross-validation flags that identify inconsistencies between self-reported capability and self-reported outcomes.
Peer comparison architecture. The segmentation framework supports over 12,000 distinct peer cohorts, defined by sector, sub-sector, region, asset criticality profile and organisational scale. Comparison is granular by design: an LNG facility's maintenance function is compared against operations of comparable complexity, not against an abstract industry average.
Longitudinal design. GARPI™ runs on an annual cycle. The instrument includes a propensity-to-act scale that tracks not only current performance but stated intent to change, creating a leading indicator alongside the lagging score. This is not a one-off assessment. It is designed to compound in analytical value with each successive cycle.
Independent governance. In March 2026, we announced the formation of the GARPI™ Global Advisory Council – four independent seats spanning energy, mining, power and manufacturing and academia. Council members hold responsibility for endorsing the scoring methodology, validating sector relevance and assuring data anonymity protocols. They hold no IP rights and receive no commercial benefit from GARPI™ outcomes. Governance Charter GOV-001 is published and available on request.
Why measure the gap – not just the score
In From Conformance to Performance, I introduced the concept of the performativity gap: the structural distance between how we frame asset management through standards, policies, APM architectures and dashboards, and how asset management is actually performed through daily routines, operational decisions and interactions in specific sociotechnical settings.
In The Performativity Gap in Asset Management (Moyo, 2026b), I identify three distinct manifestations of this gap. The conformance-performance gap arises because certification measures framing, not performance – an organisation can adopt the language of ISO 55001 and still have assets that underperform, because the governance structures may be in place but the felicity conditions required to translate documented procedures into effective operational outcomes are absent. Certification and performance are orthogonal (Moyo, 2026b). No Tier 1 mining company publicly claims ISO 55001 certification, yet every major miner frames its asset strategy using the standard's language (Anglo American, 2024; Glencore, 2024; South32, 2024). The artefacts exist on paper. The performance does not follow automatically. Worse, certification creates false confidence: board members and executives see the certificate and assume asset management is under control, when what they are seeing is a validated set of artefacts – not evidence that those artefacts produce results. The intention-realisation gap reflects the finding from my doctoral research (Moyo, 2018) that intended outcomes and realised outcomes lie on a spectrum between non-performativity and full performativity – the gap is not a failure of implementation but a structural feature of how models are enacted in complex settings. And the temporal gap recognises that strict performativity is temporary and fragile: certification captures a single moment, but overflows and reframings continuously widen the distance between the certified state and the current operational reality.
Every asset-intensive organisation already experiences these gaps. McKinsey's analysis of mining operations found average overall equipment effectiveness (OEE) of 39% in open-pit mining – against reported availability of 92% (McKinsey, 2020). That 53-point gap is the performativity gap made visible in a single metric: the framing says the assets are available; the performance says they are producing value less than 40% of the time. The maintenance strategy says preventive; the maintenance reality is 40–50% reactive. The APM platform generates predictive alerts; the planning process cannot act on them within the decision window. The risk register identifies critical spares; the supply chain cannot deliver them within acceptable lead times.
Routine dynamics theory distinguishes between the ostensive aspect of a routine – the abstract idea, how it is documented – and its performative aspect – how it is actually enacted by specific people at specific times (Feldman and Pentland, 2003). In the PMCR study, the ostensive routine remained relatively stable throughout the 27-month observation period. But the performative enactment – how maintenance engineers actually used, adapted, improvised within and sometimes circumvented those procedures – shifted constantly in response to changing pressures and contexts (Moyo, 2018). The gap between what the standard describes and what practitioners perform is where reliability is won or lost. It is also where most existing measurement instruments stop looking.
GARPI™ does not claim to close that gap. My doctoral research demonstrated that the gap cannot and should not be closed, because the creative adaptation within it – technicians adjusting routines to handle equipment that does not match design assumptions, planners improvising around competing constraints, engineers reconciling technical standards with economic realities – is how organisations build resilience. The Model for the Co-performation of Routines and Strategy (Moyo, 2018) shows that strategy implementation is not a linear journey from current state to target state but an ongoing process of mutual shaping between the model and the setting. What GARPI™ does is make the gap visible, measurable and comparable across organisations. That visibility is the precondition for managing it.
The real risk is not that the performativity gap exists, but that our current instruments cannot show it to us clearly. A benchmark that measures both sides of the gap – framing and performance together, over time – transforms the gap from a hidden liability into a manageable variable.
What turns a methodology into a standard
Solomon did not become the reference point for refining benchmarking overnight. It earned that standing over nearly four decades of consistent methodology, growing participation and demonstrated value to the organisations that contributed data. Camp's original benchmarking work at Xerox spread through the 1980s to Motorola, du Pont and eventually the service sector – not because Xerox mandated adoption, but because the methodology proved itself through use.
The same principle applies here. The methodology is published. The governance is independent. The instrument is open to every asset-intensive organisation, regardless of sector, region, or maturity level. What transforms a rigorous methodology into the reference point the profession uses is the dataset that grows with each cycle.
Every organisation that participates strengthens the peer comparisons for every other participant. The more granular the dataset, the more meaningful each individual result. This is the network effect that underpins every credible industry benchmark – and the organisations that shape the index in its first cycles define what the profession measures itself against.
Participate in GARPI™
The GARPI™ survey takes approximately 15 minutes to complete. Participation is free. Every respondent receives a personalised benchmark report showing their composite score, dimension-level performance, and peer group positioning.
References
Anglo American (2024). Integrated Report and Financial Statements. Anglo American plc.
Austin, J.L. (1962). How to Do Things with Words. Oxford University Press.
Brisset, N. (2016). "Economics is not always performative: some limits for performativity." Journal of Economic Methodology, 23(2), pp. 160–184.
Brisset, N. (2017). "On Performativity: Option Theory and the Resistance of Financial Phenomena." Journal of the History of Economic Thought, 39(4), pp. 549–569.
Callon, M. (1998). The Laws of the Markets. Blackwell.
Camp, R.C. (1989). Benchmarking: The Search for Industry Best Practices that Lead to Superior Performance. ASQC Quality Press.
Feldman, M.S. and Pentland, B.T. (2003). "Reconceptualizing organizational routines as a source of flexibility and change." Administrative Science Quarterly, 48(1), pp. 94–118.
GFMAM (2024). The Asset Management Landscape, 3rd Edition. Global Forum on Maintenance and Asset Management.
Glencore (2024). Sustainability Report. Glencore plc.
Goodhart, C.A.E. (1975). "Problems of monetary management: the U.K. experience." Papers in Monetary Economics, Reserve Bank of Australia.
IAM (2025). Self Assessment Methodology 2025 (SAM2025). Institute of Asset Management.
ISO (2024). ISO 55001:2024 – Asset management – Asset management system – Requirements. International Organization for Standardization.
McKinsey & Company (2020). "How digital innovation can improve mining productivity and performance." McKinsey Metals & Mining Practice.
Moyo, L.L. (2018). Exploring Limits to Performativity: (Re)Constituting Everyday Performances through Planned Change. Doctor of Business Administration thesis, University of Strathclyde, Hunter Centre for Entrepreneurship.
Moyo, L.L. (2026a). "Measuring What Matters: The Global Asset Reliability & Performance Index." Optimal Whitepaper GARPI™-WP-2026-01.
Moyo, L.L. (2026b). "The Performativity Gap in Asset Management." Optimal Whitepaper GARPI™-WP-2026-02.
South32 (2024). Sustainability Report. South32 Limited.
Watson, G.H. (1993). Strategic Benchmarking: How to Rate Your Company's Performance Against the World's Best. John Wiley & Sons.