# The AI Multiplier Effect: Why AI Exposes Incompetence Faster Than It Replaces Competence AI is not a replacement for human skill -- it is a multiplier of it. Research on the jagged frontier, cognitive offloading, and the reverse Dunning-Kruger effect reveals that AI widens the gap between those with domain judgment and those without it. - Canonical URL: https://buildooor.com/research/ai-multiplier-effect - Author: Rob Baratta - Published: 2026-02-10 - Version: Working Paper v1.0 - Keywords: AI multiplier effect, jagged frontier, prompt engineering, cognitive offloading, Dunning-Kruger AI, domain expertise, AI productivity, skill amplification, metacognition gap, AI fluency trap, human-AI collaboration, taste-based judgment --- The dominant narrative around AI and employment follows a predictable arc: a new model launches, social media declares a profession "ended," and the same people who celebrate the death of one field insist their own is immune. This paper argues that this framing reveals more about the speaker than the technology. Drawing on experimental evidence from Harvard/BCG's jagged frontier study (n=758), Aalto University's reverse Dunning-Kruger research, and OECD meta-analyses of 14 productivity experiments, we demonstrate that AI functions as a multiplier of existing human skill, judgment, and domain expertise — not a replacement for it. The data shows that AI compresses inequality for narrow, well-defined tasks while dramatically widening the gap for judgment-intensive work. Most critically, AI introduces a novel metacognitive hazard: users who lack domain expertise systematically overestimate the quality of AI-assisted output, creating a class of confidently wrong practitioners who cannot detect their own errors. The real divide is not between humans and AI. It is between humans with judgment plus AI, and everyone else being quietly left behind while thinking they are keeping up. There is a specific pattern that recurs on every social platform whenever a new AI model launches. It follows a rigid template: *"RIP [profession I was already bad at]. [New model] just ended the field. Luckily, AI is still terrible at [profession I happen to be good at]."* The person posting this believes they are making a commentary about technology. They are actually making a confession about their own skill distribution — and, more importantly, about their inability to prompt effectively. This is not a rhetorical flourish. It is an empirically testable claim. The Harvard Business School and Boston Consulting Group's landmark 2023 study — "Navigating the Jagged Technological Frontier" — demonstrated that AI capabilities cover an uneven, jagged set of tasks. Inside the frontier, GPT-4 users completed 12.2% more tasks, 25.1% faster, with 40% higher quality. Outside the frontier, those same users performed 19% worse than people with no AI access at all. The frontier is not a clean line. It is jagged, domain-specific, and — crucially — invisible to people who lack the domain expertise to know where it falls. This means that when someone declares a profession "ended," they are revealing that they experienced the inside of the frontier for tasks they were already weak at. The AI made their weakness painfully obvious by doing it better. And when they claim AI "can't do" their own job, they may be right — or they may simply be better at prompting within their domain of expertise, producing results that confirm their existing competence. Either way, the statement is a mirror, not a measurement. The core thesis of this paper is simple: **AI is a multiplier, not a replacement.** It multiplies whatever you bring to the interaction — your domain knowledge, your judgment, your taste, your prompting ability. If you bring expertise, AI amplifies it. If you bring ignorance, AI amplifies that too, but wraps it in fluent, confident prose that makes it harder to detect. The evidence for this is now substantial. Across 14 experimental studies reviewed by the OECD in 2025, AI-assisted workers showed an average productivity gain of 24%. But the variance is enormous. In the Brynjolfsson et al. study of 5,179 customer support agents, low-skilled workers gained 34% productivity while top performers gained effectively 0%. In the Noy and Zhang experiment with college-educated professionals, AI reduced completion time by 40% and raised quality by 18% — but the gains were concentrated among bottom-half performers. The BCG/Harvard study found 40% quality improvements inside the frontier, and a 19% quality penalty outside it. These findings create an apparent paradox. If AI helps the weakest performers the most, shouldn't it be a great equalizer? The answer is: only for narrow, well-defined tasks with clear right answers. Customer support has scripts. LSAT questions have correct answers. Business writing has templates. For these tasks, AI does compress inequality because the "skill" being multiplied is largely procedural — following a known pattern correctly. But for judgment-intensive work — strategy, diagnosis, architecture, creative direction, legal reasoning — the multiplier effect cuts the other way. Domain experts use AI as a force multiplier for their existing taste and judgment. They prompt better because they know what good output looks like. They catch errors because they have calibrated expectations. They iterate productively because they know which dimension of the output to push on. Non-experts do none of these things. They accept the first output. They cannot evaluate it. And the AI is too fluent to signal its own failures. AI does not eliminate jobs in domains where humans are strong. It exposes people who were already weak. If you are bad at something, AI will make that painfully obvious. If you are good at something, AI will make you terrifyingly effective. That is not anti-human. That is pro-competence. The BCG/Harvard study introduced the concept of the "jagged technological frontier" — the boundary between tasks AI handles well and tasks where it actively degrades human performance. The frontier is jagged because superficially similar tasks can fall on different sides of it. A consultant can use GPT-4 to brainstorm market entry strategies (inside the frontier) and then use it to evaluate whether a specific acquisition target's financial structure is sound (outside the frontier). The second task looks similar to the first. It requires the same tool, the same interface, the same confidence from the model. But the output is 19% more likely to be wrong than if the consultant had done it alone. This is the invisible cliff. There is no warning label. The AI does not announce "I am now operating outside my capabilities." It generates the same fluent, confident, well-structured response regardless of whether the content is correct. The only people who can see the cliff are those with enough domain expertise to recognize when the AI has crossed it. The implications are severe. In 758 consultants working on 18 realistic consulting tasks, the researchers found two distinct behavioral patterns. "Centaurs" strategically divided work between themselves and the AI, delegating frontier-interior tasks and handling frontier-exterior tasks manually. "Cyborgs" integrated AI into every step of their workflow, blending human and machine effort continuously. Both patterns worked — but only when the user could accurately identify which side of the frontier each task fell on. Users who could not make this distinction performed worse with AI than without it. **This is the mechanism behind the "RIP profession" post.** The person posting it experienced an inside-the-frontier task in someone else's domain and concluded the whole profession was inside the frontier. They do not see the cliff because they lack the domain knowledge to see it. Their prompt produced a fluent output, and fluency was their only quality signal. The most dangerous aspect of AI is not that it replaces human judgment. It is that it convinces people they have judgment when they do not. This is not speculation. It is a measured, replicated finding. In 2025, researchers at Aalto University published "AI makes you smarter but none the wiser" in *Computers in Human Behavior*. The study had participants solve 20 logical reasoning problems from the Law School Admission Test, with and without ChatGPT assistance. The results were striking: actual task performance improved by roughly 3 points when using AI. But self-assessed performance inflated by 7 points. Users consistently believed they performed better than they actually did — and the gap was largest for users who self-identified as AI-literate. This is a reversal of the classic Dunning-Kruger effect. In the traditional formulation, people with low ability overestimate their competence because they lack the metacognitive tools to evaluate their own performance. With AI, the effect inverts: **people with higher AI literacy show greater overconfidence**, not less. The researchers attribute this to "cognitive offloading" — users delegate the actual reasoning to the AI, engage with the output at a shallow level, and then attribute the AI's performance to themselves. They "feel smart" because they are holding smart output, but the feeling is borrowed. This is the psy-op. AI will convince you that you are doing great work when you are not. The mechanism is simple: AI output is fluent, confident, and structurally correct even when it is substantively wrong. If you do not already have taste, standards, or domain intuition, you have no basis for evaluating the output. It "sounds right." It "looks professional." It passes every test you know how to apply — because you do not know the tests that matter. **Sounding right is not the same as being right.** But distinguishing between the two requires exactly the domain expertise that the person declaring a profession "ended" does not have. Experts use AI as an amplifier. Non-experts use AI as a mirror — and mirrors lie when you do not know what you are looking for. Here is the subtle part that most commentary misses. When someone who is genuinely good at the "ended" profession evaluates AI, they often reach a similar-sounding conclusion: "Yes, this role as it existed is largely over." But the mechanism is entirely different. The non-expert sees AI doing their weak-domain tasks and concludes the profession is replaced. The expert sees AI doing their strong-domain tasks and concludes the profession is *transformed*. The expert's workflow becomes faster, higher-leverage, and more focused on the judgment layer that AI cannot replicate. They do not feel replaced. They feel liberated from drudgery. A senior developer using AI does not worry about losing their job to code generation — they worry about the junior developers who think code generation is the whole job. Consider the empirical pattern from the BCG study: when experienced consultants used AI inside the frontier, they completed tasks 25% faster while maintaining or improving quality. They did not do *less* work. They did *different* work — more synthesis, more judgment, more client-facing communication. The AI handled the commodity labor. The human handled everything the AI could not. This is why the expert's conclusion sounds similar but is fundamentally different. The non-expert says: "AI can do this job." The expert says: "AI can do the parts of this job that were never the hard part." The non-expert is announcing that they were the bottleneck. The expert is announcing that the bottleneck has been removed — and now the real work begins. The asymmetry is stark. In every scenario, the domain expert and the non-expert receive the same AI output. The difference is entirely in what happens next. The expert has a calibrated quality signal. The non-expert has only the AI's confidence — and AI is always confident. The "AI multiplier" thesis has a second variable that most people undercount: prompting skill. Prompting is not a generic skill. It is domain translation — the ability to decompose a domain problem into sub-components that an AI model can execute on, then evaluate and iterate on the output with domain-calibrated judgment. Research published in *Frontiers in Education* in 2024 framed prompt engineering as a "new 21st century skill" and found that effective prompting can improve AI output quality by up to 300% compared to basic prompts. The BCG study corroborated this at scale. Participants with moderate coding experience — used as a proxy for structured thinking ability — outperformed self-identified novices by 10 to 20 percentage points across all three experimental tasks, even tasks with zero coding involved. The researchers concluded that it was not the coding itself that mattered but the "engineering mindset" it represented: the ability to decompose problems, test sub-components, and iterate systematically. This creates a two-variable multiplier matrix. AI output quality is the product of domain expertise and prompting skill. Neither variable alone is sufficient. A domain expert who cannot prompt will under-extract value from AI but will at least be able to evaluate the output. A skilled prompter without domain expertise will extract fluent, confident, *wrong* output and not know it. The most dangerous combination is high prompting skill with low domain expertise — because the output looks indistinguishable from expert work to everyone except an actual expert. The bottom-right cell of this matrix is counterintuitively the safest failure mode. A novice with no prompting skill produces obviously bad output — bad enough that even the producer can tell something is wrong. The top-right cell is the trap: a novice who can prompt well produces output that looks excellent, reads fluently, and is wrong in ways that require domain expertise to detect. This person will ship the output, present it confidently, and — per the Aalto University research — rate their own performance as significantly higher than it actually was. The micro-level finding — AI compresses inequality for narrow tasks — sits in direct tension with the macro-level trend: AI is widening the labor market gap. Understanding why requires separating two distinct phenomena. At the task level, AI is an equalizer. The Brynjolfsson et al. study showed a 34% productivity gain for low-skilled customer support agents versus 0% for top performers. Noy and Zhang found similar compression effects. When the task is procedural — follow a script, apply a template, generate a known pattern — AI levels the field because the "skill" being multiplied is primarily pattern-matching, and AI is better at pattern-matching than most humans. But at the job level, the opposite is happening. McKinsey's November 2025 report found that while AI and robotics could theoretically automate 57% of U.S. work hours, actual job elimination has been minimal. Employment held steady at ~62% through 2025. The reason is that jobs are bundles of tasks, and the tasks AI handles well (inside the frontier) coexist with tasks it handles poorly (outside the frontier) within the same role. What changes is not whether the job exists but what the job *requires*. The procedural tasks get automated. The judgment tasks get amplified. The people who were doing mostly procedural work find their role hollowed out. The people who were doing mostly judgment work find their productivity multiplied. The IMF's 2026 staff discussion note quantified this divergence. AI-exposed sectors saw productivity growth nearly quadruple — from ~7% annual growth (2018–2022) to ~27% growth (2018–2024) in revenue per employee. But this growth accrued disproportionately to workers who could leverage AI for judgment-intensive tasks. Early-career workers (ages 22–25) in the most AI-exposed occupations experienced a 13% relative decline in employment since the release of ChatGPT. This is the macro divergence. AI compresses inequality at the task level and widens it at the labor market level. The beneficiaries are not "people who use AI" — everyone uses AI. The beneficiaries are people who use AI with domain judgment. Everyone else is being compressed into a shrinking pool of procedural work that AI already does better. The preceding analysis suggests a classification framework for how individuals interact with AI — and what the likely outcomes are. This is not a speculation matrix. Every cell maps to empirical findings from the studies cited above. **Axis 1: Domain Expertise (Low → High).** This is the user's pre-existing skill, judgment, and calibration in the relevant domain. It determines their ability to evaluate AI output, catch errors, and iterate productively. It is the difference between knowing what good looks like and accepting whatever the AI produces. **Axis 2: AI Proficiency (Low → High).** This includes prompting skill, model selection, workflow integration, and the engineering mindset the BCG study identified. It determines how much value the user can extract from AI tools and how effectively they can decompose problems for AI execution. **Quadrant I: High Domain, High AI (the "Terrifyingly Effective").** These users extract maximum value from AI while maintaining quality control. They are the centaurs and cyborgs from the BCG study who correctly identified the jagged frontier. They use AI for inside-frontier tasks and their own expertise for outside-frontier tasks. Their productivity multiplier is 2–5x. They are the ones who, from within the "ended" profession, see the role as transformed — not replaced. They are winning. **Quadrant II: High Domain, Low AI (the "Underleveraged").** These users have the judgment but under-extract value from AI tools. They are often skeptics or late adopters. Their work quality remains high but their productivity does not compound. They are safe from the psy-op — they can evaluate any AI output — but they are leaving leverage on the table. This is the "AI can't do what I do" person who may be right about quality but wrong about efficiency. **Quadrant III: Low Domain, High AI (the "Confidently Wrong").** This is the danger zone. These users produce fluent, confident output that they cannot evaluate. They are subject to the reverse Dunning-Kruger effect documented by Aalto University. They are the ones posting "RIP [profession]" because AI produced output that looked professional to them — and they have no way to know it was wrong. They are the most dangerous participants in the AI economy because their output *looks* indistinguishable from Quadrant I output to anyone who is also in Quadrant III. **Quadrant IV: Low Domain, Low AI (the "Obviously Struggling").** Paradoxically, this is the safest failure mode. These users produce obviously poor output. They know it is poor. Others know it is poor. There is no psy-op because the output does not pass even basic inspection. These users benefit the most from AI task-level compression (the Brynjolfsson 34% finding) but only for procedural work with clear right answers. The evidence converges on a single, uncomfortable conclusion. The divide that AI creates is not between humans and machines. It is between humans with judgment plus AI and humans without judgment plus AI. Both groups use the same tools. Both groups produce output. But only one group can tell whether the output is any good. **Implication 1: "AI ending professions" is a self-report.** When someone announces that AI has ended a profession, they are announcing that they were below the frontier in that domain. The profession is not ended. The procedural layer of the profession is automated. The judgment layer is amplified. If the person only ever interacted with the procedural layer, the profession looks "ended" to them. This is a confession, not a prediction. **Implication 2: AI literacy without domain expertise is actively dangerous.** The reverse Dunning-Kruger finding is the most underreported result in AI research. Higher AI literacy correlates with greater overconfidence, not less. This means that "learn to prompt" advice, without simultaneous domain skill development, produces people who are more confident and less accurate. The traditional skills pipeline — learn the domain first, then learn the tools — remains the correct order. **Implication 3: Quality evaluation is the new bottleneck.** In a world where AI can generate any output in seconds, the scarce resource is no longer production capacity. It is evaluation capacity — the ability to look at an AI-generated deliverable and know whether it is good. This is a purely human skill. It requires domain expertise, calibrated taste, and the metacognitive awareness to know what you do not know. No amount of prompting skill substitutes for it. **Implication 4: The market will sort this slowly, then quickly.** In the short term, Quadrant III practitioners ("Confidently Wrong") are indistinguishable from Quadrant I practitioners ("Terrifyingly Effective") to clients and employers who lack domain expertise themselves. This creates a temporary market for low-quality, AI-generated work that appears professional. Over time, the consequences of bad output accumulate — shipped bugs, wrong financial models, failed strategies, regulatory violations — and the market corrects. But the correction will be painful for anyone who built a career on AI-assisted output they could not actually evaluate. People who talk about AI "ending professions" are usually just announcing that they were the bottleneck all along. The profession is not over. The role of "person who does the easy part" is over. What replaced it is not AI doing the job. It is humans with judgment plus AI, and everyone else being quietly left behind while thinking they are keeping up. That is the real divide. **1. AI is a multiplier, not a judge.** It multiplies whatever you bring — skill, ignorance, taste, sloppiness. The output quality ceiling is set by the human, not the model. **2. The jagged frontier is real and invisible to non-experts.** AI capabilities are unevenly distributed across tasks. The only people who can see the boundary are those with enough domain expertise to know when AI has crossed it. Everyone else walks off the cliff. **3. Fluency is not accuracy.** AI output is always confident, always well-structured, and always fluent — regardless of whether it is correct. This makes non-expert evaluation nearly impossible and makes expert evaluation the scarcest resource in the AI economy. **4. The reverse Dunning-Kruger effect is the most underestimated risk in AI adoption.** AI use inflates self-assessed performance. Higher AI literacy makes this worse, not better. "Learn to prompt" without "learn the domain" produces confident practitioners who cannot detect their own errors. **5. The real divide is judgment, not access.** Everyone has access to the same models. The differentiator is not who uses AI but who can evaluate what AI produces. Domain expertise, taste, and metacognitive calibration are the new moats. Everything else is commodity. Dell'Acqua, F., McFowland, E., Mollick, E., et al. (2023). "Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality." *Harvard Business School Working Paper 24-013.* Welsch, R. & da Silva Fernandes, D. (2025). "AI Makes You Smarter but None the Wiser: The Disconnect Between Performance and Metacognition." *Computers in Human Behavior,* 2026. Brynjolfsson, E., Li, D., & Raymond, L. (2023). "Generative AI at Work." *The Quarterly Journal of Economics,* 140(2), 889–658. Noy, S. & Zhang, W. (2023). "Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence." *Science,* 381(6654), 187–192. OECD. (2025). "Unlocking Productivity with Generative AI: Evidence from Experimental Studies." *OECD Digital Economy Papers.* Boston Consulting Group. (2024). "GenAI Doesn't Just Increase Productivity. It Expands Capabilities." *BCG Global.* McKinsey Global Institute. (2025). "Agents, Robots, and Us: Skill Partnerships in the Age of AI." *McKinsey & Company.* International Monetary Fund. (2026). "Bridging Skill Gaps for the Future: New Jobs Creation in the AI Age." *IMF Staff Discussion Note SDN/EA/2026/001.* Penn Wharton Budget Model. (2025). "The Projected Impact of Generative AI on Future Productivity Growth." *University of Pennsylvania.* Anthropic. (2025). "Estimating AI Productivity Gains from Claude Conversations." *Anthropic Research.* Frontiers in Education. (2024). "Prompt Engineering as a New 21st Century Skill." *Frontiers in Education,* 9, 1366434. Gibreel, O. & Arpaci, I. (2025). "Development and Validation of the Prompt Engineering Competence Scale (PECS)." *Information Development.*