Abstract
The dominant narrative around AI and employment follows a predictable arc: a new model launches, social media declares a profession "ended," and the same people who celebrate the death of one field insist their own is immune. This paper argues that this framing reveals more about the speaker than the technology. Drawing on experimental evidence from Harvard/BCG's jagged frontier study (n=758), Aalto University's reverse Dunning-Kruger research, and OECD meta-analyses of 14 productivity experiments, we demonstrate that AI functions as a multiplier of existing human skill, judgment, and domain expertise — not a replacement for it. The data shows that AI compresses inequality for narrow, well-defined tasks while dramatically widening the gap for judgment-intensive work. Most critically, AI introduces a novel metacognitive hazard: users who lack domain expertise systematically overestimate the quality of AI-assisted output, creating a class of confidently wrong practitioners who cannot detect their own errors. The real divide is not between humans and AI. It is between humans with judgment plus AI, and everyone else being quietly left behind while thinking they are keeping up.
1. The Feed Pattern: A Diagnostic, Not a Take
There is a specific pattern that recurs on every social platform whenever a new AI model launches. It follows a rigid template: "RIP [profession I was already bad at]. [New model] just ended the field. Luckily, AI is still terrible at [profession I happen to be good at]." The person posting this believes they are making a commentary about technology. They are actually making a confession about their own skill distribution — and, more importantly, about their inability to prompt effectively.
This is not a rhetorical flourish. It is an empirically testable claim. The Harvard Business School and Boston Consulting Group's landmark 2023 study — "Navigating the Jagged Technological Frontier" — demonstrated that AI capabilities cover an uneven, jagged set of tasks. Inside the frontier, GPT-4 users completed 12.2% more tasks, 25.1% faster, with 40% higher quality. Outside the frontier, those same users performed 19% worse than people with no AI access at all. The frontier is not a clean line. It is jagged, domain-specific, and — crucially — invisible to people who lack the domain expertise to know where it falls.
This means that when someone declares a profession "ended," they are revealing that they experienced the inside of the frontier for tasks they were already weak at. The AI made their weakness painfully obvious by doing it better. And when they claim AI "can't do" their own job, they may be right — or they may simply be better at prompting within their domain of expertise, producing results that confirm their existing competence. Either way, the statement is a mirror, not a measurement.
| Claim Pattern | What It Reveals | Research Basis |
|---|---|---|
| “AI replaced X profession” | Speaker was below the frontier in X | BCG/Harvard jagged frontier study |
| “AI can’t do what I do” | Speaker may be right -- or may lack prompting skill | Frontiers prompt engineering research |
| “AI makes everyone equal” | True for narrow tasks; false for judgment-heavy work | Brynjolfsson et al. compression finding |
| “I use AI and I’m more productive” | Likely true -- but metacognitive accuracy unknown | Aalto/Welsch reverse D-K study |
| “AI output is just as good as human” | Speaker likely cannot distinguish quality | Hallucination detection research |
2. The Multiplier Thesis: Skill In, Skill Out
The core thesis of this paper is simple: AI is a multiplier, not a replacement. It multiplies whatever you bring to the interaction — your domain knowledge, your judgment, your taste, your prompting ability. If you bring expertise, AI amplifies it. If you bring ignorance, AI amplifies that too, but wraps it in fluent, confident prose that makes it harder to detect.
The evidence for this is now substantial. Across 14 experimental studies reviewed by the OECD in 2025, AI-assisted workers showed an average productivity gain of 24%. But the variance is enormous. In the Brynjolfsson et al. study of 5,179 customer support agents, low-skilled workers gained 34% productivity while top performers gained effectively 0%. In the Noy and Zhang experiment with college-educated professionals, AI reduced completion time by 40% and raised quality by 18% — but the gains were concentrated among bottom-half performers. The BCG/Harvard study found 40% quality improvements inside the frontier, and a 19% quality penalty outside it.
| Study | Population | Productivity Gain | Key Finding |
|---|---|---|---|
| Harvard/BCG (2023) | 758 BCG consultants | +40% quality | Inside frontier tasks only; 19% worse outside frontier |
| Noy & Zhang (2023) | College-educated professionals | +40% speed | Bottom-half performers gained most; inequality compressed |
| Brynjolfsson et al. (2023) | 5,179 customer support agents | +14% average | Low-skilled workers gained 34%; top performers gained 0% |
| OECD Meta-Analysis (2025) | 14 experimental studies | +24% average | Effects vary drastically by task type and user skill |
| Anthropic (2025) | Claude conversation analysis | +80% per task | ~90 min tasks completed in ~18 min with AI |
| Penn Wharton (2025) | US labor force projection | +1.8% annually | Aggregate GDP growth; conditional on broad adoption |
These findings create an apparent paradox. If AI helps the weakest performers the most, shouldn't it be a great equalizer? The answer is: only for narrow, well-defined tasks with clear right answers. Customer support has scripts. LSAT questions have correct answers. Business writing has templates. For these tasks, AI does compress inequality because the "skill" being multiplied is largely procedural — following a known pattern correctly.
But for judgment-intensive work — strategy, diagnosis, architecture, creative direction, legal reasoning — the multiplier effect cuts the other way. Domain experts use AI as a force multiplier for their existing taste and judgment. They prompt better because they know what good output looks like. They catch errors because they have calibrated expectations. They iterate productively because they know which dimension of the output to push on. Non-experts do none of these things. They accept the first output. They cannot evaluate it. And the AI is too fluent to signal its own failures.
AI does not eliminate jobs in domains where humans are strong. It exposes people who were already weak. If you are bad at something, AI will make that painfully obvious. If you are good at something, AI will make you terrifyingly effective. That is not anti-human. That is pro-competence.
3. The Jagged Frontier and the Invisible Cliff
The BCG/Harvard study introduced the concept of the "jagged technological frontier" — the boundary between tasks AI handles well and tasks where it actively degrades human performance. The frontier is jagged because superficially similar tasks can fall on different sides of it. A consultant can use GPT-4 to brainstorm market entry strategies (inside the frontier) and then use it to evaluate whether a specific acquisition target's financial structure is sound (outside the frontier). The second task looks similar to the first. It requires the same tool, the same interface, the same confidence from the model. But the output is 19% more likely to be wrong than if the consultant had done it alone.
This is the invisible cliff. There is no warning label. The AI does not announce "I am now operating outside my capabilities." It generates the same fluent, confident, well-structured response regardless of whether the content is correct. The only people who can see the cliff are those with enough domain expertise to recognize when the AI has crossed it.
| Task Type | Inside Frontier? | AI Effect on Quality | Risk for Non-Experts |
|---|---|---|---|
| Creative ideation | Yes | +40% | Low -- errors are visible |
| Market sizing & analysis | Yes | +25% | Medium -- plausible-sounding errors |
| Data table generation | Yes | +30% | Medium -- hallucinated numbers pass inspection |
| Business strategy (novel) | No | -19% | High -- confident but wrong recommendations |
| Causal reasoning | No | -15% | Very high -- logical errors masked by fluency |
| Domain-specific compliance | No | -22% | Critical -- wrong answers indistinguishable from right |
The implications are severe. In 758 consultants working on 18 realistic consulting tasks, the researchers found two distinct behavioral patterns. "Centaurs" strategically divided work between themselves and the AI, delegating frontier-interior tasks and handling frontier-exterior tasks manually. "Cyborgs" integrated AI into every step of their workflow, blending human and machine effort continuously. Both patterns worked — but only when the user could accurately identify which side of the frontier each task fell on. Users who could not make this distinction performed worse with AI than without it.
This is the mechanism behind the "RIP profession" post. The person posting it experienced an inside-the-frontier task in someone else's domain and concluded the whole profession was inside the frontier. They do not see the cliff because they lack the domain knowledge to see it. Their prompt produced a fluent output, and fluency was their only quality signal.
4. The AI Psy-Op: Cognitive Offloading and the Reverse Dunning-Kruger Effect
The most dangerous aspect of AI is not that it replaces human judgment. It is that it convinces people they have judgment when they do not. This is not speculation. It is a measured, replicated finding.
In 2025, researchers at Aalto University published "AI makes you smarter but none the wiser" in Computers in Human Behavior. The study had participants solve 20 logical reasoning problems from the Law School Admission Test, with and without ChatGPT assistance. The results were striking: actual task performance improved by roughly 3 points when using AI. But self-assessed performance inflated by 7 points. Users consistently believed they performed better than they actually did — and the gap was largest for users who self-identified as AI-literate.
| Metric | Without AI | With AI | Delta |
|---|---|---|---|
| Actual task performance (LSAT problems) | ~16/20 | ~19/20 | +3 points |
| Self-assessed performance | ~16/20 | ~23/20 | +7 points |
| Overconfidence gap | ~0 | +4 points | New gap created |
| AI-literate users: overconfidence | -- | Higher | Reversed D-K effect |
| AI-naive users: overconfidence | -- | Lower | Traditional D-K effect |
This is a reversal of the classic Dunning-Kruger effect. In the traditional formulation, people with low ability overestimate their competence because they lack the metacognitive tools to evaluate their own performance. With AI, the effect inverts: people with higher AI literacy show greater overconfidence, not less. The researchers attribute this to "cognitive offloading" — users delegate the actual reasoning to the AI, engage with the output at a shallow level, and then attribute the AI's performance to themselves. They "feel smart" because they are holding smart output, but the feeling is borrowed.
This is the psy-op. AI will convince you that you are doing great work when you are not. The mechanism is simple: AI output is fluent, confident, and structurally correct even when it is substantively wrong. If you do not already have taste, standards, or domain intuition, you have no basis for evaluating the output. It "sounds right." It "looks professional." It passes every test you know how to apply — because you do not know the tests that matter.
Sounding right is not the same as being right. But distinguishing between the two requires exactly the domain expertise that the person declaring a profession "ended" does not have.
Experts use AI as an amplifier. Non-experts use AI as a mirror — and mirrors lie when you do not know what you are looking for.
5. The Expert's Conclusion: Same Verdict, Different Mechanism
Here is the subtle part that most commentary misses. When someone who is genuinely good at the "ended" profession evaluates AI, they often reach a similar-sounding conclusion: "Yes, this role as it existed is largely over." But the mechanism is entirely different.
The non-expert sees AI doing their weak-domain tasks and concludes the profession is replaced. The expert sees AI doing their strong-domain tasks and concludes the profession is transformed. The expert's workflow becomes faster, higher-leverage, and more focused on the judgment layer that AI cannot replicate. They do not feel replaced. They feel liberated from drudgery. A senior developer using AI does not worry about losing their job to code generation — they worry about the junior developers who think code generation is the whole job.
Consider the empirical pattern from the BCG study: when experienced consultants used AI inside the frontier, they completed tasks 25% faster while maintaining or improving quality. They did not do less work. They did different work — more synthesis, more judgment, more client-facing communication. The AI handled the commodity labor. The human handled everything the AI could not.
This is why the expert's conclusion sounds similar but is fundamentally different. The non-expert says: "AI can do this job." The expert says: "AI can do the parts of this job that were never the hard part." The non-expert is announcing that they were the bottleneck. The expert is announcing that the bottleneck has been removed — and now the real work begins.
| Scenario | Domain Expert | Non-Expert |
|---|---|---|
| AI generates a contract clause | Spots missing liability cap in 10 sec | Ships it -- looks professional |
| AI writes a financial model | Catches circular reference on review | Presents to investors -- numbers are wrong |
| AI produces a diagnosis protocol | Flags contraindication immediately | Follows it -- patient harmed |
| AI writes marketing copy | Adjusts tone for audience nuance | Posts it -- tone-deaf but grammatically perfect |
| AI generates architecture docs | Identifies missing failure modes | Deploys it -- system goes down at scale |
The asymmetry is stark. In every scenario, the domain expert and the non-expert receive the same AI output. The difference is entirely in what happens next. The expert has a calibrated quality signal. The non-expert has only the AI's confidence — and AI is always confident.
6. Prompt Engineering as Domain Translation
The "AI multiplier" thesis has a second variable that most people undercount: prompting skill. Prompting is not a generic skill. It is domain translation — the ability to decompose a domain problem into sub-components that an AI model can execute on, then evaluate and iterate on the output with domain-calibrated judgment. Research published in Frontiers in Education in 2024 framed prompt engineering as a "new 21st century skill" and found that effective prompting can improve AI output quality by up to 300% compared to basic prompts.
The BCG study corroborated this at scale. Participants with moderate coding experience — used as a proxy for structured thinking ability — outperformed self-identified novices by 10 to 20 percentage points across all three experimental tasks, even tasks with zero coding involved. The researchers concluded that it was not the coding itself that mattered but the "engineering mindset" it represented: the ability to decompose problems, test sub-components, and iterate systematically.
This creates a two-variable multiplier matrix. AI output quality is the product of domain expertise and prompting skill. Neither variable alone is sufficient. A domain expert who cannot prompt will under-extract value from AI but will at least be able to evaluate the output. A skilled prompter without domain expertise will extract fluent, confident, wrong output and not know it. The most dangerous combination is high prompting skill with low domain expertise — because the output looks indistinguishable from expert work to everyone except an actual expert.
| User Profile | Domain Skill | Prompt Skill | AI Output Quality | Self-Assessment Accuracy |
|---|---|---|---|---|
| Expert + skilled prompter | High | High | Excellent -- catches errors, directs iteration | Accurate -- calibrated expectations |
| Expert + naive prompter | High | Low | Moderate -- good judgment, poor extraction | Accurate -- knows when output is wrong |
| Novice + skilled prompter | Low | High | Dangerous -- fluent output, no error detection | Overconfident -- cannot evaluate output |
| Novice + naive prompter | Low | Low | Poor -- obvious failures | Paradoxically safer -- obviously bad output |
The bottom-right cell of this matrix is counterintuitively the safest failure mode. A novice with no prompting skill produces obviously bad output — bad enough that even the producer can tell something is wrong. The top-right cell is the trap: a novice who can prompt well produces output that looks excellent, reads fluently, and is wrong in ways that require domain expertise to detect. This person will ship the output, present it confidently, and — per the Aalto University research — rate their own performance as significantly higher than it actually was.
7. The Macro Divergence: Task Compression vs. Labor Polarization
The micro-level finding — AI compresses inequality for narrow tasks — sits in direct tension with the macro-level trend: AI is widening the labor market gap. Understanding why requires separating two distinct phenomena.
At the task level, AI is an equalizer. The Brynjolfsson et al. study showed a 34% productivity gain for low-skilled customer support agents versus 0% for top performers. Noy and Zhang found similar compression effects. When the task is procedural — follow a script, apply a template, generate a known pattern — AI levels the field because the "skill" being multiplied is primarily pattern-matching, and AI is better at pattern-matching than most humans.
But at the job level, the opposite is happening. McKinsey's November 2025 report found that while AI and robotics could theoretically automate 57% of U.S. work hours, actual job elimination has been minimal. Employment held steady at ~62% through 2025. The reason is that jobs are bundles of tasks, and the tasks AI handles well (inside the frontier) coexist with tasks it handles poorly (outside the frontier) within the same role. What changes is not whether the job exists but what the job requires. The procedural tasks get automated. The judgment tasks get amplified. The people who were doing mostly procedural work find their role hollowed out. The people who were doing mostly judgment work find their productivity multiplied.
The IMF's 2026 staff discussion note quantified this divergence. AI-exposed sectors saw productivity growth nearly quadruple — from ~7% annual growth (2018–2022) to ~27% growth (2018–2024) in revenue per employee. But this growth accrued disproportionately to workers who could leverage AI for judgment-intensive tasks. Early-career workers (ages 22–25) in the most AI-exposed occupations experienced a 13% relative decline in employment since the release of ChatGPT.
This is the macro divergence. AI compresses inequality at the task level and widens it at the labor market level. The beneficiaries are not "people who use AI" — everyone uses AI. The beneficiaries are people who use AI with domain judgment. Everyone else is being compressed into a shrinking pool of procedural work that AI already does better.
8. Framework: The AI Multiplier Matrix
The preceding analysis suggests a classification framework for how individuals interact with AI — and what the likely outcomes are. This is not a speculation matrix. Every cell maps to empirical findings from the studies cited above.
Axis 1: Domain Expertise (Low → High). This is the user's pre-existing skill, judgment, and calibration in the relevant domain. It determines their ability to evaluate AI output, catch errors, and iterate productively. It is the difference between knowing what good looks like and accepting whatever the AI produces.
Axis 2: AI Proficiency (Low → High). This includes prompting skill, model selection, workflow integration, and the engineering mindset the BCG study identified. It determines how much value the user can extract from AI tools and how effectively they can decompose problems for AI execution.
Quadrant I: High Domain, High AI (the "Terrifyingly Effective"). These users extract maximum value from AI while maintaining quality control. They are the centaurs and cyborgs from the BCG study who correctly identified the jagged frontier. They use AI for inside-frontier tasks and their own expertise for outside-frontier tasks. Their productivity multiplier is 2–5x. They are the ones who, from within the "ended" profession, see the role as transformed — not replaced. They are winning.
Quadrant II: High Domain, Low AI (the "Underleveraged"). These users have the judgment but under-extract value from AI tools. They are often skeptics or late adopters. Their work quality remains high but their productivity does not compound. They are safe from the psy-op — they can evaluate any AI output — but they are leaving leverage on the table. This is the "AI can't do what I do" person who may be right about quality but wrong about efficiency.
Quadrant III: Low Domain, High AI (the "Confidently Wrong"). This is the danger zone. These users produce fluent, confident output that they cannot evaluate. They are subject to the reverse Dunning-Kruger effect documented by Aalto University. They are the ones posting "RIP [profession]" because AI produced output that looked professional to them — and they have no way to know it was wrong. They are the most dangerous participants in the AI economy because their output looks indistinguishable from Quadrant I output to anyone who is also in Quadrant III.
Quadrant IV: Low Domain, Low AI (the "Obviously Struggling"). Paradoxically, this is the safest failure mode. These users produce obviously poor output. They know it is poor. Others know it is poor. There is no psy-op because the output does not pass even basic inspection. These users benefit the most from AI task-level compression (the Brynjolfsson 34% finding) but only for procedural work with clear right answers.
9. Implications: The Real Divide
The evidence converges on a single, uncomfortable conclusion. The divide that AI creates is not between humans and machines. It is between humans with judgment plus AI and humans without judgment plus AI. Both groups use the same tools. Both groups produce output. But only one group can tell whether the output is any good.
Implication 1: "AI ending professions" is a self-report. When someone announces that AI has ended a profession, they are announcing that they were below the frontier in that domain. The profession is not ended. The procedural layer of the profession is automated. The judgment layer is amplified. If the person only ever interacted with the procedural layer, the profession looks "ended" to them. This is a confession, not a prediction.
Implication 2: AI literacy without domain expertise is actively dangerous. The reverse Dunning-Kruger finding is the most underreported result in AI research. Higher AI literacy correlates with greater overconfidence, not less. This means that "learn to prompt" advice, without simultaneous domain skill development, produces people who are more confident and less accurate. The traditional skills pipeline — learn the domain first, then learn the tools — remains the correct order.
Implication 3: Quality evaluation is the new bottleneck. In a world where AI can generate any output in seconds, the scarce resource is no longer production capacity. It is evaluation capacity — the ability to look at an AI-generated deliverable and know whether it is good. This is a purely human skill. It requires domain expertise, calibrated taste, and the metacognitive awareness to know what you do not know. No amount of prompting skill substitutes for it.
Implication 4: The market will sort this slowly, then quickly. In the short term, Quadrant III practitioners ("Confidently Wrong") are indistinguishable from Quadrant I practitioners ("Terrifyingly Effective") to clients and employers who lack domain expertise themselves. This creates a temporary market for low-quality, AI-generated work that appears professional. Over time, the consequences of bad output accumulate — shipped bugs, wrong financial models, failed strategies, regulatory violations — and the market corrects. But the correction will be painful for anyone who built a career on AI-assisted output they could not actually evaluate.
People who talk about AI "ending professions" are usually just announcing that they were the bottleneck all along. The profession is not over. The role of "person who does the easy part" is over. What replaced it is not AI doing the job. It is humans with judgment plus AI, and everyone else being quietly left behind while thinking they are keeping up. That is the real divide.
10. Conclusion: Five Principles
1. AI is a multiplier, not a judge. It multiplies whatever you bring — skill, ignorance, taste, sloppiness. The output quality ceiling is set by the human, not the model.
2. The jagged frontier is real and invisible to non-experts. AI capabilities are unevenly distributed across tasks. The only people who can see the boundary are those with enough domain expertise to know when AI has crossed it. Everyone else walks off the cliff.
3. Fluency is not accuracy. AI output is always confident, always well-structured, and always fluent — regardless of whether it is correct. This makes non-expert evaluation nearly impossible and makes expert evaluation the scarcest resource in the AI economy.
4. The reverse Dunning-Kruger effect is the most underestimated risk in AI adoption. AI use inflates self-assessed performance. Higher AI literacy makes this worse, not better. "Learn to prompt" without "learn the domain" produces confident practitioners who cannot detect their own errors.
5. The real divide is judgment, not access. Everyone has access to the same models. The differentiator is not who uses AI but who can evaluate what AI produces. Domain expertise, taste, and metacognitive calibration are the new moats. Everything else is commodity.
References
Dell'Acqua, F., McFowland, E., Mollick, E., et al. (2023). "Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality." Harvard Business School Working Paper 24-013.
Welsch, R. & da Silva Fernandes, D. (2025). "AI Makes You Smarter but None the Wiser: The Disconnect Between Performance and Metacognition." Computers in Human Behavior, 2026.
Brynjolfsson, E., Li, D., & Raymond, L. (2023). "Generative AI at Work." The Quarterly Journal of Economics, 140(2), 889–658.
Noy, S. & Zhang, W. (2023). "Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence." Science, 381(6654), 187–192.
OECD. (2025). "Unlocking Productivity with Generative AI: Evidence from Experimental Studies." OECD Digital Economy Papers.
Boston Consulting Group. (2024). "GenAI Doesn't Just Increase Productivity. It Expands Capabilities." BCG Global.
McKinsey Global Institute. (2025). "Agents, Robots, and Us: Skill Partnerships in the Age of AI." McKinsey & Company.
International Monetary Fund. (2026). "Bridging Skill Gaps for the Future: New Jobs Creation in the AI Age." IMF Staff Discussion Note SDN/EA/2026/001.
Penn Wharton Budget Model. (2025). "The Projected Impact of Generative AI on Future Productivity Growth." University of Pennsylvania.
Anthropic. (2025). "Estimating AI Productivity Gains from Claude Conversations." Anthropic Research.
Frontiers in Education. (2024). "Prompt Engineering as a New 21st Century Skill." Frontiers in Education, 9, 1366434.
Gibreel, O. & Arpaci, I. (2025). "Development and Validation of the Prompt Engineering Competence Scale (PECS)." Information Development.
Suggested citation: Baratta, R. (2026). "The AI Multiplier Effect: Why AI Exposes Incompetence Faster Than It Replaces Competence." Buildooor Research Brief, February 2026.
Correspondence: buildooor@gmail.com