A professional submits a report. It contains a citation to a study that does not exist. The client's researcher finds it. The professional had no framework for why it happened — and no framework means no defense, no correction, and no way to prevent it next time.
This is not a story about a broken tool. It is a story about the gap between AI literacy and AI fluency — and that gap is where professional liability lives.
Literacy is knowing which tools to use. Fluency is understanding the cognitive nature of the entity you are collaborating with: what it does at superhuman level, what it cannot do at all, and what remains irreducibly yours. The professional who submitted that report was literate. She could generate outputs, prompt, and iterate. What she could not do was evaluate what she was working with — because no one had ever given her a framework for it. She was using a very sophisticated autocomplete, and she thought she was using a reasoning partner.
That gap is what this course closes.
AI systems generate outputs through pattern completion, not knowledge retrieval. This is not a subtle distinction. It means that an AI's confidence of expression is structurally uncorrelated with the accuracy of its content. It means the tool has reliability zones — domains near the center of its training distribution where it is genuinely excellent, and edges where it produces fluent, authoritative, wrong outputs with no internal signal that anything has gone wrong. It means that every ambiguity in a prompt is a decision the AI makes without the human — and most professionals issuing prompts have never thought about what decisions they are delegating or whether those decisions are theirs to give.
This course builds that architecture. It then requires you to do original research with it, under real conditions, and to name — specifically and honestly — the judgment calls that required your values, your domain knowledge, or your professional accountability that the AI could not have made on your behalf.
Course information
| Course title | Irreducibly Human: What AI Can and Can't Do — Botspeak: The Nine Pillars of AI Fluency |
| Credit hours | 4 |
| Delivery | In-person | Lecture/Seminar (weekly) + TA-led Mode Lab (weekly) |
| Level | Graduate |
| Prerequisites | None — series entry point. Access to a laptop and at least one AI tool. |
| Instructor | Nik Bear Brown · ni.brown@neu.edu |
| Series | Part of the Irreducibly Human series at Northeastern University — College of Engineering. Botspeak is the series entry point. Companion courses: Conducting AI, Causal Reasoning, AImagineering. |
Who this course is for
This course is for any graduate-level professional who uses or will use AI in their work — and who has never been given a framework for evaluating what they are working with.
What this course assumes
Access to a laptop and at least one AI tool. You have used an AI tool at least once. Nothing else is required. This is the only course in the series with no technical prerequisites — it is designed to be the entry point.
What this course does not assume
Prior coursework in AI, machine learning, computer science, or data science. No technical background. No philosophy or ethics background. No prior prompting experience.
What you will leave with
- A complete AI fluency framework — the Five Modes, the tier taxonomy, the nine pillars — applied to real work in your own domain, with explicit reasoning you can defend in a job interview, a client meeting, or in front of the person who commissioned the work.
- Original research demonstrating fluency under real conditions: a complete research project in AI fluency, with a full iteration log, a verification record, and the one section that cannot be produced by a tool — the honest, specific account of the judgment calls that required your values, your domain knowledge, or your professional accountability.
- The capacity to answer, on demand, the question that separates professionals who use AI well from professionals who use it confidently and incorrectly: what, specifically, did the AI do in this work — and what, specifically, did you do that it could not?
What this course builds
By the end of this course, students can:
- Explain how AI systems generate outputs through pattern completion rather than knowledge retrieval, and predict failure types from this mechanism
- Apply proportional skepticism to AI outputs, calibrating verification depth to stakes, reliability zone, and reversibility
- Locate any cognitive task on the seven-tier Irreducibly Human taxonomy — identifying what the AI can perform and what the human must supply
- Write a complete five-component specification and predict its failure modes before any prompt is written
- Produce a delegation map for a complex task, with explicit tier-location and boundary rationale for every component, addressing the performance paradox
- Conduct adversarial AI conversation using at least two adversarial strategies, producing annotated transcripts that demonstrate intellectual ownership
- Apply a tiered verification protocol to an AI output, returning a structured verdict with domain-specific findings by layer
- Design a Diligence protocol for a deployed AI workflow, specifying monitoring cadence, drift indicators, escalation conditions, and shutdown criteria
- Produce a trust calibration map for a multi-step AI-assisted workflow, identifying the highest-risk compounding step
- Apply the PARU cycle diagnostic to an AI system — classifying its architecture, evaluating its Human Decision Node, and identifying what genuine oversight requires
- Apply adversarial validation to an AI output, identifying failures that ordinary verification is not designed to find
- Execute original research in AI fluency, demonstrating all five modes under real conditions and naming the judgment calls that required human values, domain knowledge, or accountability
How the course is assessed
Every assignment requires an AI Use Disclosure — not as compliance, but as the course's primary assessment instrument. Students document what they used, how they used it, what they changed, and — this field is not optional — what the AI could not do. Specifically: at least one judgment call that required the student's values, domain knowledge, or professional accountability. A disclosure that cannot name one such judgment call has not demonstrated that the student performed the irreducibly human layer. That declaration is the assessment spine of every submission.
The Irreducibly Human section of the final capstone carries 50% of the final project grade. Not because the research is less important — because the honest, specific account of what required human judgment is exactly what the research is for.
Relative grading applies at the top of the scale, comparing students on depth of fluency reasoning and quality of domain judgment. Absolute grading applies below the threshold.
How the course is structured
The course runs in three acts, tracking the arc from literacy to fluency to original practice.
Act One builds the survival vocabulary: what AI systems actually are, how they fail, and why those failures are structural rather than incidental. Two high-stakes failure cases across two chapters establish the pattern before the framework is introduced. The tier taxonomy and the Five Modes are named at the end of Act One — after students have spent three weeks accumulating domain-specific evidence that the framework is necessary. Act One closes with the first Mode Exercise: mapping a real task from the student's own work onto the full framework.
What You're Actually Talking To
The course opens with the fabricated citation — no definitions yet. The case is presented so students feel the failure before they have vocabulary to name it. Session B names the mechanism: pattern completion versus knowledge retrieval, and why confidence of expression is structurally uncorrelated with accuracy of content. Reliability zones — the center and the edge of the training distribution — are introduced as the first diagnostic tool.
Reading Response #1 — 30 ptsThe Confidence Trap
A second high-stakes failure case in a different domain establishes that the Week 1 failure was structural, not incidental. Hallucination, confabulation, and automation bias are defined precisely — three distinct phenomena that the word "AI error" collapses into one. The proportional skepticism protocol is introduced: calibrating verification depth to stakes, reliability zone, and reversibility. The symmetric failure modes — over-trust and under-trust — are given equal weight.
Reading Response #2 — 30 ptsThe Map and the Language
Move 37. AlphaGo, March 13, 2016. Fan Hui goes silent. "It's not a human move." The Irreducibly Human tier taxonomy is introduced through the most famous moment in the history of AI outperforming human experts — not to celebrate the machine, but to name what it reveals about cognitive architecture. The Five Modes are introduced as a pre/during/post temporal architecture. The literacy/fluency distinction is made explicit. Act One closes. The first Mode Exercise requires students to map a real task from their own domain onto the full framework — tier location, modes required, irreducibly human judgment named.
Mode Exercise #1 — 25 ptsAct Two teaches one mode per week. Each mode is introduced through a failure case from a professional domain. Each week's Mode Exercise requires application, not recitation. Act Two closes with the midterm: a novel multi-mode case, no annotation, all five modes applied as practices — not recited, not described. Application only.
Specification
Opening case: AI documentation of a slightly wrong API because specification ambiguities were resolved using training distribution patterns.
Every ambiguity in a specification is a decision the AI makes without the human. The five-component specification: intent, constraints, success criteria, exclusions, output format. The full prompt pattern toolkit. Specification for agentic contexts, where the stakes of unresolved ambiguity are highest. Students write a complete specification for a described case and name the failure mode each component prevents.
Mode Exercise #2 — 25 pts Reading Response #3 — 30 ptsDelegation
Opening case: A strategy consultant delegates all research synthesis to AI. The output is comprehensive. She misses the critical competitive dynamic she had the domain intuition to catch — but wasn't in the process to apply, because she had delegated herself out of it.
This is the performance paradox: better short-term outputs, long-term capability degradation. The four delegation questions. The cognitive offloading distinction — when it amplifies, when it atrophies. Students produce a delegation map with explicit tier-location and boundary rationale for every component.
Mode Exercise #3 — 25 ptsConversation
Opening case: A policy researcher develops a polished, internally consistent argument over 45 minutes of AI conversation. The strongest counterargument in her field never appeared. She never asked. Her peer reviewer found it immediately.
The four adversarial strategies — steelman the opposition, edge case probe, assumption surface, devil's advocate role assignment — introduced as the antidote to sycophantic drift. This week's Mode Lab is mandatory and supervised — the adversarial conversation exercise cannot be performed by reading alone. Students produce annotated transcripts with before/after intellectual positions and the ownership test applied.
Mode Exercise #4 — supervised workshop — 25 ptsDiscernment
Opening case: The pharmacist at the Human Decision Node — 14-medication patient, compromised renal function, AI interaction check that returned "no significant interactions." Three options: accept, reject, or discern.
This chapter teaches Option 3. The four verification layers — fact, reasoning, framing, omission — and why the omission layer is the one most professionals never reach. The tiered verification protocol: Tier 0 scan through Tier 3 adversarial. Calibration against stakes, reliability zone, and reversibility. This is the densest chapter in the course — Chapters 9 and 11 both build directly on it.
Mode Exercise #5 — 25 ptsDiligence
Opening case: The Amazon recruiting case — not why the system was biased, but why no one caught it for a year. The accountability chain was intact on paper and absent in practice.
Three forms of AI degradation: model drift, context drift, use case drift. Three ways accountability gets obscured: process laundering, tool diffusion, verification gap. The four-component Diligence protocol. Students design a complete Diligence protocol for a described deployment — monitoring cadence, drift indicators, escalation conditions, shutdown criteria. The Act Two gate follows: the midterm presents a novel multi-mode case with no scaffolding.
Mode Exercise #6 — 25 pts Reading Response #4 — 30 pts Midterm — 100 ptsAct Three stops giving clean single-mode cases. Trust calibration across compounding multi-step workflows. The PARU cycle and the Human Decision Node. Verification under adversarial conditions. Rapid prototyping as a research method, and the gap that defines it: AI can find a research gap efficiently; it has no basis for evaluating whether the gap is worth filling. The act closes with original research — a complete capstone demonstrating fluency under real conditions.
Trust Calibration
Opening case: A financial model with four AI-assisted stages. A rounding convention in Stage 2 compounds through Stage 3 into a trend line wrong by 7% — above the committee's 5% decision threshold. No single step failed enough to trigger the verification protocol.
Error compounding: trust miscalibrations at earlier steps constrain the reliability ceiling of every step that follows. System-trust versus output-trust. Students produce a trust calibration map for a multi-step workflow — appropriate trust level, calibration rationale, and compounding risk for every stage.
Mode Exercise #7 — 25 pts Reading Response #5 — 30 ptsAutomation, Agency, and the Human Decision Node
Move 37 revisited — not "how remarkable" but "how did it find it?" The PARU cycle as the answer: Perceive, Act, Reward, Update. Then immediately: Amazon's recruiting tool. Same surface behavior, structurally different architectures, fundamentally different oversight requirements. The Human Decision Node — the difference between genuine judgment and rubber-stamp approval. Students apply the PARU diagnostic to a described deployment and redesign the Human Decision Node with a specific proposal.
Mode Exercise #8 — 25 ptsVerification Under Adversarial Conditions
Opening case: A Phase II clinical trial succeeds. Phase III fails. The Phase II analysis was technically correct for its data. The patient population was systematically unrepresentative. No facts were hallucinated. The framing failure was invisible to ordinary verification.
Adversarial validation targets the three failure modes that ordinary review is not designed to find: distributional shift, framing failure, and assumption invisibility. Students apply all four adversarial moves to a described case and produce a formal report — what was probed, what was found, and whether it constitutes a structural failure.
Mode Exercise #9 — 25 ptsRapid Prototyping as a Research Method
Opening case: A graduate student develops a beautifully structured research proposal over three weeks of AI-assisted work. Her advisor asks one question: "Why is this gap worth filling?" She cannot answer. The AI found the gap efficiently. It had no basis for evaluating whether the gap was worth pursuing.
Three rapid prototyping principles. AI-assisted literature synthesis — uses and limits. The iteration log as an accountability document. The Research Protocol Checkpoint is the go/no-go gate for the capstone.
Research Protocol Checkpoint — 100 ptsCapstone: Original Research in AI Fluency — Work Session
No new instruction. Execution. The instructor's role this week is to ask the questions the AI will not ask: "Why is this question worth answering?" "What would have to be true for this finding not to matter?" Four capstone tracks available — plus an Adaptation Track for students rebuilding a course framework for a non-engineering domain.
Capstone: Peer Review Session
Structured peer review using the Five Modes rubric — written feedback, not just spoken. Each student reviews one peer's project against all five modes, produces at least one specific finding per mode, and names one question the submitted materials cannot answer. Revision work follows.
Peer Review Checkpoint — 100 ptsCapstone: Presentations and Final Submission
The terminal deliverable: a complete original research project in AI fluency. Research question, methodology with Five Modes documentation, findings with verification record, full iteration log. And the section that carries half the grade — the Irreducibly Human accounting: three specific judgment calls that required human values, domain knowledge, or accountability; one judgment call that was tried-as-delegation and then reclaimed; and an honest assessment of the collaboration — where the AI was genuinely useful, where it produced confident-sounding noise, and what the student would do differently.
Final Capstone Submission — 250 ptsMode Lab participation (100 pts) is assessed continuously across all 15 weeks. The lowest-scoring Mode Exercise is dropped — 8 of 9 count toward the final grade. Week 6's adversarial conversation workshop cannot be made up by reading alone — it is the one lab session with no equivalent substitute.