A conductor does not play any instrument.
They hold the whole performance in mind while each section plays its part. They hear the wrong note before the score confirms it. They decide which piece is worth performing and how it should be interpreted. The performance collapses without them even though they produce no sound themselves.
This is what graduate-level AI supervision looks like — and it is the role that every AI integration program currently fails to develop.
Graduate engineers learn to use AI tools. They learn to prompt, to delegate, to verify outputs. They become, by any reasonable measure, competent. And then they encounter a situation where something feels wrong before they can prove it, where the problem they have been handed is the wrong problem, where the tool they are using is producing a result that is accurate, efficient, and pointed in the wrong direction — and they have no framework for what to do. They have learned to play their instrument. Nobody taught them to conduct.
The structural argument this course is built on is not philosophical. It is mathematical. AI systems solve faster than any human. That gap will not close — it will widen. What will not change is the solve-verify asymmetry: AI optimizes for the common and the likely. It cannot verify whether its output is grounded in the specific domain reality at hand, cannot reframe a poorly formulated problem, cannot interpret what an accurate output means in a specific human context, and cannot integrate multiple legitimate but conflicting perspectives into a recommendation that someone is accountable for. These are not limitations that better models will eventually close. They are structural features of what statistical pattern matching is.
The judgment that happens before verification — determining whether consulting a source is necessary and what to look for when you do.
Reframing toward the salient and important before AI engagement — not after.
Selecting and sequencing tools for what the procedure requires at this step, with every handoff and trust decision explicit.
Supplying moral and cognitive legitimacy to an AI output — the two types AI does not achieve.
Not sequencing four capacities but holding all four simultaneously toward a unified goal — recognizing when one raises a concern that requires another to re-engage.
Course information
| Course title | Irreducibly Human: What AI Can and Can't Do — Conducting AI: The Five Supervisory Capacities No Algorithm Possesses |
| Credit hours | 4 |
| Delivery | In-person | Lecture/Seminar (weekly) + TA-led Supervision Lab (weekly) |
| Level | Graduate |
| Prerequisite | Botspeak or equivalent AI fluency foundation |
| Instructor | Nik Bear Brown · ni.brown@neu.edu |
| Series | Part of the Irreducibly Human series at Northeastern University — College of Engineering. Companion courses: Causal Reasoning, AImagineering, Ethical Play. Any can be taken after Botspeak. |
Who this course is for
This course is for engineers and AI-adjacent professionals who can operate AI tools fluently but cannot yet explain — rigorously, to a skeptic — why their outputs should be trusted.
What this course assumes
AI tool competence at Botspeak level. You understand the difference between pattern completion and knowledge retrieval. You have used AI tools at specification-and-delegation level. You have encountered the failure mode this course is designed to address: a plausible, confident, consequentially wrong AI output that you did not catch before it mattered.
What this course does not assume
Advanced AI systems knowledge. Prior coursework in metacognition or cognitive science. Experience managing AI-assisted teams — though students in that context will find the framework directly applicable.
What you will leave with
- A complete supervisory analysis of a real AI-assisted professional problem in your own domain — six sections, five capacities demonstrated, every judgment call documented and defended.
- Adversarial audit results: your supervisory analysis submitted to Claude functioning as a Plausibility Auditor — attempting to find what you missed — with your evaluation of every finding and your Gap Account naming the failure mode the auditor could not detect, and why detecting it required human supervisory capacity no prompt can supply.
- A personal account of what has changed: three specific judgment calls you now make that you would have delegated, deferred, or missed at the start of the semester. Named. Specific. The course's closing document.
What this course builds
By the end of this course, students can:
- Define the solve-verify asymmetry and explain why it deepens rather than closes as AI capability scales
- Perform a plausibility audit on an AI output before any verification check — naming the domain-grounded mechanism and what it surfaced
- Formulate a problem independently before AI engagement, produce genuinely distinct reframings, and defend a selection against a constraint set
- Design a multi-tool workflow with every handoff, verification step, and trust decision explicitly documented — and identify which tool to use to audit another's output
- Produce an interpretive judgment that supplies moral and cognitive legitimacy to an AI output, naming the AI's contribution and the professional's contribution separately and explicitly
- Hold all five capacities simultaneously toward a unified recommendation — recognizing when one raises a concern that requires another to re-engage
- Submit a supervisory analysis to an adversarial AI auditor, evaluate each finding, and name the gap the auditor could not find because finding it required a human
How the course is assessed
The terminal assessment is the Plausibility Audit — a novel adversarial architecture in which Claude functions as the Plausibility Auditor, applying the five-capacity framework to the student's supervisory analysis and looking for undocumented handoffs, defaulted formulations, absent legitimacy accounts, and integration gaps. The student evaluates each finding (genuine failure or false positive) and produces a Gap Account naming what the auditor could not detect. The student whose revised analysis has no genuine findings has demonstrated all five capacities under adversarial conditions.
Relative grading applies at the top of the scale, comparing students on supervisory depth and specificity of judgment call identification. Absolute grading applies below the threshold.
How the course is structured
The course runs in three acts, organized around the conductor metaphor as a learning arc.
Act One opens with five cases — no framework, no vocabulary. An engineer, a physician, a lawyer, a financial analyst, a logistics manager — each using AI tools competently, each producing an output that is plausible, confident, and consequentially wrong, each a failure the tools did not cause and the human did not catch. After the fifth case, a single question: what was missing in every one of these situations? The answer — the conductor — is earned rather than given. Act One builds the three-level diagnostic and ends with the five capacities named but not yet developed, and the student's personal AI usage inventory — a self-assessment returned in Week 15 to document what has changed.
The Conductorless Orchestra
Five cases, no framework. An engineer, a physician, a lawyer, a financial analyst, a logistics manager — each using AI tools competently, each a failure the tools did not cause and the human did not catch. After the fifth case, one question. The conductor metaphor earns its place rather than being declared. The Gardnerian Gap follows: Gardner catalogued the instruments human minds can play; nobody named the metacognitive capacity to direct them toward a unified purpose. In the age of AI, every instrument has been augmented. The conductor has not. Students produce a personal case inventory from their own practice — the self-assessment returned in Week 15 to document what has changed.
Reading Response #1 — 30 ptsThe Solve-Verify Asymmetry
The structural argument the course rests on: AI solves faster than any human, and verification remains irreducibly human — not because AI is limited but because verification requires exactly the capacities AI cannot structurally supply. This chapter removes the comfort that better models will close the gap. It also removes the threat: the human who verifies in a high-velocity AI workflow is doing the consequential work, not the diminished work. The asymmetry deepens as AI capability scales. This is the course's thesis, stated and defended before any capacity is introduced.
Reading Response #2 — 30 ptsThree Levels of AI Usage
The diagnostic chapter. Copy editor (Level I), research assistant (Level II), supercollaborator (Level III) — three qualitatively distinct levels with three qualitatively distinct supervisory demands. A professional operating at Level III with Level I supervisory habits has a specific and diagnosable gap. Students produce a personal AI usage inventory classifying their current practice by level and identifying the supervisory gap at each stage. This document is the starting point for the Week 15 reflection.
Supervision Lab Exercise #1 — 25 ptsAct Two develops one capacity per two-week sequence — a framework week followed by an application week. Each capacity is introduced through a case that makes its absence felt before its structure is named. Act Two closes with the supervisory analysis: a complete six-section document that carries through the adversarial audit.
Capacity 1: Plausibility Auditing
The ARC puzzle: GPT-4 scores at the 90th percentile on medical licensing exams and cannot solve a puzzle a five-year-old can. Statistical likelihood across a training corpus is not the same as a grounded model of how the world works. Plausibility auditing is defined precisely through what it is not — not verification, not fact-checking, not error detection. It is the judgment that happens before any of these. Two cognitive mechanisms: domain-grounded pattern recognition and anomaly detection against a mental model of the domain. Application week opens with the assessment instrument before any instruction — five AI outputs, audit first.
Reading Response #3 — 30 pts Supervision Lab Exercise #2 — 25 pts Supervision Lab Exercise #3 — 25 ptsCapacity 2: Problem Formulation
The Semmelweis case: the formulation that saves lives was not the computationally tractable one. AI optimizes for the common and likely. Humans must reframe toward the salient and important. The co-evolutionary model — problem and solution develop together through reflective practice — explains why handing problem definition to an AI system is not a delegation but an abdication. Primary generators: the conceptual anchors that shape how a practitioner first approaches a problem, often without their awareness. Application week requires three genuinely distinct reframings — not variations on a theme — evaluated against a constraint set, one defended.
Reading Response #4 — 30 pts Supervision Lab Exercise #4 — 25 pts Supervision Lab Exercise #5 — 25 ptsCapacity 3: Tool Orchestration
A surgeon selects instruments based on what the procedure requires at this step — not on what is already in their hand. The AI capability stack mapped: language generation, structured extraction, retrieval-augmented generation, multi-step reasoning, agentic execution. Verifiability-first engineering as a design principle: verification is a first-class objective, not an afterthought. Application week opens with a flawed workflow presented before any instruction. Every handoff documented, every trust decision made explicit. The sequencing principle: use one tool to audit another's output, selecting audit tools with different failure modes so they catch each other's blind spots.
Reading Response #5 — 30 pts Supervision Lab Exercise #6 — 25 pts Supervision Lab Exercise #7 — 25 ptsCapacity 4: Interpretive Judgment
The same chord in two contexts — a nursery rhyme and a funeral march. Identical notes. The conductor is the source of the difference. The three legitimacy types: pragmatic (does it work efficiently), moral (is it aligned with human values), cognitive (is it transparent and trustworthy). AI achieves pragmatic legitimacy readily. Humans must supply the other two. Application week requires producing Memo B — the same AI analysis rewritten to name what the AI produced, supply the moral and cognitive legitimacy account, and write the recommendation the professional would sign their name to.
Supervision Lab Exercise #8 — 25 pts Supervision Lab Exercise #9 — 25 ptsCapacity 5: Executive Integration
A quarterback who can throw, read defenses, manage the clock, and lead a huddle — but who has never done all four simultaneously in a fourth-quarter drive. Executive integration is not sequencing the four prior capacities. It is holding all four simultaneously toward a unified goal — recognizing when one raises a concern that requires another to re-engage. The weave: head (problem formulation), hand (tool orchestration), heart (interpretive judgment), spirit (plausibility auditing). The urban planning case: four AI tools, four technically correct outputs, no integration, a wrong performance. Application week opens with the assignment specification. No case. No warm-up. The supervisory analysis — six sections, 2,500 to 4,000 words — is the Act Two deliverable that carries into the adversarial audit.
Supervisory Analysis First Submission — 100 ptsAct Three is the dress rehearsal and the performance itself. Peer critique applies the five-capacity framework as a structured evaluation instrument — not general feedback. The Plausibility Audit follows. The student who has addressed all three common failure modes before submission will find the auditor looking for something more specific — which is precisely where the final learning happens.
The Dress Rehearsal
Peer critique applies the five-capacity framework as a structured evaluation instrument — specific evidence required for every finding. The student evaluates the peer critique (which findings are valid, which are misreadings) and produces a revised analysis with documented reasoning for each change accepted or rejected. Before submitting to the Plausibility Auditor, the student predicts the three most likely audit findings. That prediction is graded against the auditor's actual findings. The student who has addressed all three common failure modes before submission will find the auditor looking for something more specific — which is precisely where the Week 15 learning happens.
Peer Critique and Revised Analysis — 100 ptsThe Plausibility Audit
The chapter opens with the audit submission — the student submits before reading any content. Claude functions as Plausibility Auditor, applying the five-capacity framework adversarially: looking for undocumented problem formulation handoffs, orchestration steps without verification, interpretive judgment that diagnoses legitimacy gaps without filling them, and integration sections that sequence rather than integrate. The student evaluates each finding — genuine failure or false positive — with reasoning. Then the Gap Account: a written account of at least one failure mode the auditor could not detect, with a specific explanation of why detecting it required human supervisory judgment that no prompt can supply. The course closes with the Week 1 personal case inventory returned: three specific judgment calls the student now makes that they would have delegated, deferred, or missed. Named. Specific.
Final Submission — Plausibility Audit, Gap Account, and Closing Reflection — 250 ptsSupervision Lab participation (100 pts) is assessed continuously across all 15 weeks. The lowest-scoring Lab Exercise is dropped — 8 of 9 count toward the final grade.