Irreducibly Human Series · Prerequisite: Botspeak · Northeastern University · College of Engineering

Conducting AI: The Five Supervisory Capacities No Algorithm Possesses

Irreducibly Human: What AI Can and Can't Do
Research Report | March 2026 | Ingested via Dev the Dev
The conductor

A conductor does not play any instrument.

They hold the whole performance in mind while each section plays its part. They hear the wrong note before the score confirms it. They decide which piece is worth performing and how it should be interpreted. The performance collapses without them even though they produce no sound themselves.

This is what graduate-level AI supervision looks like — and it is the role that every AI integration program currently fails to develop.

Graduate engineers learn to use AI tools. They learn to prompt, to delegate, to verify outputs. They become, by any reasonable measure, competent. And then they encounter a situation where something feels wrong before they can prove it, where the problem they have been handed is the wrong problem, where the tool they are using is producing a result that is accurate, efficient, and pointed in the wrong direction — and they have no framework for what to do. They have learned to play their instrument. Nobody taught them to conduct.

The structural argument this course is built on is not philosophical. It is mathematical. AI systems solve faster than any human. That gap will not close — it will widen. What will not change is the solve-verify asymmetry: AI optimizes for the common and the likely. It cannot verify whether its output is grounded in the specific domain reality at hand, cannot reframe a poorly formulated problem, cannot interpret what an accurate output means in a specific human context, and cannot integrate multiple legitimate but conflicting perspectives into a recommendation that someone is accountable for. These are not limitations that better models will eventually close. They are structural features of what statistical pattern matching is.

As AI capability scales, the human supervisory role does not diminish. It becomes more consequential. The conductor's five capacities — plausibility auditing, problem formulation, tool orchestration, interpretive judgment, executive integration — are not soft skills. They are the decisive professional differentiator in AI-assisted knowledge work, and they are now assessable under adversarial conditions. This course builds them and proves it.
1 · Plausibility Auditing

The judgment that happens before verification — determining whether consulting a source is necessary and what to look for when you do.

2 · Problem Formulation

Reframing toward the salient and important before AI engagement — not after.

3 · Tool Orchestration

Selecting and sequencing tools for what the procedure requires at this step, with every handoff and trust decision explicit.

4 · Interpretive Judgment

Supplying moral and cognitive legitimacy to an AI output — the two types AI does not achieve.

5 · Executive Integration

Not sequencing four capacities but holding all four simultaneously toward a unified goal — recognizing when one raises a concern that requires another to re-engage.

Course information

Course title Irreducibly Human: What AI Can and Can't Do — Conducting AI: The Five Supervisory Capacities No Algorithm Possesses
Credit hours 4
Delivery In-person | Lecture/Seminar (weekly) + TA-led Supervision Lab (weekly)
Level Graduate
Prerequisite Botspeak or equivalent AI fluency foundation
Instructor Nik Bear Brown · ni.brown@neu.edu
Series Part of the Irreducibly Human series at Northeastern University — College of Engineering. Companion courses: Causal Reasoning, AImagineering, Ethical Play. Any can be taken after Botspeak.

Who this course is for

This course is for engineers and AI-adjacent professionals who can operate AI tools fluently but cannot yet explain — rigorously, to a skeptic — why their outputs should be trusted.

What this course assumes

AI tool competence at Botspeak level. You understand the difference between pattern completion and knowledge retrieval. You have used AI tools at specification-and-delegation level. You have encountered the failure mode this course is designed to address: a plausible, confident, consequentially wrong AI output that you did not catch before it mattered.

What this course does not assume

Advanced AI systems knowledge. Prior coursework in metacognition or cognitive science. Experience managing AI-assisted teams — though students in that context will find the framework directly applicable.

A note for students with strong technical AI backgrounds: The gap this course addresses is not a technical gap. Evaluating whether an AI output should be trusted is not a harder version of generating it — it is a different cognitive operation, one that requires domain knowledge and situated judgment that no tool possesses. Students who approach the five capacities as genuinely new terrain will get the most from this course.

What you will leave with

  1. A complete supervisory analysis of a real AI-assisted professional problem in your own domain — six sections, five capacities demonstrated, every judgment call documented and defended.
  2. Adversarial audit results: your supervisory analysis submitted to Claude functioning as a Plausibility Auditor — attempting to find what you missed — with your evaluation of every finding and your Gap Account naming the failure mode the auditor could not detect, and why detecting it required human supervisory capacity no prompt can supply.
  3. A personal account of what has changed: three specific judgment calls you now make that you would have delegated, deferred, or missed at the start of the semester. Named. Specific. The course's closing document.

What this course builds

By the end of this course, students can:

How the course is assessed

Every deliverable requires the Assessment Spine: a written statement naming at least one judgment call that required the student's values, domain knowledge, or accountability — that an AI could not have made on their behalf. Not optional. Not a reflection prompt. The course's proof-of-concept, repeated across every submission.

The terminal assessment is the Plausibility Audit — a novel adversarial architecture in which Claude functions as the Plausibility Auditor, applying the five-capacity framework to the student's supervisory analysis and looking for undocumented handoffs, defaulted formulations, absent legitimacy accounts, and integration gaps. The student evaluates each finding (genuine failure or false positive) and produces a Gap Account naming what the auditor could not detect. The student whose revised analysis has no genuine findings has demonstrated all five capacities under adversarial conditions.

Relative grading applies at the top of the scale, comparing students on supervisory depth and specificity of judgment call identification. Absolute grading applies below the threshold.

How the course is structured

The course runs in three acts, organized around the conductor metaphor as a learning arc.

Act One — The Conductorless Orchestra · Weeks 1–3

Act One opens with five cases — no framework, no vocabulary. An engineer, a physician, a lawyer, a financial analyst, a logistics manager — each using AI tools competently, each producing an output that is plausible, confident, and consequentially wrong, each a failure the tools did not cause and the human did not catch. After the fifth case, a single question: what was missing in every one of these situations? The answer — the conductor — is earned rather than given. Act One builds the three-level diagnostic and ends with the five capacities named but not yet developed, and the student's personal AI usage inventory — a self-assessment returned in Week 15 to document what has changed.

Week 1

The Conductorless Orchestra

Five cases, no framework. An engineer, a physician, a lawyer, a financial analyst, a logistics manager — each using AI tools competently, each a failure the tools did not cause and the human did not catch. After the fifth case, one question. The conductor metaphor earns its place rather than being declared. The Gardnerian Gap follows: Gardner catalogued the instruments human minds can play; nobody named the metacognitive capacity to direct them toward a unified purpose. In the age of AI, every instrument has been augmented. The conductor has not. Students produce a personal case inventory from their own practice — the self-assessment returned in Week 15 to document what has changed.

Reading Response #1 — 30 pts
Week 2

The Solve-Verify Asymmetry

The structural argument the course rests on: AI solves faster than any human, and verification remains irreducibly human — not because AI is limited but because verification requires exactly the capacities AI cannot structurally supply. This chapter removes the comfort that better models will close the gap. It also removes the threat: the human who verifies in a high-velocity AI workflow is doing the consequential work, not the diminished work. The asymmetry deepens as AI capability scales. This is the course's thesis, stated and defended before any capacity is introduced.

Reading Response #2 — 30 pts
Week 3

Three Levels of AI Usage

The diagnostic chapter. Copy editor (Level I), research assistant (Level II), supercollaborator (Level III) — three qualitatively distinct levels with three qualitatively distinct supervisory demands. A professional operating at Level III with Level I supervisory habits has a specific and diagnosable gap. Students produce a personal AI usage inventory classifying their current practice by level and identifying the supervisory gap at each stage. This document is the starting point for the Week 15 reflection.

Supervision Lab Exercise #1 — 25 pts
Act Two — The Five Capacities · Weeks 4–13

Act Two develops one capacity per two-week sequence — a framework week followed by an application week. Each capacity is introduced through a case that makes its absence felt before its structure is named. Act Two closes with the supervisory analysis: a complete six-section document that carries through the adversarial audit.

Weeks 4–5

Capacity 1: Plausibility Auditing

The ARC puzzle: GPT-4 scores at the 90th percentile on medical licensing exams and cannot solve a puzzle a five-year-old can. Statistical likelihood across a training corpus is not the same as a grounded model of how the world works. Plausibility auditing is defined precisely through what it is not — not verification, not fact-checking, not error detection. It is the judgment that happens before any of these. Two cognitive mechanisms: domain-grounded pattern recognition and anomaly detection against a mental model of the domain. Application week opens with the assessment instrument before any instruction — five AI outputs, audit first.

Reading Response #3 — 30 pts Supervision Lab Exercise #2 — 25 pts Supervision Lab Exercise #3 — 25 pts
Weeks 6–7

Capacity 2: Problem Formulation

The Semmelweis case: the formulation that saves lives was not the computationally tractable one. AI optimizes for the common and likely. Humans must reframe toward the salient and important. The co-evolutionary model — problem and solution develop together through reflective practice — explains why handing problem definition to an AI system is not a delegation but an abdication. Primary generators: the conceptual anchors that shape how a practitioner first approaches a problem, often without their awareness. Application week requires three genuinely distinct reframings — not variations on a theme — evaluated against a constraint set, one defended.

Reading Response #4 — 30 pts Supervision Lab Exercise #4 — 25 pts Supervision Lab Exercise #5 — 25 pts
Weeks 8–9

Capacity 3: Tool Orchestration

A surgeon selects instruments based on what the procedure requires at this step — not on what is already in their hand. The AI capability stack mapped: language generation, structured extraction, retrieval-augmented generation, multi-step reasoning, agentic execution. Verifiability-first engineering as a design principle: verification is a first-class objective, not an afterthought. Application week opens with a flawed workflow presented before any instruction. Every handoff documented, every trust decision made explicit. The sequencing principle: use one tool to audit another's output, selecting audit tools with different failure modes so they catch each other's blind spots.

Reading Response #5 — 30 pts Supervision Lab Exercise #6 — 25 pts Supervision Lab Exercise #7 — 25 pts
Weeks 10–11

Capacity 4: Interpretive Judgment

The same chord in two contexts — a nursery rhyme and a funeral march. Identical notes. The conductor is the source of the difference. The three legitimacy types: pragmatic (does it work efficiently), moral (is it aligned with human values), cognitive (is it transparent and trustworthy). AI achieves pragmatic legitimacy readily. Humans must supply the other two. Application week requires producing Memo B — the same AI analysis rewritten to name what the AI produced, supply the moral and cognitive legitimacy account, and write the recommendation the professional would sign their name to.

Supervision Lab Exercise #8 — 25 pts Supervision Lab Exercise #9 — 25 pts
Weeks 12–13

Capacity 5: Executive Integration

A quarterback who can throw, read defenses, manage the clock, and lead a huddle — but who has never done all four simultaneously in a fourth-quarter drive. Executive integration is not sequencing the four prior capacities. It is holding all four simultaneously toward a unified goal — recognizing when one raises a concern that requires another to re-engage. The weave: head (problem formulation), hand (tool orchestration), heart (interpretive judgment), spirit (plausibility auditing). The urban planning case: four AI tools, four technically correct outputs, no integration, a wrong performance. Application week opens with the assignment specification. No case. No warm-up. The supervisory analysis — six sections, 2,500 to 4,000 words — is the Act Two deliverable that carries into the adversarial audit.

Supervisory Analysis First Submission — 100 pts
Act Three — The Full Performance · Weeks 14–15

Act Three is the dress rehearsal and the performance itself. Peer critique applies the five-capacity framework as a structured evaluation instrument — not general feedback. The Plausibility Audit follows. The student who has addressed all three common failure modes before submission will find the auditor looking for something more specific — which is precisely where the final learning happens.

Week 14

The Dress Rehearsal

Peer critique applies the five-capacity framework as a structured evaluation instrument — specific evidence required for every finding. The student evaluates the peer critique (which findings are valid, which are misreadings) and produces a revised analysis with documented reasoning for each change accepted or rejected. Before submitting to the Plausibility Auditor, the student predicts the three most likely audit findings. That prediction is graded against the auditor's actual findings. The student who has addressed all three common failure modes before submission will find the auditor looking for something more specific — which is precisely where the Week 15 learning happens.

Peer Critique and Revised Analysis — 100 pts
Week 15

The Plausibility Audit

The chapter opens with the audit submission — the student submits before reading any content. Claude functions as Plausibility Auditor, applying the five-capacity framework adversarially: looking for undocumented problem formulation handoffs, orchestration steps without verification, interpretive judgment that diagnoses legitimacy gaps without filling them, and integration sections that sequence rather than integrate. The student evaluates each finding — genuine failure or false positive — with reasoning. Then the Gap Account: a written account of at least one failure mode the auditor could not detect, with a specific explanation of why detecting it required human supervisory judgment that no prompt can supply. The course closes with the Week 1 personal case inventory returned: three specific judgment calls the student now makes that they would have delegated, deferred, or missed. Named. Specific.

Final Submission — Plausibility Audit, Gap Account, and Closing Reflection — 250 pts

Supervision Lab participation (100 pts) is assessed continuously across all 15 weeks. The lowest-scoring Lab Exercise is dropped — 8 of 9 count toward the final grade.

Irreducibly Human: What AI Can and Can't Do — Conducting AI · Graduate seminar · College of Engineering · Northeastern University · 4 credit hours · Prerequisite: Botspeak or equivalent · Instructor: Nik Bear Brown · ni.brown@neu.edu