Irreducibly Human Series · Northeastern University · College of Engineering

Causal Reasoning

Irreducibly Human: What AI Can and Can't Do
Research Report | March 2026 | Ingested via Dev the Dev
The broken assumption

Every Monday morning meeting in every data-driven organization in the world is running on the same broken assumption: that the question is answered when the dashboard is read.

The dashboard told you what happened. It did not tell you why. It certainly did not tell you what would happen if you did something about it. The data was accurate. The model was useless — not because it was wrong, but because it was answering the wrong question. Prediction assumes the future resembles the past. Strategy is the act of making the future different from the past. These two activities require different tools, and almost no one building causal AI systems today is teaching engineers to supply the one tool those systems cannot build themselves.

That tool is the causal model. And the reason it cannot be built by the machine is mathematical, not provisional.

Multiple distinct causal structures can be perfectly consistent with the same statistical data. Three graphs — a chain where A causes B causes C, a fork where B causes both A and C, a collider where A and C both cause B — are statistically indistinguishable from observational evidence alone. The discovery algorithms return this ambiguity honestly: directed edges where the data can say something, undirected edges where it cannot. What resolves the undirected edges is domain knowledge. The engineer who says "I know from operating this system for fifteen years that Y causes Z, not the other way around" is providing something no dataset contains. Not because the dataset is too small. Not because the algorithm is insufficiently sophisticated. Because the data cannot answer the question being asked.

This is the identification layer — the set of decisions that sit between the data and the causal claim: which variables to include, how to draw the arrows, what to condition on, and whether the result should be trusted. Causal AI tools are genuinely powerful at everything that comes after this layer. What they cannot do is perform it. That requires domain expertise that no algorithm supplies.

This course teaches you to perform it.

Course information

Course title Irreducibly Human: What AI Can and Can't Do — Causal Reasoning
Credit hours 4
Delivery In-person | Lecture/Seminar (weekly) + TA-led DAG Workshop (weekly)
Level Graduate
Prerequisites Graduate-level applied statistics or machine learning. Working knowledge of regression. Comfort reading and writing Python.
Instructor Nik Bear Brown · ni.brown@neu.edu
Series Part of the Irreducibly Human series at Northeastern University — College of Engineering. Companion course: Conducting AI. Either can be taken first; neither requires the other.

Who this course is for

This course is for engineers and applied technical practitioners who use data to make decisions, have encountered causal claims in their domain, and have never been taught the layer of reasoning that sits between the data and the claim.

What this course assumes

Graduate-level applied statistics or machine learning. Comfort reading and writing Python. Working knowledge of regression — you know what a coefficient means in practice. You have been asked whether your model is measuring what you think it is measuring, and you did not have a rigorous answer.

What this course does not assume

Prior knowledge of causal inference, DAGs, or graph theory. No econometrics. No measure-theoretic probability.

A note for students with strong ML backgrounds: Students who arrive most confident in their modeling skills often find the early weeks the most disorienting. That disorientation is the course working as intended. The identification layer is not a harder version of what you already know — it is a different cognitive operation. The course's central question — "Is what your system is measuring actually causing the outcome, or just correlated with it?" — is not answerable by better modeling. It requires a different kind of work.
Missing a prerequisite? Contact the instructor before the first week. This course builds on applied quantitative fluency from Session 1 — there is no ramp.

What you will leave with

  1. A complete causal analysis plan for a real problem in your own engineering domain — from DAG construction through output evaluation, defensible in a job interview or a boardroom.
  2. The ability to answer clearly and specifically the question that separates engineers who use causal AI well from engineers who use it confidently and incorrectly: "Is what your system is measuring actually causing the outcome, or just correlated with it?" — and to name the assumption that makes the difference.
  3. A qualified conclusion in two registers — for a statistician and for a decision-maker who will never see your DAG — that states what your causal analysis supports, what it does not support, and why.

What this course builds

By the end of this course, students can:

How the course is assessed

Grading is structured around a straightforward premise: causal AI tools are available, capable, and expected. What is being assessed is the identification layer those tools cannot perform.

Every assignment requires an AI Use Disclosure — not as compliance, but as the course's central analytical act. Students document what they used, how they used it, what they changed, and — this field is not optional — what the AI could not do. Specifically: at least one identification decision that required the student's domain knowledge. A disclosure that cannot name one such decision has not demonstrated that the student performed the irreducibly human layer. That declaration is the spine of every graded submission.

The grade reflects depth of causal reasoning, quality of domain judgment, and evidence that the identification decisions were made by the student — not delegated to a tool. Relative grading applies at the top of the scale. Absolute grading applies below the threshold, ensuring a floor for demonstrated competence.

How the course is structured

The course runs in three acts.

Act One — Establish · Weeks 1–4

Act One opens in the middle of a failure. Before any definitions are introduced, students see a complete causal disaster — a decision that looked right, made from data that couldn't support it. The act builds the vocabulary to name what went wrong: the three disciplinary registers for the same causal problem, the structure of a directed acyclic graph, and the identification layer itself. The act closes with the midterm: two unseen causal scenarios, no definitions asked, no recall tested. Students draw the implied DAG, diagnose the identification failure, and name the domain knowledge required to address it.

Week 1

The Decision That Looked Right

The course opens in the middle of a causal failure — no definitions, no framework, just a decision that was made from data that couldn't support it. Students see the structure of what went wrong before they have vocabulary to name it. By the end of the week, they can describe the difference between a pattern in data and a causal claim — using an example from their own domain.

Reading Response #1 — 30 pts
Week 2

Three Words for the Same Problem

Conditioning, confounding, controlling for — one underlying operation, named differently across statistics, epidemiology, and machine learning. Students learn why the vocabulary is not neutral, and translate a causal claim from their own domain into all three registers. What each framing reveals that the others obscure is the week's analytical question.

Reading Response #2 — 30 pts
Week 3

The Map Before the Territory

The DAG is introduced — not as a visualization tool, but as a formal object where every arrow is a causal claim and every missing arrow is an assumption by omission. Students draw a DAG for a known domain problem, label every edge, and identify what the graph is quietly asserting about the world by not including certain relationships.

Weekly DAG Assignment #1 — 25 pts
Week 4

The Identification Layer: What Only You Can Do

The identification layer is named explicitly: the set of decisions within a causal analysis that require domain expertise, that no algorithm supplies, and that determine whether the result should be trusted. Three identification failure types are introduced through an unseen case. The midterm follows: two scenarios, draw the implied DAG, diagnose the failure, name the assumption, specify what domain knowledge would address it.

Midterm — 100 pts Reading Response #3 — 30 pts
Act Two — Build · Weeks 5–11

Act Two constructs the identification toolkit piece by piece through domain cases the students recognize. Confounders. Mediators. Colliders — the hardest conceptual week in the course, where the definition is deliberately withheld until students have felt the problem. The backdoor criterion. DAG defense in two registers. The act closes with the DAG Draft Checkpoint — a defended causal model for the student's own domain problem, ready for feedback and estimation.

Week 5

Confounders: The Variable You Forgot

The confounder is introduced structurally — not as a nuisance to control for, but as a specific topological position in a DAG that opens backdoor paths between treatment and outcome. Students apply three diagnostic questions, identify backdoor paths, determine a valid adjustment set, and name unmeasured confounders with bias direction.

Weekly DAG Assignment #2 — 25 pts
Week 6

Mediators: The Variable You Shouldn't Touch

Conditioning on a mediator does not improve a causal estimate — it destroys it. This week introduces the structural reason why, through the workplace wellness case, and requires students to make an explicit analytical choice between total and direct effect estimation in their own domain. The choice is a domain judgment, not a statistical one.

Weekly DAG Assignment #3 — 25 pts
Week 7

Colliders: The Variable That Breaks Everything (Part 1)

The hardest conceptual week in the course. The definition is withheld until Session B deliberately — students spend Session A with a hiring puzzle that produces discomfort before the structural explanation arrives. The key insight: conditioning on a collider does not reveal a spurious association. It creates one. The path was closed. Conditioning opens it.

Weekly DAG Assignment #4 — 25 pts
Week 8

Colliders: The Variable That Breaks Everything (Part 2)

Selection bias as a structural collider problem — and the reason a larger sample does not fix it. The obesity paradox. M-bias. What collider structure means for AI training data. Students explain, in one paragraph, why increasing sample size within a restricted population does not resolve collider bias — because the restriction is the problem.

Weekly DAG Assignment #5 — 25 pts Reading Response #4 — 30 pts
Week 9

The Backdoor Criterion (Part 1)

The backdoor criterion arrives as relief — four weeks of intuition about path-blocking have been building toward a procedure. Students learn to trace all backdoor paths in a complex DAG and apply the criterion's first condition. Path-tracing — listing every path, naming every node, identifying whether each is open or closed — is the deliverable. Deriving the adjustment set comes next week.

Weekly DAG Assignment #6 — 25 pts Reading Response #5 — 30 pts
Week 10

The Backdoor Criterion (Part 2)

The minimal valid adjustment set: why less is sometimes more, and what it means when no valid adjustment set exists. Students derive the full adjustment set for a complex DAG, verify both conditions, and state whether the effect is identifiable — and why. The case where no valid adjustment set exists is not a failure. It is an honest result.

Weekly DAG Assignment #7 — 25 pts
Week 11

Defending Your DAG

A causal model is not complete until it can be defended. The three-part defense structure — every arrow stated as a causal claim, missing arrows listed and ranked by plausibility, unmeasured confounders named with bias directions — is the standard. Students produce this defense in two registers: technical, for a statistician, and plain-language, for a decision-maker. The DAG Draft Checkpoint is the Act Two gate.

DAG Draft Checkpoint — 100 pts
Act Three — Apply · Weeks 12–15

Act Three stops giving well-formed problems and starts giving the kind of problems students will actually encounter. The identification toolkit is deployed against cases that recombine earlier concepts rather than introducing new structural ones. That is not easier — it is harder in the way that matters. Students translate their defended DAG into an estimation specification, evaluate tool output against their assumptions, conduct sensitivity analysis, and produce a qualified conclusion that states precisely what the analysis supports and what it does not.

Week 12

From DAG to Data: What the Machine Needs

The handoff to a causal estimation tool is where identification decisions most often get lost. Students learn what a complete estimation specification contains — treatment variable, outcome variable, adjustment set with justification, identification assumptions, the explicit "do not add" list — and why the tool's default behavior is a threat to every decision they made in Act Two.

Specification Checkpoint — 100 pts
Week 13

Reading the Output: What to Trust and What to Interrogate

Narrow confidence intervals and clean formatting are not evidence that identification was performed correctly. The three-question diagnostic — applied to any causal estimation output — asks what the output can and cannot support as a causal claim. Four things causal estimation output cannot tell you are named explicitly, because those four things are where overconfidence lives.

Weekly DAG Assignment #8 — 25 pts
Week 14

When the Assumptions Don't Hold

The E-value — introduced as a seed in Week 9 and answered here — quantifies how strong an unmeasured confounder would need to be to explain away a causal estimate. Students calculate it, compare it against domain knowledge about likely confounders, and write a qualified conclusion. The honest account of when a causal analysis should not be reported as definitive is the week's central claim.

Weekly DAG Assignment #9 — 25 pts
Week 15

The Full Analysis: One Problem, Every Decision

The terminal deliverable: a complete causal analysis plan for the student's own domain problem. Every identification decision made explicitly. Every limit named honestly. A qualified conclusion in two registers. Peer review using the three-question diagnostic precedes final presentations. The course ends when the student can state, in writing, what their analysis supports and what it does not — and defend that distinction to a skeptical reader.

Final Project Submission — 250 pts

DAG Workshop participation (100 pts) is assessed continuously across all 15 weeks. The lowest-scoring DAG Assignment is dropped — 8 of 9 count toward the final grade.

Irreducibly Human: What AI Can and Can't Do — Causal Reasoning · Graduate seminar · College of Engineering · Northeastern University · 4 credit hours · Instructor: Nik Bear Brown · ni.brown@neu.edu