"We were never taught structured clinical reasoning. We were taught disease-based reasoning - but never what was going on at a metacognitive level."
That observation, from a senior New Zealand professor of medical education, captures something most clinical teachers recognise but few systems address openly. Medical school spends most of its instructional bandwidth on what to know. The part that matters most - how to think through a case under uncertainty, with incomplete information, in front of a real patient - is left largely to the apprenticeship: ward rounds, bedside teaching, a registrar quietly correcting a student's differential, one case at a time.
That model is now under structural pressure. As we wrote in the first post in this series, the world is training more doctors than at any point in history - and the bottleneck is no longer student numbers but the supervised reasoning practice that turns students into doctors. This post is about what that reasoning actually is, why it does not scale the way content does, and what good practice looks like when supervision is stretched thin.
What is clinical reasoning?
Clinical reasoning is the process by which a clinician gathers information about a patient, interprets it under uncertainty, builds and refines a differential diagnosis, identifies red flags, and decides on a course of action when the data is incomplete. It is the cognitive work of medicine - distinct from the recall of medical facts.
Recall is what you know. Reasoning is what you do with what you know when a patient sits down and a story starts unfolding in pieces.
The dominant academic model is dual-process. Croskerry's framing describes clinical thinking as the interplay between fast, pattern-based intuition (System 1) and slower, analytical reasoning (System 2). Experienced clinicians use System 1 most of the time - clinical gestalt, the rapid pattern match before formal diagnosis - and reach for System 2 deliberately when a case feels atypical, when the pattern does not fit, or when the stakes are too high to trust a quick read. Both systems are trained. Neither is delivered by a textbook.
The 2024 British Journal of Hospital Medicine review of clinical reasoning teaching describes reasoning as the integration of knowledge, experience, and metacognition under conditions of uncertainty - a separate construct from knowing the facts, and one that demands a separate pedagogy.
Why clinical reasoning is the part of medical education that doesn't scale
Content delivery scales beautifully. A first-year medical student today has on-demand access to more high-quality lectures, MCQ banks, structured reading, and reference material than the entire faculty of a medical school had thirty years ago. The per-student cost of delivering content has collapsed.
Reasoning practice has not. It is taught one student at a time. A registrar at the bedside listening to the student's differential. A consultant on a ward round asking why did you start there? A supervisor cross-examining the plan: what would change your mind? That is what builds reasoning. And it is exactly what is in short supply across every health system simultaneously.
The structural problem laid out in the previous post lands directly here. Every major health system is rapidly expanding medical school intake. Supervisors are drawn from the same workforce the expansion is meant to fix. Placement and supervision bottlenecks are now visible in New Zealand, the United Kingdom, and Australia. As intakes rise, reasoning teaching - being supervision-shaped - is the part that thins out per student first.
Content delivery scaling is not the problem. Reasoning practice failing to scale alongside it is.
Supervision-grade reasoning: the standard we are building for
Good reasoning practice needs a public name. We call it supervision-grade reasoning.
Supervision-grade reasoning is the standard of clinical reasoning that holds up under a supervising clinician's scrutiny. Not "passes an MCQ". Not "got the right answer". The differential is built top to bottom and defensible at each level. The red flags have been surfaced and named. The mechanism behind the leading diagnosis can be explained out loud. The plan can be justified, and the candidate can say what would change their mind.
The point of the term is that it cuts both ways. Supervision-grade reasoning is also the standard your reasoning has to reach before you can exercise that scrutiny yourself - over a junior on your team, over a medical student you are teaching, or over an AI tool that hands you a differential. Reasoning that can defend itself and reasoning that can scrutinise other reasoning are the same standard pointed in different directions.
Recall-grade is not the same bar in either direction. A student can sit a great MCQ paper and still fail to construct a defensible differential under live conditions - and, on the day they become responsible for a team's reasoning, will be no better at scrutinising someone else's.
If reasoning practice is the part of medical education that does not scale, supervision-grade reasoning is the standard that practice must reach - in both directions - to be worth anything.
What's actually being taught - and what's left to ward rounds
Most medical-education platforms optimise the content side of training. The process side - clinical reasoning - is the part that does not scale.
The bulk of the major commercial offering in the medical-student space is question banks, video lectures, structured reading, and recall-focused testing. These are necessary; they are scaffolding for reasoning. They are not reasoning itself.
What students report, consistently, is that the surface elements of history-taking and examination are manageable. The hard part is synthesising into a differential - building the list, ordering it, defending it. That is the part that has been hardest to deliver at scale through traditional medical-education models, and the part students most reliably ask for more of.
A 2017 Medscape survey of medical educators found that 75% believed clinical reasoning should be taught across all years of medical school - but 57% reported no dedicated reasoning courses at their institutions. The pattern is well known to the people teaching it. The infrastructure to address it has not yet been built.
Where the reasoning curriculum exists, it is largely apprentice-shaped: scattered through ward rounds, clinical supervisions, OSCE rehearsal, and bedside encounters. That is the part that breaks first when supervision is stretched.
Why "just ask the AI" makes reasoning worse, not better
Generic AI tools can answer a clinical question. But answering the question is exactly what stops a student from developing the reasoning to answer it themselves.
The mechanism is direct. A student facing a difficult differential opens a generic LLM, types in the presentation, and gets an answer in seconds. The cognitive work - building the differential from scratch, weighing alternatives, ruling out red flags, justifying the leading diagnosis - is the work the student needed to do. Skipping it does not accelerate learning. It removes learning from the loop entirely.
A 2025 paper in Advances in Physiology Education named the mechanism plainly: students using generative AI without intentional reflection shortcut the reasoning process and show shallower clinical thinking on later assessment. Turney and colleagues, reviewing AI use across medical training in Advances in Medical Education and Practice (2026), put it in similar terms - AI is useful for efficiency and recall, weaker for higher-order reasoning, and creates a measurable risk of deskilling and upskilling inhibition when it is used without scaffolding. Liu and colleagues' 2025 scoping review of generative AI for clinical reasoning is consistent: the emerging evidence is positive about scaffolded AI-based simulation - practice where the AI plays the patient, holds the case, and forces the student to build the reasoning - and weaker for raw AI used as an answer engine. Liu and colleagues are specific about a further requirement: in complex clinical scenarios, AI-generated feedback needs expert validation to be accurate and pedagogically useful. A simulation is only worth the time if the reasoning the student is being measured against is itself clinically grounded - not whatever the underlying model happens to produce.
The implication for medical education is not "ban the AI". It is design the practice so the reasoning still has to happen. Answer-fetching short-circuits reasoning. Scaffolded simulation forces it. Two tools that look superficially similar can be doing very different things to a student's cognition.
How clinical reasoning is actually built - deliberate, scaffolded, repeated
Reasoning is built by reps under feedback.
The pedagogical research is consistent. Cognitive Load Theory applied to clinical reasoning recommends worked examples first, then faded guidance: show how an expert reasons through the case, then progressively remove the scaffolding as the learner becomes more competent. The 2024 British Journal of Hospital Medicine review of clinical reasoning teaching summarises the consensus as identifying learner knowledge gaps, using worked examples to prevent cognitive overload, promoting the use of key features, and practising the construction of accurate problem representations.
Croskerry's analysis of diagnostic error goes further: structured cognitive supports - explicit prompts to consider alternatives, surface red flags, check the differential against the data - consistently outperform unaided intuition in junior clinicians. Practice should not just produce reps; it should produce structured reps that train the cognitive habits experienced clinicians use without thinking.
And the reps have to be many, varied, and longitudinal. A 2023 MedEdPortal paper on bridging the gap between knowledge and reasoning found that knowledge-plus-process teaching combined produces better outcomes than process-oriented teaching alone - and that longitudinal exposure beats one-off sessions.
That is the shape good practice has to take. Worked examples, then independent attempts, then feedback. Varied cases, increasing complexity, over time. Structured prompts that train the habits. Reps under feedback, not answers handed down.
What good clinical reasoning practice looks like
Good clinical reasoning practice is built around the standard reasoning needs to reach - supervision-grade - and the cognitive work needed to get a student there.
Concretely, the student does the cognitive work of building the differential, supported by scaffolding rather than handed an answer. They practice their reasoning at each step, not just produce the right endpoint. They receive structured feedback on the process - the order, the alternatives weighed, the red flags caught or missed - not only on whether the final diagnosis matches a key. They practise across varied presentations of increasing complexity, with deliberate exposure to atypical cases that force System 2 engagement rather than pattern-matching shortcuts. They practise often enough, and long enough, that the cognitive habits become automatic - the kind a senior clinician demonstrates without conscious effort on a ward round.
That is the work bedside teaching has historically done. It is what registrars and consultants do when they ask a student why did you start there? and what would change your mind? It is irreplaceable by content delivery, and it is exactly what every health system is short of right now.
The world is training more doctors than at any point in history. The thing those doctors need most, and the thing the system supplies least, is structured reasoning practice under feedback. Building it is the central pedagogical question in front of medical education today.