Why Jennifer Nagel's Knowledge Is Relevant For Educational Thin...

Jennifer Nagel’s says in ordinary life, knowledge seems easy. We constantly attribute it to ourselves and others. We rely on it. We organise conversation around it. Yet the moment we begin to reflect in a certain way, knowledge seems to evaporate. Merely mentioning the possibility of error, the broken clock, the unseen bus, the hidden defeater, is often enough to make us retract earlier attributions. Nagel’s Locke Lectures, “Recognizing Knowledge: Intuitive and Reflective Epistemology”, take this divergence seriously as a phenomenon to be explained rather than as a nuisance to be brushed aside.

Her approach is distinctive because it refuses to treat epistemology as sealed off from psychology. She makes clear that her approach to epistemology draws on developmental and comparative psychology, linguistics, sociology of conversation, artificial intelligence, and reinforcement learning theory. This methodological stance reflects the conviction that our practices of attributing knowledge are rooted in evolved and learned capacities that can be studied empirically, and that philosophical reflection should be constrained by what we discover about those capacities. The “intuitive” side of the divergence is anchored in everyday social cognition. Nagel points to our extraordinary sensitivity to gaze direction and attention. We can tell what others are looking at, and from that we rapidly infer what they are in perceptual contact with. Perception is one of the core sources of knowledge. The capacity to map what others can see gives us a ready made set of positive cases of knowledge attribution. We say that she knows there is a fire because she saw it, that he knows what was written on the board because he was looking at it.

Developmental research strengthens this picture. Infants around their first year begin to track who has seen what and to use that information for selective social learning. Social referencing experiments show that an adult’s emotional reaction to an ambiguous object influences the infant only if the adult is in a position to see the object. In other words, very young children already discriminate between epistemically well placed and poorly placed informants. Other work suggests that children across cultures converge on a basic conception of knowledge in the preschool years. Knowledge attribution is not a late nor culturally specific philosophical luxury, it is a central tool of all social coordination.

Linguistic evidence shows that many languages have grammatical evidentials, markers that indicate whether a claim is based on perception, inference, testimony, or inner experience. Even languages without obligatory evidentials, such as English, routinely mark source lexically. This cross linguistic pattern suggests that tracking how someone knows is a deep structural feature of human communication. The social world is organised around what conversation analysts call “territories of knowledge”, who is entitled to speak authoritatively about which domain.

Nagel aligns this intuitive ease with the idea that knowledge is factive. If someone knows that p, then p is true. She explicitly invokes Timothy Williamson’s thesis that knowledge is the most general factive mental state, the common core of seeing that p, noticing that p, remembering that p, and so forth. This provides a simple abstract characterisation: knowledge is a state whose content must be true. The Nyāya tradition, which she discusses and which supports her claim that this is a biological not a cultural feature of knowledge, reinforces this by treating perception, inference, and testimony as success terms. Failures are not instances of knowledge gone wrong but pseudo cases. The success orientation underscores that knowledge attributions are attempts to mark genuine contact with reality, not merely high confidence.

Against this stands the “reflective” side. Experimental work, including Nagel’s own studies on Gettier cases and on the effect of mentioning possible error, shows how easily knowledge attributions can be downgraded when participants are prompted to consider background risks. Even when a case stipulates that the clock is working, reminding people that clocks are sometimes broken can reduce willingness to attribute knowledge. Similar sceptical pressures appear across cultures. Philosophical arguments, such as Harvey Lederman’s challenge to classical common knowledge, intensify the sense that reflection can undermine the very idea of shared knowing.

Nagel proposes that this divergence reflects two different cognitive systems. The intuitive mode resembles what psychologists call type one processing, fast, automatic, and opaque to introspection. The reflective mode resembles type two processing, controlled, scenario simulating, and sensitive to imagined defeaters. She draws an analogy with spatial navigation, where model free and model based strategies coexist, and with reinforcement learning distinctions between habitual and planning systems. The key question is why, in epistemic evaluation, merely imagining a possible obstacle can alter our map of who knows what, whereas imagining a hypothetical roadblock does not usually alter our map of the city.

For educationalists, this framework is powerful because schools are institutions built around the production, recognition, and distribution of knowledge. Teachers constantly make rapid, intuitive judgements about whether pupils know something. These judgements are not arbitrary. They draw on the same capacities for tracking attention, coherence, and epistemic territory that Nagel describes. The developmental and linguistic evidence suggests that such recognition is a genuine skill grounded in shared human capacities.

At the same time, schools are also saturated with reflective prompts. Assessment policies, safeguarding cultures, accountability frameworks, and audit regimes continually raise the possibility of error. In Nagel’s terms, they prime the reflective system. When that system is activated, it simulates unseen defeaters and penalises over confident attribution. The result can be epistemic inflation, ever increasing demands for explicit evidence, documentation, and hedging before anyone is willing to say that a pupil knows. Nagel’s analysis explains why this happens without portraying either side as irrational. Both modes are natural and functional. The problem is institutional balance.

Her work on outcome sensitivity in knowledge attribution has direct relevance to assessment. If people are prone to hindsight bias and to allowing outcome information to contaminate judgements about whether someone knew, then teachers and examiners are vulnerable to systematic distortions. Awareness of this empirical background can inform the design of moderation processes and blind marking practices that aim to reduce bias.

The Nyāya discussion of testimony also resonates educationally. If trustworthy testimony involves both knowledge and a desire to communicate faithfully, then teacher credibility depends on more than subject mastery. It depends on recognisable epistemic sincerity. Pupils are adept at detecting whether an adult is speaking from genuine understanding or merely reciting. The concept of “territories of knowledge” maps neatly onto classroom norms about who should answer which questions and whose testimony should be deferred to. Educational leadership, in this light, is partly about managing these territories wisely.

What Nagel ultimately offers educationalists is a psychologically grounded epistemology. Knowledge attribution is a real, early emerging, cross culturally robust social capacity. Reflective simulation of defeaters is also real and indispensable. The art of educational practice lies in orchestrating these capacities so that intuitive recognition of understanding is supported rather than paralysed by reflective audit, and reflective scrutiny is applied where it genuinely improves epistemic reliability rather than where it merely induces chronic doubt. In this sense, Nagel’s work restores depth to the idea of “knowledge in schools”. It is not simply a curriculum list or a set of exam specifications. It is a dynamic social practice of recognising, trusting, challenging, and stabilising factive mental states within a community. By grounding that practice in both philosophy and empirical research, she provides educationalists with a framework that is at once conceptually rigorous and psychologically realistic.

Nagel’s central move is to take a familiar thought, learning depends on discovering patterns, and then giving it sharper edges by borrowing examples from animal behaviour and from a branch of artificial intelligence called reinforcement learning. The educational significance is that this lets us describe curiosity as a functional mechanism that can be supported or damaged by how we design classrooms, tasks, feedback, and assessment.

She begins by noting that many researchers describe curiosity in an old fashioned Aristotelian way, as a desire to know for its own sake. She names cognitive scientists who do this, and she points to developmental psychologists who show that curiosity appears very early in infancy. The important point is the claim that curiosity is a basic drive that shows up in babies, and in many animals too. That creates a philosophical puzzle: how can an octopus or a rat “desire knowledge”, if knowledge sounds like an abstract concept that requires reflection and language. The philosopher Peter Carruthers presses that worry by arguing that genuine desire for knowledge should require the capacity to represent knowledge as such, which many animals probably lack. Nagel’s response is to suggest that the right way to understand curiosity is to look for simpler signals and learning mechanisms that could function as a stand in for knowledge gain. Nagel turns to an analogy designed to clean up a common confusion between information and knowledge.

She uses a striking example from computer science. Imagine a television that is not showing a programme, but only random static, a flickering field of black and white pixels with no pattern. In one technical sense, this is “information rich”. The idea comes from an influential theory of information associated with Claude Shannon. In that framework, a signal has more information when it is less predictable. A totally predictable signal, like a black screen, contains almost no information because there is nothing to learn from it. But a totally unpredictable signal, like perfect random static, contains a huge amount of information because every moment could be one of an astronomically large set of possibilities. Nagel’s point is that if curiosity were simply a desire to consume information, then a curious creature should love staring at random static, because it is maximally information rich. But we do not behave like that, and neither do animals, at least not in any stable way. So curiosity must be aimed at something more structured than raw unpredictability.

The insight she borrows from the AI researcher Jürgen Schmidhuber is that what curious agents seek is often better described as the chance to make progress in finding patterns, compressing what they experience into a manageable model. A striped pattern on a screen is easy to compress, you can describe it quickly, and after a moment it becomes boring because there is no further progress. Pure static is impossible to compress, there are no stable regularities to capture, so it is also unsatisfying. Curiosity, on this picture, peaks in the middle, where the world is not fully predictable but not hopelessly chaotic either, where there are patterns you can plausibly discover. A curriculum can be like the black screen, too simple, too repetitive, nothing new to model, the student can “do it” without learning anything further. Or it can be like the static screen, too complex, too unstructured, too many disconnected facts, tasks that feel like noise, the student cannot see what would count as progress. In both cases attention collapses. The most curiosity sustaining teaching lives between these extremes, where the student can form expectations, have them challenged, and then revise them in ways that feel like genuine progress.

Nagel links this middle zone idea to a robust pattern in humans and infants. She reports that curiosity often follows an inverted U shape. If you are completely certain of an answer, you are not very curious. If you are totally clueless, you are also not very curious. Curiosity peaks when you have a middling level of confidence, when you almost know but not quite. In developmental psychology there is a related finding sometimes called the Goldilocks Effect, infants look longest at events that are neither too predictable nor too surprising relative to what they have already seen. The important point for education is that curiosity is about the relationship between the learner’s current model and the environment’s demands. Good teaching tunes that relationship.

She reinforces all this with a behavioural study in adults. Participants were left alone in a room with pens. Some pens were labelled as definitely safe, some as definitely giving a mild electric shock, and some were uncertain, a mix of shockers and duds. People spent more time clicking the uncertain pens than the safe and known shock pens combined, even though the information gained had no practical value, and even though they could have entertained themselves more safely by clicking only the safe ones. The basic lesson is that uncertainty itself can be alluring, and that people will pay costs, even discomfort, to resolve it. In classrooms, something similar happens when students obsessively check answers, ask to see mark schemes, or become fascinated by a puzzle whose stakes are low but whose uncertainty is gripping. The educational task is to harness that appetite so it feeds understanding rather than anxiety or compulsive checking.

For Nagel, then, reinforcement learning is a way of modelling learning that is simple enough to be mathematically explicit but rich enough to reveal structural issues. Imagine a creature, or a robot, or a piece of software, interacting with an environment. The environment presents observations, the agent takes actions, and occasionally the agent receives a reward signal. Reward here does not mean praise, it means a numerical “score” that the agent is built to maximise. In a video game, the reward might be points. In a maze, it might be reaching food. The agent begins with no idea what actions are good, so it tries things at random. Over time it learns a policy, a way of acting that tends to lead to higher total reward.

Two ideas matter for the educational analogy. First, many environments are only partially observable. In chess, the whole board is visible, so the “state” is clear. In real life, or in complex video games, you do not know what is behind the door, you cannot see the whole causal structure, and you get only fragments of information. Second, rewards in complex environments are often sparse. In many video games you get no points for a long time until you complete a chain of actions. In life, you do not get immediate “points” for learning how to read, or for building the habits that later make you successful. Schools are sparse reward environments in exactly this sense. The real payoffs of understanding are delayed, and the path is long.

Nagel uses an example from AI to show what can go wrong. AlphaZero, an AI system, became extraordinarily good at chess, shogi, and Go by playing huge numbers of games against itself with a simple reward function: win good, lose bad, draw neutral. It learned without being spoon fed human strategies because the environment is fully observable and feedback is frequent enough that learning can proceed reliably. But when similar methods are applied to games that require long range exploration with delayed rewards, classic examples are large maze like games where the first reward might come only after many steps, the same kinds of systems can fail spectacularly. They never stumble upon the first reward often enough to learn anything, so they get stuck in repetitive loops.

In educational terms, an agent that learns only from extrinsic rewards struggles in environments where those rewards are rare and delayed, which is why many learners cannot rely only on grades or praise to sustain the long action chains involved in real mastery. This is the point of adding curiosity as an intrinsic motivation. In reinforcement learning research, you can give an agent a second kind of reward, not for external success, but for learning something new, often operationalised as being surprised, meaning that what happens violates its predictions. If you have a model of what usually happens when you act, and then something different happens, that mismatch is a prediction error. In both brains and machines, prediction error is a key driver of updating, because it flags that your current model is wrong or incomplete.

Nagel’s claim is that curiosity can be understood as an appetite for this kind of prediction error, not because organisms love confusion, but because prediction error is where model improvement is possible. The earlier white noise example returns here in a more precise way. If you reward an agent simply for having high prediction error in raw sensory terms, you create a trap. The agent might stare at things that are unpredictable but irrelevant, like leaves flickering chaotically in the wind, because it cannot predict them, so it stays “curious” forever. That is a perfect analogy for school distractions, situations that are highly stimulating but do not build domain understanding. The solution in AI research is to reward not raw surprise about pixels but surprise in a more meaningful feature space, a representation that captures the stable, action relevant structure of the environment.

Translate that into education: the teacher’s job is often to build the right features, the right categories, the right questions, so that students are surprised by the things that matter, not hypnotised by noise. The deeper philosophical claim, which Nagel wants to bring back to epistemology, is that this learning picture helps us understand knowledge as a “factive” mental state, a state whose content must be true. In the reinforcement learning setting, a well trained agent develops stable action guiding representations because those representations latch onto real regularities. If they were merely coincidentally true, they would not survive further experience. Continued interaction would expose their fragility and extinguish them. So the truth of the content is what explains why the representation stabilises and continues to guide action successfully.

This is why she thinks the learning model can illuminate the idea that knowledge is a mental state whose very existence depends on truth. It is not just that knowledge happens to be true. It is that the mechanisms that generate and stabilise knowledge select for success under variation, which filters out lucky guesses. Once you see the structure, the educational relevance becomes clear. Curiosity is one of the mechanisms that lets biological learners thrive in environments like ours, environments where rewards are sparse, feedback is delayed, and the true structure is partly hidden. Schools recreate that kind of environment, sometimes intentionally and sometimes by accident.

The question for educational thinking is therefore what kinds of school design treat curiosity as a genuine accelerator of knowledge gain. This yields several practical implications that follow Nagel’s analogies closely. First, classrooms should be designed so that students can form expectations, have them challenged, and then revise them. That is where meaningful surprise lives. If tasks are so routine that nothing violates expectation, there is no updating. If tasks are so chaotic that expectations cannot form, there is also no updating. The teacher’s sequencing of examples, explanations, and practice is essentially a way of managing prediction landscapes.

Second, much of what we call “engagement” can be re described as attraction to high entropy stimuli, and that is not automatically learning. A student can be intensely engaged with something that is effectively informational static relative to the domain, lots of local novelty, no compressible structure. The Nagelian test is whether the student can increasingly summarise, predict, and explain, in other words whether the representation is becoming more compact and more powerful.

Third, assessment and feedback should be timed and structured so that they do not make the learning world purely sparse. If the only feedback that matters comes at the end, many learners will never discover the early building blocks, because the action chain is too long. This is not an argument for constant grading, it is an argument for frequent signals of progress in model building, so that the learner can feel the difference between noise and pattern, between lucky success and stable understanding.

Fourth, Nagel’s view invites educationalists to take questions seriously as the most characteristic human expression of curiosity. Unlike most animals, humans can publicly mark surprise and can ask one another for the missing piece. That makes the classroom not only a place where individuals explore, but a place where a group can pool cognitive maps. If students are punished for surprise, treated as careless when they say “I didn’t see that coming”, or if questions are treated as interruptions rather than as the engine of inquiry, then schools blunt one of the most powerful mechanisms humans have for accelerating knowledge. Conversely, when teachers treat questions as evidence of the learner sitting in that Goldilocks zone, moderate confidence with open uncertainty, they are using curiosity as a guide rather than a distraction.

Nagel read as a piece of educational theory therefore gives a philosophically serious account of why curiosity should matter to knowledge, and it gives a psychologically and computationally grounded account of why curiosity is not just about liking a subject. Curiosity is a steering mechanism for exploration that helps learners build maps rather than routes, in worlds where the rewards of understanding are real but delayed. Schools can either align with that mechanism or fight it. Nagel’s value is that she helps us see which is which.

Nagel then adds the thing that changes everything for education, namely other minds. Curiosity already accelerates learning by making an organism seek out situations where its expectations are likely to be challenged and improved. But the moment you put another sentient creature into the scene, learning no longer runs only along the axis of “me versus environment”. It becomes “me plus you versus environment”, and also, unavoidably, “me versus you” in the sense that other creatures have agendas, competition, and timing. Nagel’s big thought is that psychologists have spent decades obsessing over the “me versus you” side, the attempt to predict another agent’s action by attributing to them beliefs, desires, intentions, and so on. That is the mind reading thesis. But she thinks the “me plus you versus environment” side is at least as foundational and has been systematically underappreciated, and this matters because school is precisely a setting where its primary educational advantage is that other people are present and can be used as epistemic instruments. Arguments for replacing schools with learning systems devoid of other people founder on this key reality.

To see what she is driving at, it helps to separate two functions that easily blur together. One is agent focused mind reading. Here the other creature is treated as a puzzle: its behaviour does not simply follow the “boring” regularities of folk physics, like a log drifting downstream. It is organised by internal states, and those internal states can interact with time in complex ways. A creature might learn something now and act on it much later, not because the environment pushed it mechanically, but because it was waiting for a chance to capitalise on what it knew. So if you want to predict the agent, you need to track something like an agenda unfolding across time. Nagel cites work that stresses this temporal structure of agency, the way actions are often delayed, opportunistic, and sensitive to long term goals rather than immediate pushes.

That matters in education because teaching is full of time lag. Understanding a concept may not show up as the right move until later, when the student sees the opportunity to use it. A classroom that only rewards immediate performance is like a mind reader who treats all action as immediate response and therefore misunderstands what is going on.

The classic psychological way of testing agent focused mind reading is the unexpected transfer task. You can picture it like this. A desirable object is put in one container while the target watches. Then, when the target cannot see, the object is moved to another container. The question is where the observer expects the target to search. If the observer expects the target to search where the target last saw the object, the observer is tracking the target’s perspective, even though it is now wrong. That is the famous false belief test. Nagel’s point is that, under strict conditions, humans pass it and other animals do not, at least not robustly. Children only pass the explicit version around four or five years old.

This has a direct educational moral that often gets missed. The classroom is full of situations where a teacher thinks a student “must realise” that something has changed, but the student’s map has not been updated. To treat that as laziness is to forget that representing someone’s false belief is cognitively demanding, and it develops slowly even in humans.

But Nagel does not want us to stay fixated on false belief, because it pulls our attention to the wrong educational issue. She wants us to notice the second function of social cognition, what she calls world focused mind reading. Here the other creature is not primarily a puzzle whose hidden inner states we want to model for their own sake. The other creature is an instrument, a living sensor, a mobile vantage point, a competitor who can reveal where the value is, a warning system for danger, a guide to what matters. The other creature’s gaze, recoil, sudden shift of attention, or purposeful movement can be used as evidence about the world that is currently beyond my own sensory reach. This is what makes social living such a learning advantage. If you can read another creature’s orientation to the world, you can piggyback on its contact with reality. That is already an educational thesis.

A classroom is not just an arrangement for individual learners to receive information. It is a way of creating an epistemic environment where students can use one another and the teacher as cues to the structure of the domain. In good seminars you can watch this happen. Someone’s face tightens, someone leans forward, someone laughs, someone looks confused, the teacher pauses, a student asks a question. These are not merely “behavioural” events; they are public traces of where the cognitive action is. They tell the group where the difficulty, novelty, or consequence lies. A school that treats these as noise and insists that learning is purely private and internal loses one of the most powerful accelerators of knowledge gain.

Nagel illustrates world focused mind reading by discussing gaze tracking in animals. The most basic case is easy to imagine. If one bird in a flock suddenly looks up at the sky, others look up too. You can understand this without assuming anything like a sophisticated concept of belief. The flock members are not necessarily thinking, that bird believes there is a hawk. They are doing something more primitive and more practical: deferring attention to a conspecific’s orientation because it is a clue that there may be something worth noticing in the world.

The asymmetry matters. If gaze tracking were merely a desire to align, the bird looking up would look down because most birds are looking down for food. But what happens is the opposite: the unusual, potentially informative orientation attracts the followers. Its curiosity in action. Educationally, this is the basic logic of collective attention in classrooms. The student who suddenly notices a pattern, the one who asks a sharp question, the one who hesitates at a step, can pull the group’s attention to a feature that would otherwise remain invisible. That is distributed sensing. Nagel then shows that “seeing” is itself a learnable structure. Some animals learn only crude projections of gaze direction, and some learn something closer to line of sight, including obstructions.

The experimental set up she describes is simple enough to visualise. Put two animals in adjacent spaces where they can see each other, with an opaque barrier above but an open gap below so you can walk around. Shine a laser dot on the wall on one side. One animal looks up at the dot. The other animal can see its partner looking up, but cannot see the dot from where it stands. A basic gaze follower will simply look up too, even though there is nothing on its side. A more sophisticated one will go around the barrier to the other side to check what the partner is looking at. That “go around to see what you see” move is a genuine cognitive upgrade, because it encodes the idea of occlusion. It is not enough to know the direction of another’s face. You have to grasp that the world might be structured by barriers, and therefore that what is seen depends on location.

The educational analogy is immediate and practical. Early learners often have something like the ibis stage of understanding. They track overt cues, the teacher is looking at the board so the important thing is on the board. They may not yet grasp the occlusion structure, the difference between what is visible to one person and not to another, the way a line of reasoning can be blocked by a missing premise, or the way a diagram affords one perspective and hides another. More mature learners go around the barrier. They change perspective, ask for the missing assumption, request the earlier step, reconstruct the hidden part of the argument, or find the data that was not in view.

Teachers sometimes call this “metacognition” or “independent learning”, but Nagel’s lens suggests it is a very specific kind of competence, a competence in modelling visibility and occlusion across minds and across representations. It is not just confidence, it is a learned grasp of what can and cannot be known from a given vantage point. She then connects this to knowledge and ignorance in animals, and this is where her educational point starts to bite more sharply. Many species show behaviour that looks like tracking who knows what, even if they cannot represent false belief. Scrub jays that cached food while observed are more likely later to move their caches when alone. That can be explained in a simple reinforcement way, caches made under observation are risky. But later raven studies push toward a more mentalistic interpretation. Ravens avoid revisiting a cache when a competitor could see them unless that competitor already watched them make that cache earlier. That pattern makes little sense unless the raven is treating the competitor as having, or lacking, knowledge of a specific fact about the current world, where the cache is now.

This supports Nagel’s claim that many animals track a distinction between knowledge and ignorance even if they do not manage the full decoupled representation required for false belief. This becomes philosophically and educationally important when she introduces the idea that many animals might have a model of the world that represents only how things are now, not how they were, might have been, or are believed to be by someone else in a way that conflicts with the present. She discusses a proposal, associated with work by psychologists studying animal memory, that animals update a present centred world model over time rather than representing time as an explicit dimension. A cached worm “fades out” of the world model as it becomes less likely to be there. The animal’s model changes, but without representing the change as a change.

Humans, on this view, add something extra, a capacity for explicit temporal reasoning and for decoupled content, the ability to hold in mind representations that are not simply the current world, including past events, future possibilities, counterfactuals, and other people’s misrepresentations. For education this is an extraordinarily fruitful distinction. A great deal of schooling is precisely training in decoupled content. Historical thinking asks students to represent past worlds that are not now. Scientific modelling asks students to represent unobservable structures and counterfactual interventions. Mathematics asks students to manipulate abstract structures that are not present in perception. Literature asks students to track what characters believe, including false beliefs, and what readers know that characters do not. Even ordinary classroom management is saturated with decoupled content, the teacher models what students might misunderstand, students model what the teacher expects, everyone models what will happen later, tests, deadlines, consequences.

If Nagel is right that decoupled representation is hard and develops late, then it is no wonder that so many educational failures involve a mismatch between adult assumptions about what students can represent and what students can actually hold steady. It is also no wonder that knowledge talk becomes fragile under reflection, because reflection is exactly the activity of moving into decoupled possibility space. This sets up her discussion of Gettier style patterns in animal mind reading. She describes a monkey experiment in which a monkey watches a human agent see a piece of fruit go into one box. Then the agent’s view is blocked. During that occlusion, either the fruit merely jiggles out and back in, or the box itself opens and closes without the fruit moving. When the agent’s view returns and the agent reaches, the monkeys behave as if their expectation of correct reaching is intact only when nothing changed while the agent could not see. If the fruit moved while the agent’s view was blocked, even if it ended up back where it started, the monkeys no longer expect a correct reach. Nagel’s interpretation is that the monkeys are not tracking “true belief” in a thin behavioural sense. They are tracking knowledge as a factive relation to current reality. As soon as the fruit moves while unseen, the agent loses knowledge of its current location. Even if it returns, the agent did not witness the return, so knowledge is not restored. The agent is now, in the relevant way, ignorant, and therefore the monkey withdraws confidence in the agent’s action.

Why does this matter for educational thinking? Because it reveals something about what it is to treat someone as a reliable guide to the world. If you are using other agents as instruments for learning, world focused mind reading, then what matters is not their internal phenomenology, not what it feels like from their side, and not whether they have something that looks like a belief that happens to be true. What matters is whether they are currently positioned as an expert on that patch of reality, which depends on whether their access has remained intact across changes. In classrooms, this is exactly how students treat teachers and peers, often without articulating it. A teacher who missed a key change, a new policy, a correction, a conceptual shift, can instantly lose local epistemic authority in the eyes of students. Conversely, a student who happened to get the right answer but whose method involved an unseen error can be treated by peers as “not really knowing”, even if they are confident. Teachers themselves do this all the time. They are willing to trust a student’s future performance only if the student’s grasp was formed under conditions that would remain stable under relevant variation.

Nagel’s deeper point is that Gettier problems, which in philosophy often look like arcane puzzles about knowledge versus justified true belief, arise naturally from a social learning system whose job is to decide when another agent is a reliable sensor for the world. If the world can change while an agent lacks access, then their apparent success may be luck. A system tuned to avoid being misled will sometimes cancel deference when access was broken, even if luck restored truth on the ground. That is a rational strategy for a creature that wants to harvest others’ knowledge. In education, this helps explain why students can be remarkably unforgiving about teacher mistakes, and why teachers can be remarkably unforgiving about superficial correct answers. It is an epistemic defence mechanism rooted in how social learning works.

Nagel’s contrasting agent focused and world focused mind reading is therefore also a framing for schooling. Many educational models assume that the central problem is getting inside the learner’s head, modelling misconceptions, tracking beliefs, diagnosing mental representations. That is agent focused. It is vital, but it is only half the story. The other half is that classrooms are environments where learners must constantly decide whom to treat as a guide to reality, and where that decision is updated in real time as access changes. The teacher has a different angle of view. The student has a different history of experience. Peers have seen things you have not. Someone’s sudden gaze shift toward a diagram, someone’s hesitation before a step, someone’s surprise at a counterexample, these are not just behaviours. They are public cues about the world and about what is currently knowable. The classroom is, in that sense, a social sensing network.

Nagel points out that human animals do not merely use other agents as cues. We can ask them. We can demand reasons. We can request the missing perspective. We can create explicit norms of testimony. We can build institutions, like schools, that specialise in constructing and transmitting knowledge across time, across occlusion, across generations. Nagel has already shown that knowledge attribution is not just an inward looking judgement about someone’s private state. It is also a practical, outward looking way of using other minds to get traction on a world you cannot fully see for yourself. If education is about anything, it is about making that outward looking use of other minds more accurate, more generous, and less vulnerable to the easy cancellations that come from broken access, misinformation, and mere luck. So lots of animals can “read minds” in limited ways, in the sense that they track what others can see and what others know, and they use that to learn faster. Humans do something extra. We do not just benefit from other creatures as accidental “mirrors” of reality, we deliberately turn each other into mirrors, and we also become curious about the state of the mirror itself. We are interested in what you know, what you do not know, what you might be wrong about, and what would change your mind. That curiosity, when it is shared and made public in talk, is what lets learning become collegial. We build a shared map of the world and a shared map of each other’s maps, and those two maps are constantly being adjusted together.

She begins by reminding us of examples of “selective social learning”. On the perceptual side there was geometrical gaze following, the case where you notice another creature’s gaze shifting behind you and you treat it as about something in the world, maybe a threat, maybe a resource. This is the “rear view mirror” idea. On the epistemic side there were competitive cases where animals track who has knowledge and who is ignorant, and they either follow the knowledgeable agent’s cues or ignore them, as in the primate “Gettier style” scenarios where an agent’s being right can come apart from their actually knowing because the truth changed while they were not looking.

Those earlier examples already point to something educational: learning is quicker when you have ways of deciding whose cues to treat as reliable, and when to stop treating them as reliable. Then she splices that into the reinforcement learning story from the curiosity idea. She assumes a fairly standard picture from reinforcement learning theory: behaviour is guided by reward, “policies” are patterns of action that tend to get reward, and “value functions” store what is expected to pay off. She uses the familiar contrast between model based and model free reinforcement learning, and she makes it intuitive with the maze story and the driving home story. Model free is the “autopilot” system. You store a cached value, turn left here because it usually works, switch lanes there because you have done it a thousand times. Model based is the “map and simulation” system. You carry a structured representation of the environment, and you can run mental what ifs, like hearing on the radio that your exit is closed and simulating a new route.

She attributes the imperative versus declarative contrast to Peter Dayan, and she uses it as a hinge: model free control is like a set of action commands, do this now, whereas model based control stores facts about how the environment is structured and uses those facts to infer what you should do. The point of bringing this in is an argument about kinds of mind reading. Many animals do something like model free social cognition. They learn stable associations and habits about others’ cues, and those habits can be very good, but they do not look like flexible, explicitly revisable models that can be used for planning and for deliberately changing the other’s state. That is why she mentions the well worn point about non human primates being weak at strategic deception, and she cites work in the orbit of Laurie Santos and Alia Martin on the limits of “tactical deception”, the absence of clear evidence that primates deliberately plant false beliefs by supplying false information. The claim is not that animals never mislead, but that the human style, in which you represent what the other believes and you intentionally intervene on that belief, is not well supported.

From there she goes back to perception and gives you the “distinctively human” discontinuities. The first discontinuity is the so called nine month revolution, associated with work by Celia Brown and Andrew Meltzoff. The basic observation is that human infants start with a coarse sensitivity to head direction, but around nine to ten months there is a striking jump in sensitivity to eyes, to the difference between the head turning and the eyes being open or closed, and to the way gaze actually fixes attention. She contrasts this with the more gradual improvements you see in chimpanzees and monkeys, and she adds an anatomical ingredient, the cooperative eye hypothesis associated with work by Masaki Kobayashi and Shiro Kohshima.

Humans have a conspicuous white sclera that makes gaze direction easy to read. Many other primates have darker sclera and less contrast, which makes gaze direction harder to read at a glance. Conspicuous gaze seems, at first, like an odd design choice if you are competing, because it gives away what you are attending to. The proposed explanation, pushed especially by Michael Tomasello, is that humans are unusually cooperative and coordinative, and in cooperation you want your partner to read you. Tomasello’s contrast is simple and memorable: in competition you read minds against the other’s will, in cooperation you advertise your mind because joint action depends on it.

Humans do not merely detect gaze, we manipulate each other’s gaze and attention. The emblem of that is pointing. Around ten to fourteen months, infants begin to point and to coordinate pointing with checking the caregiver’s face and checking the object, and this becomes a turn taking structure. She uses a cross cultural study led by Alice Liszkowski, with infants in rural Papua New Guinea, Kyoto, and an Atlantic Canadian community. The methodological detail that matters is that the experimenters did not tell caregivers to point. They simply left caregiver infant pairs in a room with objects, and still found pointing, turn taking, and what they call “conversation like structure” in pre linguistic gestural interaction. This is not a local Western parenting quirk, it looks like a deep pattern.

She draws a line between this human pattern and what you find in other primates. Chimpanzees can do mutual gaze and they can follow gaze, but they do not integrate these into a stable practice of joint attention around a third object, and they do not naturally point to each other. She mentions classic work on how difficult it is to train chimpanzees to use cooperative pointing cues in hidden reward tasks, compared to how quickly twelve month old human infants do. She mentions that dogs and goats can respond to pointing, and dogs can alternate gaze between owner and desired object, but she stresses the missing piece: in humans, shared attention itself is rewarding, it is not only a means to getting the sausage. In dogs, gaze alternation may be instrumental learning or expectation checking rather than a drive toward jointly attending for its own sake, and one diagnostic she gives is that dogs do not obviously stop once the human is looking correctly, because the goal is still the food.

Nagel connects joint attention to curiosity. The “surprising object” in the room, the balloon, the flashing light, the feather boa, is the kind of stimulus that would engage a curious animal. But humans do a social thing with surprise that other animals do not. We overtly signal surprise. We say “oh”. We get extra reward by making your surprise happen and watching it, and we refine our sense of what you know and do not know by hearing your surprise markers. That is the pivot to conversation analysis. She invokes John Heritage’s classic finding that “oh” functions as a change of state token, a public marker that the speaker has undergone an epistemic shift, and she complains that developmental coding schemes often strip out “expressives” like oh, hey, uh oh as if they are semantically empty filler.

She cites recent corpus work, especially the Candor corpus released in 2023, which includes video and tools for turn segmentation and back channel analysis. The point of the corpus statistics is that back channel is a massive, regular, structured part of conversation, and it does epistemic work. “Oh” and “yeah” are not just noise, they are signals about whether information was new, whether it was expected, whether it is being taken up as knowledge. If you care about learning then if Nagel is right you should care about the public signals by which learners show that something has landed, that it was new, that it corrected a prior expectation, or that it did not.

In classrooms we often focus on the content of answers and ignore the micro signals of uptake, but those micro signals are how a teacher tracks whether the shared map is converging, and whether the class has common ground. Nagel likes the analogy with philosophy Q and A: one person offers a theory, another offers a counterexample as a surprise trigger, and the group moves by a social economy of expectancy violation. It’s a good picture of a learning environment. For children at around four to five years explicit belief attribution comes online. Nagel works with a model of progression associated with Henry Wellman’s milestone sequence: early sensitivity to goals and desires, then tasks that are often misleadingly described as “diverse belief”, then robust competence at knowledge access around three to four, and then explicit false belief around four to five.

She is sceptical about the literature claiming infants have implicit false belief sensitivity, and she treats replication worries as serious. Her intended conclusion is: humans start with a knowledge centred framework, like other animals, and then later add belief. And this helps us see the role of belief. Belief is where decoupled content shows up, representations that can depart from current reality, and once you can represent those, you can predict behaviour driven by misconception, not just ignorance. She uses Gettier style tasks in development to separate “belief as such” from mere “falsity is hard”, and she cites work by William Fabricius and colleagues using a Gettier twist on the unexpected contents task: the child sees an M and M bag, expects candy, is shown a pencil, then the pencil is removed and candy is put in. When Elmo arrives, what will he think is inside, and why?

The weird performance dip, where children can answer correctly when younger, get worse around four and a half, and then recover later, is used to argue that there is a phase where children are strongly tracking knowledge and ignorance and treating absence of knowledge as central, but they do not yet have a stable, adult like grip on belief as a state that can be true, false, or accidentally true without knowledge. She mentions older work like Brad Pillow’s 1989 findings that young children can recognise that someone else knows what is in a container even when the child has not formed a belief about it, which pressures the idea that early mind reading is belief matching.

She mentions Alan Leslie’s kind of belief first account and criticises it on the grounds that Gettier style difficulties should not arise if the problem were merely inhibiting a “true belief default”. All of that can sound distant from education until you notice what she keeps returning to: the learning value of tracking who knows what, and the distinctively human practice of actively intervening on what others know and believe. In school, teachers and students are not merely observers of each other’s attention and knowledge. They constantly intervene. Teachers point, ask, prompt, name, rephrase. Students signal uptake, confusion, surprise, and they also strategically manage what they reveal. Nagel's approach is telling us that the classroom is not just a place where individual brains update, it is a place where the group constructs a shared model of the topic and a shared model of the distribution of knowledge in the room. Good teaching is, in large part, skilful steering of joint attention plus skilful reading of epistemic signals plus skilful creation of safe surprise. The “safe surprise” idea she borrows from the Schmidhuber television static example mentioned earlier, that random noise is not surprising in the learning relevant way, because you cannot form stable expectations, translates into a practical constraint: if everything is unpredictable, students cannot build a model, and if everything is over predictable, nothing updates. The art is to pitch tasks so that expectations can form and then be productively violated.

Nagel also gives a neat way to think about habits versus flexible understanding in education. Model free learning is what you get when students can do the procedure, the imperative, but cannot explain it or adapt it when the situation changes. They can “drive home on autopilot” through standard exercises. Model based learning is what you get when students can represent the structure of the domain and simulate consequences, so they can adapt when a familiar cue is missing, when a problem is posed in an unfamiliar way, when a constraint changes. A teacher who wants deep understanding is trying to build model based cognition, not merely reinforce cached values.

But Nagel adds a social twist: the route to model based mind reading, and perhaps by analogy the route to model based subject understanding, runs through interactive feedback. In infants that feedback comes from pointing and naming, from the caregiver’s responses, from the child’s ability to actively test what the other sees and knows. In classrooms the analogue is dialogue that is genuinely responsive, not merely performative, where students can test their assumptions, witness each other’s surprises, and refine not only their own model of the content but also their model of what counts as knowing in that domain. You can see the warning sign too. Once belief attribution and decoupled content come online, conversation becomes a high powered engine for learning, but also for coordinated error.

Nagel gestures at social media as an amplification system for decoupled content, where mistaken beliefs can be broadcast, rewarded, iterated, and compounded. Educationally, that implies that teaching cannot only be about transmitting facts, it must be about building learners’ capacity to manage decoupled content responsibly: distinguishing knowledge from mere assertion, tracking sources, recognising when they are relying on cues from someone who does not know, and learning when to switch off deference. In her terms, the human mind reading system is astonishing, but it is also vulnerable, because the very tools that let us build common knowledge also let us build common misconception. So for Nagel learning is fastest when curiosity, joint attention, and publicly legible epistemic signals are put together, because then we can deliberately coordinate who attends to what, who knows what, and what needs to change in the shared model. The embedded examples, from infants pointing in Kyoto and Papua New Guinea, to the cooperative eye anatomy, to the “oh” that marks a knowledge shift, to the Gettier flavoured sweet bag that shows a child struggling with belief, are all there to make that line plausible. They are showing you that distinctively human education is not just more information, it is a distinctive social technology for making knowledge states visible, contestable, and shareable, and then using that visibility to steer both individual and collective cognition.

Nagel then asks about how knowledge actually moves between minds, and why that movement sometimes yields knowledge and sometimes yields something weaker, even when it yields truth. The organising contrast is between knowledge possession and knowledge transmission, but the deeper claim is that the same kind of learning machinery that lets us become safely accurate about the world also lets us become safely accurate, often without noticing how, about who knows what. Once you see that, testimony starts looking like a hard won achievement of our animal learning systems.

Nagel begins by fixing the target: knowledge as a factive state, a state that “never deviates from its object”, in the Nyāya formulation attributed here to Vācaspati Miśra, and as “the most general factive mental state” in the Williamsonian idiom. The point of the pairing is to insist that knowledge is essentially tied to truth not as an extra condition bolted on afterwards, but as a feature of the kind of state it is. That can sound as if knowledge should be limited to pure deduction, to logic and mathematics, but it builds on what Nagel has been arguing all along that ordinary reinforcement learning, the kind of prediction error driven adaptation shared widely across animals, can stabilise representations that are safely accurate, and therefore can count as knowledge in this factive sense.

The “safety” issue matters because it is not enough to land on the truth by luck. The system must be tuned so that nearby, slightly different situations would not easily have produced error. Curiosity then becomes a reward structure, a tendency to find expectancy violation satisfying, which drives exploratory behaviour that challenges and strengthens the agent’s model of reality. This makes the basic motivational picture both more concrete and less moralistic. You do not need to assume that learners are naturally virtuous truth seekers. You need to design environments where surprise is informative, where the mismatch between expectation and outcome carries a usable error signal that can reshape the learner’s model.

Nagel insists again and again on the difference between meaningful surprise and mere novelty, which is why the earlier “random static” TV screen example remains in the background. If there are no stable expectations, then violation carries no structure, and the stream becomes boring rather than educationally productive. A good lesson, on this picture, is not a firework display. It is a calibrated engine for updating. Ravens and chimpanzees are her recurring exemplars: creatures under competitive pressure who need to distinguish the knowledgeable from the ignorant. They can use others’ behaviour as cues, selectively taking information from those better placed. Humans go further because we do not merely exploit what happens to be visible in others’ behaviour. We intervene. We show and tell. We draw attention. We ask questions. We explore each other’s minds in conversation.

That difference from other animals is what Nagel wants to explore: how does our knowledge transmission happen? Three transmission routes are set out in ascending social difficulty, and they already map neatly onto classroom practice. First, I can transmit knowledge by directing your perceptual attention to the world in a way that makes you independently safe. A hidden spotlight illuminates the spider by your ankle, or I point at it, and once you see it you do not need to trust me. Even if you dislike me or suspect a prank, your own perception locks you onto the fact. This is an elegant model for much of teaching: the teacher is often, at their best, not a source whose word must be taken on faith, but an organiser of perception and attention. They arrange the diagram, the apparatus, the text, the worked example, the physical demonstration, and the learner comes to see. Classroom trust still matters, because learners may not look where a distrusted person points, but when the method works, perception supplies an independent route to the truth.

Second, I can transmit knowledge by activating your inferential capacities through argument. Nagel uses Augustine’s De magistro and the image of an Epicurean connoisseur reciting arguments for the immortality of the soul as an example. The connoisseur does not believe what he says, but the passer by, “spiritually open”, can gain knowledge if they recognise the soundness of the reasoning from premises they know. Again, the trust requirement is minimised. What matters is the audience’s grip on the premises and their capacity to follow the proof. A classroom analogue is the way a student can come to know a theorem, or a historical inference, even if they are suspicious of the teacher’s motives, provided the reasoning is transparent and the premises are secured. This route models a certain ideal of education, knowledge through reasons one can own, but it is also limited. Not everything important is available this way, and it is slow.

Third, and most importantly, knowledge is usually transmitted by testimony. Here your attitude to the source becomes vital. If you take someone’s word, you are not simply being guided to look, and you are not simply being guided through a proof. You are accepting content because it comes from them. Nagel anchors this in Nyāya as well, invoking the idea of a trustworthy authority, and the description, associated with Vātsyāyana, of a good instructor as someone who knows directly and has the desire to communicate faithfully as it is known. But the crucial twist is that testimony can yield true belief without yielding knowledge if the trust is irrational. Trusting someone because of hair colour would usually be irrational. You can get lucky. You can reach the truth. Yet you have not judged in a way that ensures truth. If, by contrast, your trust is itself knowledgeable, if you know that they know, then your acceptance is safe. You arrive at the truth in a way that could not easily have led you wrong.

It would be great if we had a general way of knowing when someone knows something we do not. Do we? The answer is that in practice we behave as if we do. This is brought in through John Heritage’s notion of “epistemic territory”, the rough map we carry of what our conversational partners know. Heritage’s claim is that relative access to epistemic domains is treated as more or less settled in ordinary interaction, and yet this should seem like a problem of daunting complexity. In education the analogue is immediate. Teachers and students are constantly making micro judgements about who is entitled to say what, who should be asked, who should be told, who is guessing, who is competent, who is bluffing, who is out of their depth. Classroom life runs on these tacit maps.

Nagel’s ambition is to explain how such maps can be learned at all. The proposed strategy is to treat knowledge recognition as parallel to face recognition rather than long division. Face recognition is both familiar and puzzling. We know we can recognise friends at a glance, but we lack introspective access to how we do it. The suggested reason is computational. The task is too large for working memory and conscious step by step rule following, so it must be done by massively parallel, largely unconscious computation. The empirical details matter because they ground the philosophical picture.

Nagel cites estimates that adults recognise around 5,000 faces, including acquaintances and celebrities, and that a familiar face triggers a distinctive neural response at around 140 milliseconds, with identification typically under a second. The challenge is two-fold. Faces are broadly similar, so individual differences are subtle, and yet photographs of one person vary hugely across lighting, pose, expression, hair, age. We must tell people apart and tell people together, as Rob Jenkins puts it. The “telling together” problem is especially vivid in classroom terms, because it resembles conceptual learning. Novices often treat different instances as different things, and experts see them as the same underlying structure. A new student in algebra sees each problem as a new face. A fluent student sees the invariant. Older, more descriptive approaches, associated with Vicki Bruce and Andy Young, imagined something like face recognition units and person identity nodes, with a kind of checklist of features. But decades of work struggled to find a compact set of descriptive dimensions that would do the job. Nagel argues that when data and complexity are high, you stop trying to specify the dimensions in advance and you let a large parameter system learn them through error driven adjustment.

DeepFace, trained on millions of images, and FaceNet, which maps images into a 128 dimensional embedding space and uses triplet loss, are presented as emblematic. Their successes illustrate how a system can generalise across variation by carving a high dimensional space into regions where each identity occupies its own zone, separated from others. The Voronoi tessellation image, the idea of prototypes and cells, supplies an intuitive geometry: a new image is recognised when it falls closer to the cluster for that person than to any cluster for someone else.

Think of a learning system, such as a neural network, as something that tries to recognise patterns from examples. A problem arises when the system becomes too narrowly tuned to the exact details of the training data. This is called “overfitting”. The system looks accurate during training, but it has really just memorised quirks of the examples rather than learning the general pattern. Because of this, a tiny, irrelevant change in the input can easily make it give the wrong answer.

A technique called dropout, introduced in work by Alex Krizhevsky and others in deep learning, helps prevent this problem. During training, dropout randomly turns off some of the artificial “neurons” in the network. Since parts of the system are constantly switched off, the network cannot rely on very specific combinations of neurons working together. Instead, it is forced to learn patterns that still work even when different parts of the network are missing. In effect, it has to learn features that are reliable across many slightly different versions of itself. This makes the system more robust.

A poorly trained system that has overfit the data is fragile. Small, meaningless details can mislead it. For example, imagine a model that recognises pictures of an actor partly because of a single coincidental pixel pattern in the training images. If that pixel changes, the system might suddenly misidentify the person. This kind of mistake shows that the system has not really learned the relevant features. A well-regularised system trained with dropout behaves differently. Small, irrelevant changes to the input do not easily lead it astray, because its judgments depend on broader and more stable patterns.

Philosophers describe this desirable property using the idea of epistemic safety. A belief is “safe” if it could not easily have been false in nearby situations. In the machine learning example, safety becomes something we can see directly: a system is safe when small, irrelevant variations in its inputs do not push it into error. In other words, safety shows up as a concrete feature of how sensitive the system is to changes in the data. Nagel widens the methodological frame again by discussing the bias variance trade off and the puzzle of why massively over parameterised models can perform well. To understand what she’s getting at it helps to start with a common problem in machine learning.

When a model learns from data, it must balance two risks. If it is too simple, it may miss important patterns in the data. If it is too complex, it may start fitting noise and accidental details rather than the real structure. This tension is often called the bias–variance trade-off. Simple models have high bias, meaning they miss complexity. Very flexible models have high variance, meaning they can become unstable and overly sensitive to the data they were trained on. For many years, researchers believed that adding too many parameters to a model would inevitably make it worse, because it would overfit the data. But modern deep learning has produced a surprising result. Very large models with far more parameters than training examples can still perform extremely well. Work by Mikhail Belkin and colleagues helped explain this pattern using what is called the interpolation threshold and double descent.

The idea is this. As models become more complex, their performance first worsens because they overfit. But if complexity keeps increasing, performance can improve again. This produces a curve that drops, rises, and then drops again, which is why it is called “double descent.” In these very large models, the system can fit the training data perfectly while still capturing useful patterns. One explanation for this comes from work by Uri Hasson and collaborators, who describe modern neural networks as using a direct-fit strategy. Instead of discovering simple universal laws from a small amount of data, the model learns by adjusting itself to a huge number of examples. In effect, it builds a very detailed map of the patterns present in the data it has experienced. This means that such models are often interpolating, not extrapolating.

They perform well because they have seen many examples that fill in the space of possibilities. When a new example appears, it is usually similar to something already in the training data. The model can therefore respond by drawing on nearby cases rather than by applying a simple rule that it has discovered. A useful metaphor is to imagine stretching a sheet of fabric over a very uneven landscape. The sheet settles into the contours of the terrain. In the same way, a large learning system adapts itself to the complicated structure of the data it encounters. Philosophically, this picture suggests a particular kind of intelligence. It can be extremely powerful within the domain it has experienced, because it captures many subtle patterns.

But it is also local and bounded. Its success depends on having dense experience of the environment. It does not necessarily reveal simple underlying laws of the world, nor does it guarantee insight far beyond the range of the data it has encountered. This idea helps explain a common problem in education. There are different kinds of knowledge, and they behave differently when we try to learn or teach them. Some knowledge is law-like and highly compressible. Once you understand the underlying structure, you can apply it very widely. A classic example is Einstein’s equation E = mc². When you understand the principle behind it, the same idea can be used to explain phenomena far beyond the specific example that first introduced it. Knowledge of this sort resembles what researchers call an ideal-fit model. A few clear rules or principles allow you to reason far beyond the examples you have seen.

But much other knowledge does not work like this. Many practical skills, and much social understanding, are learned through many examples rather than a single rule. You get better at recognising patterns, judging situations, or performing a skill by encountering many cases. This kind of learning is closer to what machine learning researchers call direct fit. It works well within the range of situations you have experienced, but it does not automatically transfer to very unfamiliar contexts. This difference often causes confusion in teaching. When students fail to apply something they have learned to a new situation, teachers sometimes assume the student has not learned the material at all. But that is not always true. The student may have learned the concept locally, meaning they can use it in situations similar to the examples they studied. The difficulty arises when they are asked to apply it in a situation that is too far removed from those examples.

The solution is not always simply to explain the concept again. Often the real need is more varied experience. Students benefit from seeing the same idea applied across a wider range of cases. By working through many slightly different examples, they gradually build a more flexible understanding. Their “conceptual map” becomes richer and more stable, allowing them to recognise when a principle applies even in unfamiliar settings.

Nagel also points out that learning does not depend only on the structure of the information being presented. It is also shaped by what we pay attention to and what we are rewarded for noticing. Our cognitive systems learn best when they are repeatedly required to make meaningful distinctions. A well-known example comes from research on face recognition. People are often better at recognising faces from their own racial group than from other groups. This is called the “other-race effect.” A similar pattern appears with age groups, known as the “own-age bias.” These effects are not usually the result of conscious prejudice. Instead, they arise because people tend to have more practice distinguishing individuals within the groups they interact with most. When we regularly need to tell individuals apart, our perceptual system becomes very good at noticing subtle differences. But when we mostly encounter people as members of a broad category, our system does not develop the same level of fine-grained discrimination.

This insight connects directly to education. What students learn depends heavily on what the environment encourages them to pay attention to. If assessments mainly reward recognising broad categories or repeating familiar labels, students will learn to operate at that coarse level. They will become good at identifying the general type of problem or the expected terminology, but they may not develop the ability to make more subtle distinctions. Similarly, if classroom culture rewards appearing knowledgeable rather than genuinely understanding, students quickly learn strategies for display. They may focus on sounding confident, reproducing expected phrases, or signalling agreement with what they believe the teacher wants to hear.

A more productive environment rewards something different: the ability to notice and articulate meaningful differences. When students are encouraged to distinguish between closely related ideas, interpretations, or problem types, their cognitive system becomes better at telling things together when they belong together and telling them apart when they differ in important ways. Over time, this builds deeper competence in whatever field they are studying, whether that means interpreting literature, reasoning scientifically, or constructing mathematical proofs.

Nagel treats knowledge recognition as a higher dimensional version of face recognition. We learn not only the regularities of the world, but the regularities of how agents like us get adapted to the world. We learn the patterns in which perceptual access yields knowledge, and the patterns in which ignorance is likely. The coin in the palm and the coins in your pocket are simple illustrations of how quickly we can place an agent proposition pairing near a prototype of knowledge or a prototype of ignorance. Importantly, Nagel insists that many pairings will not yield a swift verdict, and that this too is part of our competence. We can ask, do you know that person’s name, as distinct from, what is that person’s name? That question presupposes that someone can often know whether they know, which is itself a kind of metacognitive mapping.

A central question in epistemology is why truth alone isn't enough for knowledge. Philosophers such as Ernest Sosa and Timothy Williamson have argued that something stronger is required. Sosa expresses the idea in terms of safety. Roughly speaking, if you truly know something, then you would not easily have been wrong about it in slightly different circumstances. Williamson presents a similar idea within his broader view that knowledge itself is the key standard governing belief and assertion. In both cases, the thought is that knowledge must be stable across nearby possibilities, not just correct by coincidence.

To see why this matters Nagel asks us to imagine a traveller who sees what appears to be water in a distant valley. By luck, there really is water there, so the traveller gives correct advice about where to find it. Or consider Plato’s famous example of a guide who happens to point correctly toward the city of Larissa even though he is merely guessing. In both cases, the advice happens to be right. If you followed it, you would reach your destination. But she thinks most of us would hesitate to say the traveller or the guide knew the correct answer. Their success was too fragile. A small change in circumstances could easily have led them to give the wrong advice.

The key point Nagel is making is that this demand for safety reflects a practical feature of how learning systems work. Judgements that remain correct across small variations are far more useful than judgements that succeed only by chance. Stable judgements allow individuals and groups to rely on what they have learned and to coordinate their actions effectively. This is easy to see in education. A student might give the correct answer to a question once, perhaps by guessing or memorising a specific example. But teachers want to know whether the understanding is robust. That is why they ask follow-up questions, request explanations, present slightly modified problems, or test whether the student can apply the idea in a new context. What they are checking is exactly what philosophers call safety: whether the student’s answer would remain correct under small changes. A student who answers correctly only in the exact situation they practised has a fragile performance. A student who can handle variations in the problem has something closer to genuine knowledge.

Nagel also briefly distinguishes her own prototype based picture from other prototype accounts, such as Goldman’s 1994 style reliabilist prototypes of reliably formed belief, and claims that the infallibilist, truth anchored framing yields a sharper error signal. The educational relevance is that error signals matter. Students need feedback that is diagnostic. If the classroom treats wrong answers as merely unfortunate outcomes of a generally good method, the system can struggle to adjust. If, however, the environment makes clear that certain kinds of mistakes indicate that one did not yet know, then the learner has a cleaner gradient for improvement. This means designing feedback so that errors can be used rather than feared, and so that the learner can see the difference between being right by luck and being right because their model tracks the structure.

Once the parallel between face recognition and knowledge recognition is in place, Heritage’s epistemic territory notion becomes an emergent map built by ordinary learning mechanisms operating on social data. We track who tends to be right in which domains, who has perceptual access, who has a history of reliable uptake, who corrects themselves when challenged, who can answer “how do you know”, who can guide us to independent checks. These are the social equivalents of robust features in a deep learning system. They are not perfect. They are local. They can fail at boundaries, just as face recognition fails with lookalikes, twins, dim lighting, and partial familiarity. But crucially, their local failures do not undermine their global usefulness. Discovering that Mary Kate has a twin does not threaten your recognition of everyone else. Similarly, discovering that someone is unreliable about one topic need not collapse your trust in their entire epistemic territory, if your map is well formed.

Via this Nagelian lens we can see that teaching is not only the transmission of content. It is the cultivation of safe ways of judging, and the cultivation of reliable maps of who to learn from and how. In a classroom, the teacher is both a source and an engineer of independent access. Sometimes they should function like the spotlight, arranging perceptual contact with the world so that students do not have to take anything on faith (as in the spider example we mentioned earlie). Sometimes they should function like Augustine’s passer by, guiding learners through reasons that can be owned. Sometimes they must rely on testimony, because much of what is taught cannot be independently verified in the moment. In those cases, the educational task includes building rational trust, not blind trust, which means helping students develop the skill of locating epistemic territory.

That has practical consequences. It suggests that teachers should make the criteria of epistemic territory visible. Who is likely to know, and why. What counts as evidence of knowing in this domain. What kind of access matters, having seen, having measured, having read, having practiced, having a track record of correcting errors. It suggests that classrooms should include routines that train students to ask the epistemic questions that ordinary conversation uses so smoothly, do you know, how do you know, what would change your mind, what did you see, what are you inferring. It suggests that peer learning succeeds when students have enough shared experience to build local maps of one another’s strengths, and fails when the environment encourages over attribution of knowledge, the very tendency the earlier developmental discussion associated with younger children.

It also suggests that assessment should be designed to test safety, not just correctness. Vary the surface features, change the context, ask for transfer, introduce near misses, because those are the educational equivalents of distinguishing Keanu Reeves from a pixel coincidence, or distinguishing a real face cluster from an overfit trick. Nagel’s direct fit versus ideal fit contrast gives a good message about what education can reasonably promise. Some domains support elegant extrapolation from a few principles. Others require dense experience and careful interpolation.

If educators assume that all understanding is ideal fit, they will misread learners who have built workable local models. If they assume all learning is direct fit, they will fail to teach the compressive power of theory where it is available. The art of teaching is to know which is which, and then to design environments in which curiosity yields structured surprise, error yields usable signals, trust is earned and domain sensitive, and testimony is continuously supported by opportunities for independent checking and reason giving.

Nagel’s theory of “knowledge transmission” is not about pouring content from one mind into another. It is about building a shared ecology in which learners can become safely accurate, and can become safely accurate about who else is safely accurate, which is exactly what makes testimony, and therefore most human learning, possible at all. One of the things Nagel does is help sharpen our understanding about why our everyday epistemic life does not collapse into scepticism, even though we can always manufacture a “could have been wrong” story. She argues that the positive and negative intuitions are not rival verdicts about one single, undifferentiated capacity, but are outputs of two different control regimes that evolved for two different jobs.

Your epistemic instincts are state dependent in the same way your action tendencies are state dependent. In the positive mode you see a person glance at a normal looking clock and you attribute knowledge almost immediately. You do not run through a checklist, you do not consciously compute error rates, you simply register a familiar pairing of agent and proposition as landing near a prototype of perceptual access. In the negative mode you recall the stopped clock possibility, or the “not long enough to check it is working” possibility, and you start to treat the same person as being near a different prototype, one that centres on decoupling, luck, and failure of discrimination. Nagel is insisting that both patterns are psychologically real, both are rationally intelligible, and neither by itself should be allowed to bully the other into global scepticism.

Behaviour can be driven by cached values tied to a state, and it can also be driven by a richer model that allows simulation. The habit system is quick and usually good enough, but brittle in unusual conditions. The goal directed system is slower and more computationally expensive, but flexible in novel conditions. In the reinforcement learning literature this is usually framed as model free versus model based control, and Nagel uses work associated with Peter Dayan and others in that tradition to motivate the distinction. The important point is that a single animal can carry both control styles, and the style that is active can depend on cues, context, and internal condition.

Nagel suggests that our epistemic evaluations work in a similar way. We have an intuitive, prototype driven, direct fit style of attribution that is trained up through enormous amounts of ordinary interaction, and we have a reflective, strategic, model based style that is trained up in contexts where distrust, conflict, deception, or institutional stakes make it worth paying the extra computational cost. When you are in intuitive mode you treat many channels as basically reliable, including ordinary clocks in ordinary rooms, testimony from trusted sources inside their usual domain, and the routine stability of objects and situations. When you are in reflective mode you treat all of this as potentially adversarial, unstable, or noisy, and you start asking for warrants, for error checking, for counter possibilities.

Nagel says the sceptical manoeuvre in the clock case is often a category mistake in practice. It applies the reflective mode, which is designed for hard social conditions, to an intuitive agent operating inside a normal environment where direct fit heuristics are the point. That is why the negative intuition feels clever and yet also strangely sterile. It is clever because it is using a genuinely powerful capacity, the capacity to decouple content, to imagine counter possibilities, and to treat the agent as an object of strategic scrutiny. It is sterile because, in the easy case, that scrutiny does not improve action and it does not align with the way the knowledge attribution system was trained to guide ordinary life. Nagel does not deny that clocks can be broken, but she denies that the mere availability of that thought should force us to withdraw knowledge attributions in the ordinary case, any more than the existence of twins forces us to withdraw face recognition in general.

Nagel goes down a level to something simpler than knowledge: mapping visual fields, to illustrate her point. We are exceptionally good at tracking what others can see, but we are rarely aware of the computational complexity of what we are doing. “A person looking at an object” is not a trivial relation to recover from pixels. Text to image systems can be excellent at style and global semantics while still being poor at gaze and attention because what humans do there is not just passive pattern matching over static images. It is learned in interactive contexts where gaze matters for action, where mistakes get corrected, and where the cost of getting it wrong is sometimes immediate.

The direct fit idea reappears here as a way to avoid demanding a perfect generative theory of optics, occlusion, and geometry. You do not need to draw the exact boundary of a person’s visual field to act well, you need a locally reliable discrimination between clearly seen, clearly unseen, and borderline cases where you sensibly withhold judgement. That local, prototype guided discrimination can be safe because small perturbations of the input would not easily flip the classification in the central zone.

Epistemic territory is introduced as an analogue of visual field, but vastly higher dimensional. Here Nagel is explicitly following John Heritage’s idea that everyday conversation presupposes a running, tacit map of who knows what, what he calls epistemic territory. The key is that this map is not only about facts, it is also about rights to tell, rights to ask, and what counts as socially intelligible action. Conversation analysis has shown that the grammatical form of a sentence does not always determine what that sentence is doing in an interaction. Researchers studying ordinary conversation noticed that the morphosyntax of an utterance, whether it is grammatically a statement or a question, often matters less than the distribution of knowledge between the participants.

Morphosyntax simply refers to the grammatical structure of a sentence. A declarative is the form normally used for statements, such as “You’re going fishing today,” while an interrogative is the form normally used for questions, such as “Are you going fishing today?” In formal grammar these forms are associated with different communicative functions: declaratives provide information and interrogatives request it. But real conversations do not always follow this neat pattern. What actually determines how an utterance functions is the epistemic relationship between the speakers, that is, who is presumed to know what. Conversation analysts have shown that a declarative can operate like a question when the speaker is seeking confirmation rather than providing new information.

Anita Pomerantz’s well known example of “fishing” illustrates this point. Someone might say, “You’re going fishing today.” Grammatically this is a statement, yet in context it may be designed to elicit a reply such as “Yes, I am,” or “No, not today.” The speaker is not simply asserting a fact but inviting the other person to confirm or correct it. In this sense the declarative functions much like a question even though its grammatical form is that of a statement.

Work by John Heritage and others has developed a framework for understanding why this happens. Heritage distinguishes between epistemic status and epistemic stance. Epistemic status refers to the knowledge a person actually has or is socially expected to have. Someone typically has higher epistemic status regarding their own plans, experiences, or intentions than anyone else does. Epistemic stance, by contrast, refers to how a speaker presents themselves in a particular moment, whether they speak as someone who knows, as someone who is unsure, or as someone who is merely guessing. The two do not always align. A speaker may possess the knowledge but adopt a cautious stance, or they may speak confidently despite lacking full information. Because of this, conversation is filled with small, moment by moment negotiations over knowledge. Speakers continually adjust their wording to display what they know, to test what others know, or to invite confirmation.

These micro negotiations explain why interrogatives can sometimes function as tellings and declaratives as questions. What ultimately governs the interaction is not the grammatical clothing of the sentence but the participants’ shared understanding of how knowledge is distributed between them and how that distribution is being managed as the conversation unfolds. Classrooms are full of intentionally engineered mismatches between stance and status.

A teacher often adopts a K minus stance, pretending not to know, to elicit a student’s articulation. A student sometimes adopts a K plus stance, performing certainty to protect face or status, even when they are unsure. Examinations are institutionalised versions of this, as are seminar questions where a question is grammatically interrogative but pragmatically a challenge, a display of competence, or a bid for epistemic authority. Once you see the Heritage point, you stop treating “question” and “telling” as linguistic forms and you treat them as social actions coordinated by epistemic mapping.

You can then also see why back channel matters so much in teaching. The small responses, the oh, right, yeah, the nods, the puzzled looks, are not polite noise, they are the feedback signal that lets a speaker calibrate whether they have misjudged the students’ epistemic territory. Nagel suggests that responses to turns are overwhelmingly common, and it uses this to push back against a picture of conversation as proceeding without uptake. The educational upshot is that if you strip away the uptake channel, for example by reading slides monotonically or by designing online interactions that suppress micro feedback, you are not merely making teaching less engaging, you are removing the mechanism by which epistemic territory gets mapped and repaired in real time.

A familiar classroom phenomenon is the contest over who gets to be K plus on a topic. A novice teacher may try to “tell” an expert student something the student already knows, and the student will respond with a form that reasserts authority, perhaps by adding extra detail, or by producing an “oh I know” that frames the teacher’s contribution as mere prompting. Conversely, a teacher might treat a student’s correct contribution as if it were a lucky guess, and in doing so the teacher does not just mark the student down, they reposition the student in epistemic space, with downstream effects on who speaks, who stays silent, and who is treated as credible by peers. When teachers talk about “classroom climate”, a lot of what they are tracking is precisely this: the stability and fairness of epistemic territory assignments and the possibilities of repairing misassignments.

Nagel returns to the original worry about the clock. The negative move, the “but clocks can be broken” move, is like treating every conversational partner as a potential deceiver and every familiar object as a potential trap. Sometimes that is rational, and Nagel gives examples: conflict, deception, institutional high stakes, or domains where the agent has reason to manipulate you. But applying that mode indiscriminately would be like a rat that insists on simulating every possible route for every trivial trip, or like a speaker who refuses to treat any back channel as adequate uptake and therefore never permits closure. The system would grind to a halt.

Nagel’s defence of ordinary knowledge attribution is therefore functional. We need a fast mode that is trained on the regularities of a mostly stable world, and we need a slower mode for the pockets of instability. Knowledge attributions in the easy cases are not defeated by the logical possibility of hard cases, for the same reason safe face recognition in central zones is not defeated by the logical possibility of twins at the borders.

This is also where Nagel’s earlier appeal to safety returns in a new light. Safety is what regularisation is doing in machine learning, and what error driven calibration is doing in conversation. Dropout, noise injection, early stopping, these are ways of preventing a system from latching onto brittle coincidences. In human interaction, the analogue is the way we continually test, repair, and refine our epistemic maps through feedback. A student answers, the teacher’s face registers surprise or confirmation, the student adjusts, the teacher adjusts, and the micro loop continues. When the loop is functioning, both parties end up not only with shared content but with a shared sense of where the content sits in each other’s territory. That is why Nagel says the mapping is not handed over as an explicit proposition, it is achieved as a symmetry, more like mutual eye contact than like a sentence added to a list.

Radford’s puzzle is that if you build common knowledge out of infinite iterations of knows that, it looks impossible for finite minds. The Letterman mirror metaphor pushes the sceptical version: tiny imperfections compound, the image fades, so genuine common knowledge never arrives. Nagel flips the metaphor. In practice, humans are not passive mirrors, they are active calibrating systems. Conversation is not a brittle chain of iterated beliefs, it is a coupled process with dense feedback. If the coupled process is real, and if the epistemic mapping equipment is broadly similar across humans, then a thin, empirical notion of common knowledge might be achievable in ordinary life, not as an infinite stack, but as a stable coordination state maintained by uptake, repair, and closure.

She produces the image of lamps shining together on a shared stage as an image of her proposal about what human social cognition is for: not merely to model others as problems, but to recruit them as collaborators in stabilising a shared grip on reality. Applied directly to education, the whole arc suggests three practical claims. First, teaching is not only transmission of content, it is transmission and continual remapping of epistemic territory. A teacher who ignores this will often misdiagnose failure. The student may not be failing to understand the content, they may be failing to locate themselves appropriately in the epistemic game, unsure whether they are being asked for knowledge, for conjecture, for recall, for interpretation, or for performance.

Second, many classroom conflicts are conflicts about epistemic status disguised as conflicts about facts. When a student says, “You never told us that”, they are not always making a factual claim about an utterance, they are challenging the legitimacy of a test as a move that presupposes they ought to have had the point in their territory. When a teacher says, “You knew this last week”, they may be asserting not only past performance but a right to treat the student as K plus now. If you treat these as merely motivational or behavioural issues, you miss the structure that is actually driving the interaction.

Third, reflective scepticism has its place, but it should be taught as a tool for specific ecological conditions, not as a default posture. Otherwise you train students into a kind of epistemic paranoia, where every source is treated as suspect, every ordinary inference is treated as unsafe, and the result is not critical thinking but slowed thinking and flattened trust.

A better educational ideal, on this picture, is bilingual competence across modes: students learn when it is rational to rely on fast, socially trained, direct fit epistemic mapping, and when the context demands the slow mode of reasons, warrants, and strategic scrutiny.

References

Arcaro, Michael J., Paul F. Schade, and Margaret S. Livingstone. 2019. “What You See Is What You Get: Experience-Dependent Development of Face Processing.” Annual Review of Vision Science 5: 231–252.

Augustine. De Magistro (On the Teacher). ca. 389.

Bavelas, Janet B., Alex Coates, and Lori Johnson. 2000. “Listeners as Co-Narrators.” Journal of Personality and Social Psychology 79 (6): 941–952.

Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. “Reconciling Modern Machine-Learning Practice and the Classical Bias–Variance Trade-Off.” Proceedings of the National Academy of Sciences 116 (32): 15849–15854.

Berlyne, Daniel E. 1966. “Curiosity and Exploration.” Science 153 (3731): 25–33.

Bruce, Vicki, and Andy Young. 1986. “Understanding Face Recognition.” British Journal of Psychology 77 (3): 305–327.

Gettier, Edmund L. 1963. “Is Justified True Belief Knowledge?” Analysis 23 (6): 121–123.

Goldman, Alvin I. 1994. “Naturalistic Epistemology and Reliabilism.”

Gordon, Robert M. 1986. “Ascent Routines for Propositional Attitudes.” Mind and Language 1 (2): 142–156.

Gottlieb, Jacqueline, and Pierre-Yves Oudeyer. 2018. “Towards a Neuroscience of Active Sampling and Curiosity.” Nature Reviews Neuroscience 19 (12): 758–770.

Hasson, Uri, Samuel A. Nastase, and Ariel Goldstein. 2020. “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks.” Neuron 105 (3): 416–434.

Heritage, John. 2012. “Epistemics in Action: Action Formation and Territories of Knowledge.” Research on Language and Social Interaction 45 (1): 1–29.

Heritage, John, and Geoffrey Raymond. 2005. “The Terms of Agreement: Indexing Epistemic Authority and Subordination in Talk-in-Interaction.” Social Psychology Quarterly 68 (1): 15–38.

Hurley, Susan, and Teresa McCormack. 2005. “Keeping Track of Time: Development and Evolution of Temporal Cognition.” Trends in Cognitive Sciences 9 (11): 528–535.

Inoue, Victor H. 1978. “On the Structure of Backchannel Behaviour.” Journal of Pragmatics 2 (1): 25–41.

Kang, Min Jeong, et al. 2009. “The Wick in the Candle of Learning: Epistemic Curiosity Activates Reward Circuitry and Enhances Memory.” Psychological Science 20 (8): 963–973.

Labov, William, and David Fanshel. 1977. Therapeutic Discourse: Psychotherapy as Conversation. New York: Academic Press.

Lederman, Harvey. 2015. “Uncommon Knowledge.” Mind 124 (496): 1019–1059.

Leslie, Alan M. 1987. “Pretense and Representation: The Origins of ‘Theory of Mind.’” Psychological Review 94 (4): 412–426.

Leslie, Alan M. 1994. “ToMM, ToBy, and Agency: Core Architecture and Domain Specificity.” In Mapping the Mind: Domain Specificity in Cognition and Culture, edited by Lawrence A. Hirschfeld and Susan A. Gelman, 119–148. Cambridge: Cambridge University Press.

Leslie, Alan M., Tim P. German, and Pamela Polizzi. 2005. “Belief–Desire Reasoning as a Process of Selection.” Cognitive Psychology 50 (1): 45–85.

Mercier, Hugo, and Dan Sperber. 2017. The Enigma of Reason. Cambridge, MA: Harvard University Press.

Nagel, Jennifer. 2013. “Knowledge as a Mental State.” Oxford Studies in Epistemology 4: 275–310.

Nagel, Jennifer. 2017. “The Psychological Basis of the Knowledge Norm of Assertion.” In Assertion: New Philosophical Essays, edited by Sanford C. Goldberg, 191–213. Oxford: Oxford University Press.

Nagel, Jennifer. 2023. Recognizing Knowledge: Intuitive and Reflective Epistemology. John Locke Lectures, University of Oxford.

Nagel, Jennifer, Valerie San Juan, and Raymond A. Mar. 2013. “Lay Denial of Knowledge for Justified True Beliefs.” Cognition 129 (3): 652–661.

Parde, Natalie, and Rodney D. Nielsen. 2023. “CANDOR: The Corpus of Annotated Natural Dialogue for Open Research.” In Proceedings of the International Conference on Language Resources and Evaluation (LREC-COLING). Marseille: European Language Resources Association.

Phillips, Stephen H., and Matthew R. Dasti, trans. 2017. The Nyāya-Sūtra: Selections with Early Commentaries. Indianapolis: Hackett.

Plato. Meno.

Pomerantz, Anita. 1984. “Asking for Information: Declarative Questions in Conversation.” In Structures of Social Action, edited by J. Maxwell Atkinson and John Heritage, 351–368. Cambridge: Cambridge University Press.

Radford, Colin. 1969. “Knowing and Telling.” Philosophy 44 (169): 226–236.

Rumfitt, Ian. 2015. “Knowledge Attribution and Development.” In The Oxford Handbook of Philosophy of Cognitive Science, edited by Eric Margolis, Richard Samuels, and Stephen P. Stich, 485–507. Oxford: Oxford University Press.

Shannon, Claude E. 1948. “A Mathematical Theory of Communication.” Bell System Technical Journal 27 (3): 379–423.

Speas, Peggy. 2004. “Evidentiality, Logophoricity and the Syntactic Representation of Pragmatic Features.” Lingua 114 (3): 255–276.

Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research 15: 1929–1958.

Stalnaker, Robert. 2006. “On Logics of Common Knowledge.” In Handbook of Epistemic Logic, edited by Hans van Ditmarsch, Wiebe van der Hoek, and Barteld Kooi, 169–190. Dordrecht: Springer.

Stivers, Tanya. 2010. “Question-Response Sequences in Interaction across Ten Languages.” Journal of Pragmatics 42 (10): 2772–2781.

Sosa, Ernest. 1991. Knowledge in Perspective. Cambridge: Cambridge University Press.

Sutton, Richard S., and Andrew G. Barto. 2020. Reinforcement Learning: An Introduction. 2nd ed. Cambridge, MA: MIT Press.

Taigman, Yaniv, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. 2014. “DeepFace: Closing the Gap to Human-Level Performance in Face Verification.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Tomasello, Michael, Josep Call, and Brian Hare. 1998. “Five Primate Species Follow the Visual Gaze of Conspecifics.” Animal Behaviour 55 (4): 1063–1069.

Williamson, Timothy. 2000. Knowledge and Its Limits. Oxford: Oxford University Press.

Wimmer, Heinz, and Josef Perner. 1983. “Beliefs About Beliefs: Representation and Constraining Function of Wrong Beliefs in Young Children’s Understanding of Deception.” Cognition 13 (1): 103–128.

Why Jennifer Nagel's Knowledge Is Relevant For Educational Thinking