Introduction 

This book begins from a simple classroom fact. Many of the words we use to judge work are vague. Clear, elegant, relevant, original, insightful. They have obvious cases and they have neighbours that resist tidy separation. Philosophers have argued for decades about what such vagueness is and what to do with it. At first glance this looks like an esoteric quarrel. In truth it goes to the heart of how we teach, how we assess, and how we speak to the public about standards. If meaning lives in use, and if objectivity in education is a public practice of giving and asking for reasons, then a truthful account of vagueness is not an ornament. It is a tool for daily work.

The early chapters set the ground. They show why the old scientistic hope that we can remove discretion from teachers and pupils by tightening rubrics and itemising everything mistakes what educational judgement is. The discussion moves through construct and criterion referencing and explains how both tried to protect meaning while keeping reliability alive. We meet Wittgenstein to keep our eye on use, Brandom to give that use the shape of a space of reasons, and Toulmin to give a plain grammar for claims, data, warrants, rebuttals and qualifiers. We then follow Hart and Dworkin into law to see what happens when rules meet hard cases and when principles must guide. Endicott helps us see why higher order vagueness persists even when rules are tidy. Sorensen shows what it feels like at a real borderline where there is no hidden fact to be found.

The middle of the book brings the lens into education. We look at how removing coursework and narrowing tasks thinned the evidence base. We examine the turn to linear examinations and to headline numbers that travel easily between schools and years. We ask whether what is gained in surface uniformity is lost in meaning. We read Messick and Kane on validity so that talk about rigour returns to the chain from construct to task to scoring to use. We keep Hacking in view when we ask what happens once a token becomes the coin of the realm. We give classroom examples so the argument is never abstract.

A cluster of chapters turn from theory to method. They show what reliable marking looks like when reliability is the disciplined education of judgement rather than its removal. They describe moderation as a practice of reasons around shared exemplars. They show how award meetings can hold standards steady by using both statistical comparators and real scripts as anchors and by writing minutes that say when reasons decided and when policy spoke. They recover the history of the NEAB so that we have a concrete picture of a culture that tried to do these things well.

Later chapters take up wider culture. They ask whether a world that prizes visibility, speed and scale has trained us to live by proxies that are easy to count and easy to mimic. They set that beside a school culture that has narrowed tasks, trimmed judgement and leaned on numbers. The family resemblance is striking. We look for counter currents in public life where reasons still rule, from citizens’ assemblies on climate to patient groups defending services. We ask how schools can partner with these places so that pupils get daily practice in owning reasons with particulars in view.

A substantial chapter faces artificial intelligence. The issue is not whether a generator can produce a fluent string. It can. The issue is where we locate evidence so that ownership of reasons is visible. We redesign tasks in literature, science, mathematics and history so that pupils must make choices that matter for the construct and must defend them in small public acts. We separate discovery from policy at the edge and we speak honestly when policy must act. We show that this is not a retreat from objectivity. It is its proper form in evaluative domains.

The penultimate chapters draw out the values that follow if we take vagueness seriously. Authenticity of task and object. Improvement as the centre of judgement. The discipline of public reasons. Candour about borders. Breadth of excellence. Fairness through design. Restraint about claims. Apprenticeship of seeing. Clarity about language. Courage about risk. Institutional memory. Trustworthy speech. Each value is made concrete in ordinary routines, in moderation notes, in task design, and in the way we write to pupils and to users. The point is simple. If we want our standards to be stable and our speech to be true, we must build these values into how we work.

The final chapters return to policy and to culture. They argue that recent reforms promised certainty by thinning the work and delivered fragility by turn. They do not call for nostalgia. They call for practices that let knowledge show itself in use and that make reasons visible and testable. They describe how numbers can help us watch ourselves without taking over content. They describe how departments can store exemplars with commentary so that new colleagues inherit living standards rather than slogans. They show how to write public reports that separate discovery from policy.

Along the way the book keeps the philosophical debate in plain sight. Williamson helps us understand why sharp cuts can exist even when no one can know them. Sorensen helps us speak honestly when a border presents as a tautology and decisions must still be made. Endicott keeps us sober about the persistence of higher order vagueness. These are not museum pieces. They are working parts in the craft of teaching and examining. They help us design tasks that are hard to fake, mark work in communities that can explain themselves, and speak in a register that the public can trust.

If you are a teacher, a head of department, an examiner, or a policymaker, none of this is optional. Vagueness lives in our practices because evaluative concepts live there. We can pretend it away and lean harder on proxies, or we can learn from the best of the philosophical literature and build institutions that are both honest and strong. The chapters that follow try to make that learning practical. They ask for small daily routines and for a larger public language. They ask for decisions that are stable enough to live with and honest enough to deserve trust.


     Chapter 1 Objectivity Reliability and the Scientistic Impulse References 

In the first chapter I set out the problem that haunts the rest of the book and I did so with help from those who have tried to think clearly about knowledge, rules and institutions. I wanted to say why high stakes assessment so often drifts toward a style that looks like science and why that style keeps failing the very goods it promises to protect. I began by naming the scientistic impulse as the habit of importing a certain picture of scientific objectivity into education and then demanding that assessment imitate what the picture seems to show. Weber’s analysis of rationalisation explains the lure of calculability and procedural control. If one can display a procedure that yields stable numbers one appears modern and accountable. Hacking’s reflections on styles of reasoning and on looping effects show how once a style is installed the world begins to present itself in forms that the style can register and reward. In this picture reliability is the sign of objectivity and discretion is the enemy of reliability. The fewer chances there are for candidates to shape their response the safer. The more that examiners can be replaced by rules the more secure the result. The opening task was to lay bare the attractions of this picture and to show why they are so strong in systems that must certify and sort at scale. I began with reliability because it is the most compelling promise and I placed it in a frame given by Messick’s unified view of validity. When large numbers of people must be awarded grades that carry life consequences the public wants stability and it wants the assurance that like cases will be treated alike. In the Weberian idiom reliability stands for fairness as predictability. Yet Messick reminds us that validity is about the meaning of our interpretations and the consequences of our uses. Reliability can therefore be a servant of validity or it can become a rival to it when the quest for stable numbers trims away the very qualities that define the construct. This is the hinge of the chapter. The trouble appears when reliability is purchased by eliminating precisely those features of performance that make the subject worth learning. Bruner’s insistence that school knowledge should preserve the structure of the discipline helps formalise the complaint. If reliability is secured by thinning tasks until only narrow tokens remain then the structure of the discipline is lost in transit. 

From reliability the argument turned to discretion and I used Wittgenstein’s reminder that meaning is use to locate the work of the examiner. Discretion is the capacity to interpret a performance in its proper grain. It is the freedom to decide what matters here. It is the power to weigh qualities that do not share a common metric. The scientistic turn treats discretion as a threat to reliability. Give a candidate discretion and they will choose when and where and how to display their competence in ways that are not easily standardised. Give an examiner discretion and the mark will seem to depend on temperament. The most visible casualty of this suspicion has been coursework and portfolio work. Where once candidates could gather evidence of achievement over time in varied forms many systems have cut away such tasks or tamed them until they resemble short written examinations done under controlled conditions. Popham’s writing on test design shows why such trimming feels rational to managers who must defend decisions. The chapter traced what was lost when the discretion of candidate and examiner is exchanged for the discretion of the item writer. 

To explain that exchange I drew on Hacking’s account of reification and reflexivity. Reification is the habit of treating an artefact of a system as if it were a natural kind. A grade begins to look like a property of a person that exists prior to the act of judging rather than as the outcome of a practice. Reflexivity is the loop by which measures alter what they measure. When an assessment device becomes the currency of value the teaching that aims to secure that currency shifts to fit the device. Goodhart’s law is the policy echo of this thought. Once a measure becomes a target it ceases to be a good measure. In high stakes settings these tendencies combine. Once a measure of written accuracy backed by easily counted tokens is installed in the name of reliability the classroom bends toward those tokens and the subject begins to look like a drill in the tokens. The system then takes the stability of the tokens as proof of objectivity. A circle is completed. Hacking helps us see why the circle is resilient. 

I then analysed objectivity as the scientistic picture conceives it and I set it against Raz’s service conception of authority. On the scientistic picture objectivity is independence from the idiosyncrasies of judges. Instruments are the paradigm. If two thermometers disagree the fault lies in the thermometers. There is a true temperature and we will build better devices until we capture it. The transfer of this figure to educational assessment encourages a hunt for instruments that displace judgement. The more the mark can be read off a pattern of ticks the more objective the process looks. Raz offers another route. Authority is justified when it helps subjects better conform to the reasons that already apply to them. Translated into assessment this means procedures are authoritative when they help readers track the reasons internal to the practice. On this view objectivity cannot be achieved by banishing judgement. It must be achieved by disciplining it inside a community that can say what counts as a good reason. To prepare the ground for that shift I looked closely at how discretion was redistributed by technocratic reforms. The rhetoric claimed that discretion was being squeezed out across the board. In fact it was being relocated. The discretion of examiners and candidates was reduced while the discretion of those who select and design test items expanded. 

Hart’s image of open texture shows why this relocation is never benign. Rules cannot anticipate every case. The attempt to anticipate by narrowing the space of possible performances masks a new power to select what will count as the domain. That power can produce reliability by design but it can do so by fixing the result in Hacking’s sense. The discretion that serves validity is suppressed while the discretion that serves stability is enlarged. Sincerity is compromised when the public face claims discovery and the private face knows that the domain has been trimmed for ease. From this diagnosis I turned to validity again and brought in Kane’s argument based approach. Validity is not a property of a test but a property of the inferences we draw and the uses to which we put them. The inferences travel through a chain that must be defended. The link between task and construct must be argued on content grounds. The link between performance and score must be shown to be a practice of reasons that trained readers can reproduce. The link between scores and decisions must respect the meaning that earlier links established. Where the construct contains incommensurable virtues and where family resemblance to exemplars is the right model of application a device built for reliability will tend to trim away what gives the construct its truth. Kane’s structure makes the failure visible. A tidy score that travels easily beyond the room may have lost its warrant because the earlier links were weakened in order to please the last link. At this stage I introduced the two broad ways systems have tried to manage meaning and I set them against the literature on judgement. Criteria referencing lists attributes that count as signs of quality. Construct referencing treats the quality as a family of virtues taught and stabilised through exemplars. The scientistic impulse leans toward criteria as lists of necessary and sufficient conditions. It wants the checklist to replace the judge. Sadler’s work on formative assessment warns that improvement requires a conception of quality and the ability to recognise it in concrete cases and the ability to close the gap. Lists do not give novices that capacity. Polanyi explains why. Much of the knowledge that matters is tacit. It is learned through guided attention to salient features in real cases. Toulmin reminds us that good judgement travels as argument. A claim supported by data linked by warrants with acknowledged rebuttals and qualifiers. In many domains lists either become so short that they are useless or so long that they mislead. When that happens the list does not remove discretion. It hides it. Examiners still decide which items on the list are salient and how to weigh them when they trade off. Both acts are interpretive and both need public discipline rather than denial. Having named the labour I showed its inevitability by appeal to vagueness and I gave the outline of the argument that later chapters develop with Endicott, Sainsbury and Edgington. Evaluative language has clear cases and clear non cases and long penumbral stretches in which neighbour pairs look the same while the ends look different. One can demand a cut point along that series. One cannot always pretend that the cut is the discovery of a hidden line. Experienced readers recognise the phenomenon. They know when two scripts sit in a neighbourhood in which either outcome can be defended and neither is decisively demanded by the construct. A system that must be decisive can still be honest. It can design to make such moments rarer and it can name them when they cannot be avoided. The scientistic habit does neither. It announces a precision the concepts do not support and designs tasks that snip away the complexity in order to secure the appearance. Endicott’s later insistence that higher order vagueness is truculent gives the theoretical warrant for what teachers already know in practice. A major part of the opening chapter therefore addressed task design and I used Bruner again to fix the principle of authenticity. If one wants to preserve validity in the presence of large cohorts the first duty is to offer tasks that draw on the same kinds of choice the real practice draws on. Where the practice values control and risk in tension the task must leave room for both. Where the practice values audience awareness the task must set a real audience and not a technical proxy that can be satisfied by key words. Where the practice values reasoning with evidence the task must present materials that permit several legitimate routes so that judgement can discriminate among them. Every such move increases the need for interpretive judgement and reduces the tightness with which a checklist can grip the case. Dewey’s view that aims should be immanent in activity underwrites the same thought. The temptation is to retreat to a task that produces stable ticks. That retreat yields systems that are beautifully consistent about the wrong thing. From tasks I moved to the classroom because the loop between assessment and teaching is the main engine of reflexivity. When the device rewards little tokens classrooms begin to teach to those tokens. When the device rewards obedient reproduction of stock forms classrooms begin to treat those forms as the subject. One can see this in writing where the paragraph becomes a five sentence unit regardless of the demands of the thought. One can see it in mathematics where a named technique is displayed even when it was not needed. One can see it in subjects of interpretation where a checklist turns inference about meaning into a hunt for rhetorical cues. The system then mistakes the stability of the tokens for objectivity and claims success. Habermas’s idea that legitimacy arises from public justification offers the counterpoint. If the community can say what it values and why and can show how its procedures serve those reasons then it can resist the drift into ritual. To avoid a merely negative stance I sketched what an alternative route to fairness would require and I placed it beside Raz’s and MacIntyre’s accounts of practice. If examiners must exercise discretion they must do so in ways that can be taught, checked and reproduced. Communities need banks of exemplars with commentary that name what counts and explain why. Moderation must be the engine room rather than the last patch. Reasons not merely numbers must be the currency by which a panel keeps itself aligned. Objectivity then appears as the discipline of public reasons inside a tradition rather than as the absence of human judgement. MacIntyre’s picture of a practice sustained by standards of excellence and narratives of internal goods helps explain why the labour must be continuous. The alternative is a managerial emulation that forgets what the goods were. The chapter also made room for statistics while refusing to let them rule and here I leaned on Deming’s spirit of statistical control and on Tukey’s modesty about what numbers can do. Numbers are indispensable as checks. They can reveal drift within a judge and across a panel. They can surface anomalies that ask for an explanation. They can help set sample sizes and identify centres where training has not taken. None of this justifies allowing numbers to determine content. Where a statistical expectation conflicts with a content based explanation the conflict should trigger inquiry not automatic override. Sometimes the numbers will reveal bias that content talk had hidden. Sometimes the content will reveal that two tasks that were supposed to be equivalent were not. The point is to keep authority with the subject and to keep numbers in their proper role as instruments of self criticism. I also cleared the air about fairness and I used Rawls and Scanlon to fix the tone. Fairness rises when standards are public, when tasks are authentic yet varied enough to permit more than one route to success, when panels are mixed and marking is blind where possible, when reasons are recorded and audited, and when patterns of outcome are checked for correlations with irrelevant characteristics. This is a Scanlonian discipline of justification. It does not deny that bias and idiosyncrasy are dangers. It addresses them by building institutions that constrain and educate judgement rather than by pretending that judgement can be replaced by a device. A small but important section examined the language we use when we explain decisions and I previewed an ethic that later chapters defend with Austin and Sorensen. Where reasons drawn from the construct support a decision decisively we can speak in the voice of discovery. Where reasons are strong but not decisive we can speak in the voice of judgement that can be defended to an informed stranger. Where reasons tie in a true neighbourhood we must acknowledge that the decision is an enactment of policy made for institutional reasons that have been set out in advance. Austin’s distinction between verdictives and exercitives keeps our grammar straight. Sorensen’s unease about sincerity at absolute borders warns us against the rhetoric of discovery where there is none. The first chapter asked readers to live with this register as the language of a truthful practice. I closed by making vivid the human shape of the argument and I allowed Shulman’s account of pedagogical content knowledge to colour the scene. There is a room in which people sit with scripts, with a set of anchors and with a timetable. In one world they open a long rubric and begin to tick. They keep pace. They converge. They go home with a pile of numbers that look clean. In the other world they begin with silent reading. They speak in short turns about reasons. They use anchors as arguments rather than as idols. They check themselves with numbers rather than obeying them. They change their minds when another reader shows them something they missed. They record the reasons that moved them so that the next cycle is taught by this one. The first room is quiet and efficient and produces a comfort that feels like objectivity. The second room is slower, sometimes tense, and produces a stability that comes from shared attention. The theorists I have leaned on explain why the second room is the right room for evaluative domains. Wittgenstein keeps us with use. Messick and Kane keep us with validity. Raz and Habermas keep us with reasons. Hacking warns about reification. Bruner keeps us with authenticity. Sadler and Polanyi keep us with learning to see. The first chapter asked readers to accept that truthfulness about what we can and cannot claim is not a luxury in assessment. It is the ground of any durable trust. Bruner, J. S. (1960). The process of education. Harvard University Press. Dewey, J. (1938). Experience and education. Macmillan. Deming, W. E. (1986). Out of the crisis. MIT Press. Dworkin, R. (1977). Taking rights seriously. Harvard University Press. Dworkin, R. (1986). Law’s empire. Belknap Press. Fuller, L. L. (1964). The morality of law. Yale University Press. Habermas, J. (1996). Between facts and norms. Polity. Hacking, I. (1990). The taming of chance. Cambridge University Press. Hart, H. L. A. (1961). The concept of law. Clarendon Press. Kane, M. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. Kelsen, H. (1991). General theory of norms. Oxford University Press. MacIntyre, A. (1981). After virtue. Duckworth. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Popham, W. J. (2001). The truth about testing: An educator’s call to action. ASCD. Raz, J. (1986). The morality of freedom. Clarendon Press. Rawls, J. (1971). A theory of justice. Harvard University Press. Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. Scanlon, T. M. (1998). What we owe to each other. Harvard University Press. Shulman, L. S. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review, 57(1), 1–22. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Toulmin, S. (1958). The uses of argument. Cambridge University Press. Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley. Weber, M. (1978). Economy and society (G. Roth & C. Wittich, Eds.). University of California Press. Wittgenstein, L. (1953). Philosophical investigations. Blackwell. Chapter 2 Discretion Judgement and Fairness In the second chapter I turned directly to discretion and to the fear that has trailed it through high stakes assessment. The fear is simple. If discretion is allowed, arbitrariness will enter by the back door. If arbitrariness enters, reliability will fall and fairness will be lost. The chapter tried to show that this fear is a confusion. It confuses disciplined interpretation with whim. It confuses visible judgement with hidden fixing. It confuses the work that belongs to a living practice with the fantasy that a device can do the work on its own. I wanted to say what discretion is for, how it is constrained from within a practice, and how it can be taught and audited without being abolished. To do that I set assessment beside legal theory where the issues have been faced in public. I read Hart on open texture and judicial discretion, Dworkin on right answers and interpretive integrity, Endicott on higher order vagueness, and I brought these arguments home to the examiner who must read, speak reasons, and live with neighbours at a cut. I began with Hart because his picture is a clean starting point. Rules, even carefully written ones, cannot anticipate every case. There will be penumbral cases where the application of a rule is unsettled. In such cases, Hart tells us, judges have a limited but real discretion. They are authorised to decide in the light of the aims of the rule and the wider purposes of the law. The analogue in assessment is familiar. Criteria and mark schemes cannot specify every turn of excellence or every combination of strengths and weaknesses. There will be scripts in which the fit with the stated descriptors is partial or cross grained. The examiner must decide. She does not step outside the practice when she does so. She exercises a licensed power in service of the point of the assessment. Hart gives us the courage to admit this. Dworkin challenges Hart, not to deny discretion everywhere, but to shrink its territory by giving interpretation its full weight. On Dworkin’s view, a conscientious reader can often arrive at a right answer by reading the practice in its best light, by bringing principles of fairness and fit into conversation with rules. The law is not a pile of separate orders. It is a practice animated by principles. I took that lesson into assessment. A script is not met by a shopping list of ticks. It is read under a construct that has a shape and a history. The aims of the course and the discipline guide weightings. The best light reading often resolves what at first looks ambiguous. An apparent gap shrinks once the examiner articulates how this work realises audience awareness, or how it shows control rather than mere neatness, or how it takes a risk that matters for the subject. Dworkin therefore rescues us from an unthinking appeal to discretion whenever we feel uncertain. Much that appears to need discretion requires instead a richer account of the practice. Endicott then reminds us of a limit that both positions must face. Some cases remain unsettled even after rules are read in the best light and principles are brought to bear. Vagueness does not vanish when we interpret well. It appears in neighbour pairs where each is sufficiently like the paradigms in different ways and where the differences that remain are not differences that matter for the construct. Endicott’s insistence on higher order vagueness is decisive here. Even if we carve bands within a scale to fix borderlines, the decision about which band is apt is itself vague at the margins. The examiner then meets a true neighbourhood in which reasons tie. The discretion that remains is not a licence to invent. It is the authority to decide by policy when reasons run out. That is a different act and must be named as such if sincerity is to be kept. With these three voices set, I turned to the exam room. Discretion first shows itself as attention. Polanyi helps here. Experts know more than they can say, yet what they know can be brought to speech through exemplars. When an examiner says that a paragraph has a coherent progression or that an argument shows evidential responsibility she is not mouthing a rubric. She is picking out features that are salient for the construct. Sadler’s account of evaluative expertise supports this. Improvement rests on possessing a conception of quality, on recognising it in concrete cases, and on knowing what moves will close the gap. Discretion is the skill that moves between the conception and the case. It is not a free choice. It is a responsiveness to the object under a shared aim. To keep this responsiveness from sliding into idiosyncrasy the chapter insisted on reasons. Toulmin gives the grammar. A judgement is a claim supported by data linked by warrants, with rebuttals and qualifiers in view. In moderation we listen for that grammar. When a reader says that a script deserves the higher band she should be able to cite passages and features, to explain how they count under the construct, and to consider countervailing features. This is how discretion becomes public. It is how a community disciplines itself. Brandom’s inferentialism gives the same lesson in another idiom. To possess the concept coherence is to know what follows from calling a passage coherent and what would defeat that label. The community keeps discretion honest by rehearsing those inferential roles aloud and by recording reasons that persuaded competent peers. I then examined two administrative manoeuvres that pretend to cure discretion while actually relocating it. The first is the long checklist. It appears to replace judgement with counting. In fact it hides two acts of discretion. Examiners still decide which items are salient for the present case, and they still decide how to trade off strengths against weaknesses when the items pull in different directions. The list becomes a mask that conceals the real work and leaves it undisciplined. The second manoeuvre is the choice of items. A regime can select tasks that compress legitimate variation so that a short rubric can cover the field. This produces tidy numbers. It also fixes the domain. Hacking’s warning about reification and reflexivity applies. The system then teaches to the narrow task and mistakes the stability of the tokens for the stability of the quality. Discretion has not been removed. It has moved upstream to item selection, where it is less visible and less accountable. The chapter argued for construct referencing as the counterweight. Messick’s unified view of validity and Kane’s argument based approach make clear that meaning must be preserved at every link. A construct is a principled conjecture about the structure of achievement. It is taught and stabilised through exemplars with commentary. Within that settlement criteria do not disappear. They change job. They become lenses that direct attention to virtues such as control of register or evidential responsibility. They are not necessary and sufficient conditions. They are prompts to look. They help novices to see. As fluency grows, exemplars and reasons carry more of the load. In this way discretion is housed within a living theory of the construct rather than being left to temperament. The question of fairness came next. Critics say that discretion invites bias. The chapter accepted the risk and answered with design. Fairness is not achieved by shrinking discretion to zero. It is achieved by putting discretion to work in institutions that constrain it and expose it to audit. That means blind marking wherever possible. It means mixed panels in which minority voices are not marginal ornaments but sources of information. It means comparative judgement methods that use many local choices to build a stable order, a method with roots in Thurstone and Bradley Terry that respects the psychology of expert seeing. It means statistical checks that surface drift and anomalous centres. It means audits for disparate impact across characteristics that should be irrelevant. Raz’s service conception of authority fits this picture. A procedure earns authority when it helps the community conform better to the reasons that already apply. Discretion earns authority when it is exercised through such a procedure. I returned to law to clarify the border between judgement and policy. Kelsen’s image of a legal norm as a frame reminds us that many decisions are correct within a permitted range. Fuller’s inner morality of law reminds us that legitimacy depends on publicity, clarity, congruence and non retroactivity. Translated into assessment these thoughts give us a rule for the edge. Where reasons settle the matter, the examiner speaks in the voice of discovery, with reasons that an informed stranger could accept. Where reasons are weighty but not conclusive, the examiner speaks as a judge whose verdict stands because it is warranted, even though another warranted verdict was possible. Where reasons tie in a true neighbourhood, the examiner does not pretend. She enacts a published policy and says that she is doing so. Sorensen’s unease about sincerity at absolute borders is answered in this register. We keep sincerity by calling policy policy. I then took up the complaint that such candour will not satisfy users who want crisp signals. The reply in the chapter was practical. Users can learn to read two kinds of statement. They can read discovery where the construct supports decisive reasons. They can read policy enactments where reasons tie and the system must still allocate places. The candour can be defended because the system has invested in the disciplines that give its reasons force. Williams on truthfulness gave the tone. Institutions survive when their public speech matches what they can honestly claim. That is better protection than a rhetoric of scientific precision that collapses at the first serious appeal. Throughout the chapter I kept returning to the classroom because reflexivity never sleeps. If we design out discretion by trimming tasks and by turning criteria into little tickets, teaching will follow. If we install discretion within a culture of exemplars and reasons, teaching will follow that as well. Bruner urged that school knowledge should preserve the structure of the disciplines. Dewey insisted that aims should be immanent in activities. A settlement that houses discretion in constructs and exemplars supports both. It encourages teachers to teach toward real goods. It discourages teaching to tokens that travel easily as numbers but badly as reasons. I closed by drawing the line that the rest of the book follows. Discretion is not the enemy of objectivity in evaluative domains. It is the form that objectivity must take when the goods at stake are plural and incommensurable. Wittgenstein keeps us with use. Hart keeps us honest about open texture. Dworkin keeps us ambitious about interpretation. Endicott keeps us sober about higher order vagueness. Messick and Kane keep us with validity as argument rather than as a property of a device. Raz keeps our authority tied to reasons. Hacking keeps us wary of the ways in which instruments reshape their worlds. Polanyi and Sadler show how judgement is learnt. Toulmin gives it a public grammar. Kelsen and Fuller give it institutional sense. Sorensen keeps our conscience awake at the border. With these companions discretion ceases to be a threat word. It becomes the name for trained and answerable judgement, exercised in public, checked by peers and by numbers that know their place, and paired with policy that is spoken plainly when reasons fall silent.   References Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: I. Biometrika, 39(3–4), 324–345. Brandom, R. B. (1994). Making it explicit. Harvard University Press. Bruner, J. S. (1960). The process of education. Harvard University Press. Dewey, J. (1938). Experience and education. Macmillan. Dworkin, R. (1977). Taking rights seriously. Harvard University Press. Dworkin, R. (1986). Law’s empire. Belknap Press. Endicott, T. (2000). Vagueness in law. Oxford University Press. Fuller, L. L. (1964). The morality of law. Yale University Press. Habermas, J. (1996). Between facts and norms. Polity. Hacking, I. (1990). The taming of chance. Cambridge University Press. Hart, H. L. A. (1961). The concept of law. Clarendon Press. Kane, M. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. Kelsen, H. (1991). General theory of norms. Oxford University Press. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Polanyi, M. (1966). The tacit dimension. Routledge & Kegan Paul. Raz, J. (1986). The morality of freedom. Clarendon Press. Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34(4), 273–286. Toulmin, S. (1958). The uses of argument. Cambridge University Press. Chapter 3 Meaning Use and Objectivity in Assessment In the third chapter I turned to meaning and to the worry that if meaning floats then standards float with it. I wanted to know whether we can have a practice in which words like coherence elegance audience awareness or evidential responsibility carry real weight without pretending that they have the same sort of determinacy as height or mass. I worked with Wittgenstein on use Quine on the web of belief Davidson on interpretation Brandom on inferential roles Putnam on internal realism McDowell on second nature and Crispin Wright on cognitive command. Around them I placed Toulmin on reasons Gadamer on tradition Habermas on public justification and Raz on authority so that the argument kept a practical shape. The claim of the chapter was plain. We can secure objectivity enough by articulating how our words operate inside a practice and by binding ourselves to public reasons. We do not need a fantasy of meanings that are fully fixed outside use. We need disciplined use that is answerable to exemplars aims and arguments. I began with Wittgenstein because the hinge of the chapter is his remark that meaning is use. The remark is not a licence for whim. It is an instruction to look at what competent speakers do when they employ a term inside a practice. When a reader calls a paragraph coherent she is not merely reporting a feeling. She undertakes a set of material commitments. She has noticed that references are resolved in a way that allows recovery of who or what is in play that the sequence of sentences sustains a line of thought that the transitions do not jar and that apparent digressions turn out to be necessary. In other words she is ready to answer certain challenges and to concede others. To show this I borrowed Brandom’s idea that concepts are nodes in a space of reasons. To count as using the word coherently is to be able to take up and give down inferences appropriately. This is what it means for a criterion to function as a lens rather than as a checklist. It directs attention to that network of inferences and it trains novices to inhabit it. Quine and Davidson helped me resist the opposite mistake which is to suppose that because use is social the meanings we need must dissolve into flux. Quine’s image of a web of belief shows how revision propagates. There is no single sentence whose meaning is insulated from the whole. Yet the web is not anarchy. It is constrained by experience and by the pressure for overall coherence. Davidson’s principle of charity then adds a discipline to interpretation. If we are to understand another judge we begin by assuming that she is mostly right about the obvious and we search for the point at which our inferential habits diverge. In moderation this takes the concrete form of short reason exchanges where two readers who disagree try to locate whether the disagreement is about what is in the text or about what ought to count under the construct. Often the dispute turns out to be about the latter and can be reduced by clarifying the aim of the task. Quine and Davidson thus keep us from despair while accepting that meaning is not a set of labels pasted to the world. At this point I brought in Putnam and McDowell to secure the worldward face of the practice. Putnam’s internal realism denies a God’s eye mapping while insisting that truth is not mere consensus. The standard for truth is idealised rational acceptability within a scheme that is itself open to criticism. McDowell’s picture of second nature complements this. Through initiation into a practice we acquire a sensibility that renders the world available under concepts without the need for a non conceptual Given. In assessment this means that trained readers really can be corrected by features in the work. They can miss a subtle shift of voice and later be moved by it when a colleague points it out. They can overvalue neatness and later see that control is a different virtue. The world answers back through the practice because the practice is the medium in which the work is seen. This blocks the slide into subjectivism while refusing a view from nowhere. With these pieces I defined what a construct has to be if it is to do honest work. A construct is not a hidden variable that causes scores. It is a principled conjecture about the structure of achievement in a domain. It gathers virtues that the practice prizes. It is taught and stabilised through exemplars whose commentaries make inferential roles explicit. Toulmin gives the grammar for those commentaries. A judgement appears as a claim supported by data linked by warrants with rebuttals and qualifiers in view. When a panel says this script shows control of register it cites passages and audience cues as data it states a warrant about how choice of lexis and syntax track audience awareness and it acknowledges a rebuttal for example that the ending lapses into a tone that does not fit. In this way the construct is no longer an abstraction. It is a set of articulated roles that readers can learn and rehearse. I then faced the charge that such a settlement does not deserve the name objectivity. Crispin Wright’s notion of cognitive command answered that charge. In some discourses disagreement is diagnostic of failure of understanding rather than of the openness of the subject. Mathematics is the usual example. In evaluative domains convergence is not guaranteed and yet that does not mean that anything goes. There are still cases in which disagreement betrays incompetence. A reader who praises a paragraph as coherent while missing a basic pronoun clash is simply wrong. Wright thus allows us to mark off areas where competence enforces near convergence without claiming that the whole domain will converge. The idea of cognitive command at local points is important for training. It explains why some feedback is categorical. It also explains why much feedback is invitational and comparative. The structure of the domain requires both tones. The middle of the chapter dealt with meaning holism which can feel like a friend and a threat at once. Holism captures the truth that the sense of a term is given by its place in a network. It threatens to make agreement impossible by making that network too wide to coordinate. The cure I argued for is local holism plus articulate practice. A community can make explicit the material inferences that govern key terms in the setting at hand. It can write short glossaries of use that are not dictionaries but notebooks of inferential commitments. It can curate anchors that display several legitimate ways a virtue can be realised. It can rehearse reasons in moderation so that the inferential habits are surfaced and corrected. Gadamer’s picture of tradition as an argument in which the past addresses the present gave the tone. We do not legislate meanings from scratch. We inherit exemplars and we continue the conversation by revising what counts as salient in the light of new work. In teacher education this practice oriented holism is the only way novices actually learn to see. I then turned to criteria because they are the most visible artefacts in assessment rooms. On the picture I am defending criteria are not lists of necessary and sufficient conditions. They are lenses that cue families of inferences. They stabilise talk. They are more like Wittgensteinian rules of a practice than like laws of nature. Their authority is earned in Raz’s sense when they help readers track the reasons that already apply to the domain. This is why criteria must be tied to exemplars. Untethered criteria drift into slogans. Tethered criteria act as handles for the inferential labour. In training we begin with the handle and the exemplar together. As fluency grows the exemplar and the reasons do more of the work and the verbal handle recedes into the background as a prompt. At this point I confronted the managerial hope that we can secure the meanings we need by adopting a more precise vocabulary. The hope is understandable. If ambiguity confuses, why not fix the language and be done. The chapter distinguished three phenomena so that labour is not wasted. Ambiguity is a pre propositional problem. It is cured by rewriting prompts and by cleaning syntax. Generality is a resource. It allows a term to travel across many cases while being made precise by reasons in context. Vagueness is a structural feature of evaluative concepts that shows itself in neighbour pairs where no sharp difference that matters can be found. Precision in wording cures the first. Examples and reasons husband the second. Only a truthful practice can meet the third. Habermas’s public justification came in here. What makes a verdict legitimate is not that it hides judgement behind sharper words. It is that it shows the reasons by which a trained community reached its conclusion and the policies it enacted when reasons tied. I brought comparative judgement into the frame as a technique that takes holism seriously. People are more consistent at local comparisons than at assigning absolute magnitudes. Thurstone and the later Bradley Terry models capture how many pairwise decisions can produce a stable order. Used well this technique respects the structure of the domain. It asks judges to articulate reasons as they go so that the order is bathed in public sense. It must be seeded with anchors so that inferential roles remain visible. The mathematics then serves the practice rather than replacing it. Used badly comparative routines can become a black box that persuades managers that the instrument has absorbed the judgement. Used well they are the mechanism by which a network of reasons yields a usable ordering without the pretence that a hidden metric exists. From here I returned to the classroom because meaning travels through teaching before it returns through assessment. Bruner’s call to preserve the structure of the disciplines becomes operational when we teach criteria as lenses and exemplars as arguments rather than as templates. Students then learn to see virtues rather than to count features. They learn to speak reasons in Toulmin’s grammar so that the inferential roles become theirs. McDowell’s second nature gives this an anthropological cast. The aim is not to stuff definitions but to cultivate a perceptual capacity under concepts. When students acquire that capacity they become better writers and better readers and the later assessment can be conducted in the language that taught them. Two worries occupied the final movement of the chapter. The first is drift into parochialism. MacIntyre warns that practices can forget their internal goods and become dominated by external rewards. A subject community can slide into rewarding the shadow of a virtue such as neatness in place of control or formulaic novelty in place of real originality. The antidote is dialogue across neighbouring practices. Historians can learn from scientists how explanation earns its keep. Mathematicians can learn from literary scholars how voice can be managed without sentiment. Such conversations loosen parochial habits and return a community to its reasons. They also enrich the bank of exemplars so that the family of excellence is not narrowed by custom. The second worry is scepticism about reliability without a device. I answered by shifting the meaning of reliability from sameness of ticks to stability of reasons. Messick and Kane both pointed the way. On a Messickian view validity is about the meaning of our interpretations and the consequences of their use. Reliability earns its place when the reasons that carry verdicts can be reproduced by trained readers working under shared aims. On a Kanean view the argument for an interpretation must be built station by station from construct to task to scoring to use. At each station the discipline of reasons secures the meaning that numbers alone cannot. Once this is seen reliability becomes a property of a community that has built its memory rather than a property of a checklist. The work is slower but it is truer to the subject. I closed by returning to Williams on truthfulness and to Habermas on public reason because meaning talk that cannot be spoken aloud is of little use in institutions. A profession owes the public a language that says what it can really claim. It can claim discovery where inferential roles and exemplars support decisive reasons. It can claim warranted judgement where reasons are strong though not exclusive. It can claim a policy enactment where reasons tie at a true border. These three kinds of speech require a practice that has made meanings visible in use. Without that work meaning talk becomes a screen for power or a fog that covers caprice. With that work it becomes the condition for fair decisions that match what the subject is. The result of the chapter is modest and demanding. We do not need meanings frozen outside use in order to have standards that bind us. We need to articulate the use that binds us and to build institutions that store and refresh that articulation. Wittgenstein keeps us at the scene of use. Quine and Davidson keep the scene from collapsing into private talk. Brandom shows how to make the roles explicit. Putnam and McDowell keep the world available to a trained eye. Wright allows us to mark true failures of competence without demanding convergence everywhere. Toulmin gives our reasons a public shape. Gadamer ensures that exemplars are not relics but arguments. Habermas and Raz protect the authority of procedures that serve reasons. With such companions the examiner can say what she means and mean what she says. The words on the rubric are not charms. They are handles for inferential labour carried out in public. That labour is how meaning earns its keep in assessment and how reliability and validity are made to live together without pretence.   References Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: I. Biometrika, 39(3–4), 324–345. Brandom, R. B. (1994). Making it explicit. Harvard University Press. Davidson, D. (1984). Inquiries into truth and interpretation. Oxford University Press. Gadamer, H.-G. (1975). Truth and method. Sheed & Ward. Habermas, J. (1996). Between facts and norms. Polity. Kane, M. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. McDowell, J. (1994). Mind and world. Harvard University Press. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Putnam, H. (1981). Reason, truth and history. Cambridge University Press. Quine, W. V. O. (1960). Word and object. MIT Press. Raz, J. (1986). The morality of freedom. Clarendon Press. Sainsbury, R. M. (1995). Paradoxes (2nd ed.). Cambridge University Press. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34(4), 273–286. Toulmin, S. (1958). The uses of argument. Cambridge University Press. Wittgenstein, L. (1953). Philosophical investigations. Blackwell. Wright, C. (1992). Truth and objectivity. Harvard University Press. Chapter 4 Sincerity Verdicts and the Duty to Decide n the fourth chapter I tried to show how an interpretive settlement might carry the burdens that high stakes systems place upon it and why the settlement is always under pressure at the borders where decisiveness is demanded. I did this by bringing legal reasoning into closer conversation with educational assessment, not because the two are the same, but because both are public practices that must produce reason guided decisions under conditions of disagreement and time. I wanted to see how far the jurisprudential repertoire could help us articulate a disciplined discretion and where the limits of that repertoire were likely to appear. I began with Hart’s image of open texture. Hart’s reminder that rules cannot anticipate every future case speaks to the ordinary life of a classroom and to the extraordinary life of an examinations meeting. The insight is modest and exact. There will be penumbral cases in which the extension of a concept is unsettled, and in such cases officials must exercise discretion. The teacher who faces an essay at the pass border is a kind of official. She is asked to apply a general standard to a particular case. Hart gives her permission to judge without pretending that the decision can be read off a rule. I argued that this permission is the first element of a humane assessment culture. It does not yet solve our central difficulty. It acknowledges it with candour. Dworkin contests the reach of such permission with his insistence that there are right answers in hard cases. The right answer thesis is not a boast about infallibility. It is a claim about the resources of interpretation. Principles as well as rules make up the law, and the judge who reasons in the best light of the practice may find an answer that is true of that practice though not extractable from any single text. This is Dworkin’s Hercules. I used his challenge to press the hermeneutical paradigm to say more than that judgement is needed. It must say that interpretation can be objective without reducing to a checklist. In educational terms, if a marker can draw on a network of reasons that includes construct level considerations, coherence with exemplars, and the purposes the assessment serves, then the verdict can be more than a hunch. It can be a best account of what the practice demands. I also recorded the cost. The more we lean on interpretation to deliver decisiveness, the more we are tempted to pretend that every borderline is resolvable if only one looks hard enough. Here Endicott’s account of higher order vagueness returns as a corrective. There are cases in which interpretation runs out because what we are asking the concept to do lies beyond its structure. Raz helped me recast objectivity for this terrain. His service conception of authority claims that an authority is justified when it helps subjects better conform to reasons that already apply to them. The translation into assessment is natural. Procedures and communities have authority when they help markers track the reasons that genuinely count in the domain. The authority is not mystical. It is instrumental. Understood in this way, moderation, comparative judgement and the curation of exemplars are not gestures toward collegiality. They are the machinery through which a community serves the reasons internal to the practice. Raz is also helpful on discretion. He distinguishes discretion that fills a gap from discretion that is deliberately engineered by a norm. Exam systems deliberately create zones in which decision makers are to exercise judgement, for example by inviting holistic weighing rather than strictly additive counting. That deliberate design can be justified if it helps decisions track the reasons that matter in the domain. It cannot be justified if it is a means of exporting inconvenient judgements into a fog and then declaring them objective because the fog was planned. Kelsen’s image of the legal order as a frame within which many decisions can be valid provided a useful mirror. He hoped to show that within the frame there are no gaps, only latitude. I argued that this picture is illuminating and also finally misleading for assessment. It is illuminating because systems do set frames that license ranges of outcomes as valid. It is misleading at the borders where higher order vagueness bites. The licence to choose within a frame does not transform an indeterminate matter into a fully determined one. The decision can be valid without being true in the way the paradigm sometimes pretends. Fuller’s inner morality of law helped here too. Publicity, congruence, prospectivity and clarity are conditions of legal legitimacy. Their analogues in assessment are clarity about standards without degradation of content, advance notice without teaching to a caricature, congruence between policy and practice, and public reasons that can be scrutinised. These are conditions of legitimacy for interpretive assessment. They do not guarantee that the last cut will be principled in every case. They make it more likely that, where a cut must be made for institutional reasons, the manner of making it is fair enough to command confidence. Schauer’s talk of rule fetishism and generality served as a warning. Rules are attractive because they promise to discipline discretion and to increase predictability. They also mislead when their domain of safety is exceeded. The more complex the judgement the more the apparent precision of an elaborate rule invites evasion or distortion. Wittgenstein’s reminder that understanding a rule is a matter of a practice rather than a private interpretation grounded this warning. When we act under a rule we are not consulting an inner scheme. We are moving within a form of life that teaches us what going on correctly looks like. In assessment this means that reassurance will come less from longer checklists and more from the training of vision through exemplars and calibrated dialogue. Austin added the further caution that verdictives are speech acts that stand in relations of fit with the world. A grade is not constitutive in the sense that a promise is. It must be answerable to the facts as we understand them. The performative turn in some assessment theory risks blurring this difference. The promise to be decisive cannot turn a judgement into a truth when higher order vagueness blocks any such claim. I then brought Hacking’s reflections on looping effects into conversation with educational practice. Classifications and measures alter the things they classify. Once a fine grained rubric is imposed, teaching and learning adapt themselves to its categories. This can be helpful where the categories are well tuned to the practice. It is harmful when the categories are built for reliability rather than authenticity. The legal system manages its looping through a culture of reason giving, precedent and revision. Assessment must find its own version of this culture. Without it the refinements designed to secure fairness can begin to produce the very distortions they were meant to prevent. Toulmin’s emphasis on argument fields helped to explain why reasons in one field cannot simply be ported into another. The warrants that make sense in mathematics do not map directly into the warrants that carry weight in literature. Validity has a field dependent logic, and that logic must be respected if interpretive assessment is not to collapse into eclectic assertion. I examined the strong claim that communities can carry objectivity through time. Here MacIntyre’s account of traditions as extended arguments afforded both courage and caution. Courage because standards are sustained in living arguments about goods internal to a practice. Caution because traditions can decay into managerial emulation if their argumentative centre collapses. A tradition of judgement in teaching lives through patient cultivation of exemplars, through the telling of cases, through the correction of bias by exposure to other schools of reading. It dies when performance is reduced to compliance with a schema. Habermas was useful for articulating the communicative requirements of such a culture. Discourses that aim at reason giving under conditions of equality and absence of coercion are rare in institutions, yet assessment communities can approximate them in moderation rooms if they treat dissent as evidence rather than as error. The point is not to produce endless talk. It is to secure the legitimacy that follows from the right kind of talk. At this stage I returned to the pressure to decide. The obligation to declare a grade on time is not a vice but a condition of public life. The question is how a hermeneutical settlement holds together when it meets that obligation at a sharp border. I used Dworkin again, this time to acknowledge a strength and to demarcate a limit. His interpretive method helps us resist defeatism. It directs us to look for the best justification of the practice and to make our decision answer to it. Yet if Endicott is right, there are cases where the best justification will still underdetermine a neighbour pair. There the final decision is institutional rather than discoverable. It is better to say so. Raz’s discussion of exclusionary reasons was helpful. An authority can give a second order reason to decide here rather than there in order to serve the first order reasons of the practice over time. A cut policy can be such a reason. It does not make the cut true. It makes it justified as a means of securing fairness in the round. The virtue then is sincerity about what the policy can and cannot claim. The vice is the semblance of discovery where there is only fiat. I noted that some will reply with an epistemic optimism. Perhaps Williamson is right that there are hidden sharp boundaries. Perhaps our reluctance to name them is a sentimental fidelity to tolerance. The soup argument from Endicott blocks this reply for evaluative terms. Norms that guide action must be knowable by agents. A secret cut cannot be a reason that an agent could have adopted. Crispin Wright’s commonsense indeterminism allowed me to keep tolerance and determinacy for clear cases while refusing the fantasy of universal sharpness. Sainsbury’s and Dummett’s analysis of sorites reasoning gave further support. If transitivity fails in patterns of comparative judgement across incommensurables, the demand that we produce a single total order is a demand that we misdescribe the object of our interest. With these materials in place I described, in more practical terms, the shape of a disciplined interpretive settlement. It begins with authenticity. This is Bruner’s old insistence that the structure of the discipline be taught and assessed in a form that preserves its significant structures. It continues with exemplars that are treated as arguments about the construct rather than as samples to be mimicked. It builds routines of standardisation and moderation in which reasons are rehearsed, recorded and revised. It uses comparative judgement to produce reliable orderings where absolute scaling would provoke pretence. It accepts statistics as a check on human consistency and as a way to surface anomalies, while refusing to let statistics dictate revisions that would hollow out content. It writes reports that explain what made a judgement hard and how the community resolved the difficulty. It publishes its anchors and commentaries so that outsiders can see the standards at work. It trains new judges by apprenticeship into this practice. It refuses to pretend that every cut is a discovery. It defends some cuts as policy choices made under acknowledged uncertainty in order to secure fairness at system level. The more I developed this settlement the more clearly I saw two temptations. One is the managerial temptation to add further layers of checklists whenever a controversy arises. The other is the romantic temptation to invoke ineffable quality whenever reasons run short. Both must be resisted. Sellars’s attack on the myth of the given helps with the second. Perception in appraisal is concept laden and can be taught. No one sees quality by gift alone. Brandom’s inferentialism helps with the first. To grasp a concept is to grasp the inferential role it plays within a network of claims. A good assessment community habituates its members into those roles. It does not mistake the possession of many labels for conceptual mastery. This is the nerve of the settlement. It lives by reasons that can be taught and checked. It dies when it sinks into either bureaucracy or mysticism. There was one further lesson from law that mattered for education. Law protects its integrity by distinguishing the truth of a verdict from its finality. A verdict can be final for the purposes of action while remaining open in its truth status to criticism and reform. Assessment can copy that distinction. A grade can be final for the purposes of admission while still being open to critique in the internal discourse of the profession. This is not a counsel to destabilise outcomes. It is a recognition that the health of a practice depends on a memory that is longer than a results day. When the next cycle of anchors is built, the discussion of earlier misjudgements and near misses is recycled into the standards. The institution gains stability by learning from its own fallibility. I concluded by returning to authority. Weber’s taxonomy of authority types is a useful cautionary tale. Charismatic authority is brittle, traditional authority can be complacent, legal rational authority can hide behind forms. The authority that interpretive assessment needs is earned authority. It is the respect granted to a community that demonstrates competence, candour and a willingness to explain itself. That authority is not automatic. It is built case by case in the long work of sharing exemplars, writing reasons, auditing patterns, and tuning constructs. It is defended not by slogans but by the cumulative persuasiveness of decisions that match the goods internal to the practice. The fourth chapter does not claim that such an authority silences all challenge. It claims only that it is the best we can do when we refuse the consolation of spurious sharpness and when we accept, with Hart, that rules will run out, with Dworkin, that interpretation can still do a great deal, and with Endicott, that some borders remain undecidable because the world of our evaluative concepts does not contain them.     References Austin, J. L. (1962). How to do things with words. Clarendon Press. Dworkin, R. (1986). Law’s empire. Belknap Press. Edgington, D. (1997). Vagueness by numbers. In R. Keefe & P. Smith (Eds.), Vagueness: A reader (pp. 294–316). MIT Press. Endicott, T. (2000). Vagueness in law. Oxford University Press. Fuller, L. L. (1964). The morality of law. Yale University Press. Hart, H. L. A. (1961). The concept of law. Clarendon Press. Kane, M. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. Kelsen, H. (1991). General theory of norms. Oxford University Press. Raz, J. (1986). The morality of freedom. Clarendon Press. Sainsbury, R. M. (1995). Paradoxes (2nd ed.). Cambridge University Press. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Williamson, T. (1994). Vagueness. Routledge. Wright, C. (1992). Truth and objectivity. Harvard University Press. Chapter 5 Building Reliable Judgement Communities In the fifth chapter I set myself the task of giving connoisseurship its due without turning it into an excuse for caprice. The earlier chapters had already shown why evaluative terms resist full codification and why attempts to replace judgement with machinery end by distorting the thing being judged. What remained was to say what expert judgement looks like when it is disciplined, teachable and answerable, and to show how criteria and constructs can serve that judgement rather than pretend to replace it. I wanted to reclaim a word that has often been used to belittle teachers, as if connoisseurship named little more than taste. My claim is that connoisseurship, when rightly understood, is a skilled practice rooted in knowledge of a domain, refined by apprenticeship to exemplars, articulated in reasons, and stabilised by a community. It is not the opposite of objectivity. It is the form that objectivity must take when the goods at stake are complex and incommensurable. Elliot Eisner’s account of educational connoisseurship and criticism provided a natural starting point. Eisner insisted that educational value is disclosed in the fine texture of performances, and that teachers learn to notice and to value qualities that are not reducible to checklists. Scriven’s work on evaluative reasoning made a complementary point. Evaluation is a structured activity in which criteria are selected, evidence is marshalled, and warrants are given for conclusions about merit and worth. These two strands meet in the classroom, where teachers read a piece of work in its living grain and then write a comment that names what they saw and explains why it matters. I tried to make that routine visible as a practice with its own norms. The teacher is not free to like what she likes. She is obliged to see what is there to be seen given the construct under which the work is read, and she is obliged to say why her seeing counts as a good reason for the conclusion she draws. Polanyi’s notion of tacit knowledge helped me describe the inner shape of such seeing. Experts know more than they can say, yet what they know is not ineffable. It is the residue of thousands of cases, internalised through guided attention to features that novices cannot yet pick out. Berliner’s studies of expert and novice teachers confirm the point. Experts do not consult longer rules. They perceive patterns and saliencies that are invisible to the beginner. The Dreyfus model of skill acquisition also illuminated the path from rule bound performance to fluid responsiveness. None of this implies that words have no place. As Sellars taught us, perception in human affairs is concept laden. What is taken in as salient is shaped by the language of the practice. Hence the insistence, throughout the chapter, that connoisseurship must be talked into existence. Apprenticeship without language breeds guild secrets. Language without apprenticeship breeds empty slogans. From here I returned to criteria. In the scientistic lexicon criteria are often imagined as a set of necessary and sufficient conditions. Their attraction is clear. If the list is right and complete, then the judge becomes a checker. The list does the work and discretion seems to disappear. The chapter argued that this picture is wrong in principle for many educational goods and misleading in practice even when it appears to work. It is wrong in principle because many of the goods we care about have internal complexity that cannot be expressed as a set of independent items. It is misleading in practice because long lists invite two pathologies. They either become so detailed that they privilege trivialities that can be ticked, or they become so general that they function as vague prompts rather than rules. The remedy is not to abandon criteria but to change their job. Criteria should be understood as lenses and prompts, not as algorithms. A criterion names a quality, for example control of register, and it directs the judge toward features that typically disclose that quality, for example shifts in tone that fit audience and purpose. It helps the judge to look. It does not settle the matter. In this sense criteria serve the connoisseur by organising attention and by providing a common vocabulary for reasons. They are scaffolds for judgement rather than substitutes for it. This reconception of criteria leads naturally to constructs. Messick’s integrative view of validity and Kane’s argument based approach gave me the tools I needed. A construct is not a metaphysical entity. It is a principled conjecture about the structure of the ability or achievement under assessment, supported by theory and by patterns of evidence. In writing we might speak of audience awareness, coherence, control of syntax, flexibility in genre, persuasiveness, and creativity. The educational question is how to hold such qualities together without either pretending that they can be collapsed onto a single scale or collapsing into an unfettered holism. I argued for a middle path. Treat the construct as a family of connected virtues, teach it through exemplars that display different ways of being good, and ask judges to state which strands they are weighting and why. Toulmin’s model of argument was helpful here. A good judgement sets out a claim, cites data, states warrants that link the data to the claim, and acknowledges rebuttals and qualifiers. When a community shares such arguments it builds a living theory of the construct that is more than a list, and less than a theory that pretends to banish judgement. At this point I confronted the worry about incommensurability. Raz and Sen both remind us that many of the values we care about cannot be measured on a single common scale. Elegance is not a subset of accuracy, imagination is not a subset of coherence, nor are they simple sums. The scientistic impulse is to force an ordering, often by hidden or arbitrary weightings. The romantic impulse is to abandon ordering, often by declaring that everything depends on a mysterious gestalt. The chapter refused both impulses. It proposed partial orderings where the materials allow it, and ranked comparisons where the aim is to choose between neighbours rather than to assign a single absolute magnitude. Comparative judgement, with its roots in Thurstone and Bradley Terry models, is a useful instrument at this point, not because its statistics solve the philosophical problem, but because its routine matches the shape of expert perception. People are good at telling which of two essays better realises a construct. They are much worse at assigning each essay a number on a long scale that pretends to be the one true ordering. The method can be abused, but used within a community that argues about its reasons it gives a workable reliability without destroying content. The interplay between criteria, constructs and connoisseurship is not an abstract dance. It lives in concrete practices. I described a moderation room in which judges meet over a set of anchors, read, talk, and write reasons. The procedure begins with silent reading, so that initial impressions are not contaminated by the first voice to speak. It moves to a round of short statements, in which each judge names what she saw and identifies the main reasons for her tentative grade. It then concentrates on disagreement, especially at borders. The facilitator presses for warrants. What in this passage makes the leap more than rhetorical flourish. What in that passage shows control rather than mere neatness. References to criteria are welcomed as scaffolds for attention. References to exemplars are welcomed as ways of locating the present case within a family resemblance space. Data on spreads and drift are introduced as checks, not as masters. Over time a case bank is built. These cases become the living memory of the community. They capture not only outcomes but the reasons that persuaded peers. New judges are inducted through guided participation in these conversations. Novices learn to see what counts and to say why it counts. In this way connoisseurship is taught rather than trusted to temperament. I then addressed the charge that such practices cannot be objective. The charge rests on a thin picture of objectivity as independence from human judgement. A thicker picture, which I drew from Raz and from Habermas, ties objectivity to public reason under conditions that reduce the force of arbitrary power. A judgement is objective to the extent that it can be defended with reasons that a competent and impartial audience could accept, given the purposes of the practice. On this view objectivity is a social and epistemic achievement, not a property of a device. Communities achieve it by publishing their standards, by inviting criticism, by checking themselves for bias, and by submitting patterns to statistical audit that can reveal drifts and blind spots. There is no paradox in saying that objectivity in evaluative domains is inseparable from connoisseurship. The paradox only appears if one has first assumed that all genuine knowledge must look like measurement. The chapter also confronted the Macnamara fallacy in its educational form. When we turn criteria into little boxes we can tick, we begin to reward what is easily counted and we teach to what can be easily reproduced. William James warned against the human impulse to seek the clean relief of the abstract over the sticky reality of the concrete case. Goodhart’s law is the modern policy echo of the same warning. A construct that is held in the mind of a community through exemplars and reasons retains its pressure on the real qualities that matter. A construct that is reduced to a series of easily scored tokens collapses into a target that is soon gamed. There is an ethical point here too. Students learn to write for the rubric rather than to write for a reader. They learn to hunt triggers rather than to persuade. In this way the life of a subject is thinned by the very procedures that promised to protect it. To forestall a slide into subjectivism I spent time on the discipline of reason sorting. Sadler has long argued that students acquire quality by learning to recognise it and by calibrating their own work against examples. The same is true of judges. Reason sorting is a practice in which reasons are gathered, grouped, tested for relevance, and weighed for sufficiency. Toulmin again is useful, as is Brandom’s inferentialism, which ties the meaning of a concept to the moves that it licences in reasoning. To say that a paragraph is coherent is to license expectations about how its parts hang together, about how references are made and resolved, about how claims are supported. In a good moderation room such reasons are made explicit and tested for their inferential roles. A reason that licenses anything licenses nothing. A reason that can be defeated by a trivial variation is a weak reason. Over time judges build a sensitivity to strong and weak warrants. This is how connoisseurship resists whim. I then returned to the problem of borders. There will always be neighbours at a cut for whom both outcomes can be defended. Here I sought to separate three tasks. First, the community must minimise the frequency of such cases through good design. Multiple tasks that display different facets of the construct help. Sequencing that allows growth to be seen helps. Opportunities for students to repair weak elements help. Second, the community must speak candidly when it meets a true neighbour pair. Endicott’s analysis of higher order vagueness remains pertinent. Some concepts do not support a decisive cut at the granularity we demand. The right response is to choose for institutional reasons while admitting that the reasons of the construct ran out. Third, the community must recover its integrity by learning from such cases. When anchors are renewed, the near misses and the hard calls are brought back into view. The aim is not to eliminate all future neighbours. The aim is to refine the shared sense of the construct and to redesign tasks so that fewer cases fall exactly at the point least supported by reasons. I also examined the relation between validity and reliability under this settlement. Messick’s insistence that validity is a unified concept matters here. Reliability serves validity by ensuring that inferences are not hostage to noise. But reliability that is purchased by trimming the construct harms validity. The chapter proposed a practical discipline. Begin with authenticity, as Bruner urged. Ask whether the task requires the same kinds of choice and control that living practice requires. Specify the construct as a family of virtues and teach it through exemplars. Build a regime of connoisseurship through moderation, commentary and comparative routines. Use statistics to check for coherence, drift and bias. Reserve hard boundaries for those parts of the domain where reasons are decisive. Use zones and confidence language where reasons peter out. Explain decisions in public in a way that shows the hierarchy of reasons used. Under this discipline reliability is not a property of a device. It is a property of a practice that has learned to reproduce its reasons with sufficient stability to sustain public trust. The last movement of the chapter returned to pedagogy. If assessment is to be an ally of learning, the language of connoisseurship must be shared with students. Shulman’s notion of pedagogical content knowledge captures the need. Teachers must understand not only the subject but the ways in which learners come to grasp the subject. In writing, that means sharing exemplars and commentaries that make visible the choices writers make and the reasons readers have for rewarding those choices. In mathematics, that means teaching elegance as a virtue that sits alongside correctness, and giving students chances to compare two correct solutions and to argue why one is better. When students are inducted into such language, they become partners in the evaluative practice. They can aim their work at real goods rather than at proxy tokens. They can also learn to critique evaluative authority with intelligence rather than with cynicism, because they can see what warrants look like when they are well formed. The chapter closed by gathering a virtue vocabulary for connoisseurship. Beyond knowledge of the domain and fluency in its exemplars, the judge needs humility in the face of higher order vagueness, courage to record reasons that may be challenged, fairness in the willingness to be moved by a form of excellence not yet familiar, and constancy in the repetition of disciplined routines that keep a community aligned without freezing it. These are virtues of practice rather than flashes of genius. They are teachable. They are also fragile. They depend on institutions that fund time for moderation, that reward commentary and dialogue, that treat exemplars and reasons as public goods, and that refrain from announcing as discovery what was really policy. Where such institutions are built, connoisseurship ceases to be a polite word for everything the algorithm cannot do. It becomes the name for the kind of objectivity that evaluative practices require if they are to remain truthful about what they value and fair to those whose work they judge.   References Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: I. Biometrika, 39(3–4), 324–345. Bruner, J. S. (1960). The process of education. Harvard University Press. Deming, W. E. (1986). Out of the crisis. MIT Press. Kane, M. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Polanyi, M. (1966). The tacit dimension. Routledge & Kegan Paul. Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34(4), 273–286. Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley. Chapter 6 Teaching Constructs Exemplars and Reasons In the sixth chapter I set out to say what kind of objectivity can be honestly claimed for interpretive assessment once we set aside the hope that convergence is guaranteed. I did not want a consolation prize. I wanted to know whether we can speak of better and worse judgements where disagreement is persistent and in some cases rational. The chapter therefore worked through three knots. First, the relation between conventionalism and objectivity. Second, the attraction and cost of meaning holism. Third, the possibility of a perspectival stance that avoids relativism. Throughout I tried to keep the discussion tethered to the practical scene of teachers and examiners who must read a performance, state reasons, and accept scrutiny. I began with the thought that conventionalism seems the natural ally of the hermeneutical turn. If meaning is use, and if standards are social artefacts, might not objectivity reduce to whatever a well organised community decides. Here Joseph Raz offered a different route. Raz’s service conception of authority treats authoritative norms as devices that help agents conform better to the reasons that already apply to them. Transposed into assessment, the authority of standards arises when they help readers track the reasons internal to a practice. This already loosens the grip of simple conventionalism. A convention is not authoritative simply because it is shared. It is authoritative because it serves the pre existing reasons that define the practice. In this sense, when a community of readers agrees that audience awareness counts in writing, or that elegance counts in mathematics, or that responsiveness counts in discussion, their convention has weight when and because it shadows reasons that are independent of the vote. The vote may be needed to settle a procedure, but it is not the ultimate ground of the claim. This is not to deny that conventions matter. They set default coordinates for interpretation and they reduce needless friction. David Lewis taught us how conventions can arise from coordination problems, and how stable patterns can emerge when actors expect each other to follow a rule that solves a common problem. In assessment there are many such patterns. We expect to number pages, to cite sources, to state claims clearly before developing them. These expectations are social and they ease the work of reading. Yet the chapter warned against sliding from coordination to truth. Conventions fix expectations. They do not fix the nature of the goods at stake. Raz helps hold that line. So does the practice of reason giving. When a moderator asks why a passage works, the answer cannot be that everyone we know thinks it works. The answer must locate a reason that would persuade an informed stranger who shares the aims of the practice. In this way the community honours reasons that are not simply the sediment of habit. The second knot entered here. Meaning holism tempts those who honour reasons because it captures an important insight. Donald Davidson, W V Quine and later Robert Brandom argue that the meaning of any given term is fixed by its place in a large network of inferences. To grasp the meaning of a concept is to grasp the role it plays in a space of reasons. This is a helpful way to think about evaluative concepts. To call a paragraph coherent is to undertake commitments about how references are resolved, how claims hang together, and how transitions do their work. The inferential roles are learnt in the practice. This helps us avoid a picture in which words carry fixed meanings that can be applied mechanically. It also explains why novices falter. They do not yet inhabit the network in which the words live. Holism carries a cost. If every term draws its sense from a wide network, the hope of stable criteria looks fragile. Meaning would seem to be hostage to the entire system, and two speakers could never be sure that they share the same concept. Hilary Putnam’s externalism complicates matters further. If meanings are not in the head, then communal and environmental facts enter the story. For assessment this sounds like trouble. It feeds a sceptical worry that no two readers can mean the same thing by elegance or coherence, and that attempts to fix shared standards are idle. I argued that we can keep the insight while refusing the despair. The remedy is local holism and articulate practice. We do not need to secure the entire language to secure enough of the network for the purpose at hand. Brandom’s account of inferential roles helps here. A community can make explicit the material inferences that govern the use of a term in a particular field. It can state what follows from calling a move elegant in a proof, what would count as a defeater, and what would count as a supporting consideration. These local articulations give content to the standards without pretending to step outside language. Crispin Wright and John McDowell both helped the case that this local objectivity is not second best. Wright defends a notion of cognitive command that falls short of guaranteed convergence but still allows that some disagreements are failures of competence rather than mere difference of taste. McDowell defends a quiet realism for values, in which the world is available to a well trained sensibility without the need for a non conceptual given. These lines converge in the picture I wanted. A trained reader may be answerable to how the work is, and may be corrigible by further attention and by better reasons, without the promise that all trained readers will eventually agree. The absence of a proof procedure does not reduce the practice to preference. It demands a different kind of discipline. The chapter then turned to Gideon Rosen and T M Scanlon to test whether reasons can carry normativity across perspectives. Scanlon’s contractualism anchored the thought that we owe each other justifications that no one could reasonably reject. In assessment the point translates into a discipline of public reasons. The judge who awards a mark owes classmates and candidates an explanation they could not reasonably reject, given the aims of the course and the character of the subject. This owes something to Habermas as well. He insists that legitimate norms are those that would win acceptance in an ideal speech situation. Classrooms are not ideal speech situations, but moderation rooms can approximate their spirit. They can display sincerity, responsiveness to argument, and willingness to be moved. The result is not convergence for its own sake. It is the construction of a space in which reasons count. Bernard Williams added an important caution. His distinction between absolute and relative conceptions of truth, his warning about the limits of the absolute conception, and his account of internal reasons reminded me to stay close to the first person standpoint of the judge. A reason counts for a judge when it can be integrated into her outlook under critical reflection. This is not licence for subjectivism. It is a test of sincerity and coherence. In practice it means that a community should expect judges to own their reasons and to be able to show how those reasons fit with the aims of the practice. It also means that communities should expose judges to other outlooks so that parochial patterns are challenged. This is how perspectival objectivity is built. It is built by the friction between sincere outlooks that take each other seriously. This takes us to parochial concepts. There are ways of talking that make deep sense within a tradition and look thin from the outside. Alasdair MacIntyre argues that such forms of life sustain goods that are internal to their practices, and that understanding travels with initiation into those practices. In teaching we see this when subject communities develop a thick sense of what counts as flair in writing, or of what counts as mathematical neatness, or of what counts as historical sense. Outsiders sometimes misread the internal standards as taste. Insiders sometimes mistake their own customs for universal truths. The chapter urged a double discipline. First, communities must keep returning to the goods they claim to serve. If a rubric begins to reward a shadow of the good, for example tidy but lifeless writing in place of clear and engaged prose, the community must correct itself. Second, communities must place their standards in dialogue with adjacent traditions. A mathematician who learns how historians talk about evidence, or a literary scholar who learns how scientists talk about explanation, returns with a refreshed view of what counts in her own field. This is not a dilution. It is a safeguard against parochial drift. The middle part of the chapter worked in detail with meaning holism as it appears in classrooms and exam rooms. Quine’s image of a web of belief helps us see how revision propagates. If we learn that a particular marker has been treating a rhetorical flourish as a sign of depth when it is only ornament, we may not need to revise the whole cloth. We can adjust a small part of the web. Davidson’s principle of charity also does work. When readers disagree, we begin by assuming that the other is mostly right about the obvious, and we search for the locus of disagreement that matters. Often it is not a verbal dispute. It lies in a tacit weighting of values. One judge is giving more weight to originality, another to control. The trick is to bring the weighting to speech and to see whether the purpose of the task recommends one weighting over the other. At this point Raz returns. Authority can set a second order reason to weight originality more in this module because the aim is to push risk taking, and to weight control more in the exam because the aim is to check secure competence. This is not manipulation. It is the explicit alignment of reasons with aims. Stephen Davies and Allan Gibbard were useful interlocutors for value objectivity without convergence. Davies shows how aesthetic reasons can be genuine and shareable even when taste diverges. Gibbard explores the norm expressing character of value discourse without collapsing into non cognitivism. Both encourage a stance in which statements in assessment can be truth apt in the thin sense that they answer to norms that can be challenged, while also acknowledging that acceptance of norms is shaped by upbringing in a practice. I tried to show what this means when a panel debates whether an essay has flair. The reasons that are offered, such as unexpected yet apt imagery, or bold but supported claims, are not merely reports of liking. They are invitations to see features as reasons. The disagreement can then move on two tracks. One track asks whether the features are really there. The other asks whether they should count. Progress on either track is real progress. I then brought in Putnam’s work against a God’s eye view. Putnam’s internal realism insists that truth is not a matter of mapping words to a world conceived from nowhere. It is a matter of idealised rational acceptability within a conceptual scheme. For assessment this underwrites the practice of public standards and reasons. We do not seek a view from nowhere. We seek a view that is answerable to the facts of the work as grasped within a well articulated scheme of appraisal. The scheme can be criticised and improved. It can be compared with rival schemes. It cannot be replaced by a scheme free view. This makes humility a central virtue. Judges should be ready to revise their scheme when it persistently fails to make sense of the best work in a domain. Hilary Kornblith and Ruth Millikan provided a further line, namely that of naturalistic normativity. They point out that classification and evaluation can be answerable to stable patterns in the world without being forced into a single metric. The thought encourages us to look for the stable features that good performances share across contexts, the way a good argument tends to minimise gratuitous leaps, or a good explanation tends to track difference makers. These are not timeless rules. They are patterns that can be taught, contested and refined. They give the community something firmer than fashion to hold. The chapter then faced the worry that a perspectival stance will slide into relativism. I distinguished five senses of relativism. There is the banal thought that contexts matter. There is the sociological thought that standards are made by communities. There is the sceptical thought that truth is nothing over and above acceptance. There is the normative thought that no one can criticise an alien practice. There is the quietist thought that legitimate disagreement is all we can hope for. I rejected the sceptical and the normative forms. I accepted the first two, and I tempered the quietist form by insisting on the work of reasons. Iris Murdoch helped here. She reminds us that attention can be trained toward reality. In moral life this means that better vision can dispel self serving pictures. In assessment it means that better attention can reveal quality that a lazy reader misses. If attention can be trained, and if reasons can be given, then not all perspectives are equal, and progress is possible without convergence. At this point I returned to classroom design. If objectivity without convergence is our destination, classrooms must stage the practice that builds it. Students must learn to offer reasons for claims about quality. They must learn to receive criticism as a gift. They must see exemplars as arguments, not as templates. They must learn that criteria name virtues, they do not replace judgement. They must watch teachers disagree with civility and be shown how the disagreement is adjudicated by appeal to the aims of the task and to the shared goods of the subject. In such rooms students begin to speak the language in which objectivity is made. They also learn why some decisions remain in the space of policy rather than truth. When a cut point must be set for system reasons, the teacher can explain that this is a collective choice designed to serve fairness over time. Students see that honesty about limits and fidelity to reasons can live together. I closed the conceptual loop by returning to validity. Samuel Messick’s integrative view says that validity is about the meaning we can read off scores and the consequences of their use. Under the present settlement, validity becomes a question about whether the network of reasons we employ actually tracks the construct we claim to be judging and whether the procedures we use support those reasons rather than substitute for them. It also becomes a question about whether the institutional uses of the judgements remain faithful to the reasons that justified them. If a panel reasons that a particular essay shows rare flair because of the risks it takes, and if the system later penalises that script because some elements are untidy, we have a failure of validity at the level of use. To prevent such failures, the chapter recommended explicit statements of the hierarchy of reasons in reports to those who will use the results. This is where Scanlon’s discipline of justification returns in practical form. Users of assessments must be able to see why the decisions deserve uptake in their setting. A few final figures helped me to keep the tone. Thomas Nagel writes about a view from nowhere that is not really available to human agents. We pretend to occupy it when we collapse many perspectives into a single figure without saying how the collapse was done. Bernard Williams writes about truthfulness as a virtue of institutions. Truthfulness here is not the bare alignment of word with world. It is the practice of telling the truth about what we can and cannot claim. If we cannot claim decisive precision at a border because the concept does not support it, we should say so, and we should pair that candour with a policy that reduces the harms of decision. Christine Korsgaard writes about the authority of norms as arising from reflective endorsement. In assessment that means teachers and students can own standards when they see how those standards are justified by the goods of the practice. Ownership of that kind builds durability. It also builds a culture that can correct itself. The sixth chapter therefore argued that interpretive assessment does not need the crutch of eventual convergence in order to speak of objectivity. It needs communities that articulate inferential roles, that curate exemplars as arguments, that require reasons that could command acceptance by informed strangers, that keep conventions aligned with pre existing reasons, that use authority to serve those reasons rather than to conceal fiat, that accept higher order vagueness at borders, and that confess when policy rather than truth must decide. This is a demanding ethic. It is also a practicable one. It fits what the best teachers already do when they read with attention and teach students to see what counts. It resists the reduction of educational goods to tokens that travel well as numbers but badly as reasons. It invites a public that is willing to live with the candour that some disagreements are honest and some decisions are institutional. In that candour the profession earns authority of the right sort.   References Bruner, J. S. (1960). The process of education. Harvard University Press. Dewey, J. (1938). Experience and education. Macmillan. Kane, M. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Polanyi, M. (1966). The tacit dimension. Routledge & Kegan Paul. Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley. Chapter 7 Vagueness Borders and Legal Analogies In the seventh chapter I turned to Timothy Endicott to sharpen the account of vagueness that has been at work throughout and to test it against the demands of assessment. I wanted to show why the usual strategies for taming vagueness fail in evaluative domains and to make clear what follows when we stop imagining that a better wording or a clever rule can dissolve the trouble. Law offered the richest set of tools for this exploration, not because education should imitate it, but because jurists have wrestled with public decision under uncertainty with a care that education often lacks. Endicott’s work gave the argument a backbone. Around it I placed Hart’s open texture, Dworkin’s right answers, Williamson’s epistemicism, Sainsbury’s treatment of the sorites, Crispin Wright’s tolerance principle, Dorothy Edgington’s critique of sharp boundaries, Kelsen on frames, and a handful of other signposts. The result is a picture of vagueness that is both humbling and useful. It is humbling because it tells us where reasons run out. It is useful because it tells us how to design and speak when they do. I began with Endicott’s central distinction between semantic vagueness and pragmatic vagueness. Semantic vagueness is a feature of words and concepts. Terms such as heap, tall, bald, elegant, fluent and good have clear positive cases and clear negative cases and they also give rise to series in which each neighbour looks much the same as the next while the ends are plainly different. Pragmatic vagueness is a feature of use. Even when language is crisp, what is appropriate to do with the language in a situation can be unsettled by context and by norms that are themselves not sharp. Endicott insists that the two are entangled in practice. We would like to separate truth conditions from questions of appropriateness, but in real decision the two are not easily prised apart. This is a sober starting point for assessment. A criterion may be written with precision. Its application will still require judgements about relevance and sufficiency that are not precise. A mark scheme may state arrive at five o’clock. The etiquette of arrival for tea may mean around five, and to arrive exactly on the hour may be impolite. Content can be neat while action remains vague. With that frame set, I placed Endicott’s positive model in view. He rejects the boundary picture in favour of a similarity model. On the boundary picture there is a scale and a sharp cut, even if we do not know where it lies. On the similarity model a vague term applies where a case is sufficiently like paradigm cases for the purposes at hand. We do not draw a fine line. We move by family resemblances and saliences, and we argue about what is sufficiently like in the present context. This is not a refusal to discipline judgement. It is a re description of how discipline actually works. To understand tree is to see a family of features that belong together in many ways and to use the family intelligently. To understand C grade in English is to have a repertoire of exemplars and to see enough kinship to apply the grade without pretending to a single scale that ranks every script with precision. The similarity model suits assessment better than the boundary model because it respects incommensurables and because it names the work that communities actually do when they justify verdicts. The most important part of Endicott’s work for education concerns evaluative language. He argues that terms which express value are not only vague, they are resistant to a certain kind of cure. The epistemic cure tells us that there is a hidden sharp boundary and that vagueness is ignorance of that boundary. On this story the right answer exists for every case. We simply cannot always know it. Endicott’s chef and soup argument shows why this is wrong for evaluative terms. The chef knows that it matters that the soup is good. The chef also knows that a single grain of salt does not matter. The chef then knows that a single grain cannot make the difference between good and not good. The point is normative not psychological. A norm that is to guide action for an agent must be knowable by that agent as a reason. A secret cut cannot do any guiding. It cannot be the reason why the chef should revise the verdict. To insist that there is nevertheless a tiny fact that flips the status is to detach evaluation from the agent who must own it. Educational judgements are of this kind. A teacher who knows that it matters that an essay is good and who knows that missing one letter never matters knows that removing one letter cannot be the thing that flips the verdict. There will be neighbour pairs that resist a decisive cut. No plea to a hidden spike of reality can save us from that. Crispin Wright’s tolerance principle sits close to this thought. The folk idea is that small changes do not make a difference in cases that are governed by vague terms. Epistemic cures often claim that the principle cannot be universally true. Somewhere there must be a pair of neighbours that disagrees. Endicott’s point is that for evaluative terms the principle can be upheld in all ordinary uses. The difference that matters is one that an agent can treat as mattering. To obey the norm is to track that kind of difference. The more we cling to a universal sharpness beneath the surface, the more we are tempted to delusions that cannot guide action. The chapter moved next to higher order vagueness. Endicott calls it truculent. If we try to manage first order vagueness by drawing a line, we discover that the line drawing itself becomes a site of vagueness. If we try to manage that by drawing a second line that marks true borderline cases, the second line is also vague, and so on. Many valued logics and supervaluationist schemes give formal expression to these attempts. The upshot for Endicott is bleak and helpful. There is no principled non arbitrary way to stamp out higher order vagueness. That is why solutions that tidy the surface of a problem leave us with the same trouble in the next layer. For a teacher this means that a precise rubric with many levels does not eliminate vagueness. It just moves it one level up, where the decision about which level is apt becomes vague. For an examiner this means that very long lists do not abolish discretion. They invite a new discretion under a heavier paperwork. At this point I brought legal examples to life. Hart’s penumbral cases remind us that a rule cannot anticipate every circumstance. In the legal literature these are the cases at the margin where the applicability of a concept is unsettled. The judge must exercise discretion. Dworkin replies with a more ambitious account of interpretation. He claims that by reading the legal practice in its best light, by attending to principles as well as rules, the judge can find a right answer even in hard cases. Endicott grants the power of interpretation while marking its limits. Some disagreements are not about ambiguity in the concept. They are about vagueness that remains after disambiguation. Dworkin’s point that disagreements often track differences about the core rather than the circumference is important, but it does not capture all that matters. There are cases where both readers share the core and have cleared away confusions, yet a neighbour pair still resists a decisive cut. Two jurisprudential devices helped me mark the shape of the impasse. Kelsen imagines the legal order as a frame that legitimates a range of choices. Inside the frame many decisions can be correct. There are no gaps. In assessment this translates into a policy choice that licences a cut point as valid. It is a useful picture and it makes organisational life possible. Endicott adds the warning that higher order vagueness means that the presence of a frame does not transform an indeterminate matter into a fact of nature. The decision is valid as policy. It is not true in the sense that some want to claim. Fuller’s inner morality of law also travelled well into the assessment context. Publicity, congruence, clarity, and non retroactivity are conditions of legitimacy. Their educational analogues include accessible standards that do not degrade content, alignment of outcomes with policy and with teaching, and reasons that can be scrutinised. These conditions do not remove vagueness. They tell us how to live with it without losing trust. I then brought Endicott’s million raves to the table. Imagine a series from a monstrous loud event to a silent event, each differing by one decibel, such that no pair of neighbours is noticeably different to those affected. The law must punish the outrageous ones and acquit the harmless ones. How can a court convict one and acquit the next when all along the line the neighbours are indistinguishable for the purposes that matter. If the legal commitment is to treat like cases alike, the demand to be decisive creates a paradox. Assessment has its own version. From a script that is plainly a pass down to a script that is plainly a fail, there is a very long series of neighbours, none of whom differ in a way that meets the standard of a decisive difference, while the ends are clearly different. A system that insists on a single sharp cut at this scale invites arbitrariness disguised as discovery. The practical lesson is not to live without cuts in public systems. It is to speak honestly about what a cut is doing and to design regimes that minimise the frequency with which a decision must be made at points that our reasons do not support. Sainsbury’s treatment of the sorites helped me consolidate the argument. If we insist on transitivity and a total order across an evaluative field with many values in play, we will be driven into paradox. Edgington adds that any supposed sharp line will place two cases either side of the line that do not differ in a way that matters. To insist that an ordered scale exists in such domains is to insist on a fiction that will deform the domain. In assessment the deformation appears as the pressure to treat unlike cases as if they belonged on a single line and to teach to the tokens that travel easily along that line. Endicott’s similarity model allows us to say no to that pressure. It authorises partial orderings and families of good cases that cannot be placed in one true sequence. It treats the claim that A is better than B as a local comparative judgement with reasons, rather than as an absolute measurement on a hidden metric. From there I turned to the relation between vagueness and interpretation. It is tempting to believe that interpretation is the extra resource that will resolve indeterminacy. Marmor’s account of interpretation as an appeal to real or counterfactual intentions is one form of this temptation. Dworkin’s reliance on a best light reading is another. Endicott shows why such appeals often move the problem rather than solve it. If interpretation disambiguates a concept and locates the core, that is progress, but vagueness remains after disambiguation. If interpretation adds principles that guide choice, that too is progress, but the principles will be stated in language that admits higher order vagueness. If interpretation appeals to context, it may resolve some uncertainties, but the judgement that this context is the one that matters is also vague in many cases. The upshot for assessment is modest and strong. Interpretive resources can reduce error and can make many hard calls intelligible. They cannot abolish absolute neighbourhoods. When we meet them we must choose for institutional reasons and say that we are doing so. The chapter had to confront sincerity. Sorensen’s discussion of absolute borderline cases and insincere verdicts is awkward and important. When a judge or a teacher knows that a case is an absolute borderline case, the judge or teacher cannot sincerely say that the case is of one kind rather than the other as a matter of truth. To make a decisive judgement while acknowledging that there is no truth to be discovered at that granularity is to act under a norm that demands decision for institutional reasons while refusing to pretend that discovery has occurred. There is nothing ignoble in such candour. The ignoble move is to present the decision as if it were a discovery. Endicott does not make a spectacle of this. He simply shows that a practice that values truthfulness will minimise occasions for this form of speech and when it cannot do so will speak in the register of policy rather than in the register of truth. At this point I returned to assessment design. Endicott’s similarity model and his insistence on the reality of higher order vagueness push us toward constructs and exemplars rather than long checklists. A construct carries a family of virtues taught through cases. Communities reason by pointing to similarities that matter and by arguing about weight. Comparative judgement methods sit naturally within this world, because they ask judges to make local choices that reflect expertise without pretending to a single universal scale. Moderation becomes the place where reasons are sorted in public, where neighbourhoods are explored, and where cut policies are agreed as policies, with an account of their purpose. Validity becomes a question about whether the reasons employed track the construct and whether the uses to which decisions are put remain faithful to those reasons. Reliability becomes a question about whether the community can reproduce its reasons with sufficient stability to sustain confidence. The shape of the settlement that emerged in earlier chapters now has a philosophical warrant. It is not a compromise born of impatience. It is an answer to the structure of the phenomena. I then returned to Dworkin and Hart to place the settlement in their light. Hart is right that discretion is needed in penumbral cases. Dworkin is right that interpretation can do more than timorous administrators suppose. Endicott is right that some borders are simply not there to be found and that higher order vagueness will defeat any hope of a universal cure. The educational lesson is immediate. A teacher who cannot make a decisive cut at a border is not incompetent. A system that pretends she can always do so is dishonest. A profession that refuses to design out needless encounters with absolute neighbours is careless. A culture that explains decisions in the language of confidence and zones when reasons run out is truthful. I wanted to allow that pragmatic vagueness is not a vice. It is often a way in which language carries the wisdom of practice. People know that arrive at five has a surrounding aura of appropriateness that language could try to capture but never fully succeeds in capturing. Teachers and markers live by norms of relevance and sufficiency that are not fully specifyable but that are teachable through cases. Endicott shows that one cannot finally separate pragmatic from semantic vagueness in much human practice. For assessment this means that attempts to purge all pragmatic judgement from marking are not examples of scientific rigour. They are examples of philosophical confusion. The effort should be to build and maintain communities that can exercise pragmatic judgement responsibly and in public, and to design tasks and scales that limit the frequency with which such judgement must take the weight of a system. The end of the chapter returned to the idea of objectivity. After Hart and Dworkin and Endicott, what remains of it. Enough remains. Objectivity is not the fantasy of a single scale with a hidden sharp line. It is the discipline of giving reasons that could persuade an informed and impartial audience that a conclusion is warranted for the aims at hand. It is the discipline of building exemplars that teach the construct, of curating a memory of cases that stabilises standards, of checking for bias and drift, of publishing anchors and commentaries so that those outside the circle can see what is being counted as a good reason. It is the discipline of writing reports that explain why a decision was hard and how it was resolved. It is the discipline of living with higher order vagueness at borders without lying about it. It is the discipline of policy that declares a cut as a device for serving fairness over time rather than as a discovery. If one asks why any of this matters beyond theory, the answer is that lives lie near these borders. When a script is just below and another just above, and there is no difference that matters for the construct, the decision to split them must be owned as a choice that serves the system rather than as a truth about the work. The teacher must be able to say that out loud without being thought incompetent. The student must be able to hear it without losing faith in the practice. That candour is not cheap. It must be paid for by the prior investment in constructs, exemplars, moderation, comparative routines, and a culture of reasons. Endicott’s jurisprudence shows why this is the right price. It refuses the false comfort of secret lines. It honours the practice by telling the truth about its limits. It points the way to a form of objectivity that keeps its promises.   References Dworkin, R. (1986). Law’s empire. Belknap Press. Edgington, D. (1997). Vagueness by numbers. In R. Keefe & P. Smith (Eds.), Vagueness: A reader (pp. 294–316). MIT Press. Endicott, T. (2000). Vagueness in law. Oxford University Press. Fuller, L. L. (1964). The morality of law. Yale University Press. Hart, H. L. A. (1961). The concept of law. Clarendon Press. Kelsen, H. (1991). General theory of norms. Oxford University Press. Raz, J. (1986). The morality of freedom. Clarendon Press. Sainsbury, R. M. (1995). Paradoxes (2nd ed.). Cambridge University Press. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Williamson, T. (1994). Vagueness. Routledge. Wright, C. (1992). Truth and objectivity. Harvard University Press. Chapter 8 Policy at the Border and Institutional Candour In the eighth chapter I set out to examine sincerity at the boundary where institutions require a decision and concepts refuse to yield one. The question is simple to ask and hard to honour. What should a judge or an examiner say and do when the case is genuinely a neighbour pair with no difference that matters for the construct. The chapter takes up Hart and Dworkin once more, folds in Endicott on higher order vagueness, draws on Sorensen’s analysis of absolute borderlines and on Austin’s account of verdictives, and then tries to say what an ethic of decision looks like when truth and policy part ways in the final millimetre. I began with the pressure to decide. Public practices are not optional games. They allocate goods and burdens and they must do so on time. The law cannot rest on a shrug. The examination system cannot end with a silence. Hart’s image of open texture frames the scene with honesty. Rules will not have anticipated every case. There will be penumbral cases in which the extension of a concept is unsettled. In these cases officials must exercise discretion. The permission is limited and real. It is limited because officials remain bound by the point of the rule and by the wider purposes of the practice. It is real because no further algorithm is available. The teacher at the pass border is in this sense a public official. Dworkin challenges the reach of this permission by arguing that interpretation can often deliver a right answer. Hercules is his emblem for the judge who reads the practice in its best light by bringing principle into conversation with rule. I took this challenge very seriously. It rescues us from an easy cynicism and it insists that a conscientious reader can do more than a checklist would suggest. In educational terms this means that a reader may move from the text to the construct, from the construct to the aims of the course, and from the aims of the course to a considered view of where this piece belongs. Much that appears marginal can be stabilised when that network of reasons is brought to bear. Yet if Endicott is right, there are limits. There are cases where disagreement survives the removal of ambiguity. There are cases where higher order vagueness defeats the hope of a principled cut. The border is not merely hard to find. It is not there to be found at the demanded granularity. Here Sorensen’s argument about absolute borderline cases becomes unavoidable. In an absolute borderline case there is no fact of the matter at the level of precision demanded by the institution. The case is neither determinately in nor determinately out. To make a decisive statement about such a case is to speak as one who does not believe that there is a truth at that level. Sorensen calls attention to the tension with sincerity. To say this is a pass in the tone of discovery when you also know that there is no discoverable fact at this margin is to invite the paradox of saying p while also holding that there is no truth of the matter about p. The force of this is not a philosopher’s trick. It is a reminder that the voice of a judge or examiner carries an authority that claims a certain relation to how things are. If that relation cannot be claimed without pretence, the ethics of the practice require that we change the register. Austin’s distinctions among speech acts help us here. A promise is an exercitive that creates a new normative fact. A verdict is not like that. A verdict purports to deliver a finding on evidence and reasons as to value or fact. Austin calls such acts verdictives and he notes their obvious connections with truth and fairness. To collapse grading into the performative class that makes truth by declaration is to mistake its grammar. A referee who calls a goal that video shows never crossed the line creates a result but does not create a truth. A teacher who calls a pass declares an outcome that the institution will enact, but the declaration aspires to fit a practice governed by reasons. That aspiration anchors the examiner in the ethic of sincerity. She must not tell herself that the act makes it so. She must either give reasons that purport to reveal how things are for the construct or she must signal that the decision is of a different kind. This is where Dworkin’s challenge returns in a new light. The best light method protects sincerity in many hard cases because it requires that the judge articulate how the decision fits the practice as a whole. The decision is not a bare assertion. It is a claim about interpretive truth within a tradition. The disinfecting work is done by reasons. The worry is at the very border where reasons tie. Endicott’s work on higher order vagueness makes the worry sharper. The judge tries to disambiguate. She tries to add principles. She tries to settle context. Each move can reduce uncertainty. Each move also reintroduces vagueness at the next order. Eventually one meets a point where the insistence on a decisive cut would turn reasons into decoration for a fiat. If the cut must still be made for institutional reasons, the voice must change. Raz helps to shape this change. His service conception of authority makes room for exclusionary reasons. An authority can give a second order reason to decide here rather than there in order to help subjects better conform to the first order reasons that apply to them over time. In assessment a cut policy can be such a reason. It can be a policy that places the cut at the place that best preserves fairness across cohorts, or that best serves the purposes of the course, or that reduces harm in the round. The policy does not discover a hidden truth at the knife edge. It declares a standing choice that serves the practice. To speak in this register is to be sincere. The examiner can say that the work falls in the neighbourhood where reasons do not separate neighbours and that the pass or fail follows from the standing policy that serves the system. The examiner keeps faith with truth by not pretending to discovery. The examiner keeps faith with fairness by honouring a policy that was chosen in view of the first order reasons of the practice. This shift of register matters for public trust. Fuller’s inner morality of law gives the institutional conditions for legitimate rule. Publicity, congruence, clarity, and non retroactivity are key. Educational analogues ground an honest assessment culture. The standards must be public and intelligible without becoming reductive. The use of grades must be congruent with the purposes that justified them. The policies that tilt decisions in true neighbourhoods must be stated in advance and applied without favour. In this way the institution can keep the duty to decide while refusing the temptation to mystify its choices. The cost is the loss of a certain drama. The gain is durability. I then returned to Dworkin’s distinction between criterial concepts and interpretive concepts. It is a fruitful distinction that protects us from a lazy appeal to vagueness whenever we meet disagreement. In some quarrels the parties have ceded the concept and are arguing about its extension. In others they disagree about the point of the concept. Dworkin’s own example concerns justice and democracy as interpretive concepts that live in networks of principle. He urges us to follow the argument where it leads. That advice carries directly into assessment. A debate about flair may be a debate about which features count as signs of it and why. A debate about coherence may be a debate about which patterns of connection matter for this task and why. If disagreement can be traced to differences of interpretation at the core, then the work of articulation can often restore unity of judgement. I argued that much that passes for border uncertainty is in truth core ambiguity. The remedy in such cases is argument and exemplification. The caution is that even when the core is clarified, higher order vagueness can remain. Dorothy Edgington’s critique of sharp boundaries strengthens the caution. A line that pretends to honour the virtue will always place two neighbours on either side who do not differ in a way that matters for the virtue. This is not a quirk. It is the structure of vague predicates. Crispin Wright’s defence of tolerance protects a corresponding norm for evaluative language. Small differences must not be treated as decisive where the practice itself treats them as insignificant. These thoughts tell against the impulse to engineer ever finer scales as if a scale could chase away vagueness by more levels. Endicott’s warning about the truculence of higher order vagueness shows why such engineering is false comfort. The vagueness moves up a level and returns as a question about which level is apt. With these materials in place I considered the accusation of insincerity in a more explicit key. Sorensen presses hard. When one knows that a case is an absolute borderline case, to announce a discovery is to lie. Some reply that one can avoid the charge by redescribing the act as constitutive. The teacher says pass and thereby makes the student a passer. Austin blocks the reply. Verdictives are not like promises. They purport to fit reasons. Others reply that one can fix the world by stipulation in such cases and hence avoid falsehood. The reply begs the question. The stipulation must be presented as policy rather than as discovery if sincerity is to be preserved. A further reply says that institutions must honour the duty to decide and that to speak of truth at the boundary is a category error. I allowed that there is a duty to decide. I denied that this duty dissolves the need to speak truthfully about the status of the decision. The right language is policy language. The right ethic is to minimise such decisions by design and to own them when they cannot be avoided. From theory I moved to design. The first design principle is to widen the evidence base so that fewer cases sit in the thinnest part of the wedge. Multiple tasks that display different facets of the construct recover reasons that are silent in a single artefact. Sequencing that allows growth to show turns some neighbours into separates. Opportunities for self correction reveal control that a snapshot hides. The second principle is to build constructs that acknowledge families of excellence. A family of exemplars with commentaries allows a reader to see kinship in more than one direction. Narrow constructs invite artificial sharpness. Broad constructs hold coherence without forcing single line rankings. The third principle is to use comparative routines that ask for local choices rather than absolute magnitudes. These routines reduce pressure to enact a spurious scale and they generate reliable orders without claiming more than they can support. The fourth principle is to surround the act of deciding with the practice of reason giving in moderation. Reasons do not only justify. They educate the eye and they build a memory that makes future neighbours rarer. I then addressed the institutional need to explain outcomes to users who require firm signals. Universities and employers are often cast as impatient with nuance. They want to know who is above and who is below and they want to know this in numbers. The chapter suggested a different compact. It is possible to provide firm signals while being truthful about their grounds. Where the construct supports decisive reasons, the signal can be presented as a discovery. Where the construct does not support decisive reasons, the signal can be presented as a policy enactment made under acknowledged uncertainty. Users can be educated to read both. In time the candour becomes a resource. It allows institutions to defend their practice when challenged because they can show what they know and how they know it and where they act by policy. Bernard Williams’ virtue of truthfulness supplies the tone. Truthfulness here is not a metaphysical boast. It is institutional integrity in speech. The chapter also took up the objection that policy at borders invites unfairness. Does not an honest policy simply institutionalise arbitrariness. The reply is that the choice is not between honest policy and decisive truth. The choice is between honest policy that is designed to reduce harm and pretended truth that distorts the domain in order to maintain appearances. A good policy can be audited. It can be tested for disparate impact. It can be revised in the light of evidence. It can be made prospectively and publicly. It can also be paired with remedial design that reduces the rate at which it must be invoked. In this way policy becomes a device for fairness rather than an admission of defeat. I returned briefly to the classroom, since the ethic of sincerity has a pedagogic face. Students can be taught why some decisions are reasons based and others are policy based. They can be shown how reasons work, how exemplars anchor a construct, and how a policy serves the practice. When students meet a border they can be told what would have moved the work out of the neighbourhood. This is more helpful than a fiction about a unique hidden line. It preserves respect for the subject and it trains students in a form of intellectual honesty that is valuable beyond any one course. The conclusion of the chapter drew the threads together. Hart keeps us from the fantasy that rules will suffice. Dworkin keeps us from the laziness that treats all hard cases as empty. Endicott keeps us from the hubris that treats all hard cases as solvable by interpretation. Sorensen keeps us from the comfort of speech that hides policy under the costume of discovery. Austin keeps our grammar straight so that we do not call a verdict a promise. Raz keeps our authority in order so that when we must act by policy we know what kind of reason we are offering. Fuller keeps our institutional conscience awake so that we attend to the conditions under which policy can be legitimate. With these companions we can say what sincerity demands. Design to avoid needless border decisions. Use interpretation to the full extent of its power. When reasons run out, decide by a standing policy that serves the first order reasons of the practice. Speak in the register that names this as policy. Publish the policy and audit its effects. Teach the community to live with this candour. In such a culture the duty to decide and the duty to tell the truth can live together.   References Austin, J. L. (1962). How to do things with words. Clarendon Press. Dworkin, R. (1986). Law’s empire. Belknap Press. Endicott, T. (2000). Vagueness in law. Oxford University Press. Fuller, L. L. (1964). The morality of law. Yale University Press. Hart, H. L. A. (1961). The concept of law. Clarendon Press. Raz, J. (1986). The morality of freedom. Clarendon Press. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Williams, B. (2002). Truth and truthfulness. Princeton University Press. Chapter 9 Absolute Borderlines Truthmakers and Validity In the ninth chapter I faced the hardest claim in the book and tried to show why it matters rather than frighten us into denial. The claim is that absolute borderlines exist for many evaluative terms and that educational grading meets them. This is Sorensen’s thesis reframed for our setting. It is not the familiar thought that some cases are hard. It is the deeper thought that at some thresholds there is no fact of the matter at the demanded precision. The chapter set out the structure of that claim, answered the incredulity it provokes, tested rival accounts that try to secure a hidden exactness, and traced the consequences for validity and for the ethics of decision. My aim was not to celebrate indeterminacy. It was to tell the truth about it and to show how a truthful practice can live with it. I began with Sorensen’s contrast between truth value gaps and truthmaker gaps. Many strategies for the sorites deny that a borderline statement has a truth value. Sorensen wants a different picture. Propositions all have truth values in the thin sense that each is either true or false. The trouble at the boundary is not a gap in truth value but a gap in what would make the truth. A claim about the first moment a craft leaves the atmosphere is true or false, but there may be no single event that makes it true because the world is not jointed there at the demanded granularity. The analogy is helpful for educational cases. A claim that this script is a pass is either true or false in the thin sense that a system must treat it one way or the other. Yet at a certain border there is nothing in the work that could make one verdict as against the other the single correct discovery. The verdict is still needed for institutional purposes, but the need cannot conjure a truthmaker where the structure of the construct does not supply one. This way of speaking can feel like a trick, so I tried to meet incredulity head on. The first incredulity says that evaluative talk must always be anchored in reasons and that a verdict without a reason would be a confession of incompetence. The chapter answered by repeating the central distinction of the book. A neighbour pair with no decisive reason at the demanded precision is not a confession of incompetence. It is the footprint of higher order vagueness in a practice whose concepts are shaped by family resemblances and incommensurables. Endicott’s chef knows that a grain of salt does not make the difference that matters. A teacher can know that a single trivial feature does not matter and can still be confronted with a pair in which nothing that matters separates the neighbours. The absence of a decisive reason for this pair therefore is not a private failing. It is a public fact about the concepts in use. The second incredulity says that even if some domains are vague in this way, assessment deals with small superlative differences precisely because the stakes are high. Surely a profession cannot tolerate a zone in which nothing separates the neighbours. I answered that careful professions meet such zones all the time. The pathologist can meet two tissue samples that present no decisive difference at the threshold of a staging system, and yet the clinic must act. The law can meet a nuisance case at the edge of tolerability and yet must reach a verdict. The honest response is not to pretend that the threshold is a natural cut but to act by a standing policy that serves the reasons of the practice over time. The point is not that there are no reasons in the neighbourhood. There are many. The point is that none of them singles out this border case as determinately one side rather than the other. That is the sense in which the borderline is absolute. The third incredulity says that absolute borderlines make nonsense of comparison. If there is no fact of the matter at the cut, how can a system claim to be valid. Here I leaned on Messick’s integrative view and on the earlier chapters. Validity is about the meaning of decisions and the consequences of their use. In many parts of the domain, reasons are decisive. There the meaning is discovery and the consequences can be interpreted as such. In the thin neighbourhood where reasons tie, the meaning is policy that serves the first order reasons in the round. If this is stated candidly, and if design reduces the rate at which policy must speak alone, then validity is preserved. It is preserved by telling the truth about which voice is speaking. I then turned to the rivals. Epistemicism promises relief. Williamson argues that vague terms have sharp borders hidden from us by principled ignorance. If that were so, there would be no absolute borderlines in Sorensen’s sense. The teacher would always be speaking to a reality that contains a last pass just as there is a last leaf on a tree. I drew again on Endicott’s argument about action guiding norms. A norm that makes the difference must be available in principle to the agent as a reason. Hidden borderlines cannot meet that condition in evaluative domains. The chef who knows that it matters that the soup is good and who knows that a grain does not matter, knows that no hidden grain sized spike can be the reason to flip the judgement. A teacher who stands in the same posture toward a script stands in the same posture toward hidden cuts. The epistemic picture may have attractions where the property is independent of our norms for action. It cannot explain the grammar of evaluation. Supervaluationism promises a different relief. It says that a sentence is true if it is true on all acceptable ways of making the vague term precise and false if false on all. Borderline cases fall into a truth value gap. Endicott’s worry about higher order vagueness returns here. The space of acceptably precise ways to draw the line is vague. Attempts to set a crisp family of precisifications simply push the problem one order up. The chapter pressed a further educational worry. Supervaluationist talk can tempt us into thinking that decisive precision could be achieved if only we agreed a sufficient scheme. That way lies the pathology of ever longer rubrics that promise a precision they cannot secure. Many valued logics promise yet another relief. They replace bivalence with a graduated truth value for borderline statements. There is a tidy completeness in these systems. The educational trouble is that they misdescribe the practice. Teachers do not treat the status of a script as partly true and partly false. They treat it as undecided at the demanded cut and then decide for policy reasons if a decision is required. The semantics of many valued logics do not fit the speech acts of verdicts. Austin’s warning keeps us straight. A verdict purports to fit reasons. Where reasons do not resolve the truth, the voice of policy must announce itself for what it is. Having checked these rivals, I returned to Sorensen’s family of examples. Impossible objects such as the colour spectrum are instructive. We teach the spectrum as if it contained crisp borders between colours. We know that where one shade gives way to the next is a matter of borderless transition. The concept remains useful and true enough for many purposes. We draw illustrative lines across what is in reality a continuous gradation. In education we perform similar acts when we rank scripts on a single scale and when we suggest discreet grade categories across a field that is in truth a family of resemblances. The point is not that we should abandon scales. The point is that we should hold them as tools rather than as discoveries and that we should educate users to read them as such. Sorensen’s critique of legal realism also helped me block a cynical slide. Realism is tempted to say that judges simply pick outcomes they prefer and then dress them in reasons. A similar cynicism is sometimes heard about teachers. The chapter rejected this. Raz’s service conception again supplies the alternative. A community earns authority when its procedures help it track the reasons that already apply in the practice. The presence of absolute borderlines does not licence caprice. It licences candour about policy at the points where reasons tie. Everywhere else the duty to reason remains. The existence of some policy decisions does not turn all decisions into policy. I considered the bureaucratic face of incredulity. Bureaucracies love superlatives and sharpness because they travel well in reports and tables. A system that admits that there are neighbourhoods in which policy rather than discovery speaks seems to invite a loss of managerial control. The chapter replied by showing the opposite. A system that denies absolute borderlines distorts its own materials until numbers say more than they mean. Trust is then lost when the distortion is exposed, for example when two scripts a whisker apart are reversed on appeal with equally good reasons. A system that admits border zones can design procedures for them and can defend those procedures explicitly. That defence can include audits for disparate impact, reviews of the frequency with which policy must speak, and redesigns that widen the evidential base. This is better management because it is truer to the phenomena. I then asked what all this does to the idea of knowledge in education. Does absolute borderline talk mean that knowledge about quality is fake. The answer is no. It means that real knowledge is often scruffy and local and that improvement is better pursued by learning to see salient features than by chasing a mythical universal metric. Polanyi’s tacit dimension returns with weight. We can train perception. We can grow shared repertoires of reasons. We can reduce noise. We can enlarge the space in which decisive reasons operate. We cannot turn every cut into a discovery, and we do not need to in order to be truthful and fair. To put pressure on the point I described two machines. The first was an inconsistent machine. It applies a long rubric to complex work and produces numerical verdicts that vary wildly with small changes in wording and with the presence of trivial cues. Its reports look clean and its stamps of reliability are earned by statistical artifice. It impresses those who want to see crisp figures. It fails the test of validity because it has been designed to ignore the very features that matter. The second was a forced analytic machine. It disaggregates a rich construct into sub scores that can be counted evenly, and then reaggregates them with hidden weights. The process produces a total number that looks authoritative. It bears little relation to the way expert readers judge the same work holistically. Both machines exist in practice. Both flourish in organisations that cannot live with absolute borderlines. Both produce decisions that are weakly reasoned even when they are stable. A different design is available. It begins with authenticity of tasks, as before. It continues with broad constructs taught through exemplars, so that families of excellence are visible and so that more of a script can be seen by more reasons. It uses comparative routines to place neighbours without pretending to a scale that solves the sorites. It builds moderation as a public practice of reason giving, so that the reasons that carry weight are rehearsed and recorded. It states cut policies as policies, including the aims those policies serve. It monitors the use of policy, both for fairness and for frequency. It writes reports that show where discovery spoke and where policy spoke. It trains students and users to read the reports in that register. Under such a design the presence of absolute borderlines is no longer a scandal. It is one of the conditions within which honesty and fairness must be achieved. The chapter spent time on superlatives because bureaucracies love them and because students feel them keenly. When a grade category is called distinction or outstanding there is a natural tendency to imagine that there must be a decisive frontier between distinction and the next level. Sorensen reminds us that superlatives are fragile at borders. The language of excellence can be retained if it is paired with a language of reasons and a language of policy where reasons tie. It can also be tempered by a shift in assessment culture from classification toward guidance wherever purposes allow. In many settings the more valuable message is what moved this work towards excellence and what would move it further, not the precise position on a manufactured line. I then returned to the theme of sincerity. The profession owes candidates and users a truthful account of what it can claim. Bernard Williams calls this virtue truthfulness and he connects it to the credibility of institutions. An assessment system that says plainly where its verdicts are discoveries and where they are policies earns a kind of trust that systems which pretend cannot earn. The cost is a loss of false comfort. The gain is resilience when controversies arrive, because the system can show the reasons it used and the policies it enacted. I closed with a short reckoning of consequences. Absolute borderlines do not license laziness. They demand better design to avoid needlessly forcing decisions at points where reasons do not separate neighbours. They do not license relativism. They cohabit with a strong culture of reasons in the vast majority of cases where reasons decide. They do not license mystification. They require public policies for the zones in which policy must speak. They do not undermine validity. They refine it by forcing us to say what our inferences mean at different points in the field. They do not make educational knowledge empty. They shift our ambition from mythical precision everywhere to disciplined perception, shared memory, and careful speech. In this way Sorensen’s hard thesis helps rather than harms. It removes an illusion that tempts systems into dishonest design. It drives us back to constructs and exemplars, to moderation and comparative judgement, to reports that speak in the right voice. It sharpens the difference between discovery and policy. It permits a profession to keep its promises by making promises it can keep.   References Endicott, T. (2000). Vagueness in law. Oxford University Press. Kane, M. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34(4), 273–286. Williamson, T. (1994). Vagueness. Routledge. Chapter 10 A Practical Settlement for Fair Assessment In the tenth chapter I tried to gather the threads into a working settlement that can live inside institutions without pretending that the troubles described earlier can be magicked away. I wanted a design that respects the structure of evaluative concepts, that earns public trust without cosmetic certainty, that gives teachers something they can actually do on Tuesday morning, and that allows regulators to sleep. The chapter therefore moved from philosophy to craft. It laid out the elements of a practice that can deliver fair enough decisions while telling the truth about their grounds. It is a design in which constructs and exemplars carry the meaning, moderation carries the discipline, comparative routines carry much of the ordering, statistics check rather than rule, and policy speaks candidly when it must. I began with the object of judgement. A construct must be broad enough to contain the family of excellences that the subject values. It must be taught through living examples, not through abstract lists. It must be revisited so that it does not congeal into a convenience. The best way to specify a construct is to curate a bank of anchors with commentary that names the salient features and explains their weight. The commentary is not a script for scoring. It is a record of reasons that persuaded competent readers. The bank is not a museum. It is a working memory that is refreshed each cycle. The measure of a good bank is that a new reader can learn to see as the community sees by reading the anchors and their reasons, that experienced readers can recalibrate themselves quickly, and that dissent has somewhere to live. A construct that will not admit more than one route to success is too narrow. A construct that cannot guide a novice at all is too vague. The bank is the device that keeps the balance. From the object I moved to task design. Authenticity is the first criterion. The task should demand the kinds of choice that the practice values in the world. It should present occasions for excellence in more than one direction, so that a single stylistic pathway does not tyrannise candidates. It should be long enough to let judgmental vision gather, but not so long that stamina and accident overwhelm intent. It should be hard to answer well by rote and easy to answer adequately by honest work. A corollary follows. If the task is authentic, it will produce variation that does not fit a single narrow template. That is not a flaw. It is the life of the subject arriving in the assessment room. The next element is moderation. In the settlement I propose moderation is not a last minute patch. It is the engine room that makes everything else possible. There are three phases. Before marking begins, a standard setting meeting studies anchors and agrees public reasons. During marking, live sampling and short conferences catch drift and distribute seeing. After marking, a review lingers over borders and over unusual patterns, with an eye both to errors and to lessons for the next cycle. The facilitator is not a boss. She is a steward of reasons. She keeps the discussion tied to constructs and exemplars. She invites minority views, not as irritations, but as sources of information about what the bank does not yet express well. Minutes matter. A discipline of recording the reasons that resolved disputes builds the memory of the community and reduces the need to fight the same fight in the next season. Comparative judgement takes a central role for ordering where absolute magnitudes are fragile. Judges see pairs and choose the better. The method respects the reality that people are more reliable at local comparisons than at assigning long scale numbers. It also respects the subject because it lets many forms of excellence argue directly with each other. The mathematics that aggregates pairwise choices into a stable order is well known. What matters for the chapter is the pedagogy around it. Comparative sessions must be seeded with anchors so that the pairwise work is tethered to the construct. Disagreements are expected and used as prompts for short reason exchanges that are folded back into the bank. The output is a rank order and if needed an estimated scale. These outputs should be treated as instruments, not as truths. They earn authority by the quality of the reasons that surround them. Criteria do not vanish in this design. Their job changes. Criteria act as lenses that help readers look. They name virtues such as control of register, coherence of argument, evidential responsibility, or inventive form, and they point to typical signs. They do not claim to be necessary and sufficient. Their language must be plain, but their meaning is fixed by their use in reasons around anchors, not by their text alone. In training, criteria are a scaffold. They give novices a first vocabulary for seeing. As fluency grows, the weight shifts to exemplars and to reasons that explain how a virtue was realised here rather than as a list of checkable moves. Statistics are present, but they are servants. They check for coherence within and across judges. They surface drift. They flag centres whose distributions are implausible given prior patterns. They help to set sample sizes for moderation. They help equate cohorts where tasks change. They do not decide disputes of content. They do not decide the meaning of the construct. They do not ask the subject to contort itself into a form that the statistic finds convenient. When a statistical signal collides with content, the collision triggers inquiry rather than an automatic override. Sometimes the inquiry ends with the content yielding, for example where bias has crept in. Sometimes the statistic yields, for example where two tasks were not equivalent and content reasons explain why. The point is to keep the hierarchy straight. Numbers help us watch ourselves. They do not know what we value. Policy enters twice. It enters early whenever authority sets second order reasons that align the practice with the purposes of the institution. A course that values risk and voice may have a policy that weights originality more in coursework and security more in the terminal exam. A system that wants to reduce harm near borders may have a policy that splits a borderline band toward the better funded next step. These policies must be public and prospectively justified. Policy enters again at the knife edge where reasons tie. There the voice of policy must be named for what it is. The cut is enacted to serve fairness in the round, or to maintain a promised rate, or to avoid distorting the construct by forcing a spurious precision. It is never presented as a discovery. The language matters. Reports must show which decisions were justified by reasons of the construct and which were enacted by standing policy in neighbourhoods where reasons could not separate neighbours. Appeals are the test of sincerity. In the settlement I propose an appeal is not a demand to re run the sorites. It is a demand to show the reasons that were used and to show that policy, where invoked, was applied as published. If an appeal reveals that a reason was misapplied, the remedy is correction. If it reveals that the work sat in a neighbourhood where reasons tie, the remedy is to check whether policy was correctly enacted. Appeals therefore become instruments for maintaining integrity rather than devices for gaming points. The tone of the appeal process must match the tone of the whole design. It should be plain, calm and precise. It should not invite the hope that hidden lines can be discovered if only the right person looks. Fairness is secured by design and by audit. By design, the construct is broad and the task varied so that more kinds of excellence can find expression. By design, marking is blind wherever this is possible. By design, panels are mixed, and dissent is used as information. By design, comparative routines dilute idiosyncratic preferences. By audit, distributions are checked for patterns that correlate with characteristics that should be irrelevant. By audit, reasons in the bank are inspected for parochial drift. By audit, policy effects are tested for disparate impact. The ethic is continuous improvement without denial. A finding of bias is not a scandal if the system is built to look for it and to correct it. Training is where this culture is made or lost. A one day briefing on a rubric will not do. Training begins with reading anchors and writing reasons under light supervision. It continues with live moderation in which novices speak early so that their seeing can be corrected in the moment. It includes short exercises in reason sorting, in which mixed quality reasons are grouped and weighed for relevance and sufficiency. It includes comparative sessions in which each choice must be justified aloud in a sentence or two. It includes periodic re reading of anchors in the light of shifts in the domain. Above all it gives novices time to watch experts read, not as a display of taste, but as a demonstration of how the construct is seen and named. The apprenticeship is cognitive and moral. It teaches attention and it teaches sincerity. Communication outward is not a cosmetic afterthought. The way we talk to students and users shapes what they expect and therefore shapes what is possible. Reports to candidates must show the hierarchy of reasons. They should name the virtues that were realised, cite passages and moves that display them, and indicate the family resemblances to relevant anchors. They should say what would move the work out of a neighbourhood where policy must speak. Reports to users must distinguish between results that answer to decisive reasons and results that enact policy. They must be explicit about the purposes served and the limits honoured. Over time users can learn to read these genres. Doing so raises the level of public discourse about assessment and reduces the appetite for numbers that pretend. Governance matters because this settlement depends on institutions that value the right things. A governing board must endorse the priority of content over convenience. It must fund time for moderation and training. It must protect the bank of anchors and reasons as a public asset rather than as a fragile internal memory. It must require audits that look for drift and bias and that test policy for disparate effects. It must discourage the hunger for spuriously fine rankings when the subject does not support them. It must insist that claims made in public match the grounds in private. This is the institutional version of truthfulness. It is not an ornament. It is the condition of durable trust. Technology is a help and a risk. It can manage comparative routines, surface disagreements quickly, track drift, and house the bank in a form that is searchable and teachable. It can support blind marking and reduce clerical error. It can also tempt managers to believe that the instrument has absorbed the judgement. The design I am defending treats technology as a tool for the human practice, not as a replacement for it. Tools are chosen because they serve constructs, exemplars, and reasons, not because they produce more digits. Research has a place in the life of the settlement. There are questions that can be asked and answered within the frame. How quickly do novices converge on the community view under different training regimes. Which comparative designs balance reliability and cost best in different domains. How often do policy cuts need to speak in a given subject and what design changes reduce that rate without loss to authenticity. How do students respond to reason rich feedback over time. What patterns of bias recur despite blind marking and mixed panels, and what changes dislodge them. These are empirical questions that can be asked with care. Their answers can be folded back into practice. I returned in the chapter to limits, since any design that does not admit them has already failed. There are tasks that will not bear mass evaluation without loss. There are domains in which high stakes public grading is a bad guide to learning. There are corners of every subject where the structure of excellence is so plural that classification will be more crude than helpful. The honest system says so and shifts ambition. It uses certification cautiously. It relocates energy from ranking to guidance where this serves the purpose better. It keeps in view the truth that the highest good of education is improvement rather than classification and that the best feedback is often a reasoned invitation to see. The settlement ends where it began with objectivity reconceived. Objectivity here is not the absence of human judgement. It is the presence of disciplined public reasons in a community that has learned how to see together. It is the stability of those reasons across judges who are answerable to each other. It is the accuracy of fit between task and construct. It is the refusal to distort the subject to flatter the instrument. It is the courage to speak policy when reasons tie and to call policy by its name. It is the humility to admit higher order vagueness without embarrassment. It is the patience to build and tend a bank of exemplars that keeps the practice honest. I closed by returning to the ordinary scene. A group of teachers gather in a room with a pile of scripts, a bank of anchors on a screen, and a schedule that looks optimistic. They read in silence, talk in short bursts, disagree, change their minds, write comments, check a sample against the bank, catch a drift in one judge and a blind spot in another, and settle a small policy point that will be written up. It is not glamorous work. It is the craft through which a profession earns authority. The tenth chapter argues that if we want a system that is both fair and truthful, the centre of our investment should be here, where reasons are made public and where judgement is taught. The reward is a culture that can keep its promises. It will deliver stable and defensible decisions without pretending to a precision that the subject cannot support. It will teach students to recognise the goods it names. It will allow the public to see what is being valued and why. It will survive controversy because it is built on speech that matches what it can know.   References Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: I. Biometrika, 39(3–4), 324–345. Bruner, J. S. (1960). The process of education. Harvard University Press. Deming, W. E. (1986). Out of the crisis. MIT Press. Fuller, L. L. (1964). The morality of law. Yale University Press. Kane, M. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Polanyi, M. (1966). The tacit dimension. Routledge & Kegan Paul. Raz, J. (1986). The morality of freedom. Clarendon Press. Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34(4), 273–286. Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley. Chapter 11 Ambiguity Generality and Vagueness Distinguished In the eleventh chapter I tried to clear the last major source of confusion that haunts assessment whenever people attempt to fix everything with better wording or longer lists. The confusion is the conflation of ambiguity and generality with vagueness. If these three are not kept distinct, systems waste energy in the wrong places and end by suppressing exactly the kinds of judgement they most need. I wanted to show, with ordinary examples and with help from the philosophers who have thought hardest about language, how each phenomenon behaves, what each permits by way of remedy, and how a wise assessment culture addresses all three without pretending they are one thing. I began with ambiguity because it is the easiest to see and the quickest to cure. Ambiguity arises when an expression admits two or more meanings before a proposition is even fixed. Polysemy gives us bank of a river and bank as a financial house. Amphiboly arises when syntax allows two parses, as in students saw teachers with telescopes. Grice taught us that context and cooperative principles often disambiguate in ordinary life. Kaplan reminded us that indexicals and demonstratives require context to fix reference. None of this is mysterious once named. In an assessment setting, ambiguity is a design failure that can and should be removed by rewriting, by examples, by clarifying scope, or by controlled use of context. There is no honour in preserving an ambiguous prompt. A good assessment room is alert to this and uses item review, pilot trials and peer edits to strip ambiguity where it undermines the aim of the task. When candidates must not be steered, we can still give a consistent disambiguation to markers by anchoring scripts that display the intended reading. Ambiguity is therefore a hygiene problem, important but not existential, and it rewards the craft of careful writing rather than heroic philosophy. Generality is different. A general term does not tell us everything it could tell, and it does not try. It trades detail for reach. Lewis shows how conventions grow around such terms and how they earn their keep by solving coordination problems. Law and teaching depend on generality because life is too various for us to list every admissible case. Hart’s open texture is in part a recognition of this fact. We say write clearly rather than list each manoeuvre that might count as clarity. We say behave professionally rather than enumerate every posture of mind and body that belongs to the idea. Generality can be handled well or badly. It is handled badly when a general term is treated as if it were a vague predicate and is then pushed to a sharp boundary it does not claim. It is handled well when users accept the gift that generality offers, namely discretion to make sense of the term in light of purpose and situation, and when that discretion is constrained by reasons and exemplars. In assessment this means that criteria like audience awareness, evidential responsibility, or elegance function as lenses. They focus attention without pretending to exhaust the domain. They are taught through cases. They are policed by reasons in moderation. If this is done, generality becomes a resource for judgement, not a license for whim. Vagueness is different again. It shows itself not at the pre propositional stage of ambiguity, nor in the reach of a general term that demands interpretation, but in the presence of neighbour pairs for which there is no decisive reason to place one on one side of a sharp line rather than the other. Sainsbury’s discussion of the sorites, Edgington’s critique of sharp boundaries, and Endicott’s insistence on higher order vagueness give the shape of the matter. No amount of rewriting cures it. No expansion of a checklist eliminates it. At some points the structure of the concept refuses a decisive cut at the demanded granularity. To confuse this with ambiguity is to keep calling the editor when one needs a jury of peers. To confuse this with generality is to keep lengthening the rubric when one needs exemplars and the discipline of reasons. When the three are untangled, institutional energy can be directed with a steadier hand. After laying out this map I attended to the interfaces, since in practice the phenomena overlap. Pragmatic vagueness, which Endicott treats with care, springs from the appropriateness of acts and the elasticity of norms. Grice helps here too. A statement may be semantically precise and yet pragmatically indeterminate because the maxims of quantity, relation and manner leave room for play. In assessment this appears when a succinct criterion is semantically clear and yet invites application that depends on unspoken expectations. Arrive at five o clock is clear as clock time and yet socially it may invite something like around five. The cure is not to multiply clauses until the life goes out of the instruction. The cure is to teach the practice in which the instruction makes sense and to surround the instruction with exemplars that display good compliance. This is why a bank of anchors matters. It does not only cure vagueness. It makes general terms and pragmatic cues tractable by showing them at work. With the distinctions in hand I turned to exam design and to the routines that keep language straight. There is a reason why item writing has its own craft literature and why review panels are structured. They are the institutions that catch ambiguity before it does harm. There is a reason why subject communities keep glossaries that are not dictionaries but notebooks of use, filled with short examples and counter examples. They are the places where generality is husbanded rather than strangled. There is a reason why moderation asks for reasons and not only for ticks. It is the practice that acknowledges vagueness and steers it with public judgement rather than with private feeling. If a system mixes these up, it shouts at ambiguity with sermons about integrity, it tries to fix generality with a straitjacket of minutiae, and it denies vagueness by announcing spurious precision. The result is the familiar combination of brittle instruments and disenchanted professionals. I then explored how these distinctions alter feedback to students. Sadler argues that improvement rests on three pillars, a conception of quality, the ability to recognise quality in concrete cases, and actionable knowledge of how to close the gap. The first pillar requires a language of generality taught through exemplars and reasons. The second requires attention that is freed from ambiguity by clear prompts and clean tasks. The third requires honesty about vagueness at borders so that students are not asked to chase mythical micrometric moves. Brandom’s inferentialism helps translate this into classroom discipline. To understand coherence as a criterion is to grasp what follows from calling a paragraph coherent and what would defeat that claim. Students can be shown these inferential roles through short reason sorting exercises in which they must match comments to features, identify underminers, and write a one sentence warrant. Such exercises honour generality and cure ambiguity. They also build the habit of public reasons that later allows a community to admit vagueness without embarrassment. At this point I brought in Sperber and Wilson to address a worry about under specification. Relevance theory reminds us that communication depends on a trade between effort and effect. A rubric that tries to say everything either collapses into vagueness by using larger and larger umbrella expressions or collapses into ambiguity by multiplying technical terms that candidates and novice markers will not share. Better to say a little clearly, surround it with anchors, and spend the institutional budget on occasions for shared reading. This is not an invitation to looseness. It is a strategy for precision of the right kind, precision in use rather than precision in idle definition. I then faced a persistent managerial hope, namely that one can engineer away judgement by narrowing language. Williamson’s epistemicism can appear to encourage that hope, since it promises hidden sharpness. In the realm of evaluative norms Endicott’s response holds. The norms must guide agents and cannot do so if their decisive boundaries are in principle unknowable to the agents. A parallel thought helps with generality. If we crush write clearly into a hundred micro items, we will have protected ourselves from some forms of ambiguity but we will have destroyed the capacity of the term to guide ambitious writing. Dewey’s insistence that aims should be immanent in activities, not tacked on from outside, is apposite. A construct like clarity must live in the acts that make writing clear for an intended reader. That life travels through examples and reasons, not through a heap of tokens. The chapter also tried to show how these distinctions matter for fairness. Many bias audits look for correlations between outcomes and characteristics we hold to be irrelevant. That is necessary and insufficient. Bias also creeps in when general terms are protected only by private exemplars, when ambiguity is allowed to stand because we imagine that only the best students will resolve it, and when vagueness is hidden by a rhetoric of discovery that is then used to resist appeals. A clean design will publish glossaries of use with examples, will show how ambiguous prompts were cleaned or how a standard reading was fixed for markers, and will state openly where judgements work in neighbourhoods rather than at single lines. Habermas’s sense of legitimacy as the upshot of public justification returns. If you can show the steps, you can defend the door. This also affects standard setting. Kane’s argument based validity offers a scaffold. The interpretive use argument in a hermeneutical system runs through four stations. The construct is presented and illustrated. The link between task and construct is defended on content grounds. The marking practice is shown to be a practice of reasons that trained readers can reproduce. The use to which the outcomes will be put is shown to respect the meaning of the decisions. Ambiguity, generality and vagueness appear at each station in different guises. The antidotes are different. At the construct station we embrace generality but we insist on exemplars that fix what counts as family resemblance. At the task station we abolish ambiguity and accept pragmatic indeterminacy that will be resolved in moderation. At the marking station we turn vagueness into disciplined connoisseurship and we write down the reasons that had to carry the decision. At the use station we split outcomes into those that answer to determinate reasons and those that enact a policy within a neighbourhood, and we tell users how to read them. To ensure the argument was not merely conceptual I worked through a concrete case. A literature examination asks for an analysis of a poem’s voice. Ambiguity appears if the prompt uses voice to mean narrator in one place and tone in another. That is cured by rewriting and by aligning the reading list and teaching materials. Generality appears in the criterion sensitivity to voice. That is made tractable by a set of anchors that display sensitivity in more than one way, with commentaries that name features and weigh them. Vagueness appears at the pass border where two scripts are equally sensitive in different registers. Moderation uses reasons to test whether one in fact displays a decisive control of register that the other lacks. If not, the scripts sit in a neighbourhood. A policy that reduces harm near borders tilts the pair toward the higher outcome if a margin of error principle is adopted at system level. The report to the student names the reasons and, where applicable, names the policy. The report to users distinguishes between a discovery and a policy enactment. In this small way the three phenomena are treated as they should be. I then brought the distinctions back to teacher education. Shulman’s pedagogical content knowledge can be read as the capacity to hold a subject’s generalities, its typical ambiguities, and its genuine vagueness in a form that can be taught. Novices should be trained to write prompts that avoid ambiguity, to use general terms as lenses rather than as walls, and to recognise the signs of a true borderline. They should practice writing short public reasons that could be defended to an informed stranger, since that is the habit that builds legitimacy. They should be taught to read rubrics as living instruments, not as incantations, and to treat anchors as arguments about the construct that can be revised in the light of better seeing. Finally I returned to the ethics of speech. Bernard Williams reminds us that institutions survive by truthfulness. Truthfulness in assessment speech begins with naming phenomena correctly. To call an ambiguous prompt a hard case is to mislead. To insist that a general term must be replaced by a long enumeration is to diminish the practice while claiming to protect it. To announce that all borderlines can be resolved by superior interpretation is to claim a mastery that the concepts do not support. A truthful culture cures ambiguity, conserves and tutors generality, and meets vagueness with reasons where they exist and with policy when reasons tie. It speaks to students in that register. It writes to users in that register. It keeps a record that allows both communities to see what was done. The payoff is clarity of labour. Editors and item writers fight ambiguity. Communities of practice carry generality through exemplars and reasons. Moderators and comparative judges carry vagueness with disciplined connoisseurship and with candid policy at the borders. When these tasks are sorted, teachers can return their attention to the heart of the work, which is to teach students to see and to make the moves that constitute excellence in their field. The eleventh chapter claims that such sorting is not pedantry. It is the map by which a system can avoid exhausting itself on the wrong mountains and can reserve its courage for the ascent that matters.   References Brandom, R. B. (1994). Making it explicit. Harvard University Press. Endicott, T. (2000). Vagueness in law. Oxford University Press. Gadamer, H.-G. (1975). Truth and method. Sheed & Ward. Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics 3: Speech acts (pp. 41–58). Academic Press. Habermas, J. (1996). Between facts and norms. Polity. Kaplan, D. (1989). Demonstratives. In J. Almog, J. Perry, & H. Wettstein (Eds.), Themes from Kaplan (pp. 481–563). Oxford University Press. Kane, M. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. Lewis, D. (1969). Convention. Harvard University Press. McDowell, J. (1994). Mind and world. Harvard University Press. Putnam, H. (1981). Reason, truth and history. Cambridge University Press. Sainsbury, R. M. (1995). Paradoxes (2nd ed.). Cambridge University Press. Sperber, D., & Wilson, D. (1986). Relevance: Communication and cognition. Harvard University Press. Williamson, T. (1994). Vagueness. Routledge. Wittgenstein, L. (1953). Philosophical investigations. Blackwell. Wright, C. (1992). Truth and objectivity. Harvard University Press. Chapter 12 Educational Value  In this twelfth chapter I tried to say what a truthful view of vagueness commits us to valuing in education and what it asks us to make explicit. The earlier chapters argued that evaluative concepts have clear cases and neighbours and that the neighbours can resist decisive separation without ceasing to be meaningful. I drew on Endicott to show why higher order vagueness is stubborn. I used Sorensen to describe the ethics of decision at the cut. I leaned on Wittgenstein to keep meaning tied to use and on Brandom and Toulmin to give judgement a public grammar. I leaned on Messick and Kane to keep validity joined to interpretation and consequence. I worked with Raz and Habermas to keep authority tethered to reasons that can be shown. In this chapter I asked what follows when we take that settlement seriously as a guide to educational value. The answer is not a list of fashionable aims. It is a map of what must be honoured if learning is to grow inside institutions that speak truthfully about what they can and cannot know. The first value is authenticity of task and object. Bruner taught that school knowledge should keep the structure of the discipline alive. Dewey taught that aims should be immanent in activity rather than bolted on from outside. If we accept that our evaluative concepts are learned through use and exemplars then a curriculum should present learners with occasions that carry the real tensions of the practice. A scientist must choose between speed and certainty. A writer must balance control and risk. A historian must weigh evidence under audience and purpose. A musician must hold technique and voice in one act. Vagueness reminds us that such goods are not neatly ordered on a single scale. Therefore tasks must make room for more than one route to success. Teaching must raise the salience of choices that matter. Assessment must be designed so that reasons can track those choices. Authenticity is not a slogan. It is the condition under which our words mean what they claim. The second value is improvement as the centre of judgment. Sadler’s triad remains the best guide. Learners need a conception of quality, the ability to recognise it in concrete cases, and the know how to move their work toward it. Vagueness does not undermine this. It clarifies it. Borderlines are least helpful when learning is framed as pursuit of a tiny move that no one can name. Borderlines become productive when feedback names virtues, cites passages, and invites the next move that would matter. Putnam’s internal realism gives the tone. Truth in these settings is what would be fixed by idealised rational acceptability within a scheme that we can criticise and improve. McDowell’s second nature shows how learners can acquire a sensibility that lets the world answer back under the right concepts. Improvement therefore depends on the cultivation of attention rather than on ever longer lists of micro moves. That is what our practice should value and our systems should make explicit. The third value is the discipline of public reasons. This runs through the earlier chapters like a spine. A judgement earns authority when it is not merely the output of an opaque device but the outcome of a practice of giving and asking for reasons that competent peers can reproduce. In Wittgenstein’s register meaning is use. In Brandom’s register concepts are roles in an inferential space. In Toulmin’s register arguments have claims, data, warrants, rebuttals and qualifiers. In Raz’s register authority serves reasons that already apply. In Habermas’s register legitimacy arises from public justification. All of this comes to ground in moderation and in the way we talk to learners. We should value speech that cites features in the work, links them to the construct, weighs countervailing features, and states its limits. We should value institutions that record such speech, train it, and audit it. We should make explicit that this is what objectivity looks like in evaluative domains. The fourth value is candour about policy and borders. If Endicott is right that higher order vagueness cannot be eliminated by clever language or by more categories, and if Sorensen is right that the duty to decide can tempt us into insincere speech, then a truthful profession learns to speak in two voices. Where reasons decide we speak in the voice of discovery. Where reasons tie we speak in the voice of policy and we publish the policy in advance. Williams calls this virtue truthfulness. Fuller’s inner morality of law adds that publicity, clarity and congruence are not ornaments but conditions of legitimacy. In education this means that we value and make explicit the difference between a warranted discovery and a policy enactment. We build procedures that reduce the rate at which policy must speak and we write reports that name when it did. We do this not because we like caveats but because we owe the public an account that matches what our concepts can sustain. The fifth value is breadth of excellence. Hacking warned us about reification and reflexivity. Once a narrow token is installed as the currency of value the world reshapes itself to fit the token. Vagueness alerts us to a different way. If virtues are often incommensurable and if family resemblance is the right picture, then a healthy subject community values plural routes to success and guards against parochial narrowing. Gadamer reminds us that a tradition is a conversation in which the past addresses the present. MacIntyre reminds us that practices can forget their internal goods and chase external rewards. A programme that honours breadth will curate exemplars that display multiple forms of excellence, will invite adjacent subjects to test its sense of value, and will teach learners to recognise more than one way of meeting an aim. This is not relativism. Crispin Wright’s idea of local cognitive command still lets us mark incompetence. It is a refusal to collapse a living field into one style. We should make this breadth explicit so that teachers and learners are not misled by a single template. The sixth value is fairness through design rather than through denial. Rawls and Scanlon keep the tone steady. Fairness is not achieved by pretending that judgement can be removed. It is achieved by designing institutions that constrain and educate judgement. That means blind reading where possible. It means mixed panels in which dissent is used as information. It means comparative routines that exploit the human strength for local comparisons rather than asking for mythical absolute magnitudes. It means statistical checks that surface drift and disparate impact and that trigger content inquiry. Deming’s sense of statistical control and Tukey’s modesty about inference help here. Numbers check and illuminate. They do not rule. A fair system says these things aloud and publishes how it will act when a signal appears. Fairness becomes the visible discipline of reasons and procedures, not the invisible hand of a device. The seventh value is restraint about what we claim. Kane’s argument based validity requires that we defend each link from construct to task to scoring to use. Messick’s integrative view requires that we account for consequences. Vagueness teaches humility in both frames. There are decisions we can defend as discoveries. There are uses we can defend because the task and the construct match and the scoring practice carries reasons that trained readers can reproduce. There are also uses that outrun what our speech can support. A truthful system learns to say so and to redirect ambition. It might refrain from fine rank orders where the domain will not support them. It might choose to report families or levels with rich commentary. It might move energy from classification to guidance when that better serves the aim. We should value this restraint and make it a public norm. The eighth value is the apprenticeship of seeing. Polanyi taught that much knowledge is tacit and is learned through guided attention. Shulman taught that teachers need pedagogical content knowledge, the know how to make the structure of a field learnable. In our settlement the apprenticeship is to a community of reasons around exemplars. Learners read anchors and commentary. They try to write reasons and have those reasons weighed. They practise short acts of comparison and explanation. They build Brandom’s inferential roles in their own speech. They absorb McDowell’s second nature as the world becomes available under the right concepts. We should value classrooms where these routines are normal and we should make explicit that this is how the subject is taught to be seen, not just performed for marks. The ninth value is clarity about language. Chapter eleven separated ambiguity, generality and vagueness. Grice and Kaplan helped with the first. Lewis helped with the second. Endicott helped with the third. A truthful programme values prompt writing that removes ambiguity. It values criteria as lenses that conserve generality while being fixed by use with exemplars. It values moderation that handles vagueness by reasons and by policy when reasons tie. We should make this linguistic housekeeping explicit in our design documents and in our training. Otherwise we waste labour trying to cure vagueness with edits and trying to cure ambiguity with more categories. The tenth value is courage about risk. If authenticity is honoured, tasks will invite forms of excellence that cannot be fully captured by rules stated in advance. Risk will then appear. Some attempts will fail. Some will succeed in ways we did not foresee. Dworkin’s insistence that interpretation reads a practice in its best light tells us how to deal with this. Judges and examiners should look for how a work realises an aim well rather than punishing it for travelling a new path. Hart’s reminder about open texture sits beside this. Rules cannot foresee every case. The right response is not to outlaw variation but to educate judgement and to record reasons. We should value risk that serves the aim and we should explain to learners that such risk is expected and welcome. The eleventh value is institutional memory. Hacking’s stories about styles of reasoning show that once a style is installed it shapes what we see. If we want to keep the right style alive we must store its arguments. That means a living bank of exemplars with commentary. It means minutes of moderation that record reasons which moved competent peers. It means training that begins with anchors and reasons and that returns to them often. It means appeal procedures that ask for reasons rather than for a new device. It means audits that review the bank for drift and parochial narrowing. Gadamer’s sense of tradition becomes a concrete practice. We should value these quiet archives and make their maintenance an explicit responsibility rather than an afterthought. The twelfth value is speech that the public can trust. Habermas and Williams give the measure. Our reports to learners should name virtues realised, cite passages and moves, and invite the next piece of work. Our reports to users should separate discoveries from policy enactments and should explain the purposes served by the policies that were applied. Our public claims should match our private grounds. Where numbers are used they should be presented as checks and summaries of a practice of reasons, not as replacements for it. We should value this register and make it the house style for the profession. These values do not live as abstractions. They live as concrete choices. A department chooses to spend time in moderation rather than to purchase a thicker rubric. A school chooses to design tasks that allow more than one route to success rather than tasks that compress variation. An awarding body chooses to publish its cut policies rather than to present every border decision as a discovery. A teacher chooses to give feedback that names reasons rather than to give ticks. A student chooses to revise by addressing reasons rather than by counting features. At each point vagueness and the settlement built upon it push us toward practices that treat human judgement as a trainable and answerable capacity rather than as a problem to be engineered away. It is worth asking again whether this settlement asks for too much. Managers worry about cost. Parents worry about consistency. Students worry about fairness. The answer begins by refusing a false choice. The choice is not between whim and machine. It is between secrecy and publicity, between tokens and reasons, between cosmetic certainty and durable trust. Deming and Tukey showed that numbers can serve as instruments of self criticism without becoming idols. Messick and Kane showed that validity can guide design and public use. Raz and Habermas showed that authority can be earned by procedures that serve reasons and that are shown to those who are asked to trust them. Wittgenstein showed that meaning sits in what we do. Once these lessons are drawn together the settlement becomes both principled and workable. There is a final value that runs under all the others. It is respect for persons. Rawls and Scanlon keep this in view. To respect a learner is to give reasons that she can use, not merely scores. It is to design tasks that allow her to show understanding in more than one way. It is to keep her from being trapped by a device that demands a form of precision that our concepts cannot support. It is to admit when policy spoke so that she is not misled by rhetoric. It is to provide routes to appeal that ask for reasons and not for miracles. It is to treat teachers as reason giving professionals rather than as operatives of a machine. This respect is not sentiment. It is the ethic implied by the way our concepts work. So the answer to the question of what we should value and make explicit is now clear. We should value authenticity, improvement, public reasons, candour about borders, breadth of excellence, fairness through design, restraint about claims, apprenticeship of seeing, clarity about language, courage about risk, institutional memory, and trustworthy speech. We should make each of these visible in the ordinary documents and routines of our institutions. None of this makes vagueness go away. It turns vagueness into a teacher. It teaches us what the goods of education demand and it guards us against false comfort. If we carry these values forward we can make decisions that are stable enough to live with and honest enough to deserve trust. References Brandom, R. B. (1994). Making it explicit: Reasoning, representing, and discursive commitment. Harvard University Press. Bruner, J. S. (1960). The process of education. Harvard University Press. Dewey, J. (1938). Experience and education. Kappa Delta Pi. Deming, W. E. (1986). Out of the crisis. MIT Press. Dworkin, R. (1986). Law’s empire. Belknap Press. Endicott, T. (2000). Vagueness in law. Oxford University Press. Fuller, L. L. (1969). The morality of law (Rev. ed.). Yale University Press. Gadamer, H.-G. (2004). Truth and method (J. Weinsheimer & D. G. Marshall, Trans., 2nd rev. ed.). Continuum. (Original work published 1960) Grice, H. P. (1989). Logic and conversation. In Studies in the way of words (pp. 22–40). Harvard University Press. (Original work published 1975) Habermas, J. (1996). Between facts and norms: Contributions to a discourse theory of law and democracy (W. Rehg, Trans.). MIT Press. Hacking, I. (1990). The taming of chance. Cambridge University Press. Hart, H. L. A. (1961). The concept of law. Clarendon Press. Kane, M. T. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000 Kaplan, D. (1989). Demonstratives. In J. Almog, J. Perry, & H. Wettstein (Eds.), Themes from Kaplan (pp. 481–563). Oxford University Press. Lewis, D. (1970). General semantics. Synthese, 22(1–2), 18–67. MacIntyre, A. (1981). After virtue. Duckworth. McDowell, J. (1994). Mind and world. Harvard University Press. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Putnam, H. (1981). Reason, truth and history. Cambridge University Press. Raz, J. (1990). Practical reason and norms (2nd ed.). Oxford University Press. Rawls, J. (1999). A theory of justice (Rev. ed.). Harvard University Press. Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. Scanlon, T. M. (1998). What we owe to each other. Harvard University Press. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Toulmin, S. (1958). The uses of argument. Cambridge University Press. Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley. Williams, B. (2002). Truth and truthfulness: An essay in genealogy. Princeton University Press. Wittgenstein, L. (1953). Philosophical investigations (G. E. M. Anscombe, Trans.). Blackwell. Wright, C. (1992). Truth and objectivity. Harvard University Press. Chapter 12 AI In this chapter I take up some of the issues for educationalists that AI raises. The issue is not whether artificial intelligence can mimic what students produce. It can. The issue is how we make ownership of reasons explicit in teaching and assessment so that mimicry alone cannot meet the aim. Everything that matters follows from this. If meaning lives in use then the meaning of our evaluative words is shown in what competent participants are prepared to do next when a reason is asked for. If objectivity in education is the public discipline of giving and asking for reasons under shared aims then our tasks and standards must be built so that reasons are visible and testable. In this frame artificial intelligence is not a foe to be hunted. It is a pressure that forces clarity about design, about validity, about fairness, and about our public speech at borders where reasons tie. I begin with use because Wittgenstein prevents us from drifting back to tokens. When a student calls a paragraph coherent or a method apt the student is not placing a label on an inert surface. The student undertakes a set of commitments and avows a readiness to defend them. Brandom gives this a precise form. To apply a concept is to take up inferential roles, to know what follows and what would defeat, to stand ready for challenge. Toulmin gives the grammar by which this readiness becomes public. A claim is backed by data, linked by warrants, qualified in full view of live rebuttals. Putnam and McDowell secure the worldward face. In a real practice features in the work can oblige a reader or writer to change a verdict. A generator can produce strings that look like reason talk. It does not stand inside a community that makes and keeps commitments and can be moved by the object in the way that trained second nature allows. This difference is not a metaphysical flourish. It is the main design instruction. If we want to protect learning we must require visible reasons, with particulars in view, responsive to challenge, and answerable to the aims that we actually teach. Validity then becomes the daily work. Messick reminds us that validity is the meaning of our interpretations and the consequences of our uses. Kane requires that every link from construct to task to scoring to use is defended with an argument, not a wish. Artificial intelligence tightens these demands. We can no longer rely on the assumption that a fluent string is evidence of learning. Constructs must be presented as families of virtues, taught through exemplars with commentary that make the inferential roles explicit. Tasks must elicit choices that matter for the construct and reveal the route by which a student reached a claim. Scoring must be a practice of reasons that trained readers can reproduce and audit. Uses must say when a verdict is discovery and when it is policy. The main issue for educationalists is to build these requirements into ordinary practice so that a polished imitation cannot count as achievement unless it is owned by reasons in a way that can be checked here and now. Set this on the ground with literature, because it is a subject where fluent imitation appears most persuasive. A thin task asks for an essay on the role of memory in a well known novel. A generator writes a plausible structure with a formal tone and proper quotation. If the standard is the presence of tokens it will pass. Redesign the task so that reasons must be owned. Give an unseen passage. Tell the student to write a commentary for next term’s new readers that sets out a provisional reading and explains what the reader should look for as they move through the chapter. Ask for a short monologue in the voice of a minor character who appears in the extract. Require a justification that ties two or three stylistic choices in the monologue to features of the passage and to the course aim of understanding how voice shapes meaning. End with a two minute oral defence recorded in class where the student must answer a pointed challenge that conflicts with one step in the written argument, using lines from the text to test and refine the claim. The generator can help draft. It cannot make the defence survive a live challenge to a neglected clause. It cannot show in the justification how a choice of rhythm echoes a local tic that matters for the audience named. The design makes the main issue visible. Only ownership of reasons can satisfy the aim. Now score it with public reasons. A reader writes a short note in Toulmin’s grammar. The commentary displays audience awareness through the scaffolding metaphor in the opening, which teaches without patronising. The shift on the word perhaps is taken as a cue to free indirect style and is traced to lines three to five, which matters for new readers, because it warns against treating the narrator as a stable witness. The monologue echoes the narrator’s asymmetrical sentence length and preserves moral distance. Warrant. Control of voice in commentary is shown when a reading guides a newcomer and when the creative piece adopts local lexis and rhythm that can be traced back to the extract. Rebuttal. The oral defence conceded the final sentence, which partially undercut the earlier certainty. Qualifier. The strengths outweigh the incomplete defence. In moderation peers point to the lines and weigh the warrants. Reasons that moved competent readers are recorded as anchors. The standard lives as a memory of reasons, not as a list of tokens that can be mimed. Move to science. A thin task asks for a method to measure the specific heat capacity of a metal and a table of results. A generator writes a spotless report. Redesign the task so that reasons must work with the awkward world. Give a kettle, two thermometers, a balance, string, and a block. Ask the student to choose a method that serves the aim for a younger cohort next year and to defend the choice in a short plan that names the two largest sources of error under these constraints and shows how the design reduces them. Run the trial. Write a one page note to the technician who will choose a class method for next year, with a comparison against a method not used and a condition under which the choice would change. Hart’s open texture is visible in the choice. Dworkin’s integrity appears when the student reads the aim in the best light and justifies a method that fits that aim. Putnam and McDowell come to life when the student is moved by the actual drift of the thermometer and revises the plan in the note. Score again with reasons. The method serves the aim because the insulation with available materials reduces the dominant loss to air, and because timing accommodates latent heat in the kettle. The comparison shows that a seemingly more precise sensor would fail for the named audience because the procedure becomes fragile under classroom constraints. The generator cannot cash this on contact. The student who owns reasons can. Move to mathematics. A thin task demands a named technique. A generator fills the page. Redesign it. Present a problem with two clean paths. Ask for two solutions. Ask for a short advisory note to a peer under time pressure that explains which path to choose and why. Then remove the hidden convenience in a near variant and ask the student to apply the note to the variant. Crispin Wright’s sense of local cognitive command appears. The student who owns reasons can revise the note in view of the variant and explain the revision. The generator can produce two paths. It often collapses on the revision because it did not commit to the warrant that governs the choice under pressure. Move to history. A thin task asks for causes of a revolt. A generator assembles a tidy list. Redesign it. Ask the student to assemble a dossier that records three rival theses with two sources that support and one that resists each thesis. Ask for a two page brief for a museum that must choose a framing for a school exhibition. Ask the student to name what will be lost by that framing and what would change if the audience were civic leaders who allocate funds. Then place a new document in front of the student that partially undercuts the chosen frame and ask for a two minute revision of the brief. Dworkin’s integrity and Hart’s open texture walk together. The student who owns reasons shifts the frame while keeping faith with the earlier warrants and the resistant source. The generator often stumbles when the new document must be placed within an argument network that has been taught in this course for this audience. The main issue becomes sharper when we face the border cases and detection. Endicott warns that higher order vagueness is truculent. There are neighbour pairs where reasons refuse to separate. Sorensen brings the central lesson. At a true border there is no truthmaker to be found. In a world of fluent generators there will be scripts where origin is not knowable and sometimes not even a fact. If we treat every border as a hidden discovery we invite insincere speech and unfair treatment. The response is to reduce the rate at which such borders arise by design, and to publish a policy for the remainder. The policy must be part of the standard. Students declare any assistance and retain prompts and drafts and recorded defences. Where evidence of undeclared assistance is overwhelming a formal process follows. Where evidence is insufficient and reasons do not decide but the design has elicited process the case is resolved by a published rule that gives the benefit of doubt at first offence paired with a supervised make up task for the same aims. Fuller’s publicity and congruence govern the wording. Williams’s truthfulness governs the tone. Habermas’s public justification governs our relations with those whom our decisions bind. We tell students and users when policy spoke and why. We stop pretending that every edge hides a discovery. Fairness must be visible. Rawls and Scanlon keep the standard steady. We blind read wherever possible. We mix panels and give minority voices time early. We train readers through exemplars and reasons rather than verdict labels. We use comparative judgement to harness many local comparisons, each made with a one sentence reason, to build a stable view of work at scale. Bradley and Terry and Thurstone give the mathematics that helps us aggregate such choices. Deming and Tukey give the sense of what numbers can do as checks that surface drift and disparate impact. Numbers trigger inquiry. They never decide content on their own. Raz helps us keep the order of authority straight. Procedures earn authority when they help the community track the reasons that apply to the domain. Teacher education must bend to the main issue. Shulman’s account of pedagogical content knowledge is the craft we need. Polanyi’s account of tacit knowledge is the apprenticeship of attention we must arrange. If ownership of reasons is the centre then novices must learn to see salience by doing small acts of comparison and defence. Two literary commentaries are placed side by side. Students speak for a minute on which better serves a named aim and why, citing lines. Every claim in a lab book carries a one sentence warrant. After each mathematics problem a short reflection explains the choice of method for a peer under different constraints. In history students practise the move from dossier to brief and back again when a new document arrives. These small routines build the habit of giving reasons with particulars in view. They also make later assessment more robust in a world of fluent strings. Institutional memory is how standards survive and improve. Gadamer reminds us that a tradition is an argument in which the past speaks to the present. MacIntyre warns that practices forget internal goods when external rewards dominate. We must curate a bank of exemplars with commentary that display several forms of excellence within the construct. We must record minutes of moderation that capture reasons that persuaded competent peers. We must audit for parochial narrowing and drift across centres and seasons. This is slow work. It is the only way to stabilise meaning without freezing it. It is the only way to induct new colleagues into standards that live as reasons rather than as slogans. A brief return to the classroom shows what all of this looks like when the main issue is kept explicit. In a literature class two pieces are near neighbours. Both are fluent. One includes a monologue that catches a local cadence from the extract and a justification that admits a weak close. The oral defence uses the last clause of the extract to refine a claim about point of view. The other uses poetic diction and a generic mood label and says that the final clause adds sadness. In moderation reasons separate them. The first shows a family resemblance to the construct at several points. The second shows surface tokens. A third case is harder. The commentary is good. The monologue style looks like a sample circulated online. The student declares early brainstorming with a generator. The recorded defence answers two pointed challenges with precise lines and revises a claim in real time when a reader highlights a neglected sentence. Origins cannot be known. Reasons do not decide. Sorensen’s truthmaker gap is acknowledged. Policy speaks. The student completes a supervised make up piece on a new passage for the same aims. The report to the student names the reasons and the policy. The report to users records that policy spoke. Trust is kept because speech matches ground. The same visibility is possible in science. In a constrained investigation the data wobble. A student explains that the wobble is too large to salvage a precise estimate but that the method still serves the aim for younger students because it teaches error sources clearly. The student offers a small adjustment that reduces the wobble without introducing fragile steps. In moderation that reason persuades. A second student reproduces a beautiful graph with no account of a drift that everyone in the room saw during the run. The string is polished. The reason is not owned. The difference is visible because the aim is explicit and the design makes contact with the awkward world. In mathematics the near variant removes a convenience. A student’s advisory note survives because it was built on a warrant that weighed search time against algebraic cost for this cohort. Another student’s note collapses because it was a tidy preference rather than a rule tied to constraints. In history the arrival of a new document forces a change that keeps faith with the earlier reasons rather than a wholesale swing to a new frame. These are small moves. They are the practice. The temptation to reach for a detection device or to shrink tasks to thin tokens will remain. It will be defended as efficient and fair. It is neither in the long run. A narrow token regime will be easy to administer and easy to imitate. It will teach less and it will corrode trust once the imitation is noticed. The settlement I am arguing for is demanding but durable. It values authenticity, improvement, public reasons, candour at borders, breadth of excellence, fairness through design, restraint about claims, apprenticeship of seeing, clarity about language, courage about risk, institutional memory, and trustworthy speech. It redesigns tasks so that these values are performed not announced. It trains judgement so that reasons can be given and checked. It uses numbers to help it watch itself. It writes to students and to users in a language that matches what it can honestly claim. With such a culture artificial intelligence becomes a pressure toward precision in aims and generosity in design rather than a solvent of trust. The main issue therefore is settled in practice and in speech. We require students to own reasons in public, with particulars and aims in view. We design tasks that make ownership visible and useful. We score by giving reasons that peers can test. We separate discovery from policy at the edge and we say when policy spoke. We teach by rehearsing small acts of comparison and defence until second nature takes hold. We store and renew the arguments that keep standards alive. This is how we protect what matters in education when fluent imitation is cheap. It is also how we become better at saying what our standards are. The arrival of fluent generators unsettles many of the familiar routines that teachers have used as proxies for genuine learning. A single right answer has often been taken as a reliable sign of mastery. A fluent well argued discursive essay has often been taken as evidence of understanding and judgement. Even a clean proof in mathematics has stood as a token that skill has been acquired. These proxies worked tolerably well when fluency and formal correctness were scarce and when producing them demanded the slow accumulation of reasons under a teacher’s eye. They work less well when a system can produce them in seconds. The problem is not only cheating. It is that these proxies can now be separated from the ownership of reasons that gives them their meaning inside the practice. The question for classrooms is where to relocate evidence so that it tracks the thing we care about. The answer requires a change of design. We must put weight on records of discussion and on short acts of defence. We must use in room writing without access to assistants for crucial moments. We must add small elements that are inaccessible to assistants. We must educate judgement so that the reasons by which a claim lives are visible and testable. The frame for this is the one already developed. Wittgenstein keeps meaning in use. To call a reading sensitive or a proof elegant is to undertake commitments that can be called in. Brandom shows that to apply a concept is to take up inferential roles and to know what follows and what would defeat. Toulmin gives the public grammar by which a claim is backed by data linked by warrants and qualified in view of rebuttals. Putnam and McDowell keep the world in the picture so that a feature in the work can oblige a change of view. Messick and Kane keep validity joined to interpretation and consequence so that design and use must answer for themselves. Hart and Dworkin remind us that rules and principles must walk together. Endicott shows that higher order vagueness persists and Sorensen shows that at some true borders there is no truthmaker to be found. Raz and Habermas keep authority tied to public reasons. With these in view we can say plainly why certain proxies fail. A right answer can be produced without grasp of the route that made it apt. A fluent essay can be produced without commitments that the writer can bear. A proof can be assembled without the sense of where it was going or why this route was chosen. The classroom must therefore ask for what a mimic cannot own. It must ask for the visible exercise of reasons tied to particulars in a community that can test them. Take the English literature teacher who wants sixteen year olds to examine Macbeth’s motivations. A thin routine asks for an essay on ambition in Macbeth with three quotations. A fluent assistant can supply a formal structure a chain of claims and even a gesture at the language. If the proxy is fluency it will pass. The teacher who relocates evidence begins by making the aim explicit. The students are to give a reading that shows how a motive is built out of language situation and audience and to test that reading against scenes that resist it. The task is then staged to surface reasons. Students work in class on a short unseen exchange between Macbeth and Lady Macbeth. They write by hand a one page commentary aimed at a new reader in the next year group. They then choose one speech and compose eight lines of an interior monologue as if Macbeth were speaking to himself just before the decision at the end of the scene. They add a short justification that ties two stylistic choices in the monologue to features of the scene and to the aim. The lesson ends with two minutes of discussion in pairs in which each student must answer a pointed challenge. For example your reading says fear of lost honour drives him more than hunger for power. Use the last two lines of the scene to test that claim. The teacher listens for reasons and records a sentence or two that cites the line and shows the effect on the claim. This design moves the centre of evidence from the finished string to the public exercise of reasons. The hand written commentary constrains access to assistants at the crucial moment and invites audience awareness rather than a theme list. The creative piece requires control of voice as a way to show understanding of the scene. The justification requires a small Toulmin shape. The pair defence introduces an element that cannot be prepared by a fluent assistant because the line used as a challenge is placed on the table now and must be handled here. The teacher’s judgement is not a feeling. It is written as short reasons. The commentary guides a newcomer without turning the extract into a catalogue of themes. The monologue borrows Macbeth’s clipped cadence when he is unsettled and resists purple diction. The defence used the word if in the last line to show that honour remains a live consideration and updated the claim from fear to ambivalence. In moderation another teacher can weigh these reasons because they cite features that are in public view. It helps to be concrete. Imagine two students. The first writes a smooth essay at home. In class this student’s hand written commentary uses general mood words and the monologue uses generic tragic diction. In the defence the student says that the last two lines add sadness. The reason is not owned. The second student’s home essay is rough. In class the commentary teaches the new reader to watch the shift on the word perhaps and ties it to a change in pronouns. The monologue echoes the scene’s rhythm and keeps Macbeth’s moral distance ambiguous. In the defence the student uses the last two lines to refine the claim and says that the if shows that honour is not a mask for naked ambition and that this matters for sympathy. The second student has shown the thing we value. The proxy is not the sparkle of the string. It is the willingness and ability to bear a claim in public with particulars in view and to amend it when the scene demands it. Turn to the second case. A professor teaches a second year course on Nietzsche with a unit on Leiter’s naturalist reading of the Genealogy of Morals. A thin routine asks for an essay on ressentiment. A fluent assistant can produce a tidy summary of revaluation of values genealogy method and the speech of noble and slave. If the proxy is fluency it will pass. The professor relocates evidence by building the assessment around inquiry in public. Students must produce three artefacts. First a short map of two live disputes about Leiter’s approach that the class has met. For example whether the naturalist reading collapses normativity into causal explanation or whether the realist strand that Leiter attributes to Nietzsche is defensible. Second a hand written précis of a paragraph from the First Essay chosen on the day without devices. Third a five minute seminar turn in which the student must use the précis to answer a live challenge. For example if genealogical debunking can undercut moralised guilt by revealing its etiology why does this not also undercut honesty as a virtue when honesty itself is genealogically explained. The student must take a position that is either a Leiterrian reply or a departure and must name the consequence for the reading of the text. Again the design asks for what a mimic cannot own. The dispute map shows whether the student has a sense of where interpretation turns. The précis without devices shows whether the student can follow Nietzsche’s syntax and compress it faithfully. The seminar turn requires the student to bear an inference in front of peers and to use a specific paragraph as data. Toulmin’s grammar and Brandom’s roles are in play. Claim. The genealogical move debunks only where the function of a practice is presented as truth apt and where the explanation shows that truth was not among the selection pressures. Data. The paragraph on priestly revaluation shows a shift in the economy of affects not a discovery of truth. Warrant. A naturalist genealogy that shows function without truth removes epistemic entitlement for moralised guilt but not for honesty as a regulative ideal in inquiry because the latter earns its place in a different practice. Qualifier. This works if we accept that practices can house different goods without a single measure. Rebuttal. If honesty is genealogically explained as group advantage with no claim to truth then the reply fails. The professor listens for this structure and writes reasons that cite the précis. Be concrete again. Two students speak. The first gives a polished summary of ressentiment and brave nobles and turns by saying that Nietzsche would say honesty is just a tool for power. The précis shows no grip on the paragraph and the challenge is not engaged. The second student is awkward at first. The précis is faithful. The student uses the paragraph to show how priestly revaluation alters the rank order of affects. The student then offers a Leiterrian line. Debunking bites when a practice claims a certain rational authority for its verdicts and the genealogy shows that this authority could not have guided its formation. Moralised guilt is such a case. Honesty as a virtue of inquiry is not. Its authority is tied to success at tracking the world in practices of explanation. The reply may be wrong. It is a position in the space of reasons. The student owns it and can be asked for consequences. If the professor pushes by asking what this does to Nietzsche’s own rhetoric and whether Nietzsche smuggles in a value he cannot justify the student adjusts or concedes. The evidence is now located in the bearing of reasons with the text in view. Sorensen’s analysis of vagueness is vital once these designs are in place. There will still be neighbours that resist separation. A student in literature may give a charming monologue with a justification that cites lines but the defence is thin. Another may give a dry commentary with a sharp defence. Which better realises the aim may be indeterminate at the edge. There is no truthmaker waiting to be discovered that would settle the decision. The presence of fluent assistants makes such borders more visible because origin is sometimes unknowable and sometimes not even a fact. At these borders we must stop pretending. Policy must speak. We publish a rule that when reasons tie the higher mark is awarded only if two different kinds of evidence are strong. For example a strong defence and a strong justification. If only one is strong the student is set a short supervised make up that targets the weaker element. In the Nietzsche class a neighbour pair may appear when one student gives a brilliant précis and a mediocre seminar turn while another gives a modest précis and a bold well defended position. The policy is written in advance and used sparingly. The professor records that policy spoke. That candour keeps trust and prevents a drift to detection fantasies. Fairness must be held in view. Oral routines must be short anchored to the submitted work and focused on reasons rather than performance. Panels must be mixed and trained on exemplars with commentary. Numbers can be used to watch for drift and for disparate impact and must trigger content inquiry rather than impose decisions. A minute of defence in literature and five minutes in philosophy are not theatre. They are the smallest moves that make ownership visible. They also teach the habits that our subjects require. Bruner and Dewey are served when aims are immanent in activity and when the structure of the discipline survives translation into the classroom. Shulman and Polanyi are served when novices are inducted into seeing salience by many short acts of comparison and defence. Williams is served when our speech to students and users matches what the grounds can bear. Habermas and Raz are served when authority rests on public reasons. The familiar proxies will not disappear. There will be occasions where a clean proof is the right focus and where a well argued essay is the best form. The point is not to outlaw them. It is to stop treating them as sufficient evidence when a fluent assistant can supply their surface. We must place decisive weight on what a mimic cannot bear. We must make design do that work as a matter of routine. The English teacher who listens for how a line is used to test a claim is not being sentimental about orality. The teacher is making meaning show itself in use. The philosophy professor who asks a student to locate a position inside a dispute and to face a live objection is not indulging a taste for seminars. The professor is asking for the public form of reasons that a practice recognises. When in doubt we return to the principle. Evidence for learning must be located where the ownership of reasons is visible to competent others and where the world can answer back. There will be cost. Time must be found for short defences and for moderation notes that record reasons. There will be temptation to return to tokens. The temptation will be defended as efficient and fair. It will not be either in the long run. A narrow proxy regime will be easy to administer and easy to imitate. It will teach less and it will corrode trust once the imitation is noticed. The culture that relocates evidence is harder at first and cheaper later because it builds memory. A bank of anchors with commentary grows. Short routines become habits. Students learn to give reasons and to hear them. The public speech of the institution improves because it now names discoveries and policy correctly. When neighbours arise we admit that there is no truthmaker and we use the policy that we wrote in the open. When the assistant is used we ask the student to show how it served the aim. The teaching of Macbeth becomes a practice of guided attention to language and motive. The teaching of Nietzsche becomes a practice of guided entry into a dispute with a text. The heart of both is the same. Learning shows itself when a person can bear a claim in public with particulars in view and amend it when the object demands it. That is the thing we must ask for and the thing we must reward. References Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922 Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs, I: The method of paired comparisons. Biometrika, 39(3–4), 324–345. Brandom, R. B. (1994). Making it explicit: Reasoning, representing, and discursive commitment. Harvard University Press. Bruner, J. S. (1960). The process of education. Harvard University Press. Dewey, J. (1938). Experience and education. Kappa Delta Pi. Deming, W. E. (1986). Out of the crisis. MIT Press. Dworkin, R. (1986). Law’s empire. Belknap Press. Endicott, T. (2000). Vagueness in law. Oxford University Press. Fuller, L. L. (1969). The morality of law (Rev. ed.). Yale University Press. Gadamer, H.-G. (2004). Truth and method (J. Weinsheimer & D. G. Marshall, Trans., 2nd rev. ed.). Continuum. (Original work published 1960) Habermas, J. (1996). Between facts and norms: Contributions to a discourse theory of law and democracy (W. Rehg, Trans.). MIT Press. Hart, H. L. A. (1961). The concept of law. Clarendon Press. Kane, M. T. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000 Leiter, B. (2015). Nietzsche on morality (2nd ed.). Routledge. MacIntyre, A. (1981). After virtue. Duckworth. McDowell, J. (1994). Mind and world. Harvard University Press. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Nietzsche, F. (2006). On the genealogy of morality (C. Diethe, Trans., 2nd ed.). Cambridge University Press. Pollitt, A. (2012). The method of comparison. Assessment in Education: Principles, Policy & Practice, 19(3), 281–305. https://doi.org/10.1080/0969594X.2012.714737 Polanyi, M. (1966). The tacit dimension. Routledge & Kegan Paul. Putnam, H. (1981). Reason, truth and history. Cambridge University Press. Raz, J. (1990). Practical reason and norms (2nd ed.). Oxford University Press. Rawls, J. (1999). A theory of justice (Rev. ed.). Harvard University Press. Scanlon, T. M. (1998). What we owe to each other. Harvard University Press. Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(2), 4–14. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34(4), 273–286. Toulmin, S. (1958). The uses of argument. Cambridge University Press. Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley. Williams, B. (2002). Truth and truthfulness: An essay in genealogy. Princeton University Press. Wittgenstein, L. (1953). Philosophical investigations (G. E. M. Anscombe, Trans.). Blackwell.   Chapter 13 The Knowledge Curriculum In this chapter I set out the claim of the knowledge curriculum as its defenders present it and I show why it runs against the educational values and practices defended in my thesis. I use Sorensen’s lens to keep the focus sharp. Where the knowledge curriculum pretends that every edge between knowing and not knowing hides a fact that can be revealed by better tests or tighter lists, Sorensen reminds us that at a true borderline there is no truthmaker to be found. Once this is clear the case for redesigning curriculum and assessment around fixed inventories weakens, not because knowledge is unimportant, but because the way knowledge lives in practices cannot be reduced to tokens without loss of truthfulness about what we do. The knowledge curriculum is presented as a principled settlement. Its advocates speak of cultural literacy, common entitlement, and equity secured by a sequenced body of facts and concepts taught through explicit instruction and checked through regular testing. They point to the work of Hirsch on core knowledge and to a list of key works in the tradition. They appeal to cognitive load theory to argue for the advantage of teacher led exposition and rehearsal. They cite Rosenshine’s principles to support small steps, checks for understanding, and frequent review. They draw on Christodoulou’s critique of myths to challenge practices that ask for analysis without prior knowledge. They sometimes invoke Michael Young’s phrase powerful knowledge to argue that curriculum should privilege disciplinary concepts that allow students to think beyond their immediate experience. In public policy speech this becomes a programme of canonical texts, knowledge organisers, low stakes quizzes, and standardised assessments that aim to measure whether the knowledge has been secured. A ministerial voice claims that such a curriculum is neutral, rigorous, and fair. It promises that every child will be given the best that has been thought and said. It promises that teaching will return to its proper centre by focusing on the transmission of knowledge rather than the management of activities. It suggests that judgement will become more reliable once we look less at vague performances and more at whether students can recall and use what they have been taught. There is much in this that is attractive. Dewey reminds us that empty activity is not learning. Bruner insists that disciplines have structure and that teaching should make that structure available to novices. No one in this argument rejects the importance of propositional knowledge. What is at issue is the picture of how knowledge lives in a practice and how we might justly claim that it has been learned. Here Sorensen is a useful guide. The knowledge curriculum trades on a sharp line between knows and does not know, and it trades on the idea that every questionable edge can be sharpened by more precise items and tighter rubrics. Sorensen’s account of vagueness warns that there are true borderlines where no further fact will settle the matter, and that to insist that every such edge hides a discoverable truth is to drift into insincere speech. Once a curriculum is built on the promise of sharp lines at every frontier it is likely to misdescribe the goods of education and to tempt assessment into reifying tokens that can be mimicked without the ownership of reasons that gives them meaning. Wittgenstein keeps us inside use. To count as knowing a concept is to be able to go on in ways that a community recognises as apt in context. Brandom gives this a public form. To apply a concept is to take up inferential roles, to know what follows and what would defeat, and to stand ready for challenge. Toulmin supplies the grammar by which this readiness is shown. A claim is backed by data and linked by warrants, and the speaker acknowledges live rebuttals. Putnam and McDowell keep the world in view so that features of an object can oblige a change of verdict. On this settlement knowledge is not simply possession of propositions. It is membership in a practice where those propositions do explanatory and justificatory work under aims that are known and shared. The knowledge curriculum tends to treat knowledge as inert inventory. The more that inventory is reified the less the curriculum invites the public exercise of reasons that shows what the knowledge is for. There is a second pressure. Hacking warns about reflexivity and reification. Once a token becomes the currency of value the world reshapes itself to fit the token. A system that pays out on the presence of declarative items will generate teaching that optimises for short recall and familiar forms. That may raise scores. It may not raise understanding. Messick and Kane help us keep the argument honest. Any interpretation of a test must be defended from construct to task to scoring to use. If the construct is framed as possession of inventories, and if the tasks are short prompts that elicit those inventories, and if the reporting claims understanding or mastery, then the validity argument fails. The interpretation outruns the grounds. The uses are likely to mislead students and the public about what has been taught. Legal voices add clarity. Hart reminds us that rules have open texture. Dworkin argues that principles guide judgement when rules cannot settle a case. Endicott shows that higher order vagueness persists even when language is tightened. If a policy speech about curriculum treats every frontier as a rule case and denies the need for principle and interpretation it will drift into poor practice and poor justification. Raz and Habermas set the standard for authority. Procedures earn authority when they serve reasons that already apply to the domain and when they can be justified in public. A curriculum that hides policy choices behind the rhetoric of neutrality fails this test. To make the critique concrete consider a unit in English literature on Macbeth. The knowledge curriculum might present a list of key facts and quotations. It might require that students can define ambition, guilt, and fate, and can recite pivotal lines. It might include a sequence of quiz questions that check recall. These are not wrong. They are insufficient. A student can recite lines about vaulting ambition and yet have no grip on how the play builds motive as language and situation shift within an audience’s horizon. A student can recall that Macbeth is persuaded by his wife and yet miss the way the pronouns in one scene move from we to I and with that move rebuild responsibility. The goods of the subject live in the ability to use knowledge to make and revise a claim in view of words on the page and a listener’s needs. The line between knowing and not knowing here is not everywhere sharp. There are neighbours who will resist separation and there will be no truthmaker that could settle every edge. A curriculum that pretends otherwise will train teachers to overclaim and will train assessors to treat quizzes as measures of understanding. It will not help a sixteen year old to learn what it is to examine a motive in a tragedy. Consider an example that shows the difference. The knowledge curriculum check asks for three quotations that show Macbeth’s ambition and a short account of how they support the theme. A student with a good memory succeeds. A curriculum shaped by the values in this thesis asks for a commentary for new readers that helps them watch a shift in voice in a short extract. It asks for a small creative reconstruction that forces attention to diction and rhythm. It asks for a justification that ties stylistic choices to features of the extract. It asks for a short defence in which a challenge is answered with a line from the text. Here knowledge is in use. The distinction between the two approaches is not anti knowledge. It is a refusal to mistake inventory for the living thing. Take a second case in science. The knowledge curriculum may specify the formulae for specific heat capacity and a method with numbered steps. It may require recall of the words conduction, convection, and radiation, and set questions that ask for definitions and substitutions into the formula. Again none of this is wrong. It is insufficient. In a laboratory with awkward materials and time pressure the student must choose which loss matters most for this setup and for this audience. The student must justify a method for a technician who will adopt it for next year. The student must revise an ideal plan in the face of a drifting thermometer. The goods of science live in such choices ruled by purposes. The frontier between knowing and not knowing here is neither wholly sharp nor wholly measurable by inventories. Sorensen’s warning returns. At some edges there is no hidden fact that will tell us whether a student knows. There is only the public exercise of reasons and the design that elicits it. Defenders of the knowledge curriculum offer two replies. They say first that recall is a necessary precondition for reasoning. This is true and irrelevant to the criticism. No one denies the need for knowledge. The point is about how knowledge is shown and assessed. They say second that a knowledge curriculum is more equitable because it gives all students access to the same cultural capital. This claim is serious. It needs a serious answer. Equity is not secured by the promise that one set of inventories is the best that has been thought and said. Equity is secured when a curriculum and its assessments enable students to enter practices as reason giving participants and when the public reports match what has been taught and tested. A policy that installs a canon without public reasons for its selection will reflect the dispositions of those in power. A policy that reduces assessment to inventories will favour those who can deliver tokens under pressure. Habermas and Raz urge us to build legitimacy by public justification rather than by slogans. Fuller reminds us that publicity and congruence are conditions of a just order. If the knowledge curriculum is to claim equity it must meet these standards. It rarely does. Michael Young’s phrase powerful knowledge deserves a careful note. In its richer sense it points toward disciplinary concepts that allow students to think beyond immediate experience. It can be read as an invitation to teach explanation rather than lists. In policy speech however it often becomes a new name for the old token regime. If powerful knowledge is taught as the capacity to bring a concept to bear with reasons in a practice then it sits easily with the settlement defended here. If it becomes a banner under which lists are promoted, it will not. A further reply says that the knowledge curriculum is measurable and therefore reliable while performance judgements are subjective and unreliable. Messick and Kane give the answer. Reliability that is purchased by changing the construct is a false economy. If what we want to claim is understanding in use, then checklists of tokens are not measuring the construct. If what we want to report is recall, then say so, and stop implying more. The settlement I defend does not reject reliability. It locates reliability in the training and auditing of public reasons and in designs that constrain and educate judgement. Comparative routines and modest statistics help. They do not replace the subject. Sorensen’s lens reveals a final weakness. The knowledge curriculum assumes that every border can be discovered and that when two neighbours look the same to competent teachers a better item or a tighter mark scheme will separate them. There are real neighbours. There is no truthmaker to be found there. A just system designs to reduce how often it meets such borders. It also writes a policy for the remainder and speaks it aloud. A token regime will pretend. It will misdescribe policy choices as discoveries and it will discipline dissent by calling it error. Two short imagined classrooms make the argument visible. In a literature class the knowledge curriculum yields a lesson on quotations and a quiz on definitions. The assessment claims that students understand ambition in Macbeth because they can recite lines. In a class shaped by my settlement the assessment records how a student used a line to test a claim in a live defence and how a student adopted a cadence from the extract in a short reconstruction and justified that choice for a named audience. The report names the virtues shown and invites the next move. In a science class the knowledge curriculum yields a lesson on method steps and a test on substitutions. The assessment claims that students can do specific heat capacity. In a class shaped by my settlement the assessment records how a student identified convection as the dominant loss in this setup, how a student chose a method under constraints for a younger cohort, and how a student revised a claim when the instrument drifted. The report names the reasons and the limits. In both cases knowledge is visible. It is shown in use rather than counted as tokens. I return to values. Authenticity of task and object matters because meaning lives in use. Improvement as the centre of judgement matters because learning is growth in the public grasp of reasons. Public reasons matter because objectivity in these domains is the discipline of giving and asking for reasons under shared aims. Candour at borders matters because there are true neighbours. Breadth of excellence matters because incommensurables live in many subjects. Fairness through design matters because equity is not achieved by pretending that judgement can be removed. Restraint about claims matters because our reports must match our grounds. Apprenticeship of seeing matters because novices need guided attention, not only lists. Clarity about language matters because we must not mistake ambiguity and generality for vagueness. Courage about risk matters because authentic tasks invite forms of excellence that no checklist can foresee. Institutional memory matters because standards live as argued exemplars. Trustworthy speech matters because the public deserves the truth about what we can and cannot know. The knowledge curriculum is strongest when it demands that teachers name what is to be taught and sequence it with care. It is weakest when it treats knowledge as inventory, when it repackages political choices as neutral canons, and when it relies on tokens to stand for living achievement. The settlement defended in this thesis accepts the centrality of knowledge and refuses to detach it from the practices that give it point. With Sorensen in view we stop pretending that every edge hides a fact to be found. With Wittgenstein, Brandom and Toulmin in view we design curriculum and assessment so that knowledge shows itself in use. With Messick and Kane in view we stop overclaiming what tests can say. With Hart, Dworkin, Endicott, Raz and Habermas in view we build policy that admits where it speaks and why. If we do that we will protect both truth and fairness better than any regime of lists can hope to do. References Brandom, R. B. (1994). Making it explicit: Reasoning, representing, and discursive commitment. Harvard University Press. Bruner, J. S. (1960). The process of education. Harvard University Press. Christodoulou, D. (2014). Seven myths about education. Routledge. Dewey, J. (1938). Experience and education. Macmillan. Dworkin, R. (1986). Law’s empire. Belknap Press. Endicott, T. (2000). Vagueness in law. Oxford University Press. Hacking, I. (1990). The taming of chance. Cambridge University Press. Habermas, J. (1996). Between facts and norms: Contributions to a discourse theory of law and democracy (W. Rehg, Trans.). MIT Press. Hart, H. L. A. (1961). The concept of law. Clarendon Press. Hirsch, E. D., Jr. (1987). Cultural literacy: What every American needs to know. Houghton Mifflin. Hirsch, E. D., Jr. (2016). Why knowledge matters: Rescuing our children from failed educational theories. Harvard Education Press. Kane, M. T. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000 Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. McDowell, J. (1994). Mind and world. Harvard University Press. Putnam, H. (1981). Reason, truth and history. Cambridge University Press. Rosenshine, B. (2012). Principles of instruction: Research-based strategies that all teachers should know. American Educator, 36(1), 12–19. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. https://doi.org/10.1207/s15516709cog1202_4 Toulmin, S. (1958). The uses of argument. Cambridge University Press. Williams, B. (2002). Truth and truthfulness: An essay in genealogy. Princeton University Press. Wittgenstein, L. (1953). Philosophical investigations (G. E. M. Anscombe, Trans.). Blackwell. Young, M. (2008). Bringing knowledge back in: From social constructivism to social realism in the sociology of education. Routledge. Young, M., & Muller, J. (2013). On the powers of powerful knowledge. Review of Education, 1(3), 229–250. https://doi.org/10.1002/rev3.3017 Chapter 13 The NEAB In this chapter I recover the history of the Northern Examinations and Assessment Board and show why its assessment culture realised the reliability and validity requirements argued for in this thesis. I take reliability to be disciplined consistency across markers tasks and occasions and I take validity to be a defended chain from construct to task to scoring to use. Wittgenstein keeps us inside use so standards live as shared ways of going on. Brandom gives that a public shape in the space of reasons. Toulmin gives the grammar that connects claims to data by warrants with qualifiers and rebuttals. Messick and Kane require that interpretations and uses are argued and constrained. Hart and Dworkin remind us that rules and principles must walk together. Endicott shows that higher order vagueness persists even after codification and Sorensen keeps us honest at the true border where there is no hidden fact to be found. On this settlement the old NEAB system looks less like a loose federation of examiners and more like a practice designed to educate and constrain judgement in public. The bare chronology is clear. NEAB was created in 1992 by merging the Joint Matriculation Board with the schools boards of the North West the North the Associated Lancashire board and the Yorkshire and Humberside board. It served England Wales and Northern Ireland through the nineties and in 2000 merged with AEB and SEG to create AQA. Kathleen Tattersall led most of the NEAB period and is a central voice in the national history of comparability. Contemporary summaries record that NEAB quickly became the largest provider and that it entered an AQA alliance in 1997 before the formal merger in 2000. These facts matter because scale and federation shaped the methods chosen to hold standards steady across many subjects and regions The national comparability literature of the period shows the scaffolding into which NEAB fitted its practice. The regulator sponsored a large programme on comparability and codified the idea that comparability was a fundamental requirement that had to hold between providers between subjects between tasks within assessor judgement and across time. Tattersall’s historical chapter marks how regulation tightened through the late eighties and nineties with codes of practice accreditation and explicit expectations on coursework moderation grading and award meetings. She notes the fear that competition might degrade standards but also records a Broadcasting Standards Commission ruling that sensational claims of secret grade rigging were unfair which in turn pushed the system toward visible codes and audits. In that chapter there is an explicit citation to an NEAB statement from 1996 committing to common and unchanging standards and the narrative situates that statement inside a broad move to regulated comparability. Methods matter more than slogans. The technical work on comparability around NEAB included cross moderation studies syllabus scrutiny and the use of archive scripts to anchor grade definitions from year to year. The regulator’s comparability volume records the shift to explicit cross moderation methods after 1988 and the continuing use of expert comparisons of question papers and mark schemes to check relative demand. The special issue from Cambridge on comparability gathers the same period methods and shows how common item designs and judgement studies were used to steady standards. A contemporaneous guide from the same source clarifies the vocabulary and methods in use and makes clear that comparability meant more than equating scores. It meant aligning judgements of performance to defensible conceptions of demand. Reliability of marking was not treated as a metaphysical promise. It was treated as an outcome of training standardisation feedback and method. Classic studies on variability in essay marking had long warned that unguided judgement drifts and the early two thousands literature reports that reliability is improved by more specific mark schemes examiner training and active communities of practice. These are the very levers NEAB and its successor invested in through subject meetings standardisation scripts and iterative feedback to examiners. Later summaries of marker effects that motivated Ofqual’s reliability programme confirm the same levers and show why communities of practice and structured feedback raise consistency. The technical summary on GCSE marking tools from the period collects studies that report the positive effect of specific mark schemes and training on reliability. These are not NEAB only numbers but they describe the ecosystem in which NEAB worked and the levers it used. Coursework moderation is the clearest meeting point between my validity claims and NEAB practice. A national review from the period concludes that consensus moderation procedures in which teachers and moderators negotiate standards using exemplars are most effective for securing consistency and for professional learning and that purely autocratic or purely statistical moderation are less effective. That is exactly the practice many NEAB subjects ran through visiting and postal moderation with exemplar sets and feedback reports. The national summary of moderation and controlled assessment processes later codifies the same logic. Here the values of the thesis are visible. Validity lives where the community can cite and share exemplars and where reasons are recorded in language that others can inspect. Grading and awarding under NEAB also shows the mixed rule and principle model that Hart and Dworkin demand. Award meetings combined statistical predictions based on cohort evidence with judgement against archive scripts and grade descriptions. The regulator’s volumes from the period describe the emergence of these dual anchors. The statistical comparator protected comparability of outcomes and the script based anchor protected comparability of meaning. That is a Wittgensteinian settlement. Meaning is use so a grade means what a community takes it to mean when it looks again at real work. It is also a Brandomian settlement. To award is to undertake commitments that can be challenged by archive exemplars and by cohort evidence. The award record in a subject file is a Toulmin case. Data and warrants and qualifiers are written down. We can be more concrete. Consider a NEAB A level English literature award in the late nineties. Principal examiners convened with archive scripts that had anchored the previous years grade boundaries. Marker reports from standardisation meetings recorded features that distinguished a borderline A from a high B. Statistical predictions from prior attainment and entry mix were tabled. The awarders first placed a provisional A B and C boundary against the archive anchors and then checked the consequences against the prediction. Where the consequences were materially out of line they looked again at the live scripts around the provisional boundary and asked whether the cohort had changed in a way that justified a shift in standard. The meeting minutes captured the reasons for any divergence from prediction. This is Messick’s consequential strand in action. Uses were considered and reasons recorded. This is Kane’s interpretive argument written down as a working file rather than a theory paper. The validity claim was not that the number is perfect. It was that the number is the outcome of a disciplined argument whose steps can be retraced. A parallel account can be given for GCSE subjects with coursework. NEAB issued exemplar sets ahead of the season. Teachers marked to the exemplar standards and internal moderation within centres built a common voice. Visiting or postal moderators then sampled and if necessary adjusted the centre rank order. Reports were sent back to centres with cited reasons. A national review later confirmed that this form of negotiated moderation was the most effective both for reliability and for the professional education of judges. That is the practice this thesis defends. Reliability is not the elimination of judgement but the education of judgement in public. Of course there were pressures. Tattersall records the public suspicion in the mid nineties that competition between boards could lead to easier standards and notes the television programme that alleged secret fixing. She also records the formal ruling that these allegations were unfair and that no persuasive evidence supported them. The regulatory response was not to deny the possibility of drift but to formalise codes of practice accreditation and audit. NEAB lived through that tightening and contributed to it. The consequence was a culture in which local subject communities and national regulators shared the labour of holding standards. That is exactly the mixed model of rule and principle that this thesis asks for. It is important to keep Sorensen and Endicott in view. No amount of codification eliminates higher order vagueness. There are true neighbours at the borderline where no truthmaker can be found to separate scripts decisively and where a forced exactness would be insincere. What marks the NEAB culture at its best is not a denial of this fact but an economy of candour. Award meetings wrote down reasons and noted when policy spoke. Moderation reports explained rather than concealed adjustments. Cross moderation studies admitted uncertainty and triangulated methods rather than resting on a single proxy. The comparability literature of the time insists that different methods license different claims and that no single technique can settle all questions. That is precisely the stance this thesis defends. There is also the Hacking lesson about reflexivity. When a system pays on a token the world reorganises to deliver that token. The NEAB period still had coursework in many subjects and used extended responses even in unseen examinations. That diversified the evidence base and reduced the risk that a single short answer proxy would dominate. Where short answer tests were appropriate they were used. Where extended argument or practical work carried the construct they were retained and moderated. That kept validity in view while the comparability machinery did its work. One might object that later headlines about re marks and reliability call all this into question. They do not. They show that the problem of reliability is permanent and that periods of policy compression can starve the community of practice that makes reliability better. Ofqual’s later programme on marker effects and reliability was launched precisely because the system knows that reliability must be measured and improved rather than asserted. The literature I have cited shows the levers. Training standardisation specific mark schemes feedback and shared exemplars. Those were the levers NEAB used and that is why its culture fits the values of this thesis. What then is the claim I am making. Not that NEAB was flawless. Not that standards never drifted. The claim is that NEAB at its best exemplified the disciplined public practice that makes reliability and validity live in subjects. It built subject communities around exemplars. It mixed statistical comparators with script based anchors at awards. It used cross moderation and syllabus scrutiny to check relative demands. It recorded reasons. It educated judgement. It did this while acknowledging the persistence of vagueness at some borders and while writing procedures that prevented that vagueness from becoming either an alibi or a scandal. The national histories and method guides of the period record the architecture of that settlement and they show why the merger that created AQA inherited a culture already shaped by these commitments. Return to the values of the thesis. Authenticity of task and object is preserved when the construct drives the choice of task. NEAB sustained coursework and extended responses where they were the right vehicle and moderated them through communities of practice. Improvement as the centre of judgement is preserved when moderation visits and feedback reports educate centres rather than simply police them. Public reasons are preserved when award rationales and moderation reports are written and archived. Candour at borders is preserved when committees record the point at which policy speaks because neighbours cannot be separated by further facts. Breadth of excellence is preserved when subjects can show different routes to similar standards and when comparability work respects incommensurables. Fairness through design is preserved when standardisation and exemplars are open and when appeals processes allow challenge. Restraint about claims is preserved when comparability reports match their methods and when public communications avoid pretending that one index can settle all questions. Apprenticeship of seeing is preserved when schools and examiners learn to see salience together around real scripts. Clarity about language is preserved when the system distinguishes ambiguity and generality from vagueness. Courage about risk is preserved when extended tasks and practical work are retained despite administrative cost. Institutional memory is preserved when archive scripts and award files accumulate and are used as living anchors. Trustworthy speech is preserved when boards and regulators resist both complacency and theatre and instead show their work. The fusion of local subject expertise with national method is the real lesson. A reliable and valid assessment system is not a machine. It is a practice that takes meaning in use seriously and then designs procedures that make that meaning publicly testable and reasonably stable across time and providers. In the nineties NEAB helped build that practice. The national record around it describes the comparability programme the codes and the methods. The theorists used in this thesis show why those methods are not add ons but constitutive of objectivity in education. When we look for models that honour reliability without destroying the goods of subjects we find that we do not need to start from nothing. We need to remember what was built and we need to keep the reasons alive. References AQA. (2001). Annual report and accounts 2000–01. AQA. AQA. (2017). AQA: Our history. AQA. Baird, J.-A., Cresswell, M., & Newton, P. E. (2000). Would the real gold standard please step forward? Assessment in Education: Principles, Policy & Practice, 7(2), 125–148. https://doi.org/10.1080/713613328 Bramley, T., & Gill, T. (2010). Evaluating the rank-ordering method for standard maintaining. Research Papers in Education, 25(3), 293–317. https://doi.org/10.1080/02671522.2010.498148 Brandom, R. B. (1994). Making it explicit: Reasoning, representing, and discursive commitment. Harvard University Press. Cambridge Assessment. (2007). Comparability of examination standards (Special issue). Research Matters, 4. Cambridge Assessment. (2010). Comparability: A framework for evaluating standards across examinations. Cambridge Assessment. Dworkin, R. (1986). Law’s empire. Belknap Press. Endicott, T. (2000). Vagueness in law. Oxford University Press. Gill, T., & Bramley, T. (2013). How comparable are GCSE grades across subjects? Cambridge Assessment Research Report. Cambridge Assessment. Habermas, J. (1996). Between facts and norms: Contributions to a discourse theory of law and democracy (W. Rehg, Trans.). MIT Press. Hacking, I. (1990). The taming of chance. Cambridge University Press. Hart, H. L. A. (1961). The concept of law. Clarendon Press. Joint Matriculation Board. (1992). Scheme of amalgamation creating the Northern Examinations and Assessment Board. JMB/NEAB. Kane, M. T. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000 Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Newton, P. E. (2007). Clarifying the purposes of educational assessment. Assessment in Education: Principles, Policy & Practice, 14(2), 149–170. https://doi.org/10.1080/09695940701478321 Ofqual. (2015). Marking consistency metrics and marking reliability. Office of Qualifications and Examinations Regulation. Ofqual. (2016). An evaluation of the reliability of marking in GCSEs, AS and A levels. Office of Qualifications and Examinations Regulation. QCA. (1998). Standards in public examinations 1975–1996. Qualifications and Curriculum Authority. QCA. (2001). Code of practice: GCSE, GCE, and GNVQ. Qualifications and Curriculum Authority. QCA. (2004). Moderation in teacher assessment: A review of research. Qualifications and Curriculum Authority. Pollitt, A. (2012). The method of comparison. Assessment in Education: Principles, Policy & Practice, 19(3), 281–305. https://doi.org/10.1080/0969594X.2012.714737 Raz, J. (1990). Practical reason and norms (2nd ed.). Oxford University Press. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Tattersall, K. (2007). The evolution of regulation and comparability in English examinations, 1988–2007. Qualifications and Curriculum Authority. Toulmin, S. (1958). The uses of argument. Cambridge University Press. Wittgenstein, L. (1953). Philosophical investigations (G. E. M. Anscombe, Trans.). Blackwell. Wikipedia contributors. (n.d.). Northern Examinations and Assessment Board. In Wikipedia, The Free Encyclopedia. Retrieved [insert retrieval date] Chapter 14 Where Education Is Now In this chapter I set out how assessment systems and school culture have moved away from the values defended in this thesis and why the result is not only less valid but in practice less reliable. The arc is familiar. Accountability has been tightened. Qualifications have been redesigned around linear examinations. Coursework has been stripped out in many subjects or reduced to endorsements that do not count for the grade. Question formats have been rationalised. Data has become the way schools and teachers are judged. Each step was defended as a move toward rigour and reliability. Each step has also thinned the evidence of learning and displaced the public discipline of reasons with tokens that are easier to count and easier to mimic. Wittgenstein keeps me at use. Meaning in our subjects is shown in what competent participants are prepared to do next when reasons are asked for. Brandom gives that the shape of a space of reasons. Toulmin gives the grammar by which claims are linked to data with warrants and with attention to rebuttal and to the scope of the claim. Messick and Kane demand that the interpretations we place on scores and the uses to which we put them are defended from construct to task to scoring to use. Hart reminds me that rules have open texture and Dworkin that principles must guide when rules leave a gap. Endicott shows why higher order vagueness persists. Sorensen closes the door on a tempting hope by reminding us that at a real borderline there is no hidden fact to be found. Raz and Habermas insist that authority rests on reasons that can be shown. Against this settlement the new culture looks like a retreat into a measure that promises certainty and delivers only a narrower picture and a higher chance of insincere speech. The withdrawal of coursework is one clear turn. It was justified by concerns about authenticity and about uneven internal moderation. The remedy removed a central site where knowledge shows itself in use and where communities of practice educate judgement. In English the project work that once asked students to build an argument across texts and audiences has been replaced by timed commentary on unseen passages and by essays that can be drilled by templates. In science the practical became an endorsement that often sits outside the grade. The laboratory remains but the mark that matters is now decided at a desk. In modern languages the oral has been squeezed by standardisation costs and timing. The result is a contraction of the evidence base. If validity requires that the task fits the construct then removing the task that best houses the construct cannot leave validity untouched. The new system promises reliability by shrinking what must be judged. It forgets that reliability about the wrong thing is not reliability at all. The move to linear examinations is a second turn. The case for linearity is elegant. It reduces game playing with modular resits. It fosters sustained study. In practice it concentrates risk and narrows the room for showing growth. If improvement is the centre of judgement then a culture that cannot record improvement except as a final surge will undervalue the learning that happened and will encourage teachers to teach to the last paper. The linear paper also invites a rationalisation of mark schemes toward itemised criteria that can be trained in short bursts. Marking can then be scripted more tightly. The paradox is that such mark schemes are more brittle in the face of genuine excellence and more open to gaming by mimicry. What feels like reliability is often only uniformity. The data regime is a third turn. School accountability has been tied to headline indicators that give the impression of clean comparability across schools and years. The intention is defensible. The effect is reflexive. Hacking warned us that tokens reshape the world that uses them. Once a single number is the currency of value teaching will reorganise to deliver that number. Weber described the comfort of calculability and procedure. We now see the comfort and the cost. The more a school must live by one number the more it must invest in reducing the variance that comes from authentic tasks and human judgement. The result is an assessment diet that is easier to count and easier to rehearse and more vulnerable to fluent imitation. Examples make the change vivid. In literature the older model asked students to read a play or a novel across a term and to write for different audiences. They learned what a claim must do in public. They learned to use lines to test a reading under challenge. They learned to hold a live ambiguity. The newer model frequently rewards the rapidly produced discursive essay that can be templated. The template instructs the student to assert a theme to quote a line to name a device to repeat. The voice sounds like school English. It is easier to mark by ticks. It is easier to mimic without owning reasons. A generator now does it in an instant. The proxy is fragile because it never was the thing. In science the older model held the practical within the grade and asked students to choose under constraints and to explain why a method serves a purpose for a given audience. The newer model often tests definitions and substitutions into a formula. A student can now secure a high mark without showing that a drift in an instrument changes a plan or that a simpler method with larger error is the wiser choice for a younger cohort. The proxy looks cleaner. It is less about science. The irony is that these changes do not necessarily improve reliability. Of course agreement rises when markers count small items. Agreement also rises when the work is so constrained that only one route is permitted. But the quality of agreement matters. If two trained readers once read an extended argument and agreed on the rank order around a boundary through reasons they could cite then the agreement spoke to the construct. If two readers now agree on ticks around a rubric for a templated paragraph the agreement is cheaper and less about the construct. Where extended responses remain, the reliability of marks still depends on training, exemplars, and the habits of public reasons. These were the levers that the older culture invested in. They are the levers that a narrowed regime quietly starves. The new culture has also changed how grade standards are maintained. Statistical comparators now carry more weight in award meetings because script based anchors are thinner where extended tasks have been reduced and where coursework is gone. The comparator protects outcome comparability but can pull a boundary away from meaning if the only anchors to meaning are short items. Committees then write reasons that speak the language of policy rather than the language of subject sense. That is the drift into insincere speech that Sorensen warns about. At some borders reasons tie because the neighbour scripts really are neighbours. In the older settlement committees admitted that fact and recorded where policy spoke. In the newer settlement policy speaks through the comparator as if it were a discovery. The public is told that standards were held without being told how. Teacher assessment has not been protected from the same narrowing. In the absence of moderated coursework many systems now use frequent internal tests that mimic the final paper. The argument is that more testing raises reliability and helps pupils practise. In practice it takes time from teaching, enshrines a monoculture of one style, and feeds a data machine that motivates interventions aimed at the number rather than at the construct. The classroom becomes a rehearsal for a proxy. Apprenticeship of seeing suffers. Polanyi’s account of tacit knowledge and Shulman’s account of pedagogical content knowledge are crowded out by termly point chases. The pandemic years made all this plain. When examinations were cancelled systems turned to centre assessed grades and then to teacher assessed grades with quality assurance. The shift was defended as necessary. It was also an unplanned experiment in what happens when a community lacks a living memory of shared exemplars and moderation routines. Where those routines had been sustained departments could give reasons with confidence and audit one another fairly. Where they had been starved practices fell back on rank orders built from internal tests designed for accountability rather than for meaning. Public trust was shaken because speech did not match ground. The rush back to examinations was presented as a return to rigour. It also returned to a narrow evidence base. None of this denies that knowledge matters or that examinations have a place. It insists that knowledge lives in use and that examinations are valid when they elicit choices that matter for the construct and when marks are the outcome of public reasons that trained readers can reproduce. It insists that reliability is not the absence of judgement but the education and constraint of judgement in communities that share exemplars and record reasons. It insists that policy must speak openly at borders and that we should design to reduce how often we meet those borders by making process visible and by asking for short acts of defence. The way back is the same as the way forward. Restore tasks that let knowledge show itself in use. Keep practical work in the grade where the subject requires it. Specify constructs as families of virtues taught through exemplars with commentary. Train and audit markers through the public giving of reasons. Use comparative routines to harness many local choices while writing a sentence of reason for each. Use numbers as checks that trigger inquiry rather than as rulers of content. Publish award rationales that separate discovery from policy. Teach students to own reasons in small routines. Ask for short defences that link claims to particulars. Speak honestly at borders. When these values are put to work reliability rises because readers share a living sense of the construct and because their reasons can be tested. Validity rises because tasks fit the thing we care about. Fairness rises because decisions can be explained and because appeals can ask for reasons rather than for magic. Trust rises because the public hears speech that matches what the ground can bear. The path we have been on has moved us in the other direction. It promised certainty by thinning the work. It delivered fragility by making proxies into ends. If we wish to be truthful about education we must reverse that logic. We must protect the places where knowledge is used and judged in public. Only there can reliability and validity live together. References Baird, J.-A., Cresswell, M., & Newton, P. E. (2000). Would the real gold standard please step forward? Assessment in Education: Principles, Policy & Practice, 7(2), 125–148. Brandom, R. B. (1994). Making it explicit: Reasoning, representing, and discursive commitment. Harvard University Press. Department for Education. (2013). Reforming GCSEs and A levels in England. Department for Education. Dworkin, R. (1986). Law’s empire. Belknap Press. Edgington, D. (1995). On the vagueness of ‘vagueness’. Mind, 104(415), 305–329. Endicott, T. (2000). Vagueness in law. Oxford University Press. Habermas, J. (1996). Between facts and norms: Contributions to a discourse theory of law and democracy (W. Rehg, Trans.). MIT Press. Hacking, I. (1990). The taming of chance. Cambridge University Press. Hart, H. L. A. (1961). The concept of law. Clarendon Press. Kane, M. T. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000 Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Newton, P. E. (2007). Clarifying the purposes of educational assessment. Assessment in Education: Principles, Policy & Practice, 14(2), 149–170. Ofqual. (2014). GCSE, AS and A level reforms in England. Office of Qualifications and Examinations Regulation. Ofqual. (2015). Regulating GCSEs, AS and A levels: Summer 2015. Office of Qualifications and Examinations Regulation. Ofqual. (2016). An evaluation of the reliability of marking in GCSEs, AS and A levels. Office of Qualifications and Examinations Regulation. Ofqual. (2020). Summer 2020 grades for GCSE, AS and A level: Guidance for teachers, students, parents and carers. Office of Qualifications and Examinations Regulation. Ofqual. (2021). Guidance: Teacher assessed grades, summer 2021. Office of Qualifications and Examinations Regulation. Polanyi, M. (1966). The tacit dimension. Routledge & Kegan Paul. Pollitt, A. (2012). The method of comparison. Assessment in Education: Principles, Policy & Practice, 19(3), 281–305. Putnam, H. (1981). Reason, truth and history. Cambridge University Press. Raz, J. (1990). Practical reason and norms (2nd ed.). Oxford University Press. Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(2), 4–14. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Toulmin, S. (1958). The uses of argument. Cambridge University Press. Weber, M. (1978). Economy and society (G. Roth & C. Wittich, Eds.). University of California Press. Wittgenstein, L. (1953). Philosophical investigations (G. E. M. Anscombe, Trans.). Blackwell. Chapter 16 Wherever Education Is , Culture Goes and Meets It In this chapter I speculate about the wider cultural shifts of the past decade and ask whether they connect with the new kind of schooling that has replaced the earlier settlement of values and practices defended in this thesis. I do not claim a single cause, I trace family resemblances. A culture that prizes visibility, speed and scale often builds proxies that feel objective while loosening their tie to the goods they once measured. A school culture that has narrowed tasks, trimmed judgement and leaned on numbers that travel easily sits beside this. Where the wider culture valorises the influencer, the hot take and the stream, schools have come to valorise the checklist, the linear examination and the sortable distribution. In both domains reasons recede and tokens dominate. The result is a shared drift toward performances that are easy to imitate and hard to ground. Begin with attention. Children and teenagers now live inside feeds where authority is carried by persona, frequency and reach. The influencer becomes a proxy for knowledge, the algorithm becomes a proxy for taste. News consumption floats in the same currents, younger audiences often meet public questions through short clips and creator commentary. At the level of politics many polities have become more comfortable with strongman assurances and less patient with slow public reasons. The informational environment looks noisy, fragile and performative, while the political environment looks more willing to accept certainty without argument. Now set beside this the schooling we have built. Qualifications have been redesigned around terminal examinations, coursework has been stripped from many subjects or reduced to endorsements that do not count, question formats have been rationalised, and accountability has been tied to headline indicators. Each change was defended as rigour and reliability. Each has also thinned the evidence of learning and displaced the public discipline of reasons with tokens that are easy to count and easy to mimic. The family resemblance is clear. A system that must speak through one number prefers tokens that travel. Coursework is hard to standardise at scale and hard to compress into a single digestible figure. Extended oral and practical work are messy. The linear paper with itemised mark schemes is neat. The proxy glitters, yet it is also easier to imitate and to drill. Sorensen’s lens makes the common logic visible. In culture and school the promise is that a sharper proxy will reduce uncertainty. The influencer reduces the complexity of expertise to a trusted face. The headline measure reduces a school to a percentile. At each border where reasons tie, systems act as if more data or a tighter rubric will reveal a hidden fact. Sorensen reminds us that some borders are real and carry no truthmaker that could settle them. At those borders the honest move is to acknowledge policy. The contemporary drift is to deny the border and to keep the theatre of certainty running. Hacking cautions that tokens create the worlds that measure them. Once brands pay out on reach and engagement, the ecology reorganises itself around formats and personae that maximise those metrics. The emergent truth of a claim becomes less salient than the traction it gains. The classroom analogue is the template essay, the thin proof, the revision pack of small items. Once schools pay out on the presence of exam ready tokens, teaching reorganises to deliver them. What can be counted at scale replaces what must be argued in a room. The link to the wider culture is not a simple cause, it is a homology. In both spaces the proxy takes the place of the thing. There is evidence that lends weight to this picture even if it cannot complete a causal chain. Time and attention are concentrated in platforms built to optimise retention. Advertisers and institutions follow the attention, public debate fragments under the pressure of short formats, and political actors reward performance that travels. In England the assessment record shows deliberate simplification away from coursework and modularity and toward terminal papers for portability and comparability. It is reasonable to ask whether habits that flourish in such an environment make it harder for schools to sustain the public discipline of reasons required by this thesis. It is also reasonable to ask whether schooling has trained the public to prize tokens over reasons and so to be more vulnerable to surface fluency. Wittgenstein and Brandom help explain why this matters. Meaning lives in use, concepts are roles in a space of reasons. In a culture saturated with performance the next move after a claim is often not a reason but a gesture, a share or a mood. In a school culture saturated with performance the next move after a claim is often not a reason but a device spotter, a term of art or a memorised step. Toulmin gives a way back. A claim should be backed by data, linked by warrants, and qualified in view of rebuttals. Putnam and McDowell keep the world in the picture so that features of an object can oblige a change of mind. In both domains the repair is the same. We must ask for reasons and keep particulars in view. The rise of fake news is a stress test of this habit. A polished string does not guarantee ownership of reasons and a viral post does not guarantee truth. The classroom version is the homework essay that sounds right and says little. Both look plausible until a line is put on the table and the claim must be tested aloud. Authoritarian drift provides a second test. Authoritarian politics rely on simplified narratives and on discrediting institutions that ask for reasons. A school culture that has muted teacher judgement in favour of machine readable tokens inadvertently trains a public that is less practised at asking for and weighing reasons. This is not conspiracy, it is a shared erosion of the same muscles. There is, however, another current that runs against these trends. Many people continue to live the values of this thesis even when schools and platforms do not encourage them. In the climate change debate, citizens’ assemblies have shown that ordinary people will listen to evidence, weigh trade offs, and give public reasons for difficult choices. When participants ask why a carbon tax should be coupled with dividend payments, and when they require that claims about grid stability be tied to specific data about storage and demand, the habit of reasons becomes visible. In opposition to authoritarian politics, people march, petition, and organise local forums. They insist on due process, they read legislation, they publish annotated guides for neighbours. They ask elected officials to attend town halls where questions are specific and answers are recorded. In defence of the National Health Service and welfare protections, patients’ groups and professional bodies gather case data, publish waiting time analyses, and compare alternative funding models. They do not only tell stories, they present warrants, they accept rebuttals, and they qualify their claims. In these spaces the public continues to want reasons and continues to practise the habits we teach when education is at its best. There are many small examples. Community energy projects publish open dashboards that show generation and demand in real time, then invite residents to adjust behaviour and watch the effect. Mutual aid groups record requests and deliveries, then hold short meetings where choices are justified and revised. Local planning campaigns share primary documents, invite residents to annotate clauses, and host short teach ins on legal thresholds before votes. Hospital trusts and patient advocates co produce leaflets that explain risks and benefits for specific treatments, then invite questions and publish follow ups. Food banks partner with schools to study local price data, then ask pupils to present reasons for alternative purchasing strategies, and record which reasons persuaded the group. These are not elaborate spectacles, they are routine acts of public reasoning, anchored in particulars, and lived in common. Such practices suggest a path for schools. Where the wider culture supplies the stream and the proxy, schools can partner with community sites that still demand reasons. A literature class can co host a public reading on climate narratives, then ask pupils to defend a claim about a passage to an audience that includes activists and engineers. A science class can adopt a local air quality sensor, then ask pupils to present a plan for placement and to justify it to residents who live near busy roads. A mathematics class can analyse appointment data from a general practice, then write a short brief for the surgery on scheduling trade offs, with a one minute defence in front of staff. A history class can study the rhetoric of a recent referendum leaflet, then present a two page dossier that weighs claims against primary sources, and answers a live objection offered by a local journalist. The classroom does not abandon knowledge, it relocates evidence for learning to places where knowledge must be used, reasons must be owned, and the world can answer back. Schools can also protect spaces inside their own walls where public reasons are the norm. Short, frequent routines help. Two short texts are read side by side, pupils speak for one minute on which better serves a stated aim and why, while citing lines. Every claim in a lab book carries a one sentence warrant. After each problem in mathematics a brief note explains the choice of method for a peer under different constraints. In history pupils practise the move from dossier to brief, then revise when a new document arrives. These routines build the habit of giving reasons with particulars in view. They also make later assessment more robust, since work that is owned in speech is harder to mimic. Policy must match these moves with honest speech. Where borders are real and reasons tie, systems should say when policy speaks. Where evidence is insufficient to decide origin, and where design has elicited process, a published rule can give the benefit of doubt and require a supervised make up for the same aims. Where numbers are used, they should be checks that trigger inquiry, not rulers of content. Where reports are written, they should separate discovery from policy and match claims to grounds. Such candour keeps trust and invites the public back into the practice of reasons. If there is a link between cultural shifts and school changes it is not that one simply causes the other, it is that both have leaned toward proxies that promise certainty while eroding the public practices that make claims answerable. The influencer becomes a proxy for trust, the viral post becomes a proxy for truth, the headline measure becomes a proxy for learning. Yet the counter evidence is strong. In climate assemblies, in campaigns against authoritarian abuses, in defences of the health service and social security, people still assemble reasons, still accept rebuttals, and still qualify claims. The values of this thesis are not absent, they are dispersed across civil life. By naming these places and by partnering with them, schools can help those values re emerge inside education. We do not need to wait for platforms to change. We can give pupils daily practice in owning reasons, we can make tasks that bring the world into the room, and we can speak honestly when policy must act. If we do this, the culture of schooling will become a counterweight to the stream, and the practice of education will again match the goods it claims to teach. References Brandom, R. B. (1994). Making it explicit: Reasoning, representing, and discursive commitment. Harvard University Press. Department for Education. (2010). The importance of teaching: The schools White Paper 2010. The Stationery Office. Department for Education. (2013). Reforming GCSEs and A levels in England. Department for Education. Dworkin, R. (1986). Law’s empire. Belknap Press. Economist Intelligence Unit. (2025). Democracy index 2024: Age of conflict. Economist Intelligence Unit. Endicott, T. (2000). Vagueness in law. Oxford University Press. Freedom House. (2025). Freedom in the world 2025. Freedom House. Hacking, I. (1990). The taming of chance. Cambridge University Press. Influencer Marketing Hub. (2025). Influencer marketing benchmark report 2025. Influencer Marketing Hub. Kane, M. T. (2013). Validating the interpretations of test scores for proposed uses. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000 King’s Fund. (2024). NHS performance and waiting times: Quarterly monitoring. The King’s Fund. McDowell, J. (1994). Mind and world. Harvard University Press. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan. Ofcom. (2024). Children and parents: Media use and attitudes report 2024. Ofcom. Ofqual. (2015). Regulating GCSEs, AS and A levels: Summer 2015. Office of Qualifications and Examinations Regulation. Ofqual. (2018). Marking consistency metrics and marking reliability. Office of Qualifications and Examinations Regulation. Putnam, H. (1981). Reason, truth and history. Cambridge University Press. Reuters Institute for the Study of Journalism. (2025). Digital news report 2025. University of Oxford. Sorensen, R. (2001). Vagueness and contradiction. Oxford University Press. Toulmin, S. (1958). The uses of argument. Cambridge University Press. UK Climate Assembly. (2020). The path to net zero: Climate assembly UK full report. UK Parliament. Wittgenstein, L. (1953). Philosophical investigations (G. E. M. Anscombe, Trans.). Blackwell. Appendix Williamson and Sorensen  This appendix keeps the frame simple and adds what it means for policy to speak. Many words are vague. Heap, tall, bald, rich, beautiful. When we move in tiny steps no single change seems to force a flip, yet somewhere the verdict must turn. The epistemicist answer says there is a sharp cut and that every statement is true or false. Our sense of blur comes from principled ignorance rather than a hole in reality. Timothy Williamson builds this into a general theory of knowledge. Roy Sorensen keeps the core and stays with the room where people must act when the edge is reached. They agree on much. Both keep classical logic. Both keep bivalence. Both say that borderline cases feel indeterminate because we cannot know where the true boundary lies, not because nothing is true there. Both distrust easy repairs that change the subject. The parting of ways shows in how they talk about omniscience and about the public voice at the edge. For Williamson an omniscient knower could locate every sharp boundary, since all truths are fixed. For Sorensen the very look of a vague boundary is peculiar. At the cut our best statements collapse into tautology. This is bald because it is bald. This is still red because it is still red. Such verdicts do not present a recognisable truthmaker as a reason. The appearance is not a flaw of rhetoric. It shows why some borders defeat even ideal enquiry. On Sorensen’s telling not even an omniscient being could know where the cut is, since there is no more to be known that would turn the tautology into a reasoned discovery. Williamson’s omniscient being could know. Sorensen’s cannot. Now explain policy speaking. In a practice we try to separate neighbours by reasons that point to features in the object and to the aim that governs the practice. Sometimes the reasons tie. A decision is still required. When reasons tie and the practice must act, the honest move is to say so and to enact a standing policy that was published in advance. That public enactment is policy speaking. It is not the discovery of a hidden fact. It is a rule of action that the community has adopted for exactly these moments. We keep Wittgenstein’s reminder that meaning lives in use. We keep Brandom’s space of reasons. We keep Toulmin’s grammar for claims and warrants. We keep Endicott’s lesson that higher order vagueness persists. We keep Sorensen’s warning that the border presents as a tautology and so cannot be known as a discovery. Policy speaks so that the practice can move without pretending to have found what cannot be found. A concrete example makes this clear. Consider an award meeting in literature. Two scripts sit at the provisional pass boundary. Both readers write reasons. Script A offers a clean commentary with thin evidence of audience awareness. Script B offers a riskier reading with fine audience sense but a loose close. The committee lays the lines on the table. The features that favour A and the features that favour B are of comparable weight for the taught construct and for the stated aim. No further feature decides. A discovery is not available. The board has a published rule for such ties. Where two independent kinds of evidence are strong, for example strong commentary reasons and a strong oral defence, the higher grade may be awarded. Where only one is strong the candidate completes a short supervised task that targets the weaker element. The chair records that reasons tied, that the standing rule was applied, and that the student will complete a supervised make up next week on a new passage for the same aim. That is policy speaking. The minute does not claim that a hidden fact was found. It names the reasons, it names the tie, and it cites the policy that governs the action. A second example comes from detection. A student submits a polished essay. In class the student gives a one minute defence and handles a pointed challenge with modest success. The teacher cannot know the origin of the home essay. The border is real and presents as a tautology. It is authentic because it is authentic. The department has a published policy. Students must keep prompts and drafts. Where origins cannot be determined and where the design has elicited process that is visible, the default is to give the benefit of doubt at a first uncertainty and to require a supervised in room task that serves the same aim. The report to the student names the reasons that were visible, states that origin could not be known, cites the rule, and sets the supervised task. The report to users records that policy spoke. Trust is kept because speech matches ground. A third example comes from colour grading in a museum. A very long strip moves from red to yellow in fine steps. The gallery must place labels. Curators bring trained eyes and reasons about hue and function. The room still faces a join where no reason decides. The catalogue has a published note. Labels at such joins are policy labels that serve visitor navigation. They are placed after a comparative session and are reviewed when lighting changes. The label says red here for visitors. The catalogue says that policy spoke. Again the action is clean because the speech is clean. These examples show why policy speaking is not a retreat from objectivity. It is the form that objectivity must take when Endicott’s higher order vagueness endures and when Sorensen’s tautological border appears. It protects truthfulness in Williams’s sense and satisfies Fuller’s publicity and congruence. It keeps Raz’s order of authority straight. Procedures earn authority when they serve reasons that already apply and when they declare the point where reasons end and policy begins. It also fits Williamson’s counsel about safe decision under ignorance. We do not claim to know what cannot be known. We install a fair policy for when we must still decide. Bring both theorists back into view and the settlement is clear. Williamson explains why our eye will never know the exact place where the verdict flips and why sharp cuts can still exist. Sorensen explains why the edge often looks like a tautology and why honest institutions must sometimes speak in the voice of policy. Education and law need both. The engine that keeps logic and knowledge steady. The practice that keeps speech and trust steady. References Brandom, R. B. (1994). Making it explicit, reasoning, representing, and discursive commitment. Harvard University Press. Endicott, T. (2000). Vagueness in law. Oxford University Press. Fuller, L. L. (1969). The morality of law Rev. ed.. Yale University Press. Habermas, J. 1996. Between facts and norms, contributions to a discourse theory of law and democracy W. Rehg, Trans.. MIT Press. Raz, J. 1990. Practical reason and norms 2nd ed.. Oxford University Press. Sorensen, R. 2001. Vagueness and contradiction. Oxford University Press. Toulmin, S. 1958. The uses of argument. Cambridge University Press. Williams, B. 2002. Truth and truthfulness, an essay in genealogy. Princeton University Press. Williamson, T. 1994. Vagueness. Routledge. Wittgenstein, L. 1953. Philosophical investigations G. E. M. Anscombe, Trans.. Blackwell.