Lonely’. (On the Fruitful Fusion of Williamson’s, Sorensen’s and Fine’s Theories Of Vagueness for Educational Assessment)

Educational assessment talk about “borderlines” very often slides between at least three different phenomena, and each phenomenon licences a different kind of response. One is ambiguity, where the meaning or intended standard is unclear (for example, “critical evaluation” could mean “include counterarguments” or “evaluate evidence quality”, and different markers hear different instructions). A second is measurement noise, where the target is in principle determinate but the procedure is unreliable (fatigue, marker drift, context effects, random errors). The third is vagueness proper, where the target concept itself is not the kind of thing that yields crisp boundaries under competent use (classic cases: “heap”, “bald”, “book”, and, in education, “insightful”, “coherent”, “excellent”). My claim is that assessment cultures routinely treat vagueness as if it were ambiguity: as if there must be a hidden, stipulable rule that would make borderline cases disappear, and the job is just to find it and implement it. I shall argue that the Williamson and Sorensen epistemic tradition, and Fine’s 2020 “global” turn, can be used together rather than as rivals to show why this is a mistake and that much follows from this.

Williamson’s^[1] and Sorensen’s^[2] epistemicism begins from a hard thought: if a predicate is vague, it still obeys classical logic and bivalence, so there is a fact of the matter about each case, even though we cannot know it. Williamson’s familiar route to this conclusion runs through the margin for error^[3] idea: if your classificatory knowledge were too “tight” around the boundary, you would get a sorites-style contradiction; so knowledge must be “buffered”, and that buffering makes exact boundary knowledge unavailable in principle. (This is the broader family of arguments around anti-luminosity and safety constraints on knowledge: if you know you are F at a point, nearby points must also be F, but a sorites series forces collapse.)

Sorensen is often an epistemicist for broadly kindred reasons, but he is especially interested in the idea of absolute borderline cases: cases that resist every method of settling, not just ordinary empirical checks, and so expose a conceptual limit to inquiry rather than a practical limitation. He frames the localist picture exactly in those terms: vagueness is understood as the possession of local, absolute borderline cases that “resist all means of inquiry,” and he notes the epistemicist’s reaction: bivalence says there is a hidden truth-value anyway.

If you want to pressure educational assessment engineers epistemicism is rhetorically sharp because it grants them something they like: there really is a right grade, even when we cannot know it. That sounds like it vindicates the examiner’s aspiration to determinacy. But it also gives you the way to pull the rug from under their feet. If the boundary is sharp but unknowable in principle, then systems that pretend to “remove borderline problems” by ever more detailed rubrics or “tightening descriptors” are engaged in a category mistake. They are treating the problem as ambiguity or noise, when on epistemicism it is constitutive ignorance. More training and finer criteria may reduce ambiguity and noise, but they cannot remove the epistemic blur created by the very structure of a vague predicate. That turns the assessment obsession with “pinpointing the borderline” into a kind of pseudo-engineering: it encourages overconfidence in instruments that cannot, even in principle, deliver what they promise.

Fine’s 2020 view^[4] complicates and, for education, arguably improves the diagnosis. Fine’s key move is that vagueness is a kind of indeterminacy that is global rather than local. The striking claim is that you cannot coherently deny a single instance of excluded middle, but you can coherently deny certain conjunctions of instances (so ¬(p ∨ ¬p) is inconsistent, but ¬((p ∨ ¬p) ∧ (q ∨ ¬q)) can be consistent in the right setting), which models the idea that a sorites series exhibits vagueness although no member does so “in isolation.”

Just to unpack that last bit for those not familiar with the symbolic logic the idea can be put like this. Take a single clear yes or no question, such as “Is this particular object a book or not?” On its own, it is incoherent to deny that there must be an answer. It cannot be neither a book nor not a book. That much still holds. What Fine’s approach challenges is something subtler. Suppose instead you are confronted not with one object, but with a long series of very similar objects, arranged so that they gradually change from clear non-books to clear books. Now consider the claim that every single item in the series has a perfectly definite yes or no answer. Fine’s point is that it can be coherent to deny that claim as a whole, even though you cannot deny it for any one item taken by itself. In other words, there need not be any single object that is “indeterminate in isolation”, yet the series as a whole can still resist having a sharp boundary. Vagueness shows up not because any individual case lacks a determinate status, but because there is no coherent way of assigning determinate statuses across the entire range without something going wrong. This captures the familiar sense in which vague concepts fail to have a cutoff, even though no particular case looks problematic when viewed on its own.

Sorensen^[5] takes Fine’s global account of vagueness very seriously and agrees that it captures something real that local, case-by-case theories struggle to explain. He accepts that vagueness often shows up as a feature of ranges and series rather than as a defect in individual cases, and he is persuaded by Fine’s diagnosis of forced-march scenarios and relational borderlines. On the other hand, Sorensen does not think this automatically rules out the existence of local borderline cases. He reconstructs the localist intuition in a careful way.

Imagine standing back from a long series of gradually changing items. From that vantage point, you might be confident that there must be at least one borderline case somewhere in the range, even though you have no idea which one it is. Sorensen’s point is that this kind of knowledge does not depend on being able to identify a particular culprit. The localist can appeal to classical logic and say that the global phenomenon Fine describes is compatible with there being a definite borderline item, hidden from us by our epistemic limitations. According to this view, global vagueness does not eliminate local borderlines, it merely conceals them.

Sorensen does not simply endorse this response, but he treats it as a serious challenge. This matters because it shows that maybe one does not have to choose between acknowledging holistic, range-based vagueness and retaining a role for epistemic ignorance about individual cases. Sorensen’s position leaves open the possibility that both are in play: there may be genuine global structure that generates vagueness, and yet also facts about individual cases that are unknowable in practice.

This is precisely why the pressure of my essay is not on settling the metaphysics of vagueness once and for all, but on asking which resources best illuminate the specific phenomena we encounter in educational assessment. The dispute between Fine, Williamson, and their respective allies is often framed as a competition to identify the true nature of vagueness in general. My project is more modest and more targeted. The question is not whether vagueness is ultimately global or local, epistemic or structural, but which account explains why assessment systems behave as they do, why borderlines proliferate in predictable places, and why attempts to eliminate them through finer criteria or procedural safeguards repeatedly fail.

From this perspective, Sorensen’s position is especially useful. His willingness to acknowledge the force of Fine’s global diagnosis while resisting the conclusion that it exhausts the phenomenon mirrors the situation in assessment. In marking essays, we often know that some scripts must fall at the boundary between grades even if we cannot say which ones, and this ignorance feels real and constraining in a way that epistemicism captures well. At the same time, the pressure points in assessment are not random. They arise systematically where holistic intellectual qualities resist decomposition, where local strengths do not fuse into a coherent whole, and where the act being judged changes the question it was meant to answer. Those patterns are far more naturally explained by Fine’s structural apparatus than by an appeal to hidden but determinate facts alone. My argument, then, is not that epistemicism is false or that global indeterminacy is the whole story, but that educational assessment is a domain in which global, relational, and hyperintensional^[6] features do explanatory work that purely local models leave mysterious.

The most defensible stance is therefore a fusion strategy: use Williamson and Sorensen to legitimise the sense of unavoidable ignorance and the persistence of sharp decision points, while using Fine to explain why those decision points cluster where they do and why treating vagueness as mere ambiguity or noise mischaracterises what assessors are actually responding to.

But Sorensen raises a further, more speculative challenge that goes beyond the standard opposition between local and global accounts of vagueness. Fine defines global vagueness as indeterminacy that only arises when a predicate is applied across a range of cases, and he suggests that at least two cases are required for vagueness to get a foothold. Sorensen asks whether this requirement might be too restrictive. He introduces the idea of a “lonely” object: an object that exists in isolation, with no other relevant objects around it for comparison. Now consider the predicate “lonely” itself. If the object exists in a situation where it is unclear whether there are any other objects in the domain at all, or unclear whether certain neighbouring entities count as distinct objects, then it may be indeterminate whether the object is lonely. The indeterminacy here does not seem to arise from comparing multiple clear cases in a range, but from uncertainty about the surrounding conditions that make the predicate applicable in the first place.

Sorensen’s point is not that this decisively refutes Fine, but that Fine’s framework may actually invite a broader view of vagueness than the local versus global distinction captures. If indeterminacy can arise from unclear domains, unclear boundaries of existence, or unclear conditions of application, then vagueness might not always require a sequence or range in the straightforward sense Fine emphasises. In that way, Sorensen suggests that Fine’s work, rather than closing the door on alternative forms of vagueness, may open it to new and less familiar ones.

If we push Sorensen’s “lonely” case a little further, it begins to suggest a class of assessment phenomena that do not sit comfortably within either the standard local or the standard global pictures of vagueness, but which nonetheless feel very familiar to experienced examiners. The core of the “lonely” thought experiment is that indeterminacy can arise not from comparison across a range of similar cases, but from uncertainty about the surrounding field in which a judgement is meant to apply. The object is not borderline because it sits between two clear cases, but because it is unclear whether the conditions that would make the predicate applicable are even in place. Transposed into educational assessment, this points to situations where a student’s work is hard to classify not because it lies between two grades, but because it unsettles the framework that makes grading intelligible at all.

Consider an unconventional Macbeth essay that does not merely answer the question well or poorly, but reshapes what the question appears to be asking. Perhaps the student rejects the usual character-based or historical framing and instead offers a sustained analysis of dramatic time, theatrical space, or moral disorientation, drawing on the play’s structure rather than its themes. The essay may be compelling, coherent, and clearly the product of serious understanding, yet it does not sit comfortably within the established domain of comparison constituted by the mark scheme. It is not obviously excellent in the standard sense, but neither is it deficient. The indeterminacy here is not well described as a borderline between, say, a high A and a low A*. Rather, it arises from uncertainty about whether this performance belongs to the assessed domain as currently defined.

In Sorensen’s terms, the essay is “lonely”: it does not coexist neatly with other scripts against which it can be straightforwardly compared, and this very isolation generates the vagueness. This kind of case is often handled poorly by assessment systems. The response is either to force the essay back into the existing categories, penalising it for not exhibiting expected features, or to treat it as an anomaly that must be normalised through moderation. What the “lonely” thought experiment suggests is that such responses may be misdiagnosing the problem. The difficulty is not that the essay is borderline with respect to a clear standard, but that the standard itself is being tacitly challenged. The assessor faces a question not of degree, but of applicability: does this work instantiate the kind of achievement the assessment is meant to recognise, even though it does so in an unexpected way?

Seen in this light, Sorensen’s speculation encourages us to look for forms of vagueness in assessment that arise from unclear domains rather than unclear cutoffs. These include cases where a task implicitly assumes a certain form of engagement, but a student’s work reveals an alternative form that is intelligible and valuable, yet not anticipated by the rubric. Such cases generate discomfort precisely because they expose the dependence of assessment on background assumptions about what counts as a legitimate performance. They are not errors or ambiguities, but signals that the space of acceptable intellectual acts may be richer than the assessment design allows. If this is right, then exploring these “lonely” cases could be especially fruitful for rethinking assessment, not by refining thresholds, but by reflecting on how assessment frameworks delimit the kinds of understanding they are capable of recognising.

A “lonely” response is awkward for assessment because it brings into view something rubrics usually hide: that the construct^[7] being assessed is not merely a list of features, it is a structured whole, a kind of intellectual act, and the rubric is a technology for approximating that act through parts.

Fine’s metaphysical apparatus is useful here because it lets you describe, with some precision, where that approximation breaks, without immediately redescribing the breakdown as mere subjectivity. Start with mereology^[8] and anti additivity. Most rubrics are built on an additivity picture: the whole grade is a function of part scores, and the parts are treated as if they can be improved independently. Fine’s mereological way of thinking encourages a different question: do the listed criteria function as genuine parts of one unified performance, or are they merely co present features whose fusion into a single coherent act is not guaranteed? A “lonely” Macbeth essay often looks lonely precisely because it does not fuse cleanly with the rubric’s expected part structure. It may have extraordinary interpretive drive and conceptual control, but it withholds or explicitly disavows “historical context” in the standard curricular sense, or it uses context in an oblique way that is not recognisable as the rubric’s intended part. It may even explicitly argue that historical context is irrelevant. The additivity model then forces a trade: either you reward the act quality and tolerate the missing part, or you reward the part compliance and miss the act quality.

Fine’s apparatus says: stop treating that as a discretionary compromise. Treat it as evidence that the part structure is mis specified for the construct. This is where truthmakers^[9] help. In Fine’s hyperintensional spirit, two essays can satisfy the same descriptor “shows understanding of Macbeth” and yet do so in different internal ways. One essay “makes it true” by assembling familiar points with correct quotations and context, another “makes it true” by reorganising the reader’s sense of what the play is doing. World based or extensional thinking collapses these: if both meet the descriptor, they are equivalent. Truthmaker thinking refuses the collapse: what verifies the claim differs in structure.

That gives you a diagnostic language for rubric design. If your construct is meant to be disciplinary understanding, then the rubric should be sensitive not only to whether the essay hits outcomes, but to the mode of achieving them, the internal organisation of reasons, the explanatory profile. Otherwise you end up rewarding “truth without the right makers”, success that is extensionally correct but structurally thin, and punishing structurally powerful work that does not match the expected surface route.

Two other key elements from Fine are fusion and non closure. A key Finean thought is that a domain may not be closed under fusion: you can have local goods without there being a possible whole that is their coherent sum. In assessment terms that means that a rubric may list “context”, “quotation”, “structure”, “argument”, “critical vocabulary”, “alternative interpretations” and a student can display these locally, yet the essay may not be a coherent intellectual performance, because the parts interfere. Context becomes a bolt-on paragraph that breaks the line of thought, quotations become ornaments rather than evidence, alternative interpretations become a list rather than a pressure on the main claim.

In a naive rubric this can still add up to a high score. Fine’s non closure lens says: that is a predictable failure mode of additivity. It is not that the marker is being fussy, it is that the construct you care about is not fusable from those parts in that way. Conversely, a “lonely” essay may have fewer of the listed parts, but its fusion is exceptionally strong: every element is doing work, the whole is tight, and the interpretation is generative. In a fusion sensitive framework, that essay is not “missing a criterion”, it is showing that the criterion list is not the right decomposition of the whole.

Non compossibility then becomes the right contrast class. Incommensurability and incommensurate values make assessors fear that comparisons are impossible or merely political. Non compossibility is sharper. It does not say the essays are beyond comparison, it says certain combinations of virtues are not jointly realisable within one coherent performance of the relevant kind. That claim is testable and disciplinarily intelligible. In a Macbeth essay, a highly original conceptual take and a rigidly template driven structure may be non compossible in a single essay, because the template forces moves that prevent the conceptual take from unfolding. The problem is not that originality and structure are incommensurate goods, it is that the particular form of structure imposed by the assessment ecology blocks the possibility of the act you want. That immediately turns into a design question rather than a metaphysical lament: if you want to assess originality as a live intellectual achievement, do not require a format that systematically prevents it from being compossible with the other demanded features.

All of this bears directly on the “lonely” response. A lonely script is one that appears to instantiate the construct by a route not anticipated by the rubric’s decomposition. It is lonely because the rubric has carved the space of performances too narrowly, so the script does not sit on the familiar comparison shelf. Fine’s apparatus suggests that the loneliness is not merely epistemic, not merely “we do not know what to do with it”. It may be metaphysical in the modest sense relevant here: the script does not share the same part structure, so the rubric cannot represent its content without distortion. The temptation is to treat it as an outlier and suppress it for reliability. The better move is to ask whether your reliability is being purchased by construct underrepresentation.

That brings us to validity and reliability^[10]. The immediate worry is that making room for lonely responses will explode inter rater reliability, because it asks markers to exercise judgement rather than apply rules.^[11] But the Finean point is not “abandon rules”, it is “stop pretending the rules already capture the construct”. Reliability and validity have to be pursued together, and a rubric can be made more sensitive to structural wholes without collapsing into idiosyncrasy, if you build the right scaffolding.

One design implication is to separate two functions that rubrics currently conflate. They try to be both measurement instruments and explanations of value.^[12] A Fine sensitive rubric should admit that at higher levels it is tracking whole act properties that are not reducible to a checklist. Instead of pretending that the grading bands are the sum of parts, the rubric can explicitly treat the parts as defeasible evidence for the whole. Concretely, you can keep analytic criteria^[13], but add a structurally prior holistic criterion that is not a “catch all”, but a disciplined judgement about the essay’s explanatory organisation. Think of it as an “integration and necessity” criterion: do the claims, quotations, and moves hang together in a way that makes them mutually supporting, with little slack?

This is where truthmaker talk cashes out: does each component play a verifying role for the central interpretation, or is it inert surplus? Markers can be trained to look for surplus and to ask whether the essay would still be the same act of understanding if a part were removed.^[14] That is an informal analogue of exactness, and it is surprisingly reliable when anchored in examples. A second implication is to build in an explicit “construct preserving exception” pathway for lonely scripts. In many exam systems, special cases are handled informally through senior examiner judgement.^[15] The Finean move is to legitimise that pathway and specify its conditions.

For instance, a rubric might say: “historical context is normally required evidence of interpretive understanding, but in exceptional scripts, a deep structural reading of the play’s dramatic logic can discharge the same construct requirement by an alternative route.” That is not “anything goes”, it is a controlled recognition that the construct can be verified by different kinds of evidence. The key is to define those alternative routes in construct language, not in feature language. The goal is still to assess interpretive understanding, not to reward rule breaking. This preserves validity, because you are not expanding the construct arbitrarily, you are acknowledging multiple ways of making it true.

A third implication is moderation design. If lonely scripts are rare but high stakes, reliability can be protected by procedural rather than purely rubric based means. You can require that any script flagged as “lonely” be double marked by a specialist panel, with an explicit written rationale framed in construct terms.^[16] That converts private discretion into public reason giving. It also creates a corpus of exemplars that can later be used to train markers and refine the rubric, which is how validity improves over time without abandoning standardisation.

A fourth implication is to rethink exemplars. Educational practice often uses exemplars as if they eliminate vagueness by providing fixed anchors. Fine’s and Sorensen’s shared sensitivity suggests a more nuanced view. Exemplars can stabilise judgement, but they can also hide global indeterminacy by making the shelf look smoother than it is. A Fine informed exemplar policy would deliberately include “lonely exemplars”, scripts that do not fit the expected route but that are judged to instantiate the construct strongly. The point is not to encourage eccentricity, it is to teach the marker community what counts as genuine understanding when it arrives in an unexpected shape. This can increase reliability, paradoxically, because it reduces the panic response to outliers.

A fifth implication is about criterion design in terms of compossibility. If you find that the highest quality performances routinely violate a particular criterion, that is evidence that the criterion is not a constitutive part of the construct at that level, or that it is non compossible with the kind of thinking you are trying to elicit under the current constraints. “Historical context” in Macbeth is a good example. If context is treated as a required paragraph, it may be non compossible with an essay that is doing fine grained analysis of language and dramatic action, because the time and space needed to do both properly is not available. A Fine style diagnosis would say: either redesign the task to make the fusion possible, longer time, different format, or redesign the construct operationalisation so that context is one possible truthmaker among others, not a mandatory part in every case. That is a design decision that can be justified empirically by studying which essays most convincingly instantiate the learning aims.

At this point the challenge is to show that this is not simply a licence for subjectivity. The crucial move is that Fine’s distinctions can be operationalised as marker questions with surprisingly high agreement when supported by training and examples. Ask whether a piece of evidence is doing explanatory work or is decorative. Ask whether a paragraph could be removed without changing the central intellectual achievement. Ask whether the essay’s claims are mutually supporting or merely sequential. Ask whether the essay’s interpretation changes what the question looks like and whether that change is argued for with textual control. These are not idiosyncratic tastes, they are discipline anchored assessments of structure.

Fine’s apparatus does not ask you to choose between reliability and validity. It tells you that current rubrics often achieve reliability by flattening structure, which yields a kind of false reliability: agreement about the wrong construct. A design that acknowledges non additivity and makes space for non standard truthmakers can preserve reliability through process and shared exemplars, while improving construct validity by being honest about what the construct actually is: not a bag of separable parts, but a coherent act of disciplinary understanding. The “lonely” response becomes, in this view, not a threat to assessment, but a diagnostic signal that your rubric is measuring a simplified surrogate rather than the achievement you claim to value.

We also need to stop treating “lonely” as a phenomenon that only appears at the top, where originality outruns the checklist. In classrooms, teachers often report something else: a piece of work that, by rubric lights, is weak or even failing, and yet it carries a kind of intellectual pressure, a signal of genuine grip, that the teacher cannot easily legitimise. The system invites the teacher to redescribe that signal as sympathy, halo effect, or a preference for a certain voice. The teacher, if conscientious, then suppresses it, because reliability demands that the rubric win. What Fine’s apparatus makes thinkable is that this suppression is not always virtuous. Sometimes it is a systematic blindness built into additive rubrics, and it can be diagnosed without abandoning rigour.^[17]

To see the shape of the claim, keep two distinctions in view. The first is the difference between local defects and global structure. Additive rubrics tend to assume that if enough local defects are present, the global performance must be poor. Fine’s global approach to vagueness suggests a different possibility: the indeterminacy may live at the level of the whole series or range, not at the level of any one local item. Translate that into assessment terms. A student may be unable to meet many local descriptors in stable ways, but may nevertheless be exhibiting a globally intelligible intellectual trajectory, a developing form of sense making that is not captured by the local checklist. The second is Fine’s hyperintensional point, that content is not exhausted by extension. Two scripts can “fail” according to the same rubric and yet fail in different ways, because what they are trying to do, the internal organisation of reasons, differs. One is simply empty, the other is trying to say something real but cannot yet make it come out in the expected format.

A rubric that only recognises the extension, the observable features, will treat them as the same. A hyperintensional lens insists they are not the same object. Now bring in truthmakers and exactness in a deliberately classroom minded way. Suppose a rubric has a criterion like “use quotations to support points”. The extensional reading is simple: do quotations appear and are they roughly relevant? A truthmaker reading asks a different question: what in the student’s work is doing the job of making their interpretive claim responsible to the text? Quotations are one common truthmaker, but not the only one. A student might describe a scene’s timing, a shift in address, an interruption, a repeated image, a rhythmic pattern, and show sensitivity to dramatic function without being able to deploy quotations neatly. In traditional marking, that can look like lack of evidence. In truthmaker terms, the student may have evidence, but of a different kind, and the criterion has been operationalised too narrowly.

Exactness adds a further diagnostic. A weak essay often has surplus, lots of inert material, context dropped in, quotations as ornaments. A “lonely success” at low rubric levels may show the opposite pattern: not much material, but unusually little slack. The student has very few moves, but each one is doing work, and the work is directed at something real. Teachers often experience this as, “there is something there, but I cannot justify it”. Exactness gives you a way to justify it: the script contains a small set of components that are tightly fitted to a claim, even if the claim is not yet expressed in the canonical way.

Imagine a student writes a short response to Macbeth that barely mentions historical context, does not use critical vocabulary, and misquotes a line. By rubric, it collapses. But the student fixates on a staging fact, perhaps the way Macbeth’s “Is this a dagger” speech is not merely introspection but an address to an absent object, almost a rehearsal of action in the mode of perception. The student writes, in plain language, something like: Macbeth is practising murder before he does it, he is trying to make the thing real by looking at it, and that is why later he cannot make the blood unreal, because he has already crossed into a world where imagining and doing are stuck together. This is not an exam perfect claim. It may be poorly evidenced and unpolished. But it is conceptually hooked into the play’s dramatic logic.

The lonely feature here is that the student is tracking a modal shift, a change in what is possible for Macbeth after a certain point, without having the vocabulary to present it as “tragic inevitability” or “psychological disintegration”. The rubric punishes the surface failures and never even notices the underlying achievement, namely that the student has located a structural pivot and is reasoning about necessity and possibility in the play. Fine’s modal apparatus helps you say what the student has grasped without romanticising it. The student is implicitly distinguishing what is merely possible, what is becoming necessary given Macbeth’s character and situation, and what is non compossible, what cannot be held together once certain choices are made.

In Fine’s terms, compossibility is not just logical consistency, it is joint realisability within a coherent state. Macbeth’s world after the murder is not merely one in which guilt is added to prior life, it is one in which certain kinds of rest become non fusable with his self understanding. The student is gesturing at this, even if clumsily. That is a cognitive achievement of the relevant construct, interpretive understanding of the drama’s structure, even though the script is locally defective.

The lonely quality is that the student has a truthmaker for genuine understanding, sensitivity to structural modality, but lacks the standard truthmakers that the rubric recognises, polished quotation use, terminology, and neat paragraphing, grammar and spelling.

The sceptic will say: you are just praising an interesting idea and ignoring basic competence. The Fine oriented reply is to distinguish two questions that additive rubrics conflate. First, does the script instantiate the targeted understanding in any recognisable way, even minimally? Second, is the script a socially acceptable token of that understanding under exam norms, with required conventions? A rubric often treats failure on the second as decisive evidence against the first.

Fine’s framework warns you that this inference is unsafe because the mapping from underlying content to recognised features is not one to one. Different internal structures can produce the same surface features, and the same internal structure can sometimes appear with atypical surface features, especially for disadvantaged students or students with atypical linguistic resources. If you care about validity, about whether you are measuring the construct you say you are measuring, you need a way to detect underlying structure even when the conventional packaging is missing. Fusion and non closure are especially useful here. Many low band scripts fail by being a mere heap. They contain pieces, but no coherent fusion. That is a real deficiency in the construct. But some low band scripts fail in the opposite way. They are not heaps, they are tight, but they are tight around something idiosyncratic, a line of thought that does not fuse with the rubric’s expected decomposition. A student may produce a coherent little structure that is simply not closed under the rubric’s fusion operation. When the rubric tries to fuse it with “context paragraph”, “terminology”, “alternative reading”, the fusion fails, and the script is judged as incomplete rather than as coherent but non aligned. This is exactly the kind of failure that teachers often sense but cannot name. It is not that the student did not think, it is that the assessment design is non closed under the kind of thinking the student can currently do.

A history example makes this even clearer because the conventional truthmakers are well policed. Imagine an essay question about the causes of the First World War. A low band student lists a few facts, some wrong dates, no historiography, weak structure. But there is a paragraph where the student says something like: It was not one cause, it was a situation where each country acted as if the others would back down, and that made backing down impossible without losing face, so the system produced a kind of trap. That is a game theoretic insight in ordinary language. It is not yet supported by evidence in the expected way, and the student cannot name “security dilemma” or “credibility” or cite scholars. But the underlying structure is recognisable. It is a claim about a modal dynamic, what becomes possible and impossible when certain commitments are made publicly. Fine’s talk of necessity as an explanatory shadow of essences can be translated, cautiously, into history talk: not that the war was metaphysically necessary, but that given the structural roles and commitments of the actors, certain trajectories became locked in.

Again, a lonely success emerges: the student has grasped a structural mechanism but cannot yet supply the conventional academic apparatus. Additive marking may bury that under missing parts. A maths example is the most delicate because the culture of right answers is strong. But even there teachers often see “wrong but good”. A student gives an incorrect final answer to a geometry problem, but they actually understood the important idea in the problem. In geometry, many problems depend on spotting a hidden pattern or relationship. For example, you might need to notice that two triangles are similar, meaning they have the same shape even if they are different sizes. Or you might need to see that a particular angle stays the same even when the diagram changes. If the student drew a sensible diagram, marked equal angles correctly, and used the right reasoning steps, that shows they had grasped the key structure of the problem. They saw what really matters. The final arithmetic slip or small mistake does not cancel that understanding. They understood the geometry, even though they made an error at the end.

The conventional scoring often gives part marks, but the radical point is different. Sometimes a student gives a wrong answer because they are pursuing a more general method than required and they overreach. Their work shows conceptual control, an ability to set up a proof strategy, but their execution fails. Another student gives the right answer by pattern matching a memorised procedure with no grasp.

Extensional^[18] scoring privileges the second because the output matches. A truthmaker lens privileges the first because the verifying structure for mathematical understanding is present. Exactness matters again. The student’s reasoning may be minimal and tightly connected, even if the final numerical output is wrong. The other student’s reasoning may be bloated with surplus steps that cancel out, because the procedure is being applied blindly. If what you want to assess is understanding, not mere production of an answer, the Fine oriented diagnosis says the wrong answer can be a better token of the construct.

At this point, the sceptic’s strongest objection is reliability. If you allow teachers to reward lonely successes at low levels, you risk systematic bias, inconsistency, and grade inflation. This is where the “controlled and regimented” part matters, and where Fine’s apparatus can actually help, because it gives you disciplined distinctions rather than just permission. You can build a protocol that distinguishes three categories, and the categories are not psychological, they are structural.

One category is heap failure. The script contains elements that do not fuse into a coherent performance. There is no stable central claim or method, the parts do not support each other, there is lots of inert surplus, and removing parts makes no difference because nothing is doing work. This is a genuine failure of the construct.

A second category is aligned low competence. The script is trying to play the rubric game, but lacks the knowledge and skill to do it. It has the right kind of parts in the right order, but they are thin. This is a common kind of failure and the rubric captures it fairly well.

The third category is lonely coherent misalignment. The script has a coherent internal structure, minimal slack, and an identifiable intellectual achievement, but that structure does not match the rubric’s decomposition, and therefore the standard evidence signals are missing. This is the lonely success zone. The protocol’s job is not to turn category three into an automatic pass. It is to make it visible as a distinct object so that decisions about it can be made with reasons. You can require that any marker who wants to treat a low scoring script as lonely coherent misalignment must write a short construct based justification, describing what the student has achieved, what makes it coherent, and what truthmakers in the script support the judgement. You can then require second marking or sampling moderation for that category. Over time, exemplar libraries can be built, including lonely low band scripts that were later judged to contain genuine achievement, as well as scripts that initially seemed promising but were revealed to be heaps. This is how you protect reliability without erasing the phenomenon.

Fine’s global vagueness point also helps explain why lonely success is not rare noise but a predictable consequence of how we force judgement. In a forced march classroom, where students must always give a definite answer and must always write in a particular format, there may be no single “borderline” script that is intrinsically borderline. The borderline is in the range. Teachers can sense that the series from very weak to very strong contains no clean cut off, and that some students occupy unstable positions, because their work flips depending on the demands imposed. Fine’s denial of local indeterminacy, in the educational translation, would be a warning against treating any one script as the essence of borderline.

The issue is the ecology of scripts and the imposed decision procedure. A student’s loneliness is often revealed only against that ecology. This is why a teacher might be confident that there is something real even if they cannot point to a single criterion that captures it. The teacher is responding to a pattern in the range, not to an isolated property.

Many disadvantaged students are precisely those whose work is most likely to be lonely. They may have genuine conceptual resources, sensitivity, intellectual courage, but lack the cultural and linguistic packaging that rubrics reward, including the tacit genre knowledge of what counts as evidence and how to display it. My Fine oriented approach lets you say that the construct may be present even if the expected display is absent. That is not a licence to inflate grades for reasons external to merit. It is a challenge to the operationalisation of merit. If the construct is genuine understanding, then the absence of conventional markers of understanding is not decisive evidence against understanding. It is evidence about access to the conventions.

This also complicates the idea of “powerful knowledge”. If powerful knowledge is heard as a stock of facts, then lonely successes will be treated as failures until the facts arrive. But if powerful knowledge is heard in a more Finean way as connoisseurship of disciplinary possibility space, the ability to see what follows from what, what is ruled out, what can be combined, what kind of move changes the field, then some lonely scripts are early signs of powerful knowledge in formation. They show sensitivity to modal structure even when factual coverage is patchy. A teacher who has deep disciplinary training is often better at recognising this because they can see the underlying structure that a novice cannot.

The policy implication is not simply “more subject knowledge”. It is that assessment design should not assume that knowledge is only factual and that reasoning is only a generic skill. It should build channels for recognising structural understanding in disciplined ways. The heavy lifting, then, is to insist that this is an alternative model of what you are doing when you assess. You are not merely checking whether a list of features is present, you are trying to judge whether a student has produced a coherent act that verifies a claim about understanding. Additive rubrics behave as if verification is feature possession and as if fusion is always possible.^[19] Fine’s apparatus says verification has structure and fusion can fail. Once you accept those two points, it becomes not only possible but professionally responsible to recognise that some low scoring scripts are failures of packaging rather than failures of understanding, and that some high scoring scripts are successes of packaging rather than successes of understanding.

The radical shift is not to reward all loneliness. It is to create a controlled mechanism for detecting lonely coherence, and to treat it as evidence about validity, about whether the rubric is actually tracking the construct, especially for students who do not naturally speak the rubric’s language.

Currently, a student may produce an E grade Macbeth response that fits the descriptor almost perfectly, and yet the teacher recognises that the work contains a kind of intellectual achievement that is structurally closer to a higher band. The rubric is not merely insensitive, it is capturing the script illicitly, in the sense that the script is being treated as an instance of the low grade kind when it is, in its internal organisation, a different kind of object. We need to explain what illicit capture is, why it happens, and what constraints can prevent it.

Extensional fit is what rubrics are built for. You match surface features against descriptors and you assign a band. Hyperintensional structure is about how the reasoning is put together, what is doing work, what grounds what, what is essential to the piece as a performance rather than incidental. Illicit capture happens when extensional fit is treated as sufficient evidence for hyperintensional kind membership. The student’s script looks like E grade because it is short, ungainly, light on quotation, missing context, awkward in expression, and the rubric is designed to treat these as jointly diagnostic. But the inner structure may be closer to a higher grade kind because it contains an organising insight that governs the rest, even if the rest is thin. The teacher is responding to that governing insight as something like an essential feature of higher band work, namely the ability to reframe what the question is really asking, or to locate a structural pivot in the text.

To make this non handwavy I want to use Fine’s idea of exactness in a deliberately deflationary way. A typical E response often has slack of a particular sort: it is a heap of loosely connected comments and plot summary, with stock moral phrases and a few gestures to themes. Remove a sentence and nothing changes, add a sentence and nothing changes. That is a structural sign that nothing is grounding anything.

A lonely E looking script can be the opposite. It may have very little content, but what it has is exact in the sense that the few elements it contains are all serving one organising claim. Each sentence, though badly formed, is doing work. There is minimal surplus. The script is not a heap, it is a small but coherent state. This is why it feels higher than the rubric allows. The rubric is mistaking thinness for incoherence because it uses quantity and conventional markers as proxies for structure.

Take a Macbeth essay example that shows illicit capture. The question asks about ambition. The student writes two short paragraphs. No context, one short quotation half remembered, clumsy expression, poor spelling. But the student’s central move is: Macbeth’s ambition is not wanting to be king, it is wanting the act to feel inevitable so he can avoid owning it, and that is why he keeps talking as if the future is already decided. This is a claim about modality, about how Macbeth tries to convert a choice into a necessity. The student then links it to the dagger speech as rehearsal, and to the later “sleep no more” as evidence that the necessity he tried to manufacture has changed what is possible for him. If that is really what the script is doing, then it is tracking the play’s deep structure. Yet the rubric captures it as E because the student cannot display the move in canonical exam form.

The illicitness is structural. The rubric’s decomposition, evidence, context, terminology, paragraphing, has become a fusion test. It assumes that higher understanding must fuse with those display conditions. But the student’s understanding may be compossible with the play and the question while being non compossible, at that stage of their schooling and language development, with the full display package. So the rubric is not measuring understanding plus conventions. It is measuring the ability to produce a fused performance where understanding and convention are jointly realised. That might be a legitimate construct, but it is not the one teachers often think they are assessing when they say they are assessing literary understanding.

If we accept that, we can state a constraint designed to block illicit capture while still acknowledging lonely successes. You require a demonstration that the script belongs to a different hyperintensional kind. Concretely, you can require three checks. First, a grounding check. The marker must be able to say what the script’s central claim is and how other sentences function as grounds for it. If the marker cannot identify any grounding structure, then the sense of promise is probably projection. This check filters out the romantic reading of a confused script.

Second, an exactness check. The marker must identify at least one place where the student’s sentences are minimal for the claim, where removing a sentence would break the reasoning. That forces the marker to show that the script is not a heap but a small coherent object. It is a testable claim because another marker can try the removal and see if the work collapses.

Third, a non compossibility check. The marker must specify which rubric demands the student fails and argue that those failures are not failures of the construct being claimed. For example, the student lacks context. Does the insight require context to be what it is? If not, then the absence of context is a display deficit, not a construct deficit. But if the insight actually depends on historical understanding and the student has none, then the claim to higher band understanding is illicit. This check forces the lonely claim to be disciplined by a separation of construct from display.

These checks turn Fine’s apparatus into a gatekeeping mechanism against misuse. They do not abolish disagreement, but they make disagreement track something more specific than taste. They also create the possibility of reliability, because the judgements are no longer private. They are about identifiable structural properties: presence of a governing claim, identifiable grounding relations among sentences, minimality, and a stated argument about why certain missing features are not part of the relevant construct.

Modern assessment regimes often deform constructs by only sampling what can be reliably marked with low disagreement using additive rubrics. In English, the dropping of creative poetry writing, is a perfect example. The official story is subjectivity. The deeper story is that creative work has hyperintensional features that are hard to capture extensionally. If you treat the construct as what can be operationalised through checklistable features, you will inevitably shrink it until it fits the method. Then you say you are measuring “English ability” but you have quietly redefined English ability as “the set of performances that survive extensional marking”. The method becomes the metaphysics.

Fine helps you expose that without relying on a subjective, affective defence of creativity. The point is that the internal organisation of a poem, its economy, its exactness, its way of making one line ground another, and its capacity to shift what is possible to say within a constrained form, are structural properties. They are not reducible to crude feature counts, but they are not mere feelings either. They are the sorts of properties that a truthmaker and grounding approach is built to respect. A poem can be thin but exact, small but coherent, and those can be robust markers of achievement. A poem can also be rich but slack, full of imagery with no necessity. That difference is not captured by extensional proxies like “uses metaphor” or “varied vocabulary”.

If you take this seriously, “appropriate sampling” of a construct changes. You stop thinking of sampling as choosing a set of tasks that cover content areas and are easily scoreable, and you start thinking of sampling as choosing tasks that expose the space of possible performances in the domain. In Fine’s language, you want tasks that reveal the compossibility constraints of the discipline, what combinations of virtues can be jointly realised in a coherent act. In English, that means that interpretive essays alone may not sample the construct of literary understanding, because the ability to produce language that carries necessity, that makes each part do work, is part of what literary understanding is. Excluding poetry writing is like excluding proof writing from mathematics because it is hard to mark, then claiming you still assess mathematical understanding. You might be assessing something, but the construct has been bent to fit the scoring machinery.

When you remove tasks because they are “too subjective”, you often remove tasks whose success conditions are hyperintensional. You then build an assessment ecology that selects for extensional display competence. Students who can play the display game look strong, students with lonely exactness look weak, and the discipline itself is represented as a set of decomposable skills rather than as a space of structured performances. That is a metaphysical distortion, not merely a technical adjustment.

Fine can change this across the curriculum. He gives you a principled way to argue that tasks should be included because they are construct essential, even if they reduce extensional reliability, and then to compensate by redesigning reliability around structural diagnostics rather than around crude agreement on feature counts. In maths, you include proof and explanation, not just answers, because the construct is not answer production. In history, you include narrative synthesis and interpretive argumentation, not just source questions, because causal reasoning is not reducible to spotting features in a text. In science, you include modelling and explanation, not just recall, because understanding is about the structure of reasons. In each case, the Finean move is to insist that validity requires preserving the internal structure of the domain, and to treat reliability as a design problem to be solved with better representational tools, not as a veto that allows you to amputate parts of the construct.

Educational “Connoisseurship”^[20] is expertise at recognising the difference between heaps and fused wholes, between slack and exactness, between local feature possession and global coherence, between merely consistent part scores and compossible virtues in a single act.

That expertise is discipline specific^[21]. It depends on knowing what counts as a real move in Macbeth criticism, what counts as a real explanation in history, what counts as a real proof strategy in maths. So it pressures the skills based teacher discourse in a very direct way. If the system wants to recognise lonely successes fairly, it needs assessors whose training enables them to see the internal structure, and it needs moderation practices that can check those structural claims.

The “lonely E that deserves higher” case is the canary for construct deformation. If your sampling is narrow and your rubrics are additive, the only achievements you can recognise are those that fuse with the display package. Anything that is coherent but misaligned will be captured as low. The system will then teach students to avoid lonely moves, because lonely moves are punished. Over time, you train out the very capacities you should be cultivating, and you call the result rigour. Fine gives you a language to say, in a way that can be operationalised, that this rigour is in part a formal artefact of extensional assessment design, and that a more faithful assessment ecology must include tasks and criteria that are sensitive to internal structure, even at the cost of forcing us to become better at training, moderating, and evidencing judgement rather than pretending that judgement can be replaced by decomposition.

Finally, just to show that a metaphysics from analytic philosophy has direct practical application in current philosophy of education here is a concrete Macbeth assessment package that implements the “lonely success” idea in a disciplined, reliability minded way, without collapsing back into vibes or into pure additivity. You can think of it as three layers: tasks, rubric architecture, and moderation protocol. The key is that “structural coherence” is assessed by explicit diagnostics (grounding, exactness, non-compossibility) rather than by impressionistic holism, and that the package is designed to surface lonely cases rather than accidentally punishing them. Task design Task 1: Interpretive essay (standard form, but with an explicit invitation to reframe)

Prompt: “Macbeth is often described as a play about ambition. Discuss, but you may challenge the question’s framing if you can justify the alternative.”

Length: exam length or coursework length depending on your setting. Why: This keeps comparability with existing regimes but creates a legitimate pathway for the student who changes the question. It stops the system treating reframing as off task when it is actually a high level achievement. Task 2: Micro-commentary plus transformation (short, designed to expose structure)

Part A (commentary): Choose one short passage (for example the dagger speech, or “Tomorrow and tomorrow and tomorrow”). Write x words explaining what the passage does in the play, focusing on how it changes what is possible for Macbeth to say, think, or do.

Part B (transformation): Rewrite the passage as (i) a police interview transcript, or (ii) a clinical case note, or (iii) a prayer, y words. Then add 3–5 bullet points explaining what your transformation reveals about the original. Why: This task is very good at surfacing “lonely” brilliance that cannot yet perform the full essay conventions. It also gives you a second independent window on the same construct. Importantly, it is not “creative writing for its own sake”, it is construct aligned, because the transformation is used to test interpretive control and modal sensitivity (what the language makes possible or impossible). Rubric architecture Instead of a purely additive rubric, use a two track rubric. Track A: Display and disciplinary conventions (still assessed, still matters)

A1 Use of textual evidence (quotation, reference, accuracy)

A2 Organisation and clarity (paragraphing, signposting, coherence at the level of presentation)

A3 Context and scholarship (historical, theatrical, critical frames, where relevant)

A4 Technical control (expression, terminology, accuracy) Track B: Structural achievement (the Finean part, explicitly diagnosed)

B1 Governing claim and grounding

Can the marker state the script’s governing claim in one or two sentences, and identify which parts of the script function as reasons for it?

Markers must write a short “grounding map” note: “Main claim: … Grounds offered: … Link: …” B2 Exactness and necessity

Is there evidence that parts of the response are doing non redundant work, meaning that if you removed them the central claim would collapse or change? This is assessed by an explicit removal test.

Marker note: “If we remove sentence X, does the argument break. If yes, how.” B3 Compossibility of virtues

Do the local strengths fuse into a coherent act, or are there local virtues that cannot be jointly realised in this performance?

Marker note: “Strengths present: … Tension: … Does the tension block the central claim, or is it merely a convention deficit?” B4 Reframing and modal control

Does the student legitimately re specify the question, showing control over what the question is really asking and what the play makes possible to claim?

Marker note: “Reframe: … Justification: … Consequence for the rest of the analysis: …” Crucial scoring rule to prevent illicit capture Do not let Track A simply average with Track B. Use a gating structure.

The final grade is anchored in Track B. Track A can raise or lower within a bounded range, but it cannot fully determine the grade if Track B shows high structural achievement.
Define “lonely success” as the specific pattern:
High Track B (B1–B4), low Track A (A1–A4), with the marker documenting:

the governing claim,
the exactness test,
and a non-compossibility explanation that the deficits are display, not construct.

Define “illicit capture” as the opposite pattern:
High Track A, low Track B. The script fits the descriptor but has no grounding structure, heavy slack, or incoherent fusion. This prevents the system rewarding polished emptiness.

This structure makes the system resistant to both romanticism and box ticking.

How this looks in practice with the E grade example discussed earlier. A typical E script might score low on B1 because the marker cannot state a governing claim beyond “Macbeth is ambitious”, and the removal test shows nothing breaks. That is a heap. A lonely E looking script might score high on B1 and B2 because it has one real move, for example “Macbeth converts choice into destiny to avoid responsibility”, and each sentence supports that. It might score low on A1 and A3 because evidence and context are thin. Under the gating rules, it is eligible to be lifted because it is a different kind of object than the heap, even though it shares surface deficits. Moderation protocol for reliability To keep reliability and fairness, you make structural judgement auditable.

Double mark only the flagged cases
Flag scripts with either:

lonely pattern (Track B much higher than Track A), or
illicit capture pattern (Track A much higher than Track B).
This makes moderation targeted and cost effective.

Require a “structural warrant”
For any grade movement driven by Track B, the marker must provide three short warrants:

Grounding warrant (B1)
Exactness warrant (B2)
Non-compossibility warrant (B3)

These are short, but they create a common language for moderation.

Use a small calibration set
Before live marking, markers jointly look at:

2 clear high scripts,
2 clear low scripts,
4 ambiguous scripts including at least 2 lonely candidates.
They practise writing the three warrants and compare. The goal is not total agreement, it is agreement on what counts as a warrant.

Introduce “construct sampling checks”
At the level of the assessment design, you audit whether tasks are deforming the construct by over selecting for extensional display. If too many grades are driven by Track A, you adjust tasks or guidance. If too many are driven by Track B without stable warrants, you tighten the warrant templates.

What changes in teacher expertise. This package makes discipline expertise matter in a defendable way. A teacher with strong disciplinary formation tends to be better at:

identifying a real governing claim,
recognising when a reframing is legitimate rather than evasive,
seeing when apparent errors are actually evidence of a deep reading,
distinguishing non compossibility of virtues from simple confusion.

But the system does not merely rely on that expertise privately. It forces it into a shareable form through the warrants and the calibration set. That is what turns “connoisseurship” into something the assessment system can responsibly use.

I think the abandoned NEAB 100% coursework assessment protocols of the late 1980' and early 1990's in the UK operationalised something very like this.

Conclusion

Williamson's and Sorensen’s local vagueness moves are useful even if you ultimately side with Fine on globalism, because they help you explain to sceptical assessors why “global indeterminacy” is not a psychological story about marker hesitation, and why it is not refuted by the fact that we can sometimes be sure there is some problematic case without locating it.

It also gives you a caution: if Fine’s denial of local indeterminacy is too strong, you do not have to accept it wholesale to get the educational payoff. You can adopt the operational moral that “borderline” is typically relational and range-governed, while allowing that there may be special constructs or domains (or special quantificational predicates, in particular Sorensen’s “lonely”) where the pressure to globalise looks less natural.

That is the fusion you want for education: lead with Williamson and Sorensen to win agreement that (i) assessment predicates can be perfectly meaningful yet inherently resistant to boundary knowledge, and that (ii) local rule-tightening cannot, in principle, deliver the elimination narrative; then expand with Fine to show that the right picture of competent application is range-based, exemplar-based, and structurally constrained, and that “borderline” is often not a locally intrinsic defect to be engineered away but a predictable manifestation of global indeterminacy in a practice that nevertheless has to act.

References

Black, Paul, and Dylan Wiliam. 1998. “Assessment and Classroom Learning.” Assessment in Education 5, no. 1: 7–74.

Cresswell, Michael J. 2000. “Maintaining Standards: The Role of Awarding Bodies.” Assessment in Education 7, no. 3: 347–358.

Cresswell, Michael J. 2008. “The Role of Professional Judgement in Awarding.” Ofqual working paper.

Eisner, Elliot W. 1976. “Educational Connoisseurship and Educational Criticism: Their Form and Functions in Educational Evaluation.” Journal of Aesthetic Education 10, no. 3/4: 135–150.

Eisner, Elliot W. 1991. The Enlightened Eye: Qualitative Inquiry and the Enhancement of Educational Practice. New York: Macmillan.

Fine, Kit. 2020. Vagueness: A Global Approach. Oxford: Oxford University Press.

Glaser, Robert. 1963. “Instructional Technology and the Measurement of Learning Outcomes.” American Psychologist 18, no. 8: 519–521.

Kane, Michael T. 1992. “An Argument-Based Approach to Validity.” Psychological Bulletin 112, no. 3: 527–535.

Marshall, Bethan. 2000. “Constructing the Subject: English in the National Curriculum.” Changing English 7, no. 2: 199–209.

Marshall, Bethan. 2004. “English Teachers’ Conceptions of the Role of Literature in English.” Changing English 11, no. 2: 231–242.

Marshall, Bethan. 2010. “Constructing the Subject: English and the Question of Knowledge.” English in Education 44, no. 3: 213–228.

Messick, Samuel. 1989. “Validity.” In Educational Measurement, 3rd ed., edited by Robert L. Linn, 13–103. New York: Macmillan.

Sadler, D. Royce. 1989. “Formative Assessment and the Design of Instructional Systems.” Instructional Science 18, no. 2: 119–144.

Sadler, D. Royce. 2005. “Interpretations of Criteria-Based Assessment and Grading in Higher Education.” Assessment and Evaluation in Higher Education 30, no. 2: 175–194.

Sorensen, Roy. 2001. Vagueness and Contradiction. Oxford: Oxford University Press.

Sorensen, Roy. 2022. “Vagueness: A Global Approach.” Notre Dame Philosophical Reviews, January 3.

Williamson, Timothy. 1992. “Inexact Knowledge.” Mind 101, no. 402: 217–242.

Williamson, Timothy. 1994. Vagueness. London: Routledge.

[1] Williamson, Timothy. 1994. Vagueness. London: Routledge.
The canonical epistemicist defence of sharp but unknowable boundaries. Supplies the irremovable ignorance lever against assessment hubris.

[2] Sorensen, Roy. 2001. Vagueness and Contradiction. Oxford: Oxford University Press.
Rich development of epistemicist themes, higher-order vagueness, and paradox pressure. Helpful in reconstructing the localist temptation – more of that later.

[3] Williamson, Timothy. 1992. “Inexact Knowledge.” Mind 101 (402): 217–242.
Early statement of the margin-for-error principle. Crucial for explaining why boundary knowledge is structurally blocked.

[4] Fine, Kit. 2020. Vagueness: A Global Approach. Oxford: Oxford University Press.
The central text for my global indeterminacy framework. Key for rejecting local borderline cases and modelling vagueness at the level of ranges.

[5] Sorensen, Roy. 2022. “Vagueness: A Global Approach.” Notre Dame Philosophical Reviews, January 3.
Authoritative review of Fine (2020). Introduces the forced-march sorites, the misshelved book analogy, and the “lonely” case as a challenge to Fine’s minimum range condition.

[6] A simple way to grasp Fine’s idea of the hyperintensional is to use Superman. Most standard logic says this: if two sentences are true in exactly the same possible situations, then they count as saying the same thing, at least as far as logic is concerned. So consider: “Superman can fly.”
“Clark Kent can fly.” Since Superman is Clark Kent, these sentences are true in exactly the same possible worlds. Wherever one is true, the other is true. So in the ordinary possible-world sense, they are equivalent. But clearly they are not the same in meaning. Lois Lane might believe that Superman can fly while denying that Clark Kent can fly. She does not know they are the same person. So although the two sentences match in truth across all possible situations, they differ in cognitive content. They present the same individual under different descriptions. Fine’s point is that logic should be able to notice that difference. Possible-world logic is “too coarse-grained”: it only checks whether statements line up in all possible scenarios. Fine argues that this misses something important. We often care not just about whether two statements are necessarily equivalent, but about how they are true, what they are about, or what makes them true. Here is another simple comparison: “Superman is Superman.”
“Superman is Clark Kent.” Both are necessarily true, if true at all. In every possible world where Superman exists, he is himself. And in every world where the identity holds, he is Clark Kent. Yet the first is trivial and uninformative. The second is a significant identity discovery. It explains something. It connects two ways of thinking about the same person. Possible-world logic treats them as equally necessary. Fine says that is not enough. We need a finer tool that distinguishes:

• trivial necessity
• informative identity
• explanatory connections
• what is essential to something That finer tool is what he calls hyperintensional. It allows two statements to be necessarily equivalent but still different in content, structure, or grounding. So, in short: Possible-world logic asks:
Are these true in the same situations? Fine asks:
Do they say the same thing?
Are they true for the same reason?
Do they involve the same essence or explanation? With Superman, the answer is clearly no. And Fine thinks logic should be sensitive to that difference.

[7] Messick, Samuel. “Validity.” In Educational Measurement, 3rd ed., edited by Robert L. Linn, 13–103. New York: Macmillan, 1989.
Use: The classic modern validity framework, vital for your “maintain the construct” requirement. Helps keep the Finean innovations tied to validity arguments, not just metaphysical elegance.