AN APPLICATION OF SORENSON’S SOLUTION TO VAGUENESS TO EDUCATIONAL GRADING ASSESSMENT PARADIGMS Submitted for the degree of Doctor of Philosophy Richard Henry Marshall

Department of Educational Foundations and Policy Studies Faculty of Culture and Pedagogy Institute of Education University of London London WC1H 0AL, UK January 2011 Richard Henry Marshall 2011
DECLARATION I declare that this thesis is a presentation of my original research work and has not previously been submitted for a degree from any university. To the best of my knowledge, this thesis does not contain any material previously published or written by another person except where duly acknowledged in the text. Word count (excluding appendices and bibliography): 93,436 words Richard Marshall

ABSTRACT The thesis examines how a particular theory of vagueness affects

six requirements of any competent educational grading assessment system. It is assumed that all such systems have to be rational, reliable, valid, universally decisive, produce superlatives and simple. The first three are about consistency. The fourth and fifth are about completeness. Simplicity is about understandability, usability and believability. Any examination system

[RM2]

is one that tidies up question and answer systems in a way that allows for the mechanical determination of statements made by the system. Informal systems are natural but according to Sorenson they have two layers of obscurity (Sorenson 2001). One is whether an answer speaks to the question and secondly if it does, which answer does it give? Vagueness is about the first layer. It is not about what the answer is to any question but is about whether an answer speaks to the question

[RM3]

. The thesis explains six consequences of Sorenson’s

[RM4]

epistemic theory implying indeterminism is ignorance, that forced analytic errors are common, that language systematically passes off contradictions as tautologies, that there are infinitely many pseudo-tautologies, that competent speakers ought to be permanently fooled by them and that precedent justifies functional, massive inconsistency (Sorenson 2001, p68). Its meta-problem is vast incredulity

[RM5]

. The thesis examines two assessment paradigms. One is a psychometric paradigm that models itself on abstract science and Behaviourism. The other models a hermeneutical approach that involves a species of conventionalism that assumes

[RM6]

meaning is use, open texture, ambiguity, that language contains no hidden meanings, meanings are known either a priori or a posteriori and answer systems can shift asking ‘what is the answer?’ to ‘what should count as an answer?’ Vagueness

[RM7]

adds sincerity as a seventh significant new constraint on any educational grading assessment system.
TABLE OF CONTENTS

Introduction Introduction to Vague Assessment Introduction to Grading and Assessment The Vagueness of Grades The Riddle of the Vague Grade	18 19 19 28 39 43
CHAPTER 2: INTRODUCTION – THE RIDDLE OF THE VAGUE GRADE Introduction Cresswell’s Vagueness Credibility as Motivation for Universal Decisiveness Stipulation Conclusion: Sincerity	47 48 48 56 59 64
CHAPTER 3: THE SCIENTISTIC PRECISION OF THE PSYCHOMETRIC IDEAL Introduction Scientism as an Assumption that Logical Equivalence Implies Truth Equivalence Psychometric Assessment and Ambiguity	69 70 71 77
CHAPTER 4: THE DECLINE OF PHYSICS ENVY? Introduction A New Paradigm The Skinner Box	91 92 93 111
CHAPTER 5: CONNOISSEURSHIP Introduction Criteria and Reason Sorting The Macnamara Fallacy A New Voice. A New Paradigm Wiliam’s New Paradigm for Assessment Grading Consensus as Social Fact Intersubjectivity Moss’s Hermeneutical Approach Generalised Trust Issues for Construct Referencing The Function of Fuzziness Intersubjectivity and its Sources Fuzziness Modelled as Relative Vagueness The Incompleteness of Admissable Sharpenings	115 116 116 118 120 122 122 124 125 127 129 130 132 133 134
CHAPTER 6: RAZ, DAVIS AND CONVENTIONALISM Introduction Conventionalism Andrew Davis’s Complaints Kind of Natural Kinds Interpretative Communities Anti-Essentialism Holism and Reflexivity Interactive Kinds The Impossibility of Precise Enough Similarity The Assumption of Necessary Resistance to Essentialist Reification Unpredictability Denied Wittgensteinian Meaning Holism Raz and Value Objectivity Objectivity as Absent Partiality Raz’s Domain Objectivity Objectivity, Disagreement and Falsehood Parochial Concepts and the Objectivity of Perspectivism Raz’s Anti-Convergence Argument Epistemic Luck Inventing Values Accessing Values Universal Values The Seven Constraints on Construct Referencing	136 137 137 138 138 139 140 140 141 141 142 142 142 143 144 144 145 145 147 149 149 150 150 151
CHAPTER 7: LEGAL VAGUENESS Introduction Endicott’s Legal Vagueness The Irremovable Indterminacy of Law Endicott’s Good Soup Argument Against Williamson’s Epistemic Solution to Vagueness Pragmatic Vagueness Similarity Supplants boundary Modelling of Vagueness The Incoherence of Semantic Autonomy	154 155 155 156 158 161 163 171
CHAPTER 8: THE SCINCERITY AND PURPOSE OF VAGUENESS Introduction Hart Vs. Dworkin Discretion as the Function of Vagueness Penumbral Cases Insincerity Family Resemblance Dworkin’s Criterial and Interpretive Concepts	176 177 177 178 180 183 185 190
CHAPTER 9: SORENSON’S ABSOLUTE VAGUENESS Introduction Dogmatic Assertion of Superlatives – The Educational Context Vagueness and a new Research Programme for the Philosophers of Language Sorenson’s Absolute Vagueness The Dogmatic Falsehood of Language The Motivation for Retaining Classical Logo Opnionated Spotty Knowledge Science not Scientism Sorensonian Futility Absolute Futilitaianism The Wrong Type of Borderline Two Kinds of Proof What Vagueness Tells us to Believe about Language Conventionalism Vague Assessment Decisiveness, Dogmatism, Bureaucracy and Obscurity, and Validity Is Vagueness all in the Mind Vagueness isn’t Meaninglessness Analytic/Synthetic Williamson’s Relative Vagueness Spotty, Untidy and Scruffy, What Real Knowledge Looks Like Forced Analytic Error Grading and Inconsistent Machines	192 193 194 200 200 202 204 205 206 207 207 208 209 211 212 213 215 217 219 220 221 223 223 225
CHAPTER 10: ANTI INCREDULITY Introduction The Absolute Unknowability of Vague Borderlines Fodor and Lot Sincerity Impossible Objects Superstasks Etc Grades and Spectra The Difference made by High Stakes	229 230 231 231 240 245 251 262
CHAPTER 11: CONCLUSION Introduction To Silence	268 269 269
REFERENCES	277

ACKNOWLEDGEMENTS
PREFACEChapter 1 The introduction outlines why assessment is increasingly important in education. The rise of high stakes has meant that there is increasing pressure on assessments in summative examinations to produce valid and reliable results. A key element is the way assessments decide boundaries between different grades, especially at those boundaries where the stakes are particularly high. The chapter begins by contextualising the issue, using recent assessment history of the UK as an example of the general issues. The chapter then argues that there are philosophical difficulties with identifying boundaries. A particular and virulent type of difficulty concens what philosophers have labelled vagueness. In this way the thesis introduces a rather abstract and seemingly rarified idea from academic philosophy into the heart of a very practical and non-trivial issue. The introduction then aims to briefly sketch the main argument of the thesis. There are three elements. Firstly, the thesis is concerned to outline what philosophers think vagueness is. Although the literature on tis subject is vast, there is only one explanation of vagueness that seems to fully explain it without reverting to some sort of logical deviancy. As might be expected with such a subject, the issue isn’t quite as clear cut as this suggests but reasons for preferring this position are argued for. Secondly, the thesis attempts to analyse two competing ways of making summative assessments. This may strike assessment experts as too schematic to be relevant. Assessment systems are highly refined and various. There are certainly more than two theories of assessment. There are certainly more than two actual systems in the world. However, I argue that the decision is defensible. I argue that the two assessment systems are considered as the dominant prototypes for current thinking about summative assessment currently being used. I argue that considering the philosophical issues of vagueness in relation to these two prototypes gives a clear picture of what the issues are for both approaches. I suggest that any assessment system will involve variations on one or both of the prototypes and so the discussion will generalise to mongrel cases. The third element is the recognition that although in practice one of the paradigms, labelled the ‘psychometric’ prototype, is still widely used, especially in the United States and in the context of high stakes, the second paradigm, labelled the ‘hermeneutical’ paradigm, is widely regarded as having better theoretical credentials. That the first paradigm is considered theoretically redundant is largely to do with arguments that are independent of direct considerations of vagueness. The scientistic vision that I argue this prototype instantiates is largely discredited because any social science claiming the kind of reliability and validity of the abstract sciences (physics) lacks credibilty. The term ‘scientistic’ is therefore a term of disparagement, indicating the hubristic self-image involved in such comparison. However, the second paradigm, though very much thought of as a powerful, successful and viable alternative generally is not able to credibly supply fine grained decisions at grade boundaries for high stakes. Put succinctly, the suggestion is that the powerful and successful theories of language and thought embedded in this prototype can’t justify the fine-grained distinctions at borderlines high stakes assessments require. So this third element in the thesis is a discussion of how these two prototypes are used in tandem in Education and whether this cross-fertilisation is a genuine option in dealing with the vagueness inherent in all assessments. In this first chapter these issues are discussed in a general way. Chapter 2 The chapter examines the first of the prototypical approaches to assessment, labelled the pysochometric paradigm, an approach that has been dominant in high stakes assessment grading systems in recent years. It instantiates the dream of ‘physics envy’ instantiated by thinkers like Carnap and Quine (Carnap 1950, Quine 1951). This approach is labelled as a scientistic approach because it tries to model a scientific paradigm of objectivity associated with abstract science. Wiliam sees a close connection between this and psychometrics hence the label(Wiliam 1994). It is represented as a system that eradicates vagueness but at the expense of validity. The expressiveness of language produces vagueness and restricting the expressiveness of grading to sharp borderlines through a rigid bureaucratic answering system removes the human face of language and thought from assessment. It is a system that denies the scruffy nature of knowledge. It adopts a false idea about language as being inconsistent and therefore systematically goes about remedying its imprecision. It assumes that its indeterminism is about ambiguity and so many of its attempted remedies are disambiguations. This links it with supervaluationists attempts to understand vagueness as a hyper-ambiguity (Lewis 1999). Williamson’s version of Epistemicism is also connected with such an approach where the ignorance of vagueness is ignorance of which language we are using. Absolute vagueness as modelled by Sorenson shows that the attempt to remedy language is wrong-headed. Language is able to express anything. It is complete. It is fully determinate. Its concepts have sharp boundaries. This system wrongly mistakes the appearance of indeterminacy as a true representation of language. It therefore develops a system that conceptually violates natural language. In doing so it fails to do justice to the expressiveness of language. It fails to do justice to the limitations on what we can know in particular it misunderstands the unknowability of absolute precise borderlines. It again is a system that doesn’t cope with the obscurity connected with answering the question, is it a system able to speak to the question

[RM8]

. Chapter 3 The chapter examines the alternative to the old paradigm discussed in chapter 2, labelled the hermeneutical paradigm. Criticism of the scientistic paradigm of education focused on its failure to adequately model the expressiveness of language and accused it of being both invalid and distorting pedagogical practices (Gipps 1994, Wiliam 1994, Resnick and Resnick 1992, Berlak et al 1992, Goldstein 1992, 1993). The key points of criticism involved attacking assumptions of decontextualised learning, assumptions about the universality of results based on such assessments, assumptions about the reality of the construct traits that the approach assumed were available for such testing, assumptions of the uni-dimensionality of these constructs, its ignoring of the incommensurate nature of tested items, and assumptions that test performance were totally individualistic. It shows that historically these criticisms were the basis of change in educational assessment. It is linked to developments of criticisms found within a broader psychometric community in the 1950s exemplified in a dispute between Thurstone and Burt. In education Glaser’s introduction of a criterion based assessment shifts the paradigm (Glaser 1963, wood 1986). The chapter examines how the presumed objectivity of the scientistic paradigm was replaced by an intersubjective paradigm which was found in various theories of language such as social constructivism, constructivism and constructionism (Wiliam 1994,Vygotsky 1987, Rogoff 1990, Gredler 1997, Prawat & Floden 1994, Lave & Wenger 1991) which emphasised the role of social norms in the construction of knowledge and belief. I think these are all versions of conventionalism, awhich thinks ‘…that conventions are ‘up to us’, undetermined by human nature or by intrinsic features of the non-human world’ (Riscoria 2010 p1). Any assessment that broadly draws on these theories for assessment models are labeled Cogniive Diagnostic Assessments. (CDAs) This is discussed. I discuss the first of two models of conventionalism that I think underpin these approaches in educational assessment. One model of conventionalism is Millikan’s idea that conventions aren’t underpinned by rational beliefs. This contrasts with another model that thinks that conventions do require rational beliefs discussed in the next chapter (Hume 1777/1975; Lewis 1969, Searle 1969; Sellars 1963). The chapter argues that the motivation for these adopting a version of CDA and conventionalism is an attempt to maintain expressive flexibility of the open-endedness of language in opposition to the scientistic paradigm. The chapter then suggests that there are two divergent approaches to CDA. The chapter thinks that CDA has sometimes attempted to constrain the role of judgment in evaluations. This is an approach that implies an approach like Millikan’s conventionalism (Wiliam 2000, Leighton & Gierl 2007, Leighton et al 2010). This approach is summarised by Wiliam when he writes, ‘To put it crudely, it is not necessary for the examiners to know what they are doing, only that they do it right’ (Wiliam 2000 p10). The chapter considers attempts to use the hermeneutical paradigm to include judgment and model more clearly than CDA the ‘hermeneutic turn’ in assessment paradigm that Wiliam identified as being common to all modern assessment systems (Wiliam 1994). The approach takes a conventionalist approach to assessments, understood in a way that assumes that ‘…assessments … make explicit the test developer’s substantive assumptions regarding the processes and knowledge structures a performer in the test domain would use, how the processes and knowledge structures develop, and how more competent performers differ from less competent performers…’ (Nichols 1994, p 578) the new paradigm no longer merely requires content specifications to describe their objectives because ‘…efforts to represent content are only vaguely directed at revealing mechanisms test takers use in responding to items or tasks’ (Nichols 1994, p585). Inferences from tests are required to understand and reflect differences between groups, changes over time and processes of learning (Cronbach & Meehl, 1955, Kane 2001). Chapter 4 This chapter takes two powerful proponents of the hermeneutical paradigm, one working in law and the other a leading philosopher of the education of assessment, to show what practitioners of the theory take as being its limits. I argue that although there are large areas of continuity between them, Davis is suspicious of any attempt to conclude that the approach can justify fine grained borderline decisions required for high grade assessments. Raz is less pessimistic. This chapter examines Raz’s theory of ‘parochial concepts’ as a way of reconceptualising objectivity in the domain of legal jurispridence (1999). Raz thinks that subjectivism is criticised because a key element, perspectivism, is linked to relativism. He thinks he can remove relativism. Davis thinks the new paradigm for assessment is still too infected with assumptions imported from the earlier paradigm. Both are used to help navigate the geography of the new paradigm and help exemplify the issues. Chapter 5 This chapter looks at how both education and law have pragmatically responded to the challenge of vagueness. Both have done by assuming that vagueness can’t be solved theoretically and so pragmatic heuristic is therefore the correct response to the genuine challenge of indeterminacy posed by vagueness. Mike Cresswell is important for two reasons in respect to the issues of the thesis. Firstly, he has written an important and explicit theoretical paper about high stakes educational assessment and vagueness. This is rare. Secondly, he actually is in charge of the largest examination board in the UK, which means that his philosophical pragmatism is not just a theoretical ideal but a practical reality. The chapter examines his approach. The chapter then provides evidence that in law a similar model for legal vagueness is theorised by Timothy Endicott (Endicott 2000). The chapter shows how Endicott’s theory is very similar to Cresswell’s, adopting a model of similarity to prototypes to avoid the geometric metaphor of sharp borderlines. He thinks of his approach as Wittgensteinian and cites with approval Sainsbury, a philosopher Cresswell also finds influential (Wittgenstein 1954, Sainsbury 1990). The chapter examines the role of interpretation in the theory (Marmor 1992, Schauer 1991, Dworkin 1997, Kelson 1991) and finds that Endicott’s model contains the same fundamental errors as Cresswell’s and those modelling construct referenced assessments. The chapter concludes that the meta-problem of vagueness, that of incredulity in the face of its solution, is more pervasive than just affecting educational high stakes judgments

[RM9]

. Chapter 6 This chapter takes a brief detour to show how the hermeneutical paradigm need not think of vagueness as an insoluble dilemma. Dworkin thinks that vagueness is trivial. He denies the denial of universal decisiveness. In order to examine the claim I look at the dispute between Ronald Dworking and HLA Hart in the field of jurisprudence (Hart 1962, Dworking 1986). The chapter shows how the attempt to model borderlines as a way of granting discretion is relevant for high stakes assessments in education too. In doing so it raises issues of sincerity involved in making judgments in borderline cases, applying the definition that “A lie is a statement made by one who does not believe it with the intention that someone else shall be led to believe it” (Isenberg 1964, p466). Chapter 7 This chapter sets out the particular view of vagueness that I apply to educational assessment grading for high stakes. This is Sorenson’s ‘epistemic’ solution. Unlike the assumption embedded in the hermeneutical paradigm, Sorenson follows Timothy Williamson in claiming that there is a simple logical solution to vagueness. It follows that borderline decisions in high stakes assessments are impossible. The simple pragmatic heuristics of education and law result in dissembling. The chapter attempts a schematisation of any good assessment system in order to make the issues clear. I think the two dominant prototype paradigms have six requirements. I generalise and claim that they are required by any competent educational grading assessment system. 1. They have to be rational. 2. They have to be reliable. 3. They have to be valid. 4. They have to be universally decisive. 5. They have to produce superlatives. 6. They must be simple. The first three are about consistency. The fourth and fifth are about completeness. Simplicity is a constraint on how the whole system is understood and whether it is usable. It also supports believability. A formal answering system is one that tidies up question and answer systems in a way that allows for the mechanical determination of statements made by the system. Informal systems are natural but according to Sorenson they have two layers of obscurity. One is whether an answer speaks to the question and secondly if it does, which answer does it give? Vagueness is about the first layer. It is not about what the answer is to any question but is about whether an answer speaks to the question. The chapter then explains the five elements of Sorenson’s epistemic position of absolute vagueness. 1. Forced analytic errors are common. 2. Language systematically passes off contradictions as tautologies. 3. There are infinitely many pseudo-tautologies. 4. Competent speakers ought to be permanently fooled by them. 5. Precedent justifies this functional, massive inconsistency (Sorenson 2001, p68). Sorenson’s Epistemicism solves the sorites puzzle that characterises vagueness. This thesis is about the way this theory interacts with assessment theories. It shows that commonly held beliefs about language and assessment are not able to model vagueness properly and are therefore false. It is suggested that once absolute vagueness is understood then its implications lead to a meta-problem of vagueness, that of incredulity. This is a more insidious and intractable problem to solve than vagueness. Vagueness is solved using very little technical logical apparatus and a modicum of common sense. It is simple. The incredulity it brings about is vast. The solution requires that many beliefs commonly held need to be revised. The chapter discusses the ones that most interest educational grading assessments. The chapter concludes by summarising the position. The failure of current assessment systems to model absolute vagueness at best makes them incomplete and at worst wrong. Chapter 8 This chapter returns to the meta-problem of vagueness, the incredulity of people in the face of its solution. It recognizes that the consequences for assessment as examined in previous chapters are large enough to make people refuse to accept the proof, despite its logical validity. Firstly, the chapter addresses the thought that incredulity is based on thinking that vagueness implies irremediable breakdown in communication. The chapter argues that vagueness is not a function of miscommunication theories by showing that if absolute vagueness is true then it must be true of a Language Of Thought (LOT) as modelled by Fodor too (Fodor 1983). It shows that because vagueness is not a type of ambiguity as some theorists think assessment systems that attempt to disambiguate will still be left facing vagueness afterwards. Then it argues against the idea that incredulity may be based on resisting the accusation that decisiveness about absolute borderline cases is a form of lying. The chapter examines sincerity and stipulation and shows that the accusation is justified. The chapter draws further distinctions between relative and absolute borderline cases. It discusses the way absolute vagueness might be avoided by graders. It challenges Wiliam’s reading of Austin in developing his theory of assessment (Austin 1962, Sorenson 2001b). It examines ways vagueness might be confused with various forms of ambiguity. This may motivate that absolute vagueness can’t be true, which would lead to incredulity at the solution to the sorites puzzle motivated by the belief that disambiguation is usually possible, at least in principle. The chapter examines various ways in which we have to accept unknowability in order to suggest that the unknowability thesis is not a unique case. The chapter examines supertasks as examples of epistemic hostility (Earman and Norton 1996; Laraudogoitia 2009; Black 1950; Benacerraf 1962; Thomson 1954, p55). It examines impossible objects as another kind of resistance to enquiry (Marcus 1981, Stalnaker 1984). It examines Moorean counterprivacy as another example (Moore 1950). Chapter 9 Absolute borderline cases exist. Neither of our educational grading systems acknowledge this. I have argued that Sorenson’s solution to the sorites puzzle requires that we are condemned to ignorance about grade boundaries. Sorenson’s epistemic theory solves the puzzle of vagueness. In doing so it creates a meta-problem. Upon hearing the solution hearers are incredulous. The proof of the solution is logically valid. To deny the solution would require a rejection of the classical logical system. Sorenson considers this too expensive a solution. Instead, beliefs about language held by many philosophers are less essential and so require revising. Two conclusions are drawn. The revision of beliefs about language acquisition suggests a new paradigm not just for educational assessment but for education more generally. Indeed, this generalisation generalises beyond just educational high stakes assessments. The theoretical agreement between leading theorists of vagueness in education and law supports this view. The second is that applying vagueness to assessment suggests that sincerity is a further, seventh, constraint on any assessment. This is perhaps the more significant conclusion of the two as it makes explicit the requirement that every assessment grade is assertable by whoever awards the grade. This suggests that pragmatic practice of hybredisation suggested by Cresswell, where both hermeneutical and psychometric paradigms are used, is replaced by judgments being made in all cases. In order to do this adjusting assessments to avoid vagueness will be required. Avoidance is vague. The tolerance of vagueness (and the lies it brings about) will be a matter of calculating degrees of acceptability, not a return to the psychometricians.

CHAPTER 1: INTRODUCTION - THE RIDDLE OF THE vague GRADE

Alfred Pennyworth: Know your limits, Master Wayne. Bruce Wayne: Batman has no limits. Alfred Pennyworth: Well, you do, sir. Bruce Wayne: Well, can't afford to know 'em. (The Dark Knight, 2008)
1.1 INTRODUCTION Education is increasingly being dominated by its assessment systems. ( Hargreaves 2005; Harlen 2005, Clarke 2005, Butler 1988,; Black 2001; Black and Wiliam 2003). Stakes for success and failure are high. There are clear and measurable impacts on the average well being of individuals linked to educational success and failure. There are sociological reasons for the increased importance in educational achievements. At least one major sociological theory of Nationalism links modernity to universal education in order to induct a state’s citizens into a higher culture that is the universal idiom of science, technology and knowledge (Geller 1973; 1983) Gellner writes; ‘The employability, dignity, security and self-respect of individuals, typically, and for the majority of men now hinges on their education.; and the limits of the culture within which they were educated are also the limits of the world within which they can, morally and professionally, breathe. A man’s education is by far his most precious investment, and in effect confers his identity on him’ (Gellner 1983, p 36). If modern societies require an increasingly homogenised standard of education for all then differentiations in order to produce positional gain will be found in the generalised educational arena. Small differentiations are all that are required to achieve the necessary differentiation. A single mark in a single examination can determine huge differentials. In the UK it is possible for a sixteen year old to fail her Mathematics GCSE by a single mark, even if she has A grades in nine other GCSEs, and therefore be barred from University. Calculations of average earnings, life expectancy and general well-being indicators suggest that on average a person going to university is richer, lives longer and is more well than the average person who doesn’t. Consequentialism provokes the thought that such a fine grained distinction is unjust. Justice requires distinctions be proportionate. A small difference cannot lead to such a huge differentiation justly according to this theory. The problem can be recast as a problem of borderlines. The idea of a borderline being so finely drawn to the unit of a single mark suggests that the issue is not of borderlines per se but sharply drawn borderlines. The prescription of sharp borderlines is a function of the growth of the populations being assessed and the high stakes being applied to this expanded group. For example, in the UK it was not until the reforms of the 1980s that conditions for the proliferation of ‘sharp borderlines’ in assessments became linked with high stakes, Before the educational reforms of 1988 only 20% of pupils (mainly in grammar and independent schools) sat the O levels and the rest sat the CSE. CSE’s had been introduced in 1963, which had at least ensured that all pupils were entitled to sit some sort of an examination. The raising of the school leaving age to sixteen a few years later coupled with this entitlement that all could sit an exam indicates a general trend towards universal educational entitlement predating the 80’s. Reforms have tended towards enacting a more inclusive educational system. (AQA 2006. p 4) Reforms aimed at bringing about a widening participation in schools. A two-tiered assessment system was replaced by a single tier for all in Secondary education. The GCSE examination was introduced to replace the O level and CSE exams to create a unified system. The 20% who sat the ‘O’ level certificate were those who attended Grammar and Independent schools, schools that selected their pupils either by the use of a pre-school examination, the 11 plus (for grammar schools) or other entry examinations set by the independent schools. Although the top grade for the CSE was considered the equivalent of the C grade at O Level, it was assumed that those sitting the O level were more able than those sitting the CSE. The different exams reflected the class system that ran through the educational system and although participation in education was now greater than ever before there was still an assumption running through it that high educational achievement was only appropriate for a minority of people, mainly those who attended the grammar and independent schools. The introduction of the GCSE’s in 1984, which were first assessed in 1988, was a bold attempt to dismantle the two tier system and develop a universal assessment for all pupils using the same scale of attainment. Boards of the O level and the CSE were asked to deliver the assessments and the work would be monitored by the Schools Education Council. For the first time there would be national criteria which would be used to develop the syllabuses of each exam subject and fix the grades. From the very beginning it was understood that it was important for there to be reliability and validity of grades and procedures if the public were to be confident in the new system. Grades were to be on a common seven point scale, from A to G. Below a G a candidate would be unclassified and receive a U grade which would not appear on any certificate. Criteria related grades were to be introduced as soon as possible. Grades were developed so that a C grade at GCSE was to be the equivalent of a C grade at O level and a top grade CSE. There were 20 subject specific criteria for nearly all subjects which made clear what was to be assessed, and how these objectives were to be assessed. For many of the subjects a coursework element was introduced. Different weightings were placed on the coursework elements for different subjects. Subjects where what was to be assessed was not easily or well assessed in a short paper examination used the coursework element to assess those elements. Subjects like PE and Music used coursework a great deal in this way. The new examination was a success in terms of promoting staying on rates for post 16 education. Whereas between 1982 to 1987 the number of 16 year olds who stayed on in full time education was between 47% and 52%, after that it rose steadily. In 1993 it was 73% and by 2003 it had levelled off to between 70 and 72%. This increased democratic provision reflected an alternative point of view about the potential of the children in the schools, It seemed to encourage the idea that potentially anybody who was prepared to work hard was entitled to educational success. It established an argument against selection and division based on old prejudices and sat well with the then Tory government’s idea of a meritocracy. The result of the changes meant that there was a focus on standards and of accountability. These were to be patrolled and ordered through summative tests. One unforeseen consequence of this emphasis was the development of precision in testing. Borderlines were required to be sharp in order to fulfil the requirement that assessments generated confidence. There is much in the literature about the negative effects of narrowing assessments to accommodate this requirement. Harlen has written persuasively of the detrimental effects on learning and school culture. ‘Throughout the 1990’s, evidence was accumulating of the detrimental effect of frequent testing on students’ enjoyment of school, their willingness to learn, other than for the purposes of passing tests or examinations and their understanding of the process of learning’ (Harlen 2005). Black has written about the lack of evidence for thinking that learning was being improved by approaching tests in such a way (Black 2001).Wiliam writes of the incoherence of the tests as they developed in line with the accountability model ( Wiliam 2003). Criticism of sharp borderlines is part of a general critique of the assessment system that appears to violate the proportionality-principle constraint on justice. Sharpness is being understood here as ‘too sharp for justice.’ This suggests that sharpness is a matter of degrees. This in turn implies a scale, running from sharp to not sharp. If we label not-sharp as blurry then blurry borderlines, like sharp ones, can be degree variable. We want blurry enough for justice. It is at this point that the rarefied philosophical analysis of borderlines becomes important. The analysis has a long history and is labelled in the modern era ‘the problem of vagueness.’ Vagueness is therefore at the heart of concerns over justice and universal educational high stakes assessment wherever the use of sharp borderlines is introduced.

1.2 INTRODUCTION TO VAGUE ASSESSMENT

Vagueness can be presented as species of slippery slope argument. Slippery slope arguments, ‘…seemingly innocuous when taken in isolation, may yet lead to a future host of similar but increasingly pernicious events’ (Shauer 1985 p361-2). Walton thinks there are three basic types (Walton 1992). The sorites is his second type, in between the wedge type and the domino type. He thinks that the ‘… basic form of sorites is simple and elegant. If one takes one grain of sand away from a heap, it’s still a heap. Repeat multiple times, each time removing a single grain. But although each individual removal does not move the heap into the realm of unheapness, eventually one is left with nothing. The problem is that “heap” is a vague term, and it is impossible to draw an objective line between “heapness” and “non-heapness’ (Walton 1992, p37-38). Drawing a sharp borderline seems impossible to Walton and to many theorists of vagueness. These philosophers argue that sliery slopes are not defeated by a sharp borderline but rather that our language and thought tolerates small differences but that these differences eventually grow too large and become intolerable. The blurriness is what collects over the tolerance period. Volokh insists on focusing on the practical aspect of slippery slope arguments. (Volokh 1985) Law is interested in these kinds of argument. (e.g. Sternglantz 2005; Schaur 1985; Meulemanns 1999; Lamb 1988; van de Burg 1991). Economists also think they are important (e.g. Rizzo & Whitman 2003). And usually slippery slope arguments are rhetorical devices used to change something by showing that a present course will end badly. Roy Sorenson thinks that ‘…the point of exhibiting a slippery slope is to influence a decision. Usually this is done by presenting a slope that has a bad bottom. The arguer tries to dissuade us from taking the first step that will send us tumbling to the bottom. ...Hypothetical slippery slope arguments dissuade by convincing the audience that an apparently acceptable state will lead (by degrees) to an obviously unacceptable state. Once the audience assents to this consequence, the choice becomes an all or nothing affair’ (Sorenson 1988, p398-438). Lithwick shows the importance of these arguments in motivating legal disputes. ‘Anyone else bored to tears with the “slippery slope” arguments against gay marriage? Since few opponents of homosexual unions are brave enough to admit that gay weddings just freak them out, they hide behind the claim that it’s an inexorable slide from legalizing gay marriage to having sex with penguins outside JC Penney’s’ (Lithwick 2004). There are two comments to make about understanding vagueness as a slippery slope. Firstly, the slope metaphor can help grasp some of the difficulties of the puzzles of vagueness but it might mislead too. Volokh makes this point when he writes, ‘The slippery slope is in some ways a helpful metaphor, but as with many metaphors, it starts by enriching our vision and ends by clouding it’ (Volokh 2003, p1137). Secondly, vagueness is usually mishandled. Most experts in the field, including Walton above, retreat from the consequences of a simple logical proof that solves the sorites puzzle. This puzzle is the definitive puzzle for vagueness. Sorenson doesn’t retreat. He is one of the brave few. ‘Retreat is often wise… But we should not retreat from standard logic to rescue speculative hypotheses about how language operates. Change in the web of belief should be made at the most peripheral portion available. Beliefs about how language works are far more peripheral than beliefs about logic’ (Sorenson 2001, p8). So rather than retreat, Sorenson holds to the logical and epistemological arguments for Epistemicism along with early pioneers of the position (e.g. Cargile 1969, Campbell 1974, Scheffler 1979 and Williamson 1994). He refines the position into one of absolute vagueness. Applied to assessment grading in education vagueness isn’t a rhetorical device but is a feature addressing the requirement of threshold decisiveness. All assessment grading systems require universal decisiveness at sharp thresholds between grades. They require the identification of exactly where thresholds begin and end. A precise threshold between grades would be a fine-grained small difference that vagueness seems to deny. (Because it seems to imply only large differences can mark threshold change). Vagueness then strikes at the heart of what a good assessment system should be able to do. It makes universality of decisiveness impossible because universality would require identifying differences within any range of acceptable tolerance. High stakes educational assessment commonly faces decisions about thresholds or borderlines. Indeterminacy of borderlines would threaten the integrity of assessments. Doubt as to whether any line has been correctly drawn inevitably raises questions about the competency of the assessment system causing such doubt. Policing thresholds

[RM10]

is a key element of any plan arranging assessments in education, and as with all policing, it comes at the price of imposing constraints. Vagueness’s riddling, paradoxical nature threatens to bring only the fairness of anarchy to any system. The phenomenon is prototypically introduced through examples. Achilles Varzi finds a passage in the novelist Saul Bellow’s ‘Herzog’ helpful in this respect. ‘Remember the story of the most-most? It’s the story of that club in New York where people are the most of every type. There is the hairiest bald man and the baldest hairy man; the shortest giant and the tallest dwarf; the smartest idiot and the stupidest wise man. They are all there, including honest thieves and crippled acrobats. On Saturday night they have a party, eat, drink, dance. Then they have a contest. “And if you can tell the hairiest bald man from the baldest hairy man-we are told-you get a prize”’(Varzi 2001, p135, Bellow 1964, p295-296). The absurdity of ‘Most-Most Club’ develops out of two competing assumptions that seem equally balanced: that if there is something then there must be a point where it stops and an equally compelling thought that there isn’t any such point between, say, a bald man and a non-bald man. The thesis assumes that any grading system for high stakes assessment is a ‘Most-Most Club’. Its absurdity generalises

[RM11]

. The sorites puzzle seems to deny the possibility that superlatives exist (e.g. the cleverest stupid candidate) and it seems to deny the possibility of universal decisiveness in borderline cases because it seems to deny the possibility of sharp borderlines. Vagueness literature includes considerations of how the absurdity might be resisted. A common response is to think that instead of having two categories, ‘clearly bald’ and ‘clearly not bald’, a third is proposed, ‘borderline bald’ and ‘borderline not bald.’ The trouble is that proposing a further borderline merely multiplies the initial problem. Identifying the clear cases of a ‘borderline bald’ from ‘borderline borderline cases of bald’ repeats the initial problem and, for consistency’s sake, requires repeating the solution ad infinitum. This is the phenomenon of ‘higher order vagueness’ and the infinite regress is taken to show that proposing revisions of classical bivalent logic to introduce alternative many-valued logics is no solution to the problem. Vagueness as discussed in the philosophical literature, then, is characterized as being about indeterminacies of borderlines. The sorites puzzle is vagueness’s puzzle (Dummett 1975: Williamson 1994: Keefe, R & Smith, eds. 1996: Sainsbury, and Williamson, 1997: Tye, 1995: Hyde 2000: Greenough 2003: Graff 2001: Sorenson 2001). It is an argument that begins with a true premise but which, through repeatedly applying an apparently sound logical principle, modus ponens, results in a false conclusion. The prototypical sorites is the problem of the heap, the first version attributed to Eubilides of Miletus (Williamson 1994, Sainsbury and Williamson 1997, Keefe and Smith 1997, Sorenson 2005). The first proposition is the true statement that a thousand grains of sand makes a heap. The first induction step is that if a thousand heaps of sand makes a heap then nine hundred and ninety nine grains of sand makes a heap. Repeated application of modus ponens leads to the false conclusion that one grain of sand makes a heap. Varzi’s example is more vivid: ‘(1) Upon removing 1 hair, the count of Montecristo is still hairy. (2) For every n: if the count of Montecristo is still hairy upon removing n hairs, then he is still hairy upon removing n + 1 hairs. (What difference can a single hair make?) On the other hand, we certainly want to deny the statement (3) Upon removing all hairs, the count of Montecristo is still hairy.’ (Varzi 2001, p2) The philosophical literature explores the issue of the sorites puzzle. Theories of vagueness that think vagueness is caused by indeterminacy resist the obvious solution because of the unbelievable consequences that follow. Wright thinks that the ‘principle of tolerance’ is at the heart of the problem (Wright 1976). This is the principle that says we tolerate insignificant differences but these add up to significant differences that can’t be tolerated. This explanation takes seriously the apparent indeterminacy of borderline cases. Sainsbury describes the indeterminacy as borderless transition and this is taken at face value to be true (Sainsbury 1991, p167). This is a view shared by Cresswell who explicitly raises the problem of vagueness in high stakes educational assessment (Cresswell 2003). If the sorites puzzle is explained by denying the existence of a borderline then systems of adjudication that require discovering a line are fatally compromised. There are various ways in which this problem is dealt with. One is to deny the alleged indeterminacy thesis and argue that the principle of bivalence can be universalized. This is a position that can result in the denial of vagueness as well as of the theory of vagueness as indeterminacy. All competent assessment systems for high stakes present themselves as being able to decide all cases. Assessment systems are because of this operating a bivalent system. Bivalence is the principle that governs the idea that every proposition about a candidate’s grade in a high stakes assessment, including borderline cases, can be determined as being true or false

[RM12]

. Vagueness is used in a technical sense developed by philosophers and concerns borderlines, slippery slope arguments about tolerance of small differences that accumulate to intolerable large differences and epistemic hostility. According to Sorenson this accounts for less than twenty per cent of uses of the term (Sorenson 2001) The ancient sorites puzzle is the defining puzzle of vagueness (Williamson 1994, Sorenson 2002, Hyde 2000, 2002, Weatherson 2003, Greenough 2003, Hookway 1990, Bobzien 2002). The prototypical example of the sorites puzzle involves a heap of sand. As already noted, the puzzle seems to involve a true premise, the correct application of simple rudimentary logic and a conclusion that is false. The vagueness of ‘heap’ is constitutive of it being a term that is sorites susceptible. As we have also already noted, the feature of ‘higher order vagueness’ is another constituent of the puzzle that has to be solved by any theory. (Sorenson 1985, Burgess 1990, Sainsbury 1991, Wright 1992, Williamson 1994, 1999, Hyde 1994, Tye 1994, Graff 2003, Varzi 2003) Higher order vagueness is a phenomenon that is a structure of vagueness. It is the idea that not only borderlines are vague but that the borderline of borderline is vague as well. Proposed solutions to the Vagueness problem have been various. Some argue it’s a feature captured by many valued, degree theoretic or fuzzy logic (Wright 1987, 2001, 2003, Haak 1987, Sainsbury 1988-9, Tye 1989, 1994, Kosko 1993, Wiliamson 1994, 1996, Edgington 1996, Hyde 1997, Weatherson 2002, Field 2003, Varzi 2003). It is an approach that can’t handle the idea that ‘closely similar to dead’ is absolutely different from ‘being dead’. Any substitution of similarity relations for truth relations suffer from this catastrophe. This is a problem for theories of grading that propose similarity to grade prototypes as a way of understanding the meaning of grades and as a solution to how to correctly apply grades to individual cases. Supervaluationism is another approach (van Fraassan 1969, Lewis 1975, 1993, Kamp 1975, Fine 1975, Unger 1980, Tye 1989, Rasmussen 1990, Field 1994, Williamson 1994, 1995, 2002 Fodor and Lepore 1996, Hyde 1997, Schiffer 1998, Varzi 2000, 2001, 2002 Keefe 2000). It is an approach that seems to be about sorting out the ambiguity of different propositions rather than the vagueness that occurs after disambiguation. Vagueness becomes analysed as a sort of hyper-ambiguity. (Lewis 1975, Sorenson 2001). This misrepresents vagueness, as vagueness is not the same as ambiguity. Vagueness has been taken to show that language and thought and perception is nihilistic and incoherent. (Goodman 1951, Wright 1975, 1987, 1991 Dummett 1975, Unger 1979, 1979 a, Quine 1981, Sainsbury 1988-9, Williamson 1994, Varzi 1995, Graff 2001, Sider 2003). This approach responds to the pervasiveness of vagueness in ordinary natural languages and thought by denying that ordinary language and thought is really a language of propositions. Natural languages, like English and Japanese, is too vague for logical relations to work. Quine thought that only a language of mathematical science, free of vagueness, could actually express any propositions (Quine 1981). Natural languages were rough translations that didn’t really mean anything. Frege, the father of modern logic, expressed this thought when he wrote, ‘A definition of a concept (of a possible predicate) must ... unambiguously determine, as regards any object, whether or not it falls under the concept (whether or not the predicate is truly assertible of it). Thus there must not be any object as regards which the definition leaves in doubt whether it falls under the concept ... We may express this metaphorically as follows: the concept must have a sharp boundary’ (Frege 1903, S56). Dummett acknowledges that vagueness is spread over language ‘like dust’. (Dummett 1995, p207) Removing vagueness removes so much of our language that the cost of precision is radical loss of expressiveness (Varzi 2001, p2) yet it is the contention of this thesis that these thoughts drive grading assessment for high stakes towards precision. Frege and Quines’ dream of a scientific exact language drives a kind of ‘physics envy’ in educational assessment circles that creates a crisis of expressiveness. This is discussed in the educational literature as the conflict between ‘reliability’ and ‘validity.’ (Wiliam 1994, Cherrholmes 1989, Carver 1974, Messick 1980 1989 Embretson 1983, Cronbach & Meehl 1955). The thesis refers to this project as ‘scientistic’. The approach is often considered to be self-contradictory. Its scepticism makes vagueness itself incoherent alongside everything else. Yet it requires vagueness to disprove the coherence of vague terms. It was Dummett’s paper in 1975 that is often cited as the historical catalyst for renewed interest in the sorites puzzle and vagueness (Dummett 1995). His approach was considered broadly nihilistic. Epistemicism is a theory that the thesis thinks best captures non-controversial assumptions about thought, including crucially maintaining the principle of bivalence, the law of the excluded middle and the law of non-contradiction. It denies indeterminism. (Cargile 1969, Sorenson 1988, 1988a, 1995, 2000, 2001 Williamson 1992, 1994, 1995, 1996, 1996a, 1997, 1999, 1999a, 2000, 2000a, 2001, 2002 Wright 1995 Hyde 1995, Sainsbury 1995, Bobzien 2002, Beall 2002). Williamson and Sorenson are the two main proponents of this view. Sorenson presents a version of epistemic vagueness that makes vagueness absolute whereas Williamson’s version makes vagueness a condition of the medical limits of human beings. By denying indeterminism it argues that the unknowability of precise vague borderlines is due to human ignorance. For Sorenson this is an a priori necessary ignorance (Sorenson 2001). The appeal of this theory is that it is able to capture the compulsiveness of the sorites paradox. It is a cognitively resistant illusion. The other solutions fail to capture this absurdist aspect of vagueness. Although the logic involved in the sorites paradox is simple and uncontroversial many philosophers have resisted this obvious epistemic solution that its second inferential step has to be false, because that would imply something unbelievable. A heap of sand would stop being a heap at the removal of a single grain, a person would turn from being not bald to bald at the removal of a single hair, it would mean that there was a precise second at which a person’s childhood ended, a precise mark achieved by a candidate that moved them from valued as being not good enough to being valued good enough. Back in the ‘Most Most Club’, there really is a fattest thin man, and a baldest hairy man. In assessment, there really is a stupidest clever candidate. But they are forever unknowably thus. The thesis thinks that this position has great merit because it preserves the idea that every proposition is either true or false and that people are fallible and ignorant and should proceed with humility. It also thinks that vagueness is an inescapable part of language and thought and requires it to be a phenomenon that causes a priori and false analytical beliefs (Sorenson 2001, p59). The compulsion to have such beliefs is psychological but because they are based in language they are not merely psychological but have normative force. Their normativity is derived from the normativity that is required of any induction into any language. If we are language users then the conditionals that vagueness commits to are conditionals all language users are committed to (Sorenson 2001, p58). What the thesis discusses is what implications follow from introducing assessments for high stakes as a ‘Most Most Club’. If vagueness is absolute resistance to knowledge of borderlines then this would seem to be a most dangerous sort of club. Grading involves sorting and ranking candidates fairly. Not knowing where the borderline is between those who have been successful and those who haven’t threatens the very purpose of the system. An odd feature of this is that vagueness is not a new phenomenon. It may well be wondered why this issue hasn’t been treated as a catastrophe for assessments. It is a contention of the thesis that examination of what has been taken to be the problem of vagueness, whenever it has been identified as a problem, has fatally misrepresented the phenomenon. So, for example, a ‘scientistic’ approach has been thought unproblematic but in reality has attempted to make assessment precise whilst not accepting that to do so removes nearly all of the expressiveness required to convey ordinary meanings and thoughts. It has also tended to assume that vagueness is a form of ambiguity and so the process of precision has actually been one attempting to disambiguate. The thesis argues that attempts to analyse vagueness as a type of ambiguity is fatally flawed and so any process of disambiguation, no matter how successful, will still be left to face the challenge of the sorites. The alternative to the scientistic approach to high stakes grading is more versatile, psychologically astute and insists on being able to use the expressiveness of language in order to make its judgments. However there is again a further misrepresentation of the challenge that vagueness throws down. The tendency is to deny that language and thought can overreach our own cognitive capacities. This results in a surprisingly stiff-necked approach that denies that because of our own cognitive, psychological apparatus, there are thoughts that we are compelled to have which are inconsistent and therefore false. The idea of having a priori necessary falsehoods structuring the way we use language and thought is a result of conceiving vagueness as absolutely resistant to cognitive investigation. A grading system is the representation of an impossible object. It is analogous to a colour spectrum which also represents an impossible object (Ramachandran 1992; Sorenson 2001). A colour spectrum gives a perceptual illusion of borderless transition, where red turns to non-red without there being a precise point where this takes place. Yet red cannot be not-red, so we think there must be a precise borderline. So we are perceptually committed to both not having and having a precise borderline between red and not red. A grading system gives a cognitive illusion of similar transition where clever turns to not clever without there being a precise threshold were this happens. Yet clever cannot be not-clever. So there must be a precise borderline between clever and not-clever. Again, this commits us to a contradiction. And it is one that is unified in one thought. This is not a question of deciding what we mean when we say clever. That would be deciding which language we were speaking. The problem comes after we know that. Sorenson’s solution to vagueness is a logical proof that entails there are precise borderlines and that we have a priori reason for never knowing where they are. The solution separates where many anticipate unity. The approach acknowledges the conundrums of linguistic philosophy found in the later Wittgenstein and yet offers an anti-Wittgensteinian solution. The Wittgensteinian unifies meaning and use. For the Wittgensteinian the meaning of something is exhaustively given in its mode of use, so that if use is vague then its meaning is too. Sorenson accepts that we use most terms vaguely but insists that contrary to use the meaning of any vague expression is unknowably precise. This should be taken as a warning that navigating the terrain through the lens of Sorenson’s solution to vagueness can be disconcerting. Presuppositions of familiar arguments are often inverted. His approach separates belief from truth. His argument that logically implies that vague predicates pick out sharp borderlines is unpersuasive. It being unpersuasive is the real issue with vagueness. We know what it is but can’t believe that it is so. He thinks vagueness is a species of ‘blindspot’. Blindspots are cognitively inaccessible propositions. Many, vagueness amongst them, are strikingly weird and appear nonsensical. They fascinated Wittgenstein who developed a philosophical approach designed to neuter the nonsense. Wittgenstein’s ‘Philosophical Investigations’ argued that understanding meaning as use evaporated the nonsense by showing that apparent beliefs were rooted in misunderstandings about the role of the words in a language game. Sorenson wrote a self-proclaimed ‘…anti-Wittgensteinian study of this kind of nonsense…’ in 1988 and has continued to do so over an ever expanding array of subjects. ( Sorenson 1988, p 1) Although there are serious philosophical arguments concerning theories of meaning and truth involved in this, educationalists without a background in such arguments can grasp the issue easily. If we can’t use precise borderlines in our ordinary language, how is it possible that the language picks them out anyway, unbeknown to us. This is the core thought of philosophers who consider the anti-Wittgensteinian approach baffling. For example, John Burgess (Burgess, John 2001. Vagueness, Epistemicism and Response-Dependence, Australasian Journal of Philosophy, 79: 507-24.) attacks the epistemic solution to vagueness by arguing that certain principles or widely agreed assumptions about the ‘metaphysics of content’ are violated by the account. Burgess’s view is summarised by Sider in (Sider T . ‘Epistemicism, Parasites and Vague Names ‘) : ‘Burgess argues that epistemicists owe us a theory of how terms like ‘clever in English literature’ get to have the precise meaning they apparently have given that the facts about use do not seem to generate a precise meaning.’ Why do Wittgensteinians insist that they have to be facts about use? Because they rightly think that there are not enough facts about ‘heap’ in the world to decide the content of the word heap – if there were then it wouldn’t be vague – they conclude that for paradigmatically vague terms like ‘heap’ we must rely on facts about use. So the issue for the proponents of epistemic vagueness is – how do they account for the precise content of the term ‘heap’ and all other vague terms when there are not enough facts about use to do this. If epistemological vagueness is true then either there are facts about the use of vague terms which shows that having a precise borderline is true, or else what Burgess calls the ‘platitudes about the metaphysics of content’ must be false when it comes to vague terms. Returning to assessment and the issue of drawing a precise, fine grained borderline between grade thresholds we find that this seemingly rarified dispute within linguistic philosophy is pertinent. Schematically, the situation looks something like this. One argument runs: Language is used in a way that suggests that meanings are vague. Threshold boundaries seem blurry not sharp. If boundaries are blurry then assessment thresholds that require sharp boundaries contravene normal use and therefore normal meaning. Contravention of this kind is to change the meaning of the terms being used. Assessment systems do not claim to have to use a different language to assess using grade boundaries. Grading takes place in the natural language of choice and not a deviant one. Therefore it is impossible to grade using sharp thresholds without changing the language being used. This is relevant to educational assessments because if the argument is accepted there are at least two obvious responses. One is to agree and stop worrying about the deviancy from natural language. The other is to agree and stop worrying about sharp thresholds. What I ague in the thesis is that one of the major assessment prototypes is more inclined to deviate from natural language in order to achieve greater reliability and the other major assessment prototype inclines to saving natural language at the expense of reliability. I argue that this explains why even after the first prototypical approach has been largely dismantled as a philosophy of meaning and learning it is still given a role in high stakes assessment. It delivers reliability, albeit of something inaccessible in any natural language. It has the explanatory power to explain why the second assessment prototype, which is still thought to be powerful as an assessment and learning theory, is sometimes discredited in a high stakes context. Explanatory power is not exhaustive, nor is it justificatory. The decision to discuss high stakes educational assessment in terms of just two assessment prototypes is largely one based on the two possible responses to the argument about the relationship between meaning and language and use. The first assessment prototype, which I label the ‘psychometric paradigm’ is the approach that makes reliability a key issue and entertains few worries about deviating from actual natural language usage in order to achieve precision. The second, which is labeled the ‘hermeneutical paradigm’, emphasises the interpretative resources believed to be a fact about actual language use. This approach believes that there is a pragmatic heuristic available for achieving the required borderlines without distorting beliefs about language. I take this approach to be broadly Wittgensteinian in the sense described above. The first paradigm tends to model itself on abstract science. It is a paradigm that is sympathetic to ideal language approaches to the problem of vagueness in natural languages. This approach is linked to the philosophers Frege, Russell and Quine. (Frege 1880; Russell 1923; Quine 1981) They assume that vagueness is an inherent aspect of all natural languages and that an alternative language is required if it is to be eradicated. An ideal language stipulates recursive syntactic rules which determine how certain well-formed formulas can be combined. This enables language to contain infinitely many languages. These rules run in parallel with semantic rules that define whether the sentence is true or not. Syntax and semantics must correspond. For logical purposes the only relevant aspects of meaning are those that determine the sentence’s truth value. This is the referent. In a logically perfect language every well-formed expression has one and only one referent. This is context invariant because rules of inference have to be checked from expression to expression. Semantic reference cannot change or else it might change the syntactical inferential value. For example, a simple sentence, ‘ 7 is prime,’ involves a name and a predicate. ‘7’ is the name of the number 7 which is an object; ‘prime’ is the predicate of which 7 partakes. The sentence is true because 7 falls under the concept predicated by ‘is prime.’ The role of a predicate is to divide everything into two things – primes and the rest. So 7 is prime is true because 7 falls into the class of primes, 8 is prime is false because 8 falls into the other class of non-primes. A predicate therefore refers its object to either the truth or the false. The predicate refers the object of which it is predicated to its truth value. That is its function, to map out what fits with it and what doesn’t. The predicate ‘is prime’, therefore, maps out everything that is a prime as true, and everything that isn’t prime as false. And Frege uses concept to mean any predicate whose function is to decide the truth value of its referent. An object falls under a concept if the concept maps out the truth value for the object. Later, Frege introduced the idea of an expression having both a referent and sense. Sense is how we understand the referent. The referent is what sense presents. So sense is prior to referent. On this model vagueness is an incomplete stipulation. For example, a rule stipulating that all positive integers were true and all negative ones were false would be vague because the rule failed to stipulate the value for zero. It is not that a decision was made not to stipulate, rather, no decision at all was made. A predicate functons by picking out a referent, but in a case of incomplete definition there is no referent to be identified. If predicate-involving formulas are to be recognisable as logically valid by their syntactic structure then incompleteness is a deficit in language. Frege understood vagueness in these terms. Vague terms were incomplete definitions. Russell and Quine similarly argued that vagueness was a deficit for any language. Mathematics was taken as the prototype of an ideal language where incomplete definitions were excluded. Abstract science’s use of mathematical models meant that beliefs about science language imported this bias against natural language. Natural language was considered inadequate for genuine knowledge. The psychometric paradigm modelled itself on science and so it too imported the bias against natural language. This accounts for the suspicion of relying on judgment in high stakes assessments. Judgments require use of natural language. Recapping the argument linking vagueness to the psychometric paradigm of assessment, then; considerations of vagueness led to the conclusion by major philosophers at the end of the nineteenth century through to the fifties that natural languages were deficient for conceptualising genuine knowledge. The idea of science requiring an ideal language was developed ( along with logic and mathematics) which in turn led to assessment systems modelling themselves on science also downgrading natural languages to avoid indeterminacies of vagueness. The contrasted assessment paradigm reverses the bias. The expressive power of natural language is central to this paradigm. If it is conceded that there are indeterminacies at borderlines, and it isn’t a universal concession, these are thought to be outweighed by benefits brought by its sensitivity to the myriad facts about use that are used as its resource. Vagueness is not a source of anxiety nor a defect but merely a side-effect of expressiveness, a price we pay for linguistic power. The manifestability constraint inherent in this position is crucially important and worth making explicit. Wittgenstein in the Philosophical Investigations argues that meaning is dependent on the truth conditions for any proposition being made manifest to public scrutiny (Wittgenstein 1957,§§109a, 111, 125, 307-9). If accepted, this presumed necessary condition seems to imply that language can’t have unknowable truth conditions. It has also led some philosophers to argue that rules for ‘use’ need not be capable of imposing systematic rules of interpretation on users. Wittgenstein, Quine and Davidson are three exponents of this argument. Hume’s argument against inductive laws is applied to prove that there can be no necessary inference from past use to present use. Past usage can’t constrain future use. Each new use extends the use in unconstrained , unpredicatable ways. Arguments that suggest that a calculus according to definite rules are required to explain how language is learned are countered by Wittgensteinian arguments to the effect that rules are themselves open to different interpretations. Each rule would be in need of an interpretation according to a rule that would itself require a further interpretation of that rue. An infinite regress threatens. Rather than attempt to discover some calculus, proponents tend to argue that meanings tend to converge because contexts, circumstances and uses are standard. This educational assessment paradigm exploits these arguments. Meanings are taken to supervene on use. This means that you can’t change the meaning of an expression without changing the use. The approach is influential although there is no theory explaining how the superveniance relationship works. The contrasting attitude to the indeterminacies of natural language of this assessment paradigm to the first’s can be observed if we examine J.O. Urmson discussing grading. Urmson agrees with Frege, Russell and Quine that understanding facts about use doesn’t generate sharp boundaries, but doesn’t concede that this is a crisis requiring linguistic reform. In his paper ‘On Grading’ (Urmson 1950) Urmson argued that grading labels require criteria. Criteria are the reasons for assigning the label. Criteria function as the standards of evaluation or appraisal. "…'Good' is a grading label applicable in many different types of contexts, but with different criteria for employment in each" (Urmson 1950, p174). Disputes about whether a grade is correctly assigned or not reverts, on this view, to either whether the criteria being applied fit the subject towards it is being applied or whether these are the right criteria to apply in this case. That criteria, understood as reasons, are required for grading is important in that it places the activity squarely in the sphere of reason. It follows that if this is accepted then it can’t be the case that grading can take place without using criteria. Urmson thinks; "…the way to find out what criteria are being employed is to ask why the [object] has been graded thus" (Urmson 1950 p183). But Urmson doesn’t conclude that this implies that a completely determinate language is required in which these reasons can be expressed. Nor does he feel that indeterminate natural language cannot make the necessary discriminations. Urmson writes that, ‘…when there are differences of opinion about what grading criteria to adopt in a given situation is there not a right and wrong about it; can we not say that these are the right, these are the wrong criteria; or are we to say that the distinction, for example, between higher and lower, enlightened and unenlightened, moral codes is chimerical? In some cases we would perhaps be content to admit that there was no right or wrong about it; the differences in criteria arise from different interests, different environments, different needs; each set is adequate to its own sphere. But in others we certainly do not want to say this; the distinction, for example, between higher and lower moral codes cannot be lightly brushed aside’ (Urmson 1950 p184). Urmson here reflects the approaches’ sensitivities to subtleties about lingistic use, which includes sensitivities towards interests, environments and needs as well as the confidence that sources of indeterminism can be resolved to a degree of satisfaction from within ordinary language use. The criteria have to be about the facts that make it true that something is of a certain grade. If the truth-maker is this fact then criteria are the relevant propositions about this fact (Armstrong 2004). ‘For those who believe in truthmakers truthmaking is a relation.The relata are truthmakers and truthbearers.The truthmakers may belong to different ontological kinds: ordinary particulars, tropes, states of affairs, properties. The truthbearers are typically thought of as propositions rather than sentences and belief tokens. Truthmaking is a cross-categorial relation in the sense that it can obtain between entities belonging to different kinds: an entity that is not a proposition and a proposition. For instance, it allegedly obtains between Socrates and the proposition <Socrates exists>’ Roderiguez 2006). If the notion of a truth-maker is rejected then whatever replaces it is what motivates criteria. So for instance, there are philosophers who reject truth makers but retain the idea of truth making. They make a distinction between true propositions being made true by how things are rather than whether they are (Lewis 2001, Dodd 2002, Melia 2005). Milad is a C Grade candidate is not made true by a certain entity, Mild, being a C grade. Rather it is how Milad is that makes the proposition true. How a thing is is not an entity. So what makes truthmaking is not an entity on this theory. A truth maker is an entity so this theory rejects truthmakers. Against this view is the idea that ‘how a thing is’ will reify into an entity. If so, then it collapses into a truthmaker theory (Roderiguez-Pereyra 2005a). In a borderline case there is a problem identifying the ‘truth-maker’. There are no ‘facts of the matter’ that make an object’s value start at any precise point. We can choose to use different criteria if we like but we can’t chose whether the facts apply or not. If the truth-maker for grading are the agreed criteria then grades are dependent on those criteria being what they are, not the way we chose to reason about them. This situation is read as a deficit in the first paradigm. How is it possible to reason to truth if the language is incapable of picking out precisely the truth maker? In this paradigm, the strategy is to work out a more piecemeal, pragmatic. This is partly motivated by the fact that its defenders don’t think that there is actually any option but to use natural language. The dream of an ideal language is just that, a dream. But it is also motivated by the idea that there is nothing illegitimate in doing so. Conventionalism is a view that language is best understood as a facts about conventions. Knowledge of conventions delivers the relevant facts about meaning. So understanding facts about grading conventions exhausts the meaning of grading. It underwrites the belief that language can be changed by acts of will. Conventions can be changed to suit the needs of its users. Conventionalism understood in this broad sense is a stance that strongly underpins many of the responses to vagueness found within this assessment paradigm. To the recognition that natural language fails to deliver sharp borderlines for values inherent in any assessment system there are responses that argue that criteria are not needed (e.g. Browning 1960). One is that criteria are inexpressible. This is not an argument against criteria however. So long as there are criteria for saying why certain characteristics make a thing good and that the characteristics are known then an awarder of grades can still be applying criteria even if it is beyond her powers to express them. It may be difficult to know whether this is really happening, but that is an epistemological question about whether we can ever be certain about what another person really thinks. But it isn’t rare for people to make evaluations about things and have their reasons even if they can’t explain themselves, even to themselves. Not being able to say which criteria are being applied is not an argument against applying criteria in making evaluations. It could be an argument against thinking that criteria are always expressible. Urmson’s response to inexpressibility arguments is to argue that it is easier to employ criteria than to recognize them. He agrees that criteria are not always explicit or recognisable. Implicit criteria can remain very murky even when we apply them consistently and passionately. Implicit criteria are usually recognizable in certain conditions but not in others. Many Rom-Com plotlines exploit this to bring about the comedic resolution. A lover may not recognize that he is in love until a situation reveals it to him, as in the Richard Curtis film ‘Notting Hill’ (1999) when it suddenly dawns on the Hugh Grant character that he has made a catastrophic error in thinking he didn’t love the Julie Roberts character. The revelation doesn’t cause the fact he is in love. He may have died without ever having the truth about herself revealed. Some think that self-knowledge is rare. An awarder may apply criteria in every case she judges and never know that she is. Rom-Coms that fail can be ones where the audience is left feeling that they know more about the true motivations of characters than the writer. So in another Richard Curtis film, ‘Four Weddings and a Funeral’ (1994). Hugh Grant is thought by some to have clearly got the wrong girl at the end, which spoils the comedic resolution and creates a sense in the audience of sad misalliance that will spool out long after the closing credits. Epistemic priviledge is what this example helps identify; the audience know more about what the characters are doing than the characters themselves Unknowingly using criteria is still using criteria. Critics such as Browning find this unbelievable (Browning 1960). The philosopher of law Timothy Endicott, when discussing Timothy Williamson’s epistemic solution to vagueness, also finds it unbelievable that principles guiding judgment can remain hidden from the person’s consciousness and yet still have a guiding role (Endicott 2000). They don’t deny the denial of the KK Principle for many states of knowledge but where knowledge is used for guiding they can’t believe hidden knowledge can have a guiding role. Their unbelief is unbelievable in the light of examples which seem uncontroversial. A grader who awards according to criteria she doesn’t know she knows is not acting irrationally or arbitrarily when she applies these criteria, Work colleagues may well know more than she does in that they recognize what she knows even though she doesn’t. They can see that the hidden criteria guide her grading awards even though she can’t. This seems to be a common feature of performative speech acts. In fact, it is often because a person is behaving as if they are being guided by something that the idea of non-transparent intentions guiding their behaviour seems plausible. What this discussion shows is how this second assessment paradigm is capable of finding resources from detailed knowledge about facts about language use to address challenges of indeterminacy inherent in natural languages. Unlike the first paradigm, it is optimistic about natural language’s ability to fulfil requirements of evaluation. We can see how nuanced this optimistic approach can be by again examining Urmson’s approach to grading. Grading is taken to be a performative speech act by Urmson. " ...We must say firmly ... that to describe is to describe, to grade is to grade, and to express one's feelings is to express one's feelings, and that none of these is reducible to either of the others; nor can any of them be reduced to, defined in terms of, anything else" (Urmson 1953, p171). JL Austin used ‘performative’ to cover speech acts such as ‘ I promise’, ‘I know’ and ’guilty.’ Their performance brought into being a social fact. Urmson finds an analogy between ‘grading’ and ‘choosing’. RM Hare argues grading statements are prescriptive, like commands and requests. Dylan Wiliam argues that grading is an illocutionary performative speech act that creates a Searlean social fact (Searle 1995, Wiliam 2000). Urmson argues grading is objective and correct when a criteria is established for grading and the grader follows the criteria accurately. "The first thing which seems clear is that the question whether this is X is, granted the acknowledged criteria, as definitely decidable as are the empirical questions whether this is A, or B, or C ...The point is that if this has the empirical characters A, B, C, then it merits the grading label X, and if not, not; and this, in the required sense, is a decidable issue’ (Urmson 1953, p169). A grading is not reducible to criteria satisfaction because he argues a grading is an imperative. RM Hare suggests that grading statements are imperatives that are closer to a proposition than other imperatives such as ‘Choose this’ because of the criterial requirement. "All value-judgments are covertly universal in character, which is the same as to say that they refer to, and express acceptance of, a standard which has an application to other similar instances... Whenever we commend, we have in mind something about the object commended which is the reason for our commendation'' (Hare1952. p129-130). The example Urmson gives is that of sorting apples in an apple-packing shed. The apple grader follows criteria to sort good apples from bad ones. The criteria are the reason for sorting the apples as they are. However, a sorter of apples may use the criteria to sort good apples from bad without the criteria being her own criteria. She could just be reading the criteria from a script she has been given. She is using someone else’s commendation. This is a feature of Urmson’s account of grading that some critics think shows that application of criteria doesn’t always enable grading. Browning thinks Urmson fails to distinguish ‘grading’ from ‘sorting’ and ‘ranking’ (Browning, 1960). Browning thinks grading doesn’t require reference to criteria nor is a grade justified by criteria fulfilment. He thinks sorting is differentiated from grading because sorting use criteria and grading doesn’t. Browning considers the film ‘The Maltese Falcon’ and asks how it might be graded. He agrees with Urmson that justification of a grade is to give reasons and considerations for the grade but he denies that these are criteria. He thinks that ‘grading statements are logically prior to the designation of good reasons or proper considerations for such judgments. He thinks it is only by inspection of movies known to be great that the proper sort of considerations can be adduced. To say this is to say that considerations are not criteria, for a criterion is logically prior to a judgment made by its means’ (Browning 1960, p239). But what Urmson contends is that what browning is showing is how criteria are used and developed. Criteria don’t stand in a single relationship with final jugments being made. Nor do they remain unchanged by use. A person will apply some criteria to an initial judgment of the film, for exmple, and then apply further considerations and reasons to check and assess the judgment made initially. The process involves the idea that knowledge changes and that criteria change through application. That some of the criteria are unknowingly known is no hinderance. Nor does it support the objection that hidden knowledge cannot guide decisions. If a person has knowedge then it can guide their actions unbeknown to them. This is Urmson’s argument when he writes in response to Browning’s scepticism about criteria: ‘ it is easier to employ criteria than to recognize them’ (Urmson 1953, p185) Urmson thinks that there can be no precise list of criteria guiding reason in grading, saying, ‘ no one can give a precise list’ (Urmson 1953, p175) but nevertheless thinks that there can be a list. Browning thinks that without a precise list there can be no list. This contrasts again with assumptions in the first paradigm, where the agreement that there is no possibility of a precise list is profoundly unsettling. The first paradigm requires a precise list and so invents a new language. In the second paradigm response to the absence of a precise list is to make do or to dismiss the idea of any list. Browning takes the latter position. He argues for a distinction between grading and sorting. Sorting activities can produce precise lists and for this reason sorting is criterial. Grading activities cannot (Browning 1960, p241). He denies that grading could be criterial because he argues that ‘…no species of ranking … will be adequate to the job of grading… [because] …in calling a movie ‘great’ I disavow by my act any bindingness to pre-established standards’ (Browning 1960, p241). Browning may be making a psychological point about how it may seem that there is a delay between making a grading judgment and thinking the reasons justifying the grade (Taylor 1962). But it is possible to argue that Browning is recognising that criteria themselves are part of the evidence for grading. He draws an incorrect inference from this insight. That criteria don’t bind a grader as they do a sorter is not evidence that criteria are not involved in grading. Rather, as a grader considers her evidence for a grade, criteria are joined by new evidence brought about by the actual case being studied. These considerations create new knowledge. Criteria for a grader are not fixed lists but dynamic. The incompleteness of a criterial list is a response to the richness of perspectives that may usefully guide any grader. It is an essential part of the expressive power of natural language to be able to respond even to new and previously unknown situations. This explains the puzzle as to why Browning denies that a grader is constrained by criteria. He is arguing with assumptions commonly found in the other assessment paradigm where a precise criteria is the ideal. Browning ais arguing that grading activity can present reasons that haven’t been thought of before. A fixed, unchangeable criterial list cannot bind in such circumstances because there is always the possibility of additional reasons. Browning’s complaint about fixed criterial lists is that they would have to outstrip our linguistic or cognitive abilities. They would have to know the reasons and considerations for applying a value to something even in cases where reasons and considerations could not have been known. This is a core argument against he first paradigm for those arguing for the second paradigm. And it is also an argument against those who fail to recognise Browning’s distinction between sorting and grading.y. Browning links the imprecision of criterial lists for grading as a reason for making a distinction between grading and sorting. Browning’s distinction isn’t just that criterial lists are precise and grading ones aren’t. Browning draws attention to the idea that sorting is a stipulatory act. It requires no intentionality at all. Its precision is at the cost of intentionality. Stipulating a complete, precise fixed criterial list replaces grading for sorting. It changes the subject. Why? On this model, grading expresses purpose. A fixed and complete criterial list enables sorting but gives no reason for preference. Fixed complete lists of criteria does no more than sort. Ranking requires sorting preferences and so is a species of grading not sorting. In Urmson’s example, the apple grader in the shed sorts out apples into grades according to the sorting criteria but may do so by blindly following the precise criteria for sorting. Analogously, students marking assessment papers may correctly sort out papers according to grades using sorting criteria but not understand the meaning of the grades. In both cases, a non-intentional sorting tool is able to sort. An example of such a sorting tool is a sieve. A sieve sorts but its purpose has to be supplied elsewhere by an intentional being. Someone else has already graded according to the purpose of the assessment and then set up sorting criteria to ensure sorting takes place according to the grading system. However, if the sieve or sorter is using precise criteria then what the grader intended to be graded cannot be fully realised. No precise sieve or sorter has the expressive resource whose side-effect is vagueness. Chess playing computers illustrate this issue. A chess playing computer can only sort out future moves by running through an enormous number of possible games. It is limited to the amount of computer memory it has but chess playing computers like Big Blue are able to survey many more possible future moves than any human. It is the sheer brute power of this ability that enables them to beat humans. However, they are incapable of making generic strategic decisions. Deemter draws attention to this in relation to the Cuban grandmaster Capablanca’s strategic rule ‘ Before launching an attack, close the pawn formation’ (Deemter 2010, p223). Deemter comments that for computers; ‘For them, chess is all tactics and no strategy. Essentially they make the most of their decisions by means of brute force search: going through all possible moves, all possible responses to these moves, and so on’ (Deemter 2010, p223). Chess playing computers are just massively successful at searching but humans, who don’t have the mental power to develop anything like such a tactic, have developed other ways to overcome the search limitations of human cognition. Grading is analogous to strategic rule application in chess. It isn’t blindly sieving but each application involves increasing knowledge of what guides preferences determining grading. This new knowledge is additional evidence that a sieve/sorter cannot access. We may summarise the distinction between the first and second assessment paradigms ; the first designs assessment’s Big Blue. The second Capablanca. Browning argues that ranking criteria are not ordinary criteria but rather standards that gives significance to ordering. As such they are a species of grading. Much assessment in education is ranking understood in this way where preference is given in terms of a particular end or purpose. These preferences are like standards captured in ‘criteria’ that are logically prior to certain judgements. Grading is when preference is expressed without reference to such standards or purpose. Browning writes that ‘By the word 'criteria' we understand standards which are logically prior to certain judgments. This means only that when the fulfilment of certain empirical conditions are necessary for the assertion of a judgment, we may call such conditions 'criteria'. But in grading proper there are no such conditions’ (Browning 1960, p12). He likens grading to connoisseurship, such as is used in wine tasting where the actual purpose of the grading is to capture qualitative distinctions in taste but which allow short cuts, presented as if they are a criterial list, that refer to correlations between these qualitative distinctions and empirical characteristics such as colour, place of origin and so forth. Expedience may allow a criterial list of these accidental correlations but at any point where taste is found elsewhere, or not found using the list, the criteria are ignored or altered. Dullards and machines can use criteria but according to Browning they cannot sincerely grade because they the reasons for a grading judgment is prior to any defence or specification of such a judgment. Standards follow grading insights, not vice versa. Ranking is similar in this respect to grading. Browning thinks that many grading words began as ranking words that both functioned in a regulative and preferential manner. This might preclude the idea that there might be a grading system that doesn’t rank. Sorting follows standards. Lying behind this idea is the thought that knowledge itself doesn’t require a prior criterial check list. Browning doesn’t think that a theory of knowledge is prior to knowledge itself, denying ‘… that true knowledge is a question of checking off certain steps or criteria’ (Browning 1960, p245).

1.4 the vagueness of grades

Urmson can agree to this by arguing that he never argued that use of criteria implied criteria had to be explicit or transformed into a check list. But Urmson thinks that criteria for grades can’t be precise. Why would Urmson claim that precise criterial lists for grades are impossible? He thinks that grading criteria are vague. Urmson agrees with Frege, Russell and Quine in thinking that vagueness is a species of incompleteness. He is optimistic about natural language however. He considers the vagueness to be a function of the expressive power of languages rather than a defect. He argues that vagueness captures the complex process involved in any grading activity and the complexity of the subject of any evaluation, a dual function for complexity that any imposed non-vague criteria would be incapable of: ‘The writings of some philosophers seem to suggest that pleasant taste is the only criteria of goodness in apples, but this is surely false. Other criteria are size, shape, keeping quality, nutritive value, pleasing appearance and, perhaps, feel. Now we have already noticed vagueness and open texture within one criterion. But the list itself has the same properties. No one can give the precise list; some will omit a criterion I have given, add another, vary the emphasis, and none of them need be wrong (though we could produce a list which would be certainly wrong). And it is always possible to think of something else which might be taken as a criterion or which has been implicitly used as such and not been noted. But surely as long as we recognize this it need not worry us any more than the vagueness of the criteria for the use of descriptive adjectives. 'Good' is very vague--so is 'bald', or 'middle-aged' (Urmson 1950, p175). Urmson thinks that the vagueness of criteria determines that a criterial list is always incomplete. Lying behind Urmson’s argument is Frege’s idea that vague concepts are terms that haven’t been completed. Vagueness as incompleteness joins up closely with the idea of concepts being ‘open textured’ (Cardozo 1921, Hart 1961). Vagueness and ‘open texture’ present problems for the application of criteria whilst at the same time endorsing the need for criteria generally. Borderline cases are a requirement of criteria, however they are defined, and not an argument for abandoning criteria wholesale. Hart, in the context of discussing the effects of ‘open-texture’ in law wrote that, ‘ There will indeed be plain cases… to which general expressions are clearly applicable … but there will also be cases where it is not clear whether they apply or not‘ (Hart 1961, p126). This ‘legal realist’ view of the scope of indeterminacy caused by ‘open texture’ thinks that ‘ the law is rationally indeterminate locally not globally’ (Leiter 1996, p265). For educational assessment Browning’s argument that; ‘…Grading statements are logically prior to the designation of good reasons or proper considerations for such judgments. That is to say, it is only by inspection of movies known to be great that the proper sort of considerations may be adduced(Browning 1960 p239)’ understood in terms explicated above is also a core argument for another central assumption within this assessment paradigm. His approach is a version of prototype assessment, where grading judgments are made based upon similarity relations between a prototype and an individual instantiation (Baird 2000, p91-100, Cresswell 2003, p10-11). Use of prototype assessment is particularly dominant in this paradigm. The Hidden Relevance of Vagueness To Assessment Revealed The point of of all this is to show how embedded in the both the psychometric and hermeneutical paradigms is a general theory about vagueness. Vagueness is theorized in both as incompleteness of conceptual definition. In the first this is a source of pessimism about the state of natural language and leads to attempts to stipulate an ideal language that eradicates indeterminacies caused by incompleteness. In the second it is a source of optimism, providing resources for flexibility in the context of change and unpredicatability. If we go back to the introductory section about vagueness, all of them respond to the vagueness in terms that both paradigms would understand. Deviant logic such as fuzzy logic and many-valued approaches argue that the open texture of natural languages is ill served by assumptions of Fregeian, Russellian, Quinean classical logic and argue for new inference rules that don’t assume the idealized classical rationalist thinker. Dialetheists argue for truth-value gluts, arguing for inferences allowing for contradiction. Supervaluationists don’t go so far as these deviant logicians and claim that they modify the Fregeian approach by withdrawing decidable truth values for borderline cases. Rather than gluts, they argue for gaps. Incoherentists withdraw rational credentials from all our thinking in vague language and support a radical nihilism about meaning. The psychometric paradigm points to these alternatives and warns that any desired universal reliability of assessment is inconceivable from any of these perspectives. Assessment systems can’t have gaps of applicability. Nor can they have gluts. Candidates can’t be failing and passing. Nor can they be to some degree a pass. Decisiveness and reliability requires the eradication of vagueness. The hermeneutical paradigm disagrees and can accept versions of some of these theories so long as they resource grading activity. Urmson and Browning were considered as generally agreeing, but a varient reading of Browning’ argument highlights how an assumed or implicit theory of vagueness can be important in what can resource an assessment system involving vagueness. Taken literally, Browning argues that grading can happen without a reason. An alternative reading of Browning’s position is that Browning is arguing that it is possible to refer but not specify a referent.. Grading on this reading becomes successful referral without any contribution from audience or speaker of a sortal. A grader may not know what she had in mind when she graded something good. Browning then is not arguing for a conception of vagueness that involves semantic incompleteness. Rather, he is arguing that it is a function of our ignorance (Williamson 1994; Sorenson 2001). On this view, intending to grade is enough to grade. It is a position that denies the assumption that intentions are always transparent to an agent. The intention to grade may conflict with linguistic intention. The strangeness of vagueness is sometimes part of the strangeness of this conflict. When a grader says she’s grading a paper according to what she thinks about a comparable paper and then thinks the paper isn’t good when the comparable paper indicates it is, then she both believes the paper is not good and that the paper is good. Demonstratives and linguistic intention come apart and so can be spaces for inconsistent beliefs. A grader might dismiss criteria as being able to justify a grade award and then later on forget this belief and justify her grade award using the criteria she doesn’t believe. She has both justified and not justified the grade award. She holds inconsistent beliefs. However, it would be wrong to characterise her as being irrational. But Browning seems to have graders that say ‘This is a grade A’ without indicating what ‘grade A’ means. Rather than remove incompleteness, it presents a radical incompleteness. But incompleteness itself is vague, so by having an awarder grade without criteria raises uncertainty as to whether there is a grade being awarded at all. It is always possible to construct sorites about the precise point at which no criteria is being applied. This changes the direction of vagueness, from raising uncertainties about what was being referred to by the grade to whether there was any reference at all. But the point of this discussion is to show how theories about vagueness resource assessment systems in different ways. That most assessment theories don’t make explicit the theory of vagueness it embodies does not preclude the fact that there is one which it is consistent with. And given that all theories of vagueness conflict with each other, assumptions made by any assessment system will conflict with most theories of vagueness. Criteria are complex to the degree in which the object being valued is complex. A novel is complex; therefore the criteria by which it is judged is complex. Sometimes the complexity is considered as its identity, so that a poor novel may be considered as not really being a novel, or a poor poem as not being a poem. This is true whether one argues that criteria are complete lists or partial evidence. Schools tend to need to simplify objects being assessed in order to simplify criteria being applied. Too often criteria are used as check boxes where the engagement of thought is minimized. Criteria broken up into discrete elements may well make it hard for the whole to be detected and the ticking off of criteria may well become a mindless sorting exercise. Browning thinks something similar. He thinks criteria developed in this way seem to be the opposite of anyone having a reason. Some people think that removing criteria from grading would paradoxically allow for more reasoning. Browning thinks like that when he writes: ‘…Close consideration and the determination of the fulfilment of a criterion appear, on the face of it, to be mutually exclusive enterprises. There is no need for consideration if one is merely checking off criteria; one cannot give his consideration to that which has a relevance completely decided for him, as a characteristic that fulfils a criterion does (Browning 1960, p239-240). Clearly it is the case that criteria as a descriptive term may well truthfully pick out a device for simple checking off. But criteria as reasons requires that the complexity of the object being evaluated be similarly reflected in the complexity of the criteria being applied to those characteristics that are judged good making. Urmson thinks that criteria used by a person has to be criteria chosen and accepted by that person. They have to be her reasons. To be motivated by reason is not to say that it is justifiable to be motivated by anyone’s reason. They have to be ones own reasons. A new teacher who is learning her assessment skills may well at first take all her criteria from others. But she is a mere apprentice until she finally decides to learn criteria as her own. Until this happens Urmson thinks…"There would be some point . . . in saying that the apprentice is not really grading" (Urmson 1950, p161).

1.5 THE RIDDLE OF THE VAGUE GRADE

The puzzle of vagueness was introduced through Bellow’s ‘Most Most Club’. To conclude the chapter the thesis returns there because it wants to remind readers that vagueness is not understood if its paradoxical, absurdist nature is not recognized and foregrounded. Thinking about it as being part of designing educational grading systems may deceive us into thinking the issue is rather more tame and easily domesticated than it is. Timothy Williamson’s theory of vagueness, for example, has been accused of being an account of a linguistic miracle. Incredulity is the response default state to epistemic solutions. Fellow epistemicist Roy Sorenson argues that the real problem of vagueness is this meta-problem, incredulity (Sorenson 2001). Paradoxes have by their nature the air of trickery about them, isolated party tricks that are compelling for a short while but might seem ultimately trivial. But paradox is often the starting place for investigation that gradually become incorporated in non-philosophical enquiry. Rather than understand philosophy as being qualitatively different from science, they can be understood as being on the same continuum of enquiry. This is the explicit general thesis of Sorenson’s theory of thought experiments (Sorenson 1992). There he subscribes ‘… to a gradualistic metaphilosphy: Philosophy differs from science in degree, not kind.Understand science, understand the parameters to be varied, and you understand philosophy’ ( Sorenson 1992, p 3). Philosophical paradox gradually lead to incorporation into scientific and other research programmes. The thesis claims that the sorites paradox, for example, far from being trivial, is a key to understanding underlying features of current grading systems in education. How to answer its riddle correctly is of vital importance. The first paradox recorded was Anaximander’s paradox of a beginningless individual (610 BC – 585 BC). Wittgenstein’s joke ended with a haggard old man counting ‘5,1,4,1,3’ and then exclaiming ‘Done!’ When asked what he’d been doing, the old man said that he’d just counted the complete decimal expansion of pie backwards (Sorenson 2005, p1). Paradoxes are riddles. (Sorenson 2005) Seduction riddles try and create the illusion of bad answers being good ones. For example, ‘Is blackness the colour of pitch darkness a blind person sees?’ The trick is to make the answerer think that because pitch darkness is black the answer is affirmative. Yet a blind person doesn’t see, so she doesn’t see any colour at all. Mystery riddles are riddles that create the illusion of no answer by offering a description of something that seems contradictory. For example, ‘What is black and white and read all over?’ is a mystery riddle that works verbally but not on paper. So is, ‘ When is a door not a door? When its ajar.’ Oedipus’s tragedy is a literary version of such a mystery riddle. Unable to answer Tiresius’s riddle, he sleeps with his mother and kills his father. There are riddles that often remain mysteries until the answer is revealed and there are riddles that remain mysteries because even the riddler is in the dark. Sorenson, who has written extensively about tall this, says that ‘ When the Mad Hatter asks Alice, ‘ Why is a raven like a writing desk?’ he has no idea what the answer is. Neither did the creator of the Mad Hatter, the logician Lewis Carroll’ (Sorenson 2005, p4). In the film ‘The Dark Knight,’ directed by Christopher Nolan, (2008) the chaos unleashed on the city of Gotham by Batman’s psychopathic enemy ‘The Joker’ is characterised as a mystery beyond comprehension. In an exchange with District Attorney Harvey ‘Two Face’ Dent, the Joker remarks, ‘Do I look like a man with a plan?’ Some mystery riddles later become solved and stop being riddles. ‘Which came first, the chicken or the egg?’ was first a riddle. When it became known that the egg came before the chicken, because newly discovered biological laws explain that a non-chicken egg can mutate into a chicken egg but a non-chicken can’t mutate into a chicken, the mystery disappears along with the riddle. A paradox is where two equally compelling but contradictory answers compete. As with the chicken and egg paradox, evidence for such conflict is often unstable. Once the equality of response is dispelled, the paradoxical compulsion disappears too. Evidence need not be a reason. It might be based on seeing something or on common sense. Sorenson thinks that ‘The history of philosophy becomes visible through the prism of paradox’ (Sorenson 2005, p10). This makes paying attention to paradox important. Much of what strikes the present as paradoxical will later be incorporated into mainstream science. But the sorites is a puzzle that has remained powerfully compelling even after such a long time in circulation. If grading is vague then high stakes assessment is confronted with one of philosophy’s greatest mystery riddles. It is the idea that something as serious as high stakes educational assessment is inescapably wound up in such a riddle that is intriguing and sends out a warning to those who would ‘solve it’ breezily and confidently that it has remained very resistant to most solutions. This introductory discussion of vagueness and grading assessment suggested that there are certain requirements a functioning grading assessment system is required to achieve that vagueness makes problematic. Grading has to be rational. They have to be decisions taken for salient reasons. They have to be reliable. This is a key term in educational assessment literature. It is the consistency constraint on assessment requiring that all assessments treat like as like irrespective of context. It links to validity which requires that the system speaks to the right subject. The rationality, reliability and validity constraints are all connected and may be thought of as being consistentcy constraints on all good grading assessment. Also, there is a requirement that these formal answer systems are universally decisive. And in so doing they have to be able to produce superlatives of the sort encountered in Varzi’s ‘Most-Most club’. These are requirements that any competent grading assessment system is complete. Finally they have to be simple. This is a constraint on how the system is understood and usable. It links to the element of the consistency constraint that requires rationality. The system has to be able to be relevantly rational to those using the system. This is about believability, legitimacy and credibility. A system that was too complex for humans, but nevertheless consistent and understandable to a God-like mind would not satisfy this constraint. The completeness constraints are severely threatened by consideration of vagueness. Universal decisiveness in borderline cases producing superlatives is precisely what vagueness seems to prevent. It is a challenge to assessment grading that remains even if assessment grading is rational, reliable and valid. The sorites paradox deepens the difficulty of assessment grading by suggesting that what formal systems claim to achieve is impossible.

CHAPTER 2: INTRODUCTION - THE RIDDLE OF THE vague GRADE

‘The heresies we should fear are those which can be confused with orthodoxy’(Borges: The Theologians)

2.1 INTRODUCTION

Cresswell recognizes vagueness as a major challenge that assessment grading systems have to confront but refuses to accept that there is a solution to the sorites (Cresswell 2003). He accepts at face value the illusions of borderless transition and reconceptualises grades as vague sets (Sainsbury 1991). If his approach is not to be considered absurd then the borderline cases he discusses can be only cases of relative cases and therefore his pragmatic solution to vagueness fails to discuss absolute borderlines. Therefore his approach is criticised as being both incomplete and misrepresents vagueness as unsolvable. In so doing it violates necessary conceptual norms of natural language. His approach also requires that beliefs be voluntary. But beliefs are involuntary and so the heuristic of using statistical reasoning to be decisive in borderline cases fails. The resulting decisions stipulating grades are accused of being inventions rather than discoveries and therefore misrepresentations. Vagueness is about the obscurity of answering the question, ‘does this answer system speak to the right question?’ Cresswell’s invented answer system fails this test.

2.2 cresswell’s vagueness

Mike Cresswell runs AQA, the largest examination board in the UK. He is Director General of the Assessment and Qualifications Alliance (AQA), which was formed through the combining of two exam boards including the old NEAB, which used to run the most successful English 100% GCSE course in the late eighties and through the nineties. He thinks that vagueness is an important issue for all high stakes assessments. He thinks vagueness is an absence of precise borderlines. His thoughts about how to accommodate vagueness in high stakes assessment are implemented by this exam board. This fact alone makes vagueness important. Cresswell thinks that the sorites is a puzzle that can’t be solved. (Cresswell 2003) Rather, he thinks that the best pragmatic response is to model vagueness by implementing Mark Sainsbury’s proposal of ‘vague sets’ (Sainsbury 1995, 2002). He thinks that no competent examiner is able to distinguish value on the basis of merely a single mark, and value is expressed as a grade. He thinks that because of this the precise location of the borderline between grades is inaccessible to such an examiner. He doesn’t always make it sound impossible although he thinks it is. ‘If you think you could, you are in a small minority: few of those who have actually tried to do it claim that they can’ (Cresswell 2003, p8). He should add that those who claim they can are always mistaken because he doesn’t think there are any borderlines. He thinks his decision to reconceptualise grades is so that after sorting mark boundaries of grades; ‘… the paradox of the heap still lies in wait and the grades need to be conceptualised as rough sets to avoid it’ (Cresswell 2003, p18). He sorts uses a similarity to ‘performance prototypes’ principle which he thinks is a key to understanding how grading decisions are actually made (Baird 2000, Baird, Cresswell & Newton 2000, p213-229). Prototype similarity creates vague sets, like magnets to iron filings. Most files are attracted to the nearest magnet, but there are those where the similarity isn’t certain enough. He thinks that vagueness is borderless transition. ‘Borderless transition’ is Sainsbury’s term that captures the idea of something gradually changing from being one thing to another thing at the same time as there being no precise point where the change happens. A colour spectrum is an example of this where it seems that we see red gradually change to orange but without there being a precise point where red stops and orange begins. Vague sets remodels the vagueness so that he can introduce his simple heuristic of statistical reasoning to sort out the uncertainty of borderline cases. ‘Statistical approaches replace …prototypes with an alternative heuristic for providing defensible sharp boundaries for grades which do not, in reality, have such boundaries’ (Cresswell 2003, p18). Cresswell encourages this analogy between a grading system and a colour spectrum. A high stakes assessment uses a grading system that embeds a self- contradiction, just like the colour spectrum. The self-contradiction is simple; transition is about change and borderlessness is about no change. Sainsbury’s term involves having the thought that something both changes and doesn’t change at the same time. Usually contradictions function as full stops. In arguments, it is mandatory to concede defeat if it is shown that you are committed to a contradiction. A common response when a person finds out that they hold self-contradictory beliefs is to revise their beliefs. A contradictory belief is usually a sign that a mistake has been made. But Cresswell and Sainsbury are highly competent thinkers who don’t retreat from the absurdity of their thought. Vagueness is a phenomenon that has encouraged highly competent thinkers to believe in contradictions. If grading systems are vague and vagueness is the phenomenon of borderless transition then everyone believing the grading system is a valid system believes at least one inconsistent thought. If vagueness is borderless transition and grading systems requires vagueness then believers in such systems believe in the inconsistency of borderless transition. If vagueness isn’t borderless transition then they believe something inconsistent with that. If values are expressed in terms of grades then if values are vague there can’t be a sharply defined borderline between values if vagueness is considered as semantic incompleteness of some sort. Cresswell thinks that values can be ordered in vague sets and that a single mark doesn’t designate a borderline between values. A mark scheme which makes sharp discriminations can only vaguely map onto the values being assessed. Marks give the illusion of precise borderlines but values elude this appearance. Reconceptualising grades as vague sets is an attempt to deny adjacent marks can be understood as a precise demarcation between grades. Cresswell thinks examiners assess by comparing each candidate to a prototype. The ‘prototype’ may be interpreted as a stand-in for Urmson’s ‘criteria.’ In educational assessment the development of prototypical assessments has been understood as replacing criterial assessments. Urmson’s understanding of criteria is being taken as referring to a broader idea of ‘justifying reasons’ and so accommodates this replacement. An alternative interpretation of prototype would be to contrast it with concept. A concept is the basic unit of propositional thought. A prototype isn’t. A feature of a concept is its compositionality (Fodor & Lepore 2002). Prototypes aren’t compositional therefore aren’t concepts. The prototypical example is that of a goldfish, the prototypical pet fish. It is neither a prototypical pet nor a prototypical fish. If thought is conceptual then Urmson’s criteria refer to concepts not prototypes. This would be to contrast Urmson’s ‘criteria’ with Cresswell’s use of ‘performance prototypes’ rather than use them interchangeably. Cresswell draws two implications from this proposed solution of using prototypes and then applying statistical reasoning to the uncertain cases. If grades are determined solely by their score then some candidates will be awarded the wrong grade. Secondly, the grades awarded cannot determine precise grade boundaries because they are vague sets that are therefore unable to coincide with any distinct ranges of precise marks. Cresswell thinks that nothing can determine precise grade boundaries. Cresswell thinks that vagueness prevents the discovery of grade boundaries because they don’t exist. Yet a grading system typically has many grade boundaries identified by a distinction of a single mark. The use of statistical thinking applied to those with marks that are uncertain replaces the use of the performance prototypes but ‘In the general case, the statistical procedures in question are not sufficiently precise to identify a single mark as the boundary between two grades’ (Cresswell 2003, p18). These procedures replace the use of prototypes when grading borderline cases. Sorting decided in this way is then conceptualised using the notion of vague sets to avoid the idea that a single mark can be the difference between grades. He thinks borderlessness means that a good assessment system doesn’t always have to answer to the question. Instead, concern for the reputation and credibility of the assessment system as a whole or the particular exam in particular trumps having to answer to the question in every case. This fulfils the requirement that the system needs to make a decision in every case, but decisions about borderlines are inventions made in the face of their non-existence. Of typical concern to exam systems is what in the UK press has become labelled ‘grade inflation’, the idea that exam grades of a high value are easier to achieve now than in the past. This is linked with the idea that there is too much dissimilarity between exams taken now and those of the past to make meaningful comparisons. This is often conjoined with the contradictory idea that current values are degraded. Cresswell complains that, ‘… it is well worth keeping in mind when reading anything which claims to pronounce on whether exam standards are rising or falling (like the recent QCA five-yearly review report)…’ (Cresswell 2003, p6). Another concern is that some subjects are easier to achieve good grades in than others. Cresswell thinks an exam system that was not able to rebut these charges robustly would be one that would be incompetent and lack legitimacy. The invention of grade boundaries and engineering their location is a way to dissuade critics by engineering perceptions of reliability. Yet it is acknowledged by Cresswell that ‘because the curriculum changes to reflect current values about what people should learn, there is an essential sense in which objective comparisons between examination standards over time are impossible’ (Cresswell 2003, p6). Although he thinks there are no precise grade boundaries the system works as if there are by inventing them. Cresswell is confident that although vagueness requires invention of grades applying to borderline cases most assessments using the system are both valid and reliable. Just as when people are asked to identify red and orange thy can in most cases despite the illusionary nature of a colour spectrum, so too with most grading activity. He thinks that subjective judgements of competent judges are capable of ensuring that in the majority of cases awards of grades are accurate. In order to defend this view he argues against those who think subjectivity undermines reliability and validity. He thinks judgements based on subjective decisions of awarders are standardly criticised as being nothing more than the feelings of the people making the judgements. Cresswell agrees with Toulmin who thinks that the properties of a judgement are not the subjective feelings of the judges nor the properties of the evidence being judged but are rather properties of an interaction between the two (Cresswell 2003, p7). The issue for a judgement is not the fact of subjective properties of a judge nor objective properties of evidence to be judged but rather the reasons given for making a judgement. Cresswell’s gloss on Toulmin’s position is, ‘The practical issue – the issue of real interest – is not whether ethical judgements are objective or subjective but whether they are supported by sound reasoning and so provide a convincing basis for action… Precisely the same argument can be applied to the setting of examination standards’ (Cresswell 2003, p7). In this way Cresswell is indifferent to the subjectivity or objectivity of judgements made by assessors and clear that he thinks the core issue for the credibility of an examination system’s standards are the reasons supporting such standards. He thinks that in most cases a competent examiner is able to provide good reasons for assigning value and expressing value using grades that a marking system can track. But he thinks that there are some cases when this isn’t the case. When the difference between scripts being judged is invisible to any assessor, when the difference is so small, perhaps just a single mark, then Cresswell argues that to make a distinction between scripts that seem indistinguishable requires more than the subjective judgements of awarders. As he says, ‘…fixing grade boundaries simply as value judgements seems unsatisfactory…’ (Cresswell 2003, p8). If grading is that, then Cresswell is saying that for some candidates it is impossible to grade them. Statistical reasoning allows him to sort them and to do so in a way that creates an illusion of grading. He thinks in cases at the boundary it isn’t possible to rely on subjective judgements of value. There are no reasons capable of fixing such a fine grained distinction. He takes this as a reason for thinking that there is no borderline. Cresswell introduces the idea of ‘mental prototypes’ (Baird 2000, Fuhrmann 1988) to help establish his claim that awarders of grades work in a way suggested by his pragmatic solution to the vagueness paradox, and thinks that this is typically the way that grading decisions are made. Awarders carry with them a mental representation of a prototypical candidate’s work of a certain value, ‘with lists of features and exemplars.’ (Cresswell 2003, p10) and use these to classify candidates rather than marks. He makes two claims for the idea of mental prototypes of grades: firstly that Baird has presented evidence to show that this is what awarders actually do when awarding grades and secondly, that such mental prototypes fit in with the idea of grades being vague sets without sharp precise boundaries. He also stresses that the mistakes made in awarding grades to candidates if based purely on marks is not a question of the reliability of marking. Vagueness causes the grade boundaries to be such that inevitably some candidates will be wrongly graded even if every mark awarded is reliable so long as marks are used as the sole determinant of grade boundaries. However in the context of awarding meetings where grade boundaries are decided on the basis of marks Cresswell notes the unreliability of individual judgements of awarding these marks. Cresswell thinks that over a range of marks awarded to candidates by a range of awarders there will be unanimity of agreement about which marks correspond to the award of a certain grade. This follows from his earlier claim that ‘…very good performances in examinations are recognised as such by all observers, in much the same way that very good athletic performances are recognised by everyone.’ (Cresswell 2003, p8). There will also be a unanimity of agreement about which marks correspond to the award of the next grade. But in between the highest number of marks agreed by all to be needed to be awarded one grade and the lowest number of agreed marks needed for the award of the next grade there is a range of marks where there is no such unanimity because no awarder can make a value distinction based on one mark, which is what is required to make a sharp boundary. For Cresswell this is because the grades expressing value are vague and so cannot correspond with a precise mark boundary. The reliable judgements of awarders, because they are using vague grades which, for Cresswell, because they are vague do not have precise boundaries, will cut across any proposed precise boundary identified by marks alone. Cresswell thinks that in these cases where there are no precise borderlines to any grades the use of statistical expectations can be used to invent boundaries that allow comparability. Boundaries can be reasonably adjusted in order to ensure that there is comparability either of performance standards or outcomes both between subjects in the system (so an English C grade is equivalent to a Maths C grade) and between different times. There are of course limits to how far this adjustment can be made without damaging the credibility of the exam. Cresswell gives the example of the maths performance standards in the summer 2001 AS exams in the UK. This was a new exam and as such it was thought that it would be harder than previous exams. Therefore performance standards were lowered to accommodate the increase in difficulty. However, the performance of candidates was so low that there would have been too much damage to the credibility of the exam if candidates performing so badly had been awarded a pass. Cresswell thinks that subjective judgments of awarders are reliable and valid when they are certain about the value being awarded. But the certainty of a value is as vague as the value. The set of certain value is as fuzzy as the set of the value. Cresswell faces the problem of deciding which is the first mark that is unreliable because it is far enough away in similarity from the prototype to be uncertain. If he decides it is the first mark where there is disagreement between competent awarders he must explain why this is not subject to vagueness. Using his own criteria, it is unlikely that the grade chosen will be picking out the value he needs to be able to pick out. His own argument denies the possibility of any mark tracking the limit of any value with precision. The borderline of certainty is vague. As is borderline of the borderline of the borderline of certainty. This is an intractable problem for theorists of vagueness. Higher order vagueness is the phenomenon of the vagueness of vagueness (Sorenson 1985, Burgess 1990, Wiliamson 1994, 1999, Graff 2003, Varzi 2003). Borderline cases are vague, as are borderline borderline cases. There is no principled way of preventing an infinite regress. Cresswell is committed to denying that there are precise borderlines between different grades. By admitting his ‘simple heuristic’ he admits that there is a need to invent marks that can seem to sort borderline cases in the zone of uncertainty between values. The result is a convenient fiction. He thinks marks can’t pick out precise borderlines because there aren’t any. He thinks marks that differ by a single mark can’t be discriminated in terms of that mark by competent graders because vagueness involves borderlessness. But an assessment requires dogmatic decisions identifying superlatives. By reverting to statistical analysis of those pupils deemed borderline cases of similarity to the prototype Cresswell makes the case for doing this even though he doesn’t believe in the existence of superlatives. His solution enables the invention of a means of systematic categorisation of scripts with adjacent marks as having different values, even though no subjective judgement could on the grounds of the marks alone make such a distinction. His solution is to reconceptualise the resulting grades as vague sets. This ‘simple heuristic’, as he calls it, brings about a satisfactory solution because it is able to retain comparability of standards and the robustness of result data which in turn helps retain people’s confidence in the system as a system, and means that adjacent marks are not conceptualised as being a precise border between values labelled by grades.

2.3 credibility as motivation for universal decisiveness

Cresswell is interested in the business of maintaining an examination system’s credibility. He thinks it is possible and advisable for the subjective judgement of expert awarders to judge values of most cases. He recognises that these subjective decisions attempt to discriminate multi-stranded dimensions of a subject and capture these strands under a single value (usually a grade of some sort, or a pass/fail distinction) and that the use of ‘mental prototypes’ captures these despite their subjectivity. He is clear that reliable valid judgements from awarders are not undermined by vagueness until judgements are made about the borderline cases. In decisions about borderline cases statistical reasoning is used to ensure that results remain comparable, robust and retain everyone’s confidence. This statistical turn that Cresswell thinks enables a high stakes assessment system to survive the challenge of vagueness makes the appeal to non-subjectivity to overcome unreliability. But Cresswell is not committed to thinking that the statistical turn he takes is a better source of reliability and validity generally. He is only committed to saying that the usual source of valid and reliable assessments is inapplicable in borderline cases. The source of the vagueness is the vagueness of the value. Cresswell thinks that there is no sharp borderline to values. Therefore a distinction of a single mark between grades is impossible. But he faces the uncomfortable position of needing the appearance of sharp grade boundaries in order to sustain the appearance of a competent assessment system. Yet if there is no boundary then statistics cannot discover it. Reliability can’t either, nor validity. Invention rather than discovery can replace the vague subject matter with precise subject matter. But though apparently similar to the original value system it is not identical because these are precise. The mark that is derived as being the borderline based on statistical analysis gives the impression of a sharp boundary. Although it gives the impression that there is a definite mark identifying the boundary it is only the appearance of one. It is more accurate to say that it is round about that mark that the grade ends and the next one begins. The mark can only be known after all the marks are in because the decisive mark is accounted for by statistical analysis of all mark distributions, not on any judgement of value. For this reason the mark used to make distinct the borderline is merely a stipulated, invented one. It is a post hoc result of statistical invention. This explains why the precise mark needed to pass an exam can only be known after the event. And thinking of grades as vague sets denies the appearance of a sharp borderline picked out by the a single mark. Statistical reasons enables the stipulating of grades. Statistical reasoning uses statistics to constrain stipulation. Where stipulation is arbitrary then stipulation is merely invention and applied to semantics, invention changes the subject. Changing the subject means that what is being assessed is at best a homophone of the original. If statistical reasoning is sorting not grading then Cresswell’s simple heuristic changes the subject so that assessments that appear to be grades are merely the result of sorting. It is as if ‘tall’ was somehow stipulated to mean ‘Six feet and over’. Resources for expressiveness and range of communicative flexibility, including indexicality, would have been lost. You would have switched language. The stipulation would make ‘tall’ unthinkable. Similarly, stipulation of the exact spot where red ended and orange began would change the meaning of red. ‘Red’ would be unthinkable. If grades describe values and they are vague then stipulation using statistical measurements in borderline cases changes the meaning of the grades. Changing the subject is not a trivial thing to do. Cresswell thinks that the pragmatic reasons for doing this justifies doing this. He thinks he is just offering a pragmatic solution. Fodor and Lepore think that this kind of approach is insincere: ‘It is unclear to us how the vagueness of English expressions is to be illuminated by investigating the homophonic expressions in a language that is not English and none of whose terms are vague’ (Fodor and Lepore 2002, p77). Creswell construes vagueness in terms of borderless transition, vague sets and resolves the problem of determining different values in borderline cases through a form of psychometric measurement. He thinks vagueness sets limits to what subjective judgments are able to determine but thinks there is no practical problem in using psychometric-type measurement from sorting out the borderline cases to maintain reliability and validity. His solution is a pragmatic one justified in terms of the broader needs of any robust and legitimate assessment system in a modern context of mass education. It is an approach that pragmatically invents rather than discovers borderlines because it doesn’t believe there are any borderlines to be found. His approach assumes that for most assessments a subjective judgment from a competent assessor is able to produce reliable and valid results. But he thinks that a form of psychometric measurement is required in cases where similarity to a paradigm becomes unreliable. Vagueness is on this view solved pragmatically by the use of a kind of psychometric objectivity. If a system is required to answer to the question and preserve truth then Creswell’s ‘simple heuristic’ is not a solution. Cresswell resists the ‘physics envy’ that motivates appeals to the psychometric paradigm throughout educational assessment. He thinks subjective judgement can deliver reliable and valid results in all but borderline cases. Only in borderline cases do assessments need to resort to the statistical turn and apply some sort of statistical norm referencing. Cresswell’s simple heuristic attempts to accommodate vagueness by minimising the use of scientistic objectivity, applying it only to borderline cases. Vagueness can be seen as a consequence of subjective assessments that objective assessment can be used to pragmatically solve. Cresswell thinks vagueness itself is unsolvable. Cresswell thinks vagueness is not a general sceptical project that denies the possibility of subjective reliability and validity. So Cresswell, in acknowledging vagueness, asserts the knowability of value. Sceptics of all subjective knowledge are required to explain the apparent existence of vagueness from their position or deny the existence of vagueness. But because Cresswell thinks vagueness produces borderless transition he invents precise borderlines as a simple heuristic to deal with those cases that could threaten the overall authority and legitimacy of an assessment system. Understanding the context of the use of psychometric measurement in educational assessments over the last century indicates why Cresswell thinks using that approach in the small area of borderline cases is legitimate. If there is no actual borderline but there has to be one for the sake of assigning grades to all candidates in an examination system then his pragmatic solution is understandable. It determines decisive discrimination over indeterminate reality.

2.4 stipulation

In many cases the use of stipulation to resolve difficulties is acceptable. Sorenson used to think that vagueness is one of these cases (Sorenson 1992). He then thought that the vagueness of grades in high stakes assessment is resistant to enquiry and that it is irresolvable and so in the face of this an ‘ersatz response’ was acceptable (Sorenson 1992, p175). By switching to statistical measurement Cresswell changes the subject by inventing a more accessible replacement. ‘The newcomer can be cannibalized from the originals or be made from scratch’ (Sorenson 1992, p176). Urmson when thinking about grading thought stipulators present their inventions as old news (Urmson 1950, p145-69). Sorenson glosses Urmson thus: ‘ … persuasive speakers blur the distinction between proposing criteria and applying already acceptable criteria’ (Sorenson 1992, p176). Thoughts of this kind avoid the intractable nature of vagueness and examples tend to be about ambiguity not vagueness. But Sorenson used to think that vagueness was open to invented solutions. He cites the riddle of whether the falling tree in an uninhabited forest makes a sound when it falls. Sound is vague between meaning auditory sensation and physical wave and so disputes over the answer are purely verbal. The issue is not caused by ambiguity because after ‘sound’ has been disambiguated there remains vagueness between the two meanings and vagueness resists further investigation. There is a way of inventing that uses a relativisation clause to justify itself. For example, a value is discussed in terms that relativise it, perhaps to some folk psychology idea of ‘general intelligence’. Psychometric measurements merely relativise this in a more formal way. Sorenson contrasts this with the approach of other high stakes assessors, such as lawyers, who refuse the scientistic approach. “Lawyers share the scientist’s affection for relativisation but are prone to relativise in a more higgledy-piggledy fashion following history’s palseyed finger of precedent’ (Sorenson 1992, p178). The scientistic turn that relativises the vague to gain precision can be accused of removing the vagueness to elsewhere rather than eliminating it. Defining a term as a quantity raises the question ‘quantity of what?’ and the vagueness of definitions of definiens is introduced. Cresswell may think he is not really replacing anything or if he is he is doing so yet preserving the spirit if not the letter. However, understanding the enormous appeal of the pragmatic solution does nothing to answer the question as to whether Cresswell is right to think of vagueness in terms of borderless transition. The strong incoherence of the position is surely a warning that something has gone wrong. As indicated at the beginning, self-contradiction is a strong indicator of incoherence and usually the discovery of such incoherence leas to revision to eradicate what is assumed must be a mistake in the thinking somewhere along the line. To do this is to think that the idea of vagueness as borderless transition is an illusion. Cresswell uses the sorites paradox to establish his belief that there are no actual sharp borderlines between grades. He argues that rather than sharply bounded grades we instead have rough sets that in fact don’t have boundaries. The metaphor of the boundary is rejected and replaced by the idea of ‘mental prototype.’ Cresswell claims it is well established empirically in studies by Baird (2000) as we noted above. He rejects the geometric metaphor of ‘conceptual space’, as first introduced by Frege as part of his project to invent a formal logic, and prefers to think of concepts as being prototypes that are then used as objects of comparison (Frege 1903). As we saw, instead of trying to work out the extension of a concept, that is, how far in conceptual space it can be stretched before it no longer applies, which is the typical way the metaphor of conceptual space is applied, he thinks a concept as being something fully understood as a mental prototype. He thinks we can then compare any object to this prototype and decide whether it is similar enough to the prototype to apply. It has been noted above that this mistakes prototypes for concepts, a move common in many semantic theories explaining learning and thought. He thinks the spatial metaphor gives the sorites puzzle its appeal (Hyde 2002). We imagine a monotonic sequence of scripts beginning on the right of the sequence with a D grade and ending on the left of the sequence with a C grade, each script imperceptibly different from its adjacent script in the sequence. There seems no reason why if, given that the extreme right hand script is a D that all of the scripts are not D grades. This of course contradicts the initial statement that the script on the far left of the sequence was a certain C grade, hence the paradoxical nature of the puzzle. And of course, if we begin the sequence from the left side of the sequence then similarly we prove the reverse, that all the scripts are C grades, which again contradicts the initial statement that claimed that on the far right of the sequence the script was a certain D grade. Further, taken together we are in the position of claiming that all the scripts are both D grades and C grades. What is required is a point along the sequence where transition from one grade to the other takes place. But there is apparently no such place. Cresswell’s assumptions and intuitions about how to solve the puzzle are not mandatory. Like Sainsbury, who coined the phrase ‘borderless transition’, Cresswell takes the sorites to have shown that there is no border. It therefore makes no sense to try and find a border. He introduces the idea of ‘mental prototype’ to avoid the requirement of even looking for a borderline and making the situation seem less paradoxical. Endicott proposes the same approach when discussing vagueness in law (Endicott 2000). Such prototypes are well-understood exemplars, held by the users of any concept, to which examples are then compared. So, for example, awarders of grades hold a mental prototype of a C grade and of a D grade. They then sort out the assorted scripts before them in terms of the prototype, deciding whether any one script is comparable enough to count as a C grade or a D grade. C grade is just a rough set of scripts to which the awarders think their mental prototype applies. It is a vague set because ‘comparable’ is vague. It too is open to sorites puzzles. However, using the ‘mental prototype’ to set standards of what counts as a C grade and a D grade, for example, avoids the sorites problem of grade boundary fixing because it doesn’t attempt to fix any such boundary. The model is an avoidance strategy which enables standards to be fixed but not engage with the problem of boundaries at all. Cresswell illustrates this by asking us to imagine awarders of marks deciding to give a set of scripts a range of marks. The scripts have to be awarded grades and Cresswell asks us to imagine that there will be a minimum mark somewhere along the scale of zero to whatever the maximum possible mark is which all the awarders agree is required for a certain grade. He also imagines that there will be a maximum mark which all the awarders judge to be the mark required for the grade below the other mark. In between these marks will be a range of marks where awarders disagree about whether the scripts awarded these marks should be judged as being of the lower or the upper grade. Similarity to mental prototypes cannot be precise, because the prototypes are not bounded by conceptual space but conceived in terms of each item’s similarity to the metal prototype. If marks alone were to be used to set a boundary between the grades then scripts would be wrongly placed on either side of the border. This would happen because the ‘mental prototypes’ used by awarders are vague and so can’t have a one to one correlation with marks. In this he thinks that the objectivity of psychometric measurement is generally unnecessary. His use of the idea of ‘mental prototypes’ establishes that he thinks reliable and valid assessments can be achieved through a subjective method. Though subjective, in that it is ‘mental’, it is not purely a matter of what judges think, as he puts it, a kind of ‘ I don’t know much about art, but I know what I like’ approach. It does not, however, claim to be purely about the objective characteristics of what is being judged. Rather, it is about reasons for making the judgement, Urmsonian reasons which are the way mind interacts with things. A mental prototype contains the reasons that a person identifies things as something. A competent awarder of grades therefore knows the reasons that make a C grade and is therefore someone able to judge whether, even with a new subject, (introduced perhaps to keep up with new technological advances) or with a radically altered subject (such as when aspects are removed because of changing ethical concerns about old practices e.g. dissection in biology) a standard being set is of that same standard. They can do this by judging whether or not the new standard is close enough to the old one to count as being the same. The idea of a ‘sufficient match’ (Cresswell 2003, p11) is all that is required here. Cresswell draws on the psychometric paradigm in a very limited way. Subjective judgment is his preferred mode. He thinks that in a small number of cases the robustness of subjective judgement is too fragile. In this situation the mental prototype cannot make the precision judgements required to react to this unpredictability. Universal agreement of marks corresponding to grades based on mental prototypes can handle the clear cases, but in unclear cases the use of statistical reasoning completes the process. If the exam is harder than last years then an application of statistics about the number of last year’s entries can be the method to set a boundary. Because the values are understood as rough sets the justification for setting any particular precise grade borderline can be given in terms of the statistical requirements of ensuring performance comparability or attainment comparability. In fact Cresswell sees this as the way to determine any indeterminate grade boundary. The value judgements of awarders set the appropriate standards for all grades and are authoritative and are conceptualised on an ontology of rough sets. This business is different from determining the precise borderline between grades, not because of unreliability of awarders judgments but because the model used to make such judgements is based on vague sets understood as having no precise borders at all. Their ability to draw precise borders, captured by a sequence of marks, for example, is therefore practically impossible using such a tool. Cresswell is concerned that the potential for disagreement between competent awarders is too high and would be too disruptive of any examination system’s authority to be tolerated. Potential disagreement would be high because the requirement would entail awarders making the difference of one mark change the value of a judgement even though no awarder would be able to recognise any difference in value between scripts differing by only one mark. The use of statistical data to set the precise grade boundary combined with the awarders judgements to set standards using the model of the ‘mental prototype’ is the way Cresswell proposes to ensure that standards are comparable, robust and defensible. However, he does not claim that statistical data is able to provide precise enough data to determine a sharp boundary. Even here, a certain amount of interpretation is required. As he admits, ‘In the general case, the statistical procedures in question are not sufficiently precise to identify a single mark as the boundary between two grades’ (Cresswell 2003, p18). Grades are still conceived as being rough sets without sharp borders and the statistical heuristic can only be justified in terms of the practical requirement to have such a precise boundary, not in the establishment of anything beyond that practical need. Using this approach, which he asserts is the way the AQA approach assessment of scripts, he offers a lucid and powerful rebuttal of critics who would doubt that standards were being upheld. It also offers a rebuttal of those who further claim that because of changes in exams and context real comparisons between exam standards are impossible. Cresswell’s thinks a continuity of use in terms of the mental prototypes alongside the use of statistical data about both performance and achievement perspectives enables assessment systems using these heuristics to offer reasons for rebutting such claims. What is not being claimed by Cresswell is that his use of ‘mental prototypes’ actually solves the challenge of vagueness as he conceives it. If grades are rough sets then the drawing of a boundary that appears sharp can only be justified as a pragmatic necessity. What then becomes important is what constitutes that practical need. Cresswell finishes by saying that this pragmatism should be informed by ‘…an awareness of the effect which examination results have on students’ life chances and the ethical imperatives which imposes upon all those who are involved in setting standards’ (Cresswell 2003, p18). Cresswell thus brings a formal understanding of the implications for understanding that the values required for summative high stakes grading assessment are vague. Cresswell inducts us into an understanding of the world of high stakes assessment by showing that even when totally reliable there is no avoidance of the challenge of vagueness to setting grade boundaries. He thinks borderlines are ineliminably fuzzy to us and therefore any precise borderline is a concoction. The reasons for such a concoction and their consequences are the only justifications that can be sought for drawing such lines. The arbitrariness of where to draw such lines is arbitrary only in the sense that it is not constrained by the world itself – there are no values with precise borders – and so how to make the decisions is inevitably a political decision.

2.5 conclusion: sincerity

Cresswell begins with a question: how many marks do you need to pass? He senses the challenge of the answer that has to respond with just an approximate mark rather than a precise figure before the exam in taken. It is that ‘about such and such a mark’ answer that causes him to ponder on why no one knew the precise mark that would be needed to pass before the test had been marked. His solution at least explains the fact that you can’t know it beforehand but can afterwards. By identifying vagueness as the answer he raises a new challenge to contemporary high stakes educational assessment systems. Cresswell’s heuristic fails to make a distinction between sorting on the one hand and grading on the other. Candidates are undoubtedly sorted by Cresswell’s system but they can’t be said to be graded. Grading requires that someone knew the criteria that justified the grading and the system used denies this possibility. It grades to a degree of detail that is beyond human cognitive capacity but claims otherwise. Cresswell can’t argue that the assigned grades of any borderline case can be sincere because sincerity requires that salient criteria have been applied by someone. There is no subject using her beliefs to reach a decision about correct grading. Vagueness denies the possibility of her doing this in borderline cases. Cresswell doesn’t believe there is a borderline. His simple heuristic is to sort when grading is impossible. Reconceptualising grades as rough sets allows him to present adjacent marks as separating grades without the inference that grades can be distinguished in respect of a single mark. But the reconceptualisation is not apparent to those using the system, in particular those asking how many marks they need to achieve a certain grade. And the heuristic seeks to pretend continuity of process for all assessment. Pretending in this context is equivalent to lying because it is a pretence that aims to deceive. The motivation for the lie is honourable. Sainsbury writes that ‘ In general, only a pragmatic justification could be found for drawing a legal line where there are no relevant boundaries‘ (Sainsbury 2002, p83). Cresswell justifies the lie by saying ‘… our pragmatism must be informed, above all, by an awareness of the effect which examination results have on students’ life chances and the ethical imperatives which that imposes upon all those who are involved in setting examination standards’ (Cresswell 2003, p18). Cresswell’s presents his exposition as if he is offering the received wisdom about vagueness. ‘Like all classic paradoxes, this has kept everybody busy for a few thousand years (albeit not in the examination context), but recent work on vague concepts offers a reasonably satisfactory solution to it’ (Cresswell 2003, p9). He actually doesn’t think there is a solution, but rather a practical way of handling its unsolvability. But his use of vague sets is not the received wisdom of all philosophers working in the field and he doesn’t present alternatives. As such, he is being somewhat disingenuous. As was noted earlier, there are several approaches towards the sorites. I think Sorenson’s solves the sorites puzzle and so is preferable to Cresswell’s approach that doesn’t. The epistemic approach that rejects the second premise of the sorites implies that the appearance of borderless transition is an illusion. This would then establish that there were precise borderlines that marked the distinction between two values. The vagueness of grades would be a function of absolute ignorance and a constraint on the universal decisiveness of any system involving vagueness. If Epistemicism is true it would mean that the largest examination boards in the UK was currently modeling its grading system using a false understanding of vagueness. At the bottom of Cresswell’s system is his assumption of semantic indeterminism which assumes that predicates are both inductive and discriminative. But rejecting the inductive principle and accepting the discriminative is a viable alternative that turns the vagueness of grades into genuine predicates that create the illusion of obeying the inductive principle (Sorenson 1988, 2001, Williamson 1994). Sorenson accuses indeterminists of misrepresenting the phenomenon. Epistemicists claim that they are making discoveries about the very behaviour of words. Cresswell’s convoluted attempt to avoid semantic indeterminacy is pointless because there is no such thing. It results an exercise in revisionary rather than descriptive metaphysics. "Descriptive metaphysics is content to describe the actual structure of our thought about the world, revisionary metaphysics is concerned to produce a better structure" (Strawson 1959, p9). The actual conceptual structure of thought can’t help but be classical if the sorites is to be solved. This leads to revealing the deep structure of thought, which requires that ‘…vague words must have discriminative powers that far exceed the recognitional powers of speakers’ (Sorenson 1996, p211). Cresswell’s proposed solution to the sorites is condemned as misrepresenting the vagueness of grades. Grades are discriminatory but not inductive. There is no semantic indeterminism because the inductive principle is illusory. The illusion is exposed through the application of simple logic. Creswell’s and Urmsons accounts of vagueness are convicted of playing fast with words. Words have the metaphysical constraints of things because they are things. Because of this it is possible to have beliefs about words that conflict with the actual properties. Cresswell’s approach denies the conflict. Words appear to be inductive and discriminative and so they are taken to be as they appear. The sorites puzzle works as a riddle that reveals the mistake. Sorenson thinks that this is a familiar mistake of the metaphysics of words because words are not thought about as much as things and as Strawson said, the conceptual structure of words ‘does not readily display itself on the surface of language, but lies submerged’ (Strawson 1959, p10). Sorenson thinks that there is less anxiety about the ontology of words than of things described by ‘displaced speech’. Sorenson thinks that this is because it seems we are able to produce a word whenever we doubt its existence, whereas other objects are less easily brought into purview. We have lower standards for accepting the existence of words than objects. He thinks ‘… we are more permissive about words than things’ (Sorenson 1996, p193). But he reminds us that words can not only refer to things that don’t exist, such as unicorns, they can also refer to words that don’t exist, such as ‘heterological.’ (Grelling 1908) Heterological is a predicate that applies to all and only those predicates that do not apply to themselves. The word Heterological indubitably doesn’t exist, and could never exist in any possible world because it is a logical contradiction. But this is not the same as incompletion. Williamson stipulated the definition of dommal as ‘all dogs are dommals and that all dommals are mammals. (Williamson 1990) He then asks if cats are dommals? They are undecided because of stipulated incompleteness. The assumption that incomplete predicates can be stipulated without constraint is the error of Urmson’s and Cresswell’s assumption of semantic indeterminacy. Just as stipulating the reality of any object is constrained, so too the stipulation of words is constrained. The sorites puzzle flushes out constraints on what can and cannot be stipulated. It denies the many forms of conventionalism. Cresswell’s solution stipulates pseudo-grades. It assumes that unlike other objects words don’t have to conform to classical logic. The sorites is solved if it is assumed that they do. All predicates are either true or not. The principle of bivalence this expresses is what is psychologically obvious when we discuss cars and cats. Violating classical logic changes language. A reconsideration of the sorites helps us see Cresswell’s mistake. If 54 marks is graded a pass then, given that the grade is too rough to be sensitive to the difference of a single mark, a single mark less is also a pass mark. This is a consistent conjunction. The inductive principle seems to correctly track what we think and genuine linguistic competence. Yet the predicate ‘pass mark’ is discriminative and so it isn’t the case that ‘53 marks less than a pass mark’ is a ‘pass mark’. Linguistic competence mandates the sorites and commits us to widespread a priori acceptance of analytical falsehoods. We can’t investigate them because we don’t know which of our beliefs they are. We are doomed.

CHAPTER 3: the scientistic precision of the psychometric ideal

Batman: Sometimes the truth isn’t good enough, sometimes people deserve more. Sometimes people deserve to have their faith rewarded…(The Dark Knight 2008)

3.1 INTRODUCTION

The chapter examines an approach that has been dominant in assessment grading systems until recently. It instantiates the dream of ‘physics envy’ instantiated by thinkers like Carnap and Quine (Carnap 1950, Quine 1951). This approach is labelled as a scientistic approach because it tries to model a scientific paradigm of objectivity associated with abstract science. It is linked with psychometrics because this is the historical form it has prototypically taken in the field of education (Wiliam 1994). It is represented as a system that eradicates vagueness but at the expense of validity. The expressiveness of language produces vagueness and restricting the expressiveness of grading to sharp borderlines through a rigid bureaucratic answering system removes the human face of language and thought from assessment. It is a system that denies the scruffy nature of knowledge. It adopts a false idea about language as being inconsistent and therefore systematically goes about remedying its imprecision. It assumes that its indeterminism is about ambiguity and so many of its attempted remedies are disambiguations. This links it with supervaluationists attempts to understand vagueness as a hyper-ambiguity (Lewis 1999). Williamson’s version of Epistemicism is also connected with such an approach where the ignorance of vagueness is ignorance of which language we are using. Absolute vagueness as modelled by Sorenson shows that the attempt to remedy language is wrong-headed. Language is able to express anything. It is complete. It is fully determinate. Its concepts have sharp boundaries. This system wrongly mistakes the appearance of indeterminacy as a true representation of language. It therefore develops a system that conceptually violates natural language. In doing so it fails to do justice to the expressiveness of language. It fails to do justice to the limitations on what we can know in particular it misunderstands the unknowability of absolute precise borderlines. It again is a system that doesn’t cope with the obscurity connected with answering the question, is it a system able to speak to the question.

3.2 scientism as an assumption that logical equvialence implies truth equivalence

In the last chapter Cresswell accepted the illusion of language being full of borderless transitions. He abandoned human grading judgments in borderline cases and applied statistical reasoning to produce dogmatic superlatives. Other approaches to assessment have typically attempted to remove vagueness and achieve precision completely. Assessment that tries to achieve scientific status assumes that logical equivalence implies equivalence in truth value. Psychometric assessment theory is the prototypical approach to assessment that assumes this. But some statements are closer to the truth than others without varying what they entail (Sorenson 2007). Even meaningless statements can be closer to the truth than other meaningless statements. Educational assessments that attempt scientific standards of reliability and validity tend to overlook this. Yet science, including physics, accommodates the idea. So any answer system that doesn’t accommodate the idea of variance of closeness to truth doesn’t necessarily entail variance of logical equivalence misrespresents the scientific ideal. Logical equivalence is derived from entailments of whatever criteria is being used. Popper thinks verisimilitude is matter of having true consequences and avoiding false ones (Popper 1963, p397). Logically equivalent statements have the same consequences. Counting consequences is difficult. Goodman’s new problem of induction used predicates working like ‘grue’ and ‘bleen’ to make calculations difficult (Goodman 1954). Criteria in one language can be further from the truth than in another. The scientistic assessments agree that languages should be ranked in terms of how good they are at eradicating difficulties. Carnap thought indexicals and ambiguity were a problem and refused languages that used them in order to remove the difficulty (Carnap 1962). Quine thought that only a language cutting nature at the joints would do. Language would have to be precise (Quine 1969). Urmson’s vague grades would need to be made precise if they were able to function properly. The attempt to remove imprecision from grading criteria has been a central feature of assessment. Goodman wanted entrenched predicates that have a good track record of inductions. Goodman therefore endorsed a type of continuity constraint. Such a principle is used to guide some grading in high stakes assessment although continuity in itself isn’t considered constitutive of the sense of a grade. Britton thinks that mereological criteria which prevents double counting will work (Britton 2004). Breaking criteria down into relationships between component parts and the whole is prototypically the way criteria have developed for assessments in high stakes exams. Winner take all theorists suggest we wait and see which criteria prevails over time to preserve the truth best. But this allows the winner to legislate on the criteria for winning retrospectively. It is now, not in the future, that we need to be able to judge truth values. However, Cresswell’s solution to the vagueness problem in assessments is retrospective. It is only after marking that the grade boundary is established using statistical reasoning justified by maintaining reliability, validity and standards norms (Cresswell 2003). David Lewis thought that: ‘A theory is close to the truth to the extent that our world resembles some world where that theory is exactly true. A true theory is closest to the truth, because our world is a world where the theory is true. As for false theories, the ones that can come true in ways that involve little dissimilarity to the world as it really are thereby closer to the truth than those that cannot’ (Lewis 1986, p24). Criteria work like theories to ensure that grades are accurately applied. Vagueness creates problems for deciding boundaries between grades. But vague criteria don’t lead to universal skepticism. The sorites patterns difficulties in a zone of uncertainty between two zones of certainty. So it makes sense to talk about some criteria as being more vague than others. It also makes sense to say that there can be too much or not enough vagueness (Endicott 2000). Grading criteria are good when they pick out what they intend to pick out. They are better than rivals if they are better at doing this than the rivals, even if they are not perfect at doing this. Strict logical implication would render any criteria with a false conclusion as logically equivalent to any other than also implied a falsehood. But this would be uninformative. We can count near misses as better than big misses. We count fewer misses as better than more misses. Scientists use the discourse of ‘closer to the truth’, which endorses this approach even to scientistic assessment theorists. Scientists know there are five kingdoms of complex organisms. A scientist agrees that saying there are six is closer to the truth than that there are sixty. In maths the same type of data exists to justify the same approach (Sorenson 2007, p5). The approach to psychometric assessment uses maths and science to achieve greater reliability. ‘Closer to the truth’ discourse allows for errors within margins of acceptability to be established on the grounds that some errors are better than others. Salsburg explains this when he writes about the way data overlaps from maths into empirical science: ‘The numbers we get from this random sample are most likely wrong, but we can use the theorems of mathematical statistics to determine how to sample and measure in an optimum way, making sure that, in the long run, our numbers will be closer to the truth than any others (Salsburg 2001, p172). We notice that this thought substitutes probabilities for beliefs. Progress in science can be measured in terms of getting closer to the truth. In maths truth seems to be more stable even though errors are made. Maths progress is therefore best measured in terms of full truth, but not exhaustively. Statistical and analogical reasoning figure in maths even though deductive truth overrules both. In assessment statistical reasoning is used. Cresswell’s approach to sorting out the vagueness of borderlines uses statistical reasoning. This argument shows that in terms of mathematics it is not an invalid way to proceed. Statistical reasoning does not violate mathematical standards and given that maths and science are largely intertwined there is no violation of scientific standards purely from his using statistical reasoning. Criteria that gets closer to the truth than other criteria are to be preferred in high stakes assessments. There are inconsistent theories that get closer to the truth than consistent ones. Logical equivalence cannot be the sole determining factor in the concept of ‘closer to the truth.’ This is true in physics, the science assessment theorists that we’ve called ‘scientistic’ most want to emulate. Schott gives as an example Bohr’s theory which in order to predict greater truth was more inconsistent than it had been initially. ‘Bohr’s theory of the Balmer series is based upon several novel hypotheses in greater or less contradiction with ordinary mechanics and electrodynamics, ...yet the representation afforded by it of the line spectrum is so extraordinarily exact that a considerable substratum of truth can hardly be denied to it. Therefore, it is matter of great theoretical importance to examine how far really it is inconsistent with ordinary electrodynamics, and in what way it can be modified so as to remove the contradictions’ (Schott 1918, p243). The accommodation of inconsistency in order to be closer to the truth is not found in assessment criteria. Consistency is seen as something that is required to achieve the truth. Yet if verisimiltude in physics can accommodate contradiction and inconsistency then educational assessment systems modeled on physics should be able to too. They are further from the model of physics than they would be if they could accommodate inconsistency. If educationalists found that an inconsistent theory achieved better levels of truthful inference than a consistent theory then, using physics as its model, it should accept this. Norton thinks that this is what happens in empirical science. ‘If we have an empirically successful theory that turns out to be logically inconsistent, then it is not an unreasonable assumption that the theory is a close approximation of a logically consistent theory that would enjoy similar empirical success. The best way to deal with the inconsistency would be to recover this corrected, consistent theory and dispense with the inconsistent theory. However, in cases in which the corrected theory cannot be identified, there is another option. If we cannot recover the entire corrected theory, then we can at least recover some of its conclusions or good approximations to them, by means of meta-level arguments applied to the inconsistent theory (Norton [2002], p193). This is not a denial of consistent theories being available. But it suggests that if they can’t be easily found, approximate theories that work quite well are legitimate. Some theorists do deny that consistency is always available however. If they are right then the failure to use an inconsistent theory that is close to the truth becomes an act of folly. For example Shapere writes, with classical electrodynamics in mind, ‘“…there can be no guarantee that we must always find a consistent reinterpretation of our inconsistent but workable techniques and ideas” (Shapere 1984, p235). Frisch agrees (2005). ‘If acceptance involves only a commitment to the reliability of a theory, then accepting an inconsistent theory can be compatible with our standards of rationality, as long as inconsistent consequences of the theory agree approximately and to the appropriate degree of accuracy. Thus, instead of Norton’s and Smith’s condition that an inconsistent theory must have consistent subsets which capture all the theory’s acceptable consequences, I want to propose that our commitment can extend to mutually inconsistent subsets of a theory as long as predictions based on mutually inconsistent subsets agree approximately’ (Frisch 2005, p42). Cresswell’s system may be inconsistent because it models false beliefs about vagueness and language and stipulates grade boundaries whilst claiming not to believe in them, but it may yet be an approximate truthful answer system. Inconsistent statements could be as close as each other to the truth. Assessment criteria that denies this would be to change natural ways of thinking about truth. Attempts to use consistency that cannot accommodate inconsistent statements having the same truth value therefore misrepresents the way we think about truth, even as a scientist and a mathematician. ‘It is ten to twelve’ is as close to the truth of ‘it is about noon’ as is ‘it is ten past twelve’. Yet they are mutually inconsistent statements. They are part of an acceptable repertoire of thinking about truth proximity. Assessment criteria that are meaningless may be closer to the truth than others to the extent that they resemble meaningful statements that do have a proximity to the truth. Yet meaningless statements are degenerative in terms of logical consequence. There are no logical consequences to a meaningless statement. Yet some nonsense can be used to get close to the truth. Meaninglessness can have implicature which may allow us to decide that, though strictly equivalent, one meaningless statement is arbitrarily closer to the truth than another. So with time, noon and midnight are singularities. Strictly, ‘12.00 PM’ is meaningless. But if it is noon the statement is closer to the truth than someone who reports the time as ‘12.10 PM’ (Sorenson 2007, p12). Educational assessments that attempt to mirror science tend to still be beholden to logical positivists and ordinary language philosophers who supposed that there were many more meaningless statements than there actually were. If we are no longer too concerned about the original meaning of the latin ‘P.M.’ then the meaningless of the time example can be ignored. But some incomplete criteria may be closer to the truth than others so they should not be precluded. Urmson thinks that all criteria for grading are incomplete and wrongly concludes that it is this incompletion that makes them vague. But this does not preclude him from rightly thinking that some incomplete criteria are better than others because they get closer to the truth than others. An awarder of a grade may not have decided fully what the reason was for the award. She has a reason but it isn’t fully formed. However, it may be that the incomplete reason is good enough to make the grade because it gets her close enough to the truth. This is the case even if by being unfinished the reason is strictly meaningless. In the same way as someone thinking it is minus forty degrees may not have decided whether its centigrade or farenheit she’s talking about still gets us close enough to the truth because at minus forty the difference between the two units of measurement is negligible. Her unfinished thought about temperature was meaningless but close enough to the truth. Assessment theorists trying to rank preferences face Arrow’s claim that cardinal utility under certainty as meaningless. (Arrow 1950) But the meaningless principles of utility had analogous meaning when rendered in the alternative to ranking preferences, ordinal preference ranking. Although meaningless they were close enough to the truth to be useful. Wittgenstein wrote a whole book that claimed to be meaningless but was close enough to the truth to be useful. In the ‘Tractatus’ he wrote: ‘6.54 My sentences are illuminating in the following way: to understand me you must recognize my sentences – once you have climbed out through them, on them, over them – as senseless (You must, so to speak, throw away the ladder after you have climbed up on it). You must climb out through my sentences; then you will see the world correctly’ (Wittgenstein 1922). Donald Davidson thinks language meaning requires a principle of charity which constrained users to always interpret so as to maximize agreement with the author of language statements (Davidson 1984, p169). So Wittgenstein can be read as meaningless and fulfil Davidson’s principle. If Frisch is right in thinking that there may not be a consistent substitute for inconsistency then Wittgenstein’s meaningless text may be as close to the truth as you can get. Wittgenstein’s text, like many works of art are attempts of the human to transcend the limits of language and thought. An assessment theory that claims to be grading genuine thought is incapable of doing so if it restricts itself to thinking that strict logical consistency is the only and best way to assess.

3.3 psychometric assessment and ambiguity

In the last chapter Cresswell accepted the illusion of language being A scientistic methodology using statistical reasoning, predominantly the psychometric approach prototypically restricts any ambiguity that threatens consistency. Ambiguity is used here to mean that there are at least two options available when interpreting criteria for grading. If disambiguation is the motivation for psychometric assessment then vagueness will only be overcome by such a method if vagueness is a type of ambiguity or that its problems are derived from ambiguity somewhere. Sorenson doesn’t think that vagueness is a form of ambiguity and therefore thinks that generally any approach that supposes disambiguity will help solve the problem of vagueness is mistaken. It is a subtext of his approach to vagueness that vagueness can’t be solved by disambiguation and that therefore the prototypical approach to assessment that has assumed this is mistaken. An argument from the previous chapter links ambiguity to perspectivism. Urmson calls this the ‘final problem’ for grading theory. ‘Now for the final problem; when there are differences of opinion about what grading criteria to adopt in a given situation is there not a right and wrong about it; can we not say that these are the right, these are the wrong criteria; or are we to say that the distinction, for example, between higher and lower, enlightened and unenlightened, moral codes is chimerical? In some cases we would perhaps be content to admit that there was no right or wrong about it; the differences in criteria arise from different interests, different environments, different needs; each set is adequate to its own sphere. But in others we certainly do not want to say this; the distinction, for example, between higher and lower moral codes cannot be lightly brushed aside’ (Urmson 1950, p184). This draws attention to some of the special features of grading. There may be more than one justifiable reason for grading an object, and decisions between reasons often requires grading the reasons. Grading does not on the face of it preclude a situation where there is a ranking tie and the different reasons justify different grades. The contextualization that Urmson thinks generates some grading criteria, those ‘… different interests, different environments, different needs, each set … adequate to its own sphere...’ (Urmson 1950, p184), may well be sources of criteria that resist a monotonic grading along a single scale of rank. Contextualisation risks ambiguous grading values. In Urmson’s account of grading reasons are criteria by which we test whether a grade has been correctly applied. ‘…’Good’ is a grading label applicable in many different types of contexts, but with different criteria for employment in each…’ (Urmson 1950, p174). Urmson is right to think that grades are not just characterized by their vagueness. They can also be ambiguous and perspectivism is a source of this. A dispute about whether the grade is correct may therefore be about whether the criteria are satisfied by the work. It may also be unclear which criteria are being applied in a case. Urmson uses criteria to be the reasons given justifying the application of a grading label to an object. He writes, ‘ Roughly the way to find out what criteria are being employed is to ask why the [object] has been graded thus’ (Urmson 1950, p183). Further disputes arise after it is agreed which criteria are being applied. These can be about whether the criteria actually do apply. Cresswell’s problem is vagueness. Disambiguation is only relevant to the problem of vague grade thresholds if vagueness is a species of ambiguity. Scientific precision is often characterized as complete disambiguation. The hope of achieving scientific standards of reliability and validity can therefore be characterized as an attempt to ensure the complete disambiguation of assessments. Vagueness considered as ambiguity is vagueness explained in terms of human insensitivity to small changes in meaning. This is a principle of ‘tolerance’. We are blind to small shifts in meaning until the difference become too large to ignore. Vagueness is ‘ambiguity on a grand and systematic scale’ (Fine 1975, p282). This insensitivity conjures up an illusion of univocal continuity where it doesn’t really exist. In an assessment at the borderline between two grades there may be a dispute about different meanings. The assumption is that ‘good’, for example, is ambiguous and is causing the dispute. For one teacher, ‘good’ is replaced with ‘good for a dyslexic’. For the other it isn’t. This may be a matter of assuming the meaning depends on its practical usefulness, its purpose. Judgment as to what is salient to practicality and purpose is itself dependent on contexts that may be disputed. Meaning varying from speaker to speaker causes complex patterns of misfit between judgments. Where a borderline is required that converges on a single meaning the removal of the different meanings is imperative. Insensitivity to ambiguity is a sign of incompetence. A competent assessment is one that should be sensitive to ambiguity where it occurs. If vagueness is caused by ambiguity on a massive scale then a competent assessor will be overwhelmed. But competence is learnt and therefore ambiguity is limited by constraints of learnability. There cannot be an infinite number of choices because these would be unlearnable. Incoherentist theorists like Peter Unger think that if this is the case then nothing really exists, including grades. For any standard set, there will be a better version of the standard. A flat hand is not as flat as a flat table top which is not as flat as… etc. (Unger 1975, p65-68). He takes this type of slippery slope argument to prove that nothing is really flat because flat is incoherent. He concludes that natural languages are incoherent and so everything it tries to express is incoherent too. Critics say that Unger is ‘changing the score on you’ (Lewis 1983, p245) by changing standards of precision. If the original grading is correct then that has occurred in a context that should be used to stabilize the understanding. Lewis thinks that Unger is being too free with drawing the line. Once chosen, he must stick to the choice. By roaming from context to context without any constraint the change of standards of precision equivocates. The awarder of the grade is obliged to choose where the borderline is and then stick to the choice in order to prevent the equivocation between different senses according to Lewis. One of the signs of a good assessor is reliability and a sign of reliability is continuity. Goodman though this helped secure a best model for achieving ‘closeness to truth’ in a world where the same criteria in different languages could achieve different levels of closeness to truth. An awarder is guided by looking for ‘good continuity’ of assessments (Golman 1989). She awards a grade and for good continuity settles on the same grade for a candidate that strikes her as being saliently similar. Maintaining a statistical continuity is the motivation behind Cresswell’s pragmatic solution to the vagueness of borderline grades. By using statistical reasoning a continuity of statistical pattern is preserved. This is helpful when trying to avoid accusations of grade inflation. Norm referenced assessment based on a psychometric approach generalizes this stabilization technique. Some continuities need not be strict continuities if we accept that closeness to the truth is independent of logical equivalence. There may be continuities by analogy or approximation (Frisch 2005). Use of analogy and approximation are part of the repertoire of anyone with linguistic competence. Goldman thinks that vague borderlines are not detected because we tend to overapply this competence to ensure good continuity. A small difference in sense obliges us to treat it in the same way in order to maintain good continuity (Goldman 1989). Others think that we are psychologically committed to gestalt switches. Small differences agglomerate until we reach a limit. At the limit of holding small differences under a single concept a switch of perspective occurs and we treat further differences as something else. The induction step of the sorites only applies over a small scale that is limited by this human psychology (Raffman 1994). Sorenson shows that Raffman’s idea condemns good arguments (Sorenson 1998, p7). A sensitivity to perspective seems compatible with not switching perspective. Induction steps over the same scale as in a sorites can be constructed to preserve the truth of first step and its final step, which Raffman’s solution can’t explain. The scientific grader requires that ambiguity is eliminated. Precision is the ideal (Carnap 1950). Ambiguous language needed to be precisified under cumulative sharpenings. In a scientifically ideal language the sorites could not be a sound argument. All terms would be precise. There would be a precise borderline between grades. Historically the development of grading was developed with this ideal of precision in mind. It was not thought realizable but nevertheless by grading intelligence in terms of a precise number created the desired illusion (Medawar 1977). The idea of grades expressing subjective value is replaced in this ideal with something akin to the discovery of brute, objective facts about people. Grades as Urmsonian subjective values were considered tainted with ambiguity and imprecision and so were not fit for purpose. Agassiz wrote in the mid nineteenth century about the ideal of having a discourse-free, and therefore precise, social science modeled on natural science when he wrote that, ‘Naturalists have the right to consider the questions growing out of men’s physical relations as mere scientific questions, and to investigate them without reference to either politics or religion.’ (Agassiz, 1850, p111) Brigham argued against immigration on the grounds that the immigrants had been scientifically proved to be not smart enough: ‘The steps that should be taken to preserve or increase our present intellectual capacity must of course be dictated by science and not by political expediency’ (Brigham 1923). This approach embeds ‘…a tendency to assume biological causation without question, and to accept social explanations only under the duress of a siege of irresistible evidence. In political questions, this tendency favoured a do-nothing policy’ (Myrdal 1944). Precision ranking is a key objective of psychometric measurements lying at the heart of the traditional assessment paradigm. Ranking is simplified by assigning simple numbers to complex entities, ‘… the illusion embodied in the ambition to attach a single number valuation to complex quantities’ (Medawar 1977, p13). There is an extensive literature about the arguments for ranking and the modern history of the phenomenon. Craniometry was ‘the leading numerical science of biological determinism during the nineteenth century’ (Jay Gould, 19981, p25). Intelligence testing in the twentieth century has become what craniometry was for the nineteenth. Eugenics, racism and sexism all used numbers to rank and discriminate. The literature emphasises how early theorists of intelligence measurement considered their work as of an empirical nature. Binet very early on wrote that; ‘It matters very little what the tests are so long as they are numerous’ (Binet 1911, p329). Binet assigned age levels to tasks and then tested people against this level. Successful performance of the task meant that the person was working at that age level. This was recorded in 1908. This is the criterion for IQ levels used since then. In 1912 Stern adjusted the number; mental age was to be divided by the mental age not subtracted from it and this figure has become the intelligence quotient. Binet didn’t think that intelligence was a single entity but knew that the illusion created by a single number would be powerful. ‘ We feel it necessary to insist on this fact because later, for the sake of simplicity of statement, we will speak of a child of 8 years having the intelligence of a child of 7 or 9 years; these expressions, if accepted arbitrarily, may give place to illusions’ (Binet 1911). The literature shows that it was in the USA that the illusions were taken as reality, grafting a spurious hereditary theory to the idea of measureable intelligence quotient. The literature identifies L.M. Terman, RM Yerkes and H.H. Goddard as doing this. It was Yerkes who persuaded the US army to test 1.75 million soldiers in WW1. Goddard is linked to the beginning of the eugenics movement. Intelligence was linked with crime and perversion as well as the alleged feeble-mindedness and volatility of women and non-white people. ‘The intelligence controls the emotions and the emotions are controlled in proportion to the degree of intelligence…It follows that if there is little intelligence the emotions will be uncontrolled and whether they be strong or weak will result in actions that are unregulated, uncontrolled and, as experience proves, usually undesirable. Therefore, when we measure the intelligence of an individual and learn he has so much less than normal as to come within the group that we cal feeble-minded, we have ascertained by far the most important fact about him (Goddard 1919 p272)’. The literature identifies the high stakes that came to depend on intelligence scores in the USA. The literature charts the impact of psychometric testing on the populations tested. Terman, for example, linked the requirement of high IQ scores to modern technological society and as a result of his approach high status jobs were largely closed to individuals with low IQ scores (Terman 1919). This has largely remained the case ever since. The removal of the sociopath was another use of IQ testing (Terman 1916). The view that social ranking was a natural outcome of IQ was also justified by these ideas. ‘…should we not naturally expect to find the children of well-to-do, cultured and successful parents better endowed than the children who have been reared in slums and poverty? An affirmative answer to the above question is suggested by nearly all the available scientific evidence’ (Terman 1917, p99). Yet not all psychometricians believed in the illusion of precision ranking and numerical grading results. They believed that context and perspective couldn’t be discounted. The literature shows that there were psychometricians from the beginning who thought that IQ evidence was unreliable because it changed as the circumstances of people tested changed. If the intelligence was a fixed hereditary biological fact then this couldn’t happen. For some the tests were affected by cultural knowledge such as language use and knowledge of customs embedded in the tests. The literature also records the way intelligence became reified into a single entity not merely through the age scale methods of the Binet IQ test but also through correlational methods of Spearman’s factor analysis. ‘Each of these two lines of investigation furnishes a peculiarly happy and indispensable support to the other… Great as has been the value of the Simon-Binet tests, even when worked in the theoretical darkness, their efficiency will be multiplied a thousand-fold when employed with a full light upon their essential nature and mechanism’ (Spearman 1914). Spearman’s statistical correlation in the literature has been shown to be an important feature of assessment (e.g. Jenson 1979). It has not always been clear that those making assessment inferences have understood that ‘correlation’ has nothing to do with ‘causality’. Inferences drawn from successful achievement of a task that suggest a causal relationship with a ‘correlated’ factor are guilty of a categorical error. Two reasons support this: firstly, no set of factors can be exclusively correlated in a single way. There can always be a conflict of interpretations. Secondly, any single set of factors is open to multiple interpretation. Reality cannot arbitrate if this is true. The literature records that reification of intelligence has been derived from factor analysis (Spearman 1904, Jay Gould 1981, ch6). The reduction of complex entities to a single factor has been described as ‘physics envy’ attributed to those working in softer sciences. ‘…we must venture to hope that the so long missing genuinely scientific foundation for psychology has at last been supplied, so that we can henceforward take its due place along with the other solidly founded sciences, even physics itself’ (Spearman 1923, p355). Spearman recanted in his last book and thought that reification ‘… is mostly illumination by way of metaphor and similes’ (Spearman 1950, p25). The psychometric test has been enormously important for assessment in schools. Burt, chief psychologist for London schools between 1913-1932, wrote his ‘The Backward Child’ using Spearman’s factor analysis theory (Burt 1937). In it he concludes, ‘…the backwardness seems due chiefly to intrinsic mental factors; here, therefore, it is primary, innate, and to that extent beyond all hope of cure’ (Burt 1937, p110). Burt played a key role in assessment culture in the early years of the century. The 11-plus examination was designed on his vision of a single-ranking based on inherited ability. The high stakes of the 11 plus exam are clear; it was through these tests that children were sorted into different ranks of schools. The test attempted to apply Spearman’s theory to children at the ages of ten and eleven. 20% went to grammar schools. The rest were regarded as unfit for University. Burt wrote at the time: ‘It is essential in the interests alike of the children themselves and of the nation as a whole, that those who possess the highest ability – the cleverest of the clever – should be identified as accurately as possible. Of the methods hitherto tried out the so-called 11+ exam has proved to be by far the most trustworthy’ (1959, p117). The exam was a result of reports from the mid 1920s onwards that realised the hierarchical theory of intelligence. The Hadow Reports of 1926 and 1931, Spens of 1938, Norwood of 1943 and the Board of Education’s White Paper on Educational Reconstruction led to the Butler Act of 1944 led to an assessment system in the UK that embedded Spearman’s ideas about hereditary intelligence. The reports leading to the Butler Act discussed children and education in terms of a ‘general intelligence’, a reification of Spearman’s g factor that was just a correlation factor. For example, in the Hadow Report the author writes: ‘During childhood, intellectual development progresses as if it were governed largely by a single, central factor, usually known as ‘general intelligence’, which may be broadly defined as innate, al round, intellectual ability, and appears to enter into everything the child attempts to think, say, or do: this seems the most important factor in determining his work in the classroom.’ They explicitly used factor analysis theory to justify the views of education and assessment they espoused, including justifying the age at which the test was to be applied (Hadow 1931). Thurstone presented a different version of factor analysis that did not result in an interpretation showing that there was a general intelligence to be measured. Thurstone’s approach was taken to show that children can excel at different abilities and independent qualities of mind. A consequence is the impossibility of a unilinear ranking of pupils. Thurstone wrote: ‘Even if each individual can be described in terms of a limited number of independent reference abilities, it is still possible for every person to be different from every other person in the world. Each person might be described in terms of his standard scores in a limited number of independent abilities. The number of permutations of these scores would probably be sufficient to guarantee the retention of individualities’ (Thurstone 1935, p53). The primary aim of ranking people is threatened by this approach. Burt attacks Thurstone’s approach as one that threatened to ruin the 11+ tests ‘…on the principle of the caucus-race in Wonderland, where everybody wins and each get some kind of prize’ (Burt 1955, p165). Plato’s ‘Republic’ gives an early pithy rational for the need to classify people into different categories of worth. Socrates says that there is no basis in biology for ranking people differently and that the people need to be told lies about inherent qualities bestowed in the womb by God in order to ensure that distinctions of rank and fortune were maintained. Glaucon, to whom this scandal is revealed, comments, ‘You had good reason to be ashamed of the lie which you were going to tell,’ (in Jay Gould 1981, p19) and comments that the lie couldn’t possibly work immediately. However he adds, ‘Not in the present generation; there is no way of accomplishing this; but their sons may be made to believe in the tale, and their sons, and posterity after them’ (Jay Gould 1981, p19). As we have seen, psychometric approaches to norm referencing invented in the nineteenth century formed the basis of attempts to classify people using some sort of Statistical distribution curve and assumed that a single general intelligence factor had been identified. The theory embedded in the psychometric approach impacted on teaching and learning theory as well. It was on the basis of statistical reliability that Galton (1869) thought of intelligence as ‘a natural ability.’ So too was it the basis of Binet and Simon (1916) talking about intelligence as ’a fundamental faculty’. So too Terman (1921) who thought that intelligence was the ability to think in abstract terms and from there a notion of IQ developed which could be uncovered, classified and then tested. Jensen (1979) thought that the statistical reliability measured by IQ could be correlated to measurements of the speed of neural processing, and proposed ways in which this speed could be measured. Jenson’s proposal highlights the way that statistical measurements of easily measured and seemingly objective data, such as speed of neural processing, were seriously thought of as delivering objective facts about mental traits of people. Early in the twentieth century a theory of intelligence dominated educational thinking (Resnick and Resnick, 1992). It assumed that intelligence was something that could be located in each individual and that it could be quantified numerically. The precision of numbers gave the illusion of precision to whatever measurement an assessment resulted in. Therein lay its appeal because it was assumed that the prototypical precise methodology was science, in particular physics. Yet we have shown that although precision is a quality that the scientific method achieves, it is also accommodating of approximation and ‘closeness to truth’ which are largely eradicated by the psychometricians. Ambiguity was thought of as being eliminated because there seemed to be a precise numerical value that located the precise threshold of any grade. This precise threshold was thought to correspond as an exact measurement of a similarly precise intelligence trait. This match is hardly surprising if the intelligence trait is a reification of the measurement (Jay Gould 1981). A further criticism of this position is that it commits a grader to grades that she didn’t intend to commit to. In this respect the psychometric method is open to the same criticism as supervaluationists. The supervaluationist thinks that vagueness is a species of ambiguity and that it can be disambiguated in order to resolve its truth value (Fine 1975). But the critics of this process think that disambiguation needs to take place before truth value can be determined (Tye 1989). Ambiguity is about indecision between propositions. The supervaluationist solution resolves the ambiguity after the truth value has been resolved. The psychometric solution to precision determines sharp thresholds by working out thresholds by statistical reasoning and draws conclusions before graders resolve ambiguities. They are thresholds that come about by fiat. The grader should be deciding which sense she means when she awards a grade. Sorenson thinks that ‘Ambiguity gives the speaker control over meaning’ (Sorenson 1989, p24). The critics of the psychometric attempt to establish precision argue that the process removes this control. The psychometric method chooses the sense. Cresswell’s use of a psychometric solution to decide precise threshold for grades is a pragmatic response to vagueness. Decisions about borderline cases are decided by numerical calculations that decide grader intentions. The process is therefore a violation of sincerity. The marker doesn’t get a say in whether she agrees with the sense being given to a grade through this method and yet the system presents itself as determining a precise truth about the grade and co-opts her into this. Yet it is always possible for a grader to think that although reliable and valid the grade her marking resulted in is not the grade she intended and so it is insincere because a sincere statement requires that it expresses her intention. And in a borderline case she cannot have an opinion, not even one derived from statistical correlations. The development of assessments using this psychometric paradigm assumes that ambiguity is the key problem that assessment theory has to solve. Linked with this theory is a theory of learning and learning. It assumes that people learned discrete skills learned separately and only later used as a whole. The controlling metaphor is that of learning being like a process of building blocks being built up into a complex whole structure. As Shepherd (1991) showed, this coupled with a Behaviourist, stimulus/response model of teaching and learning dominated by a linear and sequential learning programme. Linked to this was an assumption of what Shepherd calls ‘grade retention’ whereby a learner cannot work on complex learning until basics are learned first. It paid no attention to the effect of motivation and self-esteem of the learner and that higher order skill practice strengthens basic skills. Psychometrics and behaviourism both emphasised decontextualisation (Resnick and Resnick, 1992, p43) despite evidence of the limited transferability of decontextualised learning (Wolf et al, 1990). They emphasised isolated individuals as the only site of learning engaged in ‘isolated activities focused on symbol manipulation divorced from experience’ (Gipps 1994, p22). The decontextualisation was emphasised as a key element in the objectivity of the process of assessment. As Gipps explains, the conception of teaching developed within this paradigm emphasises practice, repetition and basic skills testing where the learner is viewed as a largely passive absorber of information and facts (Gipps 1994, p22). Its ‘scientific‘ credentials were based on what it assumed assessment was about, which was largely measurement of a learner’s attributes. Scores were interpreted in terms of their relation to norms and technical issues associated with assessment became primary. Assessment was able to measure learners against each other and so technicalities assuring fairness, reliability, validity and standardisation were at a premium (Gipps 1994, p5). Objectivity was reinforced by the requirement of testing to be accurate and meaningful. In turn, this requirement led to assessment as being a powerful method of categorising learners. IQ scores, reading ages and rankings were developed out of this approach and became dominant frames of reference for understanding rates of learning success in populations. Numerous critics have commented on other features of this approach to assessment. It assumes, falsely, universality of its results. (Berlak et al 1992, Goldstein 1992, 1993) What the claim of universality is is that the results have the same meaning for all individuals, that it is using the same ‘construct’. ‘Construct’ is a label for the underlying attributes and skills that are being assessed. The argument against universality of results is simple. Standarised tests reduce a construct to a single dimension but do not make it knowable which particular aspect of the construct is being assessed. A test result is therefore of an epistemic mystery. It tests something but what it is is unknown. It links to a second feature of this approach to assessment, that of unidemensionality. This is just the assumption that any test should be of just one underlying dimension. Goldstein thinks that this approach is illogical given that nearly everything we want to assess is multi-dimensional (Goldstein 1993). However, the requirement of unidimensionality is an attempt to avoid the problem of comparing incommensurate dimensions. For example, if the construct of a particular piece of good writing requires both good secretarial skills and imagination, it isn’t clear how the two dimensions could be assessed on a single scale. This is the issue identified above in the dispute between Burt and Thurstone. It wasn’t until the 1950’s that psychometric approaches to assessment were being challenged by an alternative paradigm. This was partly due to difficulties with the psychometric and behaviourist approaches as well as the strength of alternative theories. Of difficulties identified one was the realisation that perhaps not all the defects of a test performance were individualistic. Influence of home, parenting, school, teacher, in short, environmental influences, were increasingly being identified as problematising the results of test scores. It was also recognised that standardisation of the administration of assessments, requiring standardisation of bureaucratic organisation, administration and tasks was needed on top of the standardisation of scoring which had been up until then the main locus of interest. The influence of alternative approaches also weakened the psychometric paradigm. Glaser published his paper on criterion-based assessment in 1963 which separated educational from psychometrical assessment and was influential (Wood 1986). The purpose of assessment changed from ranking to giving ‘executive advice’ to both students and teachers. The notion of authentic, performance-based assessment developed where assessments were developed as actual examples of what was being assessed rather than proxies. Gipps claims that the SATs in the UK introduced in 1988 by the DES were examples of this (Gipps 1994). A new paradigm of learning based on new cognitive and constructivist models has been developed since the 1990’s. Learning is more likely to be understood as a network of connections where knowledge is constructed, is situational and where learning is knowledge dependent. However, it is entirely possible that, though psychometrics was a far more influential movement in the USA than in UK, it offered a clearer answer to the question of how assessments can be made reliable (Wiliam – unpublished work in progress). Both Gipps and Stiggins both ask how the issue of how to combine authentic assessment with reliability (Gipps 1994, p12, Stiggins 1992). It seems to be a ‘paradigm clash.’ (Gipps 1994, p12) It is the difficulty of resolving this that the legacy of psychometrics still retains influence (Mehrens 1992, Wiggins 1982a, Miller and Seraphine 1992). Further problems arise around the issue of equity where it is difficult to resolve the issue of ensuring that ‘like is treated as like’ (Baker and O’Neil 1994) So in high stakes assessments critics like Black, Wiliam and Hilton think there is still an emphasis on relative rankings rather than actual accomplishment, the privileging of easily quantified displays of skills and knowledge, individual rather than collaborative forms of cognition and an implicit though false assumption that the Bell curve distribution of assessment performance gives information about what people are capable of learning, reinforcing the essentialist, fixed intelligence theory derived from a psychometrics based theory of learning (Black & Wiliam 1999, Hilton ?). All these approaches produce precision by fiat. They produce precision by ignoring the intended meaning of a grader. Just as a sieve can sort without any comprehension of what is being sorted, the psychometric paradigm can ignore intended meanings of graders to decide grade thresholds (Sober 1993). Ambiguity is the wrong kind of obscurity. Vagueness remains a problem after ambiguity is removed. Cresswell recognizes this. He thinks that after validity and reliability have been secured the sorites lies in wait. The chapter has shown that models of assessment based on physics envy and spurious attempts to achieve scientific levels of precison as found in the abstract sciences are poor models to adopt. They fail to produce a full model of potential blocks to good assessment and in particular don’t model the absolute vagueness of Sorenson. Applying Sorenson to this model shows that such systems are at best capable of addressing relative borderline cases, cases where it isn’t absurd to agree to a stipulated answer. In the Batman film ‘The Dark Knight’ the Joker is a character who is not interested in getting results that anyone can understand. He even burns the loot because he isn’t after anything so mundane and reasonable. Psychometric measurements are like the Joker, quite prepared to see validity disappear in order to achieve an effect of total consistency. Both are hideous.

chapter 4: THE DECLINE OF PHYSICS ENVY?

‘No one realized that the book and the labyrinth were one and the same…’(Borges: The Garden of Forking Paths)

4.1 introduction

The chapter examines the alternative to the old paradigm discussed in chapter 3. Criticism of the scientistic paradigm of education focused on its failure to adequately model the expressiveness of language and accused it of being both invalid and distorting pedagogical practices (Gipps 1994, Wiliam 1994, Resnick and resnick 1992, Berlak et al 1992, Goldstein 1992, 1993). The key points of criticism involved attacking assumptions of decontextualised learning, assumptions about the universality of results based on such assessments, assumptions about the reality of the construct traits that the approach assumed were available for such testing, assumptions of the uni-dimensionality of these constructs, its ignoring of the incommensurate nature of tested items, and assumptions that test performance were totally individualistic. It shows that historically these criticisms were the basis of change in educational assessment. It is linked to developments of criticisms found within a broader psychometric community in the 1950s exemplified in a dispute between Thurstone and Burt. In education Glaser’s introduction of a criterion based assessment shifts the paradigm (Glaser 1963, wood 1986). The chapter examines how the presumed objectivity of the scientistic paradigm was replaced by an intersubjective paradigm which was found in various theories of language such as social constructivism, constructivism and constructionism (Wiliam 1994,Vygotsky 1987, Rogoff 1990, Gredler 1997, Prawat & Floden 1994, Lave & Wenger 1991) which emphasised the role of social norms in the construction of knowledge and belief. I think these are all versions of conventionalism, awhich thinks ‘…that conventions are ‘up to us’, undetermined by human nature or by intrinsic features of the non-human world’ (Riscoria 2010 p1) Any assessment that broadly draws on these theories for assessment models are labeled Cogniive Diagnostic Assessments. (CDAs) This is discussed. I discuss the first of two models of conventionalism that I think underpin these approaches in educational assessment. One model of conventionalism is Millikan’s idea that conventions aren’t underpinned by rational beliefs. This contrasts with another model that thinks that conventions do require rational beliefs discussed in the next chapter (Hume 1777/1975; Lewis 1969, Searle 1969; Sellars 1963). The chapter argues that the motivation for these adopting a version of CDA and conventionalism is an attempt to maintain expressive flexibility of the open-endedness of language in opposition to the scientistic paradigm. The chapter then suggests that there are two divergent approaches to CDA. The chapter thinks that CDA has sometimes attempted to constrain the role of judgment in evaluations. This is an approach that implies an approach like Millikan’s conventionalism (Wiliam 2000, Leighton & Gierl 2007, Leighton et al 2010). This approach is summarised by Wiliam when he writes, ‘To put it crudely, it is not necessary for the examiners to know what they are doing, only that they do it right’ (Wiliam 2000 p10). The chapter criticises this approach of CDA, again arguing that it makes assumptions about language that cannot model absolute vagueness. It also uses arguments about innateness, modularity and the use of counterfactuals that are required for any judgment involving grading (Fodor 1970, 1983, Sober 1993, Gould and Lewontin, 1979, Fodor and Piattelli-Palmerini 2010).

4.2 a new paradigm

What was argued in the last chapter was that the paradigm was designed to remove indeterminate borderlines through disambiguation. However, most predicates that are ambiguous are also vague. The problems that ambiguity raise for making decisions about borderline cases are not solved by a system of disambiguation if vagueness is not a species of ambiguity. Sorenson thinks that it isn’t and that the challenge to assessment from vagueness remains even after the issues of reliability and validity have been met. Cresswell makes this point explicitly: ‘Even when the marking is perfectly reliable, we can be sure that some candidates in the range of marks where the boundary … lies will, by definition and unavoidably, be wrongly graded’ (Cresswell 2003, p11). Cresswell rejects a purely psychometric approach. In order to maintain the resources of expressive power a new paradigm has been adopted. The psychometric paradigm and its spurious ‘scientific objectivity’ is replaced. The new paradigm contrasts itself with the old by being a species of ‘subjectivity.’ The subjectivity is contrasted with individual subjectivity and draws on notions of ‘intersubjectivity’ from various theories about social constructivism, constructivism and constructionism (e.g. Rogoff, 1990; Vygotsky, 1987) where social factors shape and evolve the construction of knowledge (Gredler, 1997; Prawat & Floden, 1994, Lave & Wenger, 1991). In this paradigm assessment validation is understood in terms of hermeneutic practices. Moss thinks A hermeneutic approach to assessment would involve holistic, integrative interpretations of collected performances that seek to understand the whole in light of its parts, that privilege readers who are most knowledgeable about the context in which the assessment occurs, and that ground those interpretations not only in the textual and contextual evidence available, but also in a rational debate among the community of interpreters (Moss, 1994 p7). Scriven thinks that a community of interpreters arises out of social convention and practice: ‘The community of inquirers must be a critical community, where dissent and reasoned disputation (and sustained efforts to overthrow even the most favoured of viewpoints) are welcomed as being central to the process of inquiry (Scriven, 1972 p30-31), Wiliam labels a Moss/Scriven approach to assessment ‘construct-referenced’ (Wiliam 1994). The learning theories associated with construct referenced assessment (Constructivism, social constructivism and constructionism) are prototypically conventionalist in that they all assume that assessment conventions are ‘…up to us’ (Rescoria 2010, p1). The three theories are similar in that they take beliefs about the world as part of the way reality is constructed and that belief formation is a social activity. Members of a social group invent the world rather than discover it according to this approach. (Kukla, 2000) Knowledge is socially and culturally constructed (Ernest, 1999; Gredler, 1997; Prat & Floden, 1994). Learning is similarly social. Rather than taking place inside the head of an individual it is something that happens when there is meaningful social engagement between individuals socially (Derry, 1999; McMahon, 1997). Constructivism differs from social constructivism in that it states that individuals can construct meaning from direct interaction with reality. Social constructivism denies this and thinks that meaning can only be constructed through social interaction (Crotty, 1998). There is an assumption that communication generally shares common interests and assumptions and this is the basis of what these theories think of as inter-subjectivity. (Rogoff 1990) Intersubjectivity is a key to the way beliefs are formed in groups (Gredler, 1997; Prawat & Floden, 1994). Intersubjectivity is the mechanism by which people extend their knowledge and their beliefs. (Rogoff, 1990; Vygotsky, 1987) Intersubjectivity is formed by cultural and historical factors forming the context in which social interaction takes place. (Gredler, 1997; Prawat & Floden, 1994; Shunk, 2000; McMahon, 1997). Context presents perspectivism as an important part of any learning theory based on social constructivism. Context has been theorized in terms of social constructivist learning in at least four distinct ways (Gredler 1997). One insists on group project work designed to produce a product which the group makes meaningful through the interaction (e.g. Prawat & Folden, 1994). The second approach focuses on prioritizing ky concepts that are used as foundation concepts for learners (e.g. Gredler, 1997, p59; Prawat, 1995) A third emphasizes the emergent nature of learning. It is a pragmatic approach that implements social constructivism as the need arises. For emergent social constructivists there is both individual and social points of view regarding learning (Cobb, 1995; Gredler, 1997). A fourth common perspective is a situated cognition point of view which insists that the mind and environment interact in such a way that changing the environment necessarily changes the nature of a task. From this perspective decontextualised testing of learning is impossible because of the necessary link between task and its context (Bredo 1994; Gredler 1997). Social constructivist models of learning are collaborationalist and insist on a strong relationship between individual and society in its broadest sense. Learning is practical because knowledge is situated among practitioners who are described in the literature as ‘communities of practice.’ Teaching and learning that opposes the old paradigm involves peer collaboration, peer assessment, problem based instruction and other methods involving the learning with others. (Lave & Wenger 1991, Shunk 2000). Vygotsky may be a constructivist. His development of the idea of a ‘Zone of Proximal development’ (ZPD) and ‘More Knowledgeable Other’ (MKO) and ‘Scaffolding’ can be understood in terms of how carefully managed social interaction can lead to learning (Vygotsky 1978). This contrasts with Piaget’s Constructivism which is much more a part of the older paradigm, basing itself on a developmental theory of thinking. It asserts that there are common learning stages that all thinkers must pass through and that these can be observed in a detached scientific way. The stages used by Piaget are based on psychometric theory and Associationism, the empirical learning theory of the old paradigm. Some readings of Vygotsky have understood him as being closer to Piaget’s position than the reading above suggests (e.g. Fodor 1975). The constructionism described by Kafai and Resnick (Kafai & Resnick 1996) focuses on the dynamics of individuals with their environment and peers and eschews the role of instruction. Learning is acquired through building artifacts. It is an approach associated with Seymour Papert who writes that: ‘The … knowledge is built by the learner, not supplied by the teacher… that happens especially felicitously when the learner is engaged in the construction of something external or at least sharable" (Papert, 1991, p3). All of these (except Piaget as explained above) contrasts with the understanding of ‘intersubjectivity’ by those sharing in the scientific objectivity paradigm of psychometrics. There, ‘intersubjectivity’ ‘…assumes the very strict form of a more or less mechanical process of experimentation whose results can be established by relying on little more than simple perception (assuming of course a rich background knowledge)’ (Raz 1999, p119). Conventionalism draws attention to what underpins the conventions of meaning constructed in these various ways. There is a contrast between philosophers who think conventions rest on rational and irrational underpinnings (e.g. Hume 1777/1975, Lewis 1969 Burge 1975; Miller 2001) and those that think rationality plays no part Cognitive Diagnostic Assessment (CDA). (Millikan 2005). This distinction is mirrored in the new paradigm where there is a contrasting position regarding the role of belief and judgment. The learning theories outlined above offer explanatory information about constructs to be assessed in order to make validity of assessments secure in the new paradigm. Inferences from tests are required to understand and reflect differences between groups, changes over time and processes of learning (Cronbach & Meehl, 1955, Kane 2001). In respect of this new demand that ‘These new assessments …make explicit the test developer’s substantive assumptions regarding the processes and knowledge structures a performer in the test domain would use, how the processes and knowledge structures develop, and how more competent performers differ from less competent performers…’ (Nichols 1994, p578) the new paradigm no longer merely requires content specifications to describe their objectives because ‘…efforts to represent content are only vaguely directed at revealing mechanisms test takers use in responding to items or tasks’ (Nichols 1994, p585). CDA is based on learning theories discussed by constructivists, social constructivists and constructionists and is conceived as a form of construct referenced assessment as defined above. However, it believes value judgments by assessors are not important (e.g. Leighton et al 2010). Rather than such judgments, assessors are to understand the CDA learning theory to an extent that allows them to interpret data from traditional tests in terms of the theory. The approach adopts a form of Millikan’s view of conventionalism where blind reproduction of a pattern is enough to be a convention. Actors need have no beliefs about either the origins or reasons for such conventions but are required merely to reproduce them (Millikan 2005). Millikan thinks that lack of imagination, conformism, playing it safe with what has worked all eliminate the need for beliefs to guide actions. The rational decisions of rational agents is not required. She agrees with Burge who thinks that ‘the stability of conventions is safeguarded not only by enlightened self-interest, but by inertia, superstition, and ignorance’ (Burge 1975, p253) but is more extreme than Burge because she thinks there is no need for any rational underpinnings. In this way the reliability and consistency of the old paradigm is joined by a different learning theory: reliability and validity is increased by such a procedure without the need to involve value judgments of teachers. This version of CDA is very much closer to a Piagetian approach where there assumptions of fixed levels of intellectual formation that can be identified by a valid test. It tends towards representing itself as a branch of scientific psychological theory, involved specifically in the psychology of test taking (Leighton & Gierl p7). The CDA based approach mixes the old psychometric approach with hermeneutics, and claims to be able to deliver higher standards of reliability and validity than the older paradigm. There is an extensive literature about educational assessment that is concerned with the objectivity of high stakes summative assessment in education. Much of the literature presents teacher assessments as being necessarily subjective (e.g. Sadler 1985a) because they are qualitative judgments and such judgments ‘rely on the human brain as the primary decision-making instrument’ (C. Wyatt-Smith et al, 2010). Teacher based assessment is considered contentious in the literature because of this subjectivity. A contrast is often drawn between teacher based assessments for low stakes assessments and those of a large scale, centralised curriculum and testing initiative for high stakes which are thought to require different assessment arrangements in order to eradicate the subjectivity of qualitative judgment (Freebody and Wyatt-Smith 2004, Maxwell 2002, Hatie 2005). The literature records pressure to legitimate subjective assessments through ensuring that quality judgments are made that are consistent with each other and with agreed standards. Validity and reliability are key concerns for legitimate assessments for high stakes according to the literature. Shared criteria and standards and the use of moderation are key components of the methods used to achieve valid and reliable assessments of value (Maxwell 2007, Linn 1993, Gipps 1994, Wilson 2004). The literature contends that the intra-subjectivity of these methods produces greater validity than reliability (Maxwell 2001). Wiliam contends that there are methods of intra-subjectivity that produces greater reliability as well (Wiliam 2001). The intra-subjectivity links with the philosophy of conventionalism that argues ‘…that conventions are ‘up to us’, undetermined by human nature or by intrinsic features of the non-h.uman world’ (Riscoria 2010 p1) Conventions are voluntarily chosen. They result in inventions. Goodman writes that; ‘… the conventional is the artificial, the invented, the optional, as against the natural, the fundamental, the mandatory’ (1989, p.80). Lewis, for example, thinks about conventionalism underpinned by either practical or epistemological reasons where ‘…the expectation of conformity to the convention gives everyone a good reason why he himself should conform’ (Lewis 1969, p167). Lewis approaches decision making using a game – theoretic orientation. It is a particular approach to solve coordination of beliefs, a key issue for assessment where there are disputes. Not all conventions are designed to solve coordination disputes (Davis 2003; Marmor 1996; Miller 2001; Sugden 1986; Vanderschraaf 1998). Grading coordination is a situation where we do care that conventions help sort out disputes and moderation meetings are prototypical examples of where these disputes occur. Training graders to coordinate grading decisions is important in the literature of this approach. Removing grader value-judgments is an important part of the training. For example, Brookhart (Brookhart 1993) thinks that teachers make value judgments when assigning grades and that this was a source of unreliability that training in assessment would help eradicate. Brookhart noted that it wasn’t the only source of unreliability in teacher assessments. There is much literature about how training enhances reliability in various ways. Some of the literature writes about being ‘assessment literate’ and that a key element of such literacy is knowing about measurement theory, which is concerned to a significant extent with reliability and validity, as well as other things such as being able to evaluate the diagnostic information of assessments (Stiggins 1991, 2001, Black and Wiliam 1998). In this literature certain features of how assessment practices are learned are discussed. For example, there are studies that show that moderation is more important than a standards referenced framework alone. Hutchinson and Hayward (2005) have produced evidence to show that moderation is essential to support teacher judgments. Cooksey, Freebody and Wyatt-Smith (2007) have produced evidence that a standards referenced frameworks alone cannot regulate judgments to produce validity or reliability nor reveal how any judgment was made (Wyaltt-Smith 2010). There is also an extensive literature showing that teachers have negative attitudes towards large-scale assessment (e.g. Natriello 1984; Stiggins, Conklin and Bridgeford 1986; Schafer and Lissitz 1987; Guskey 1988; Sparks 1988; O’Sullivan and Chalnick 1991; Stiggins 1991, 2001; Brookhart 1993, 1994, 1997; Plake and Impara 1997; McMillan 2001; Childs and Lawson 2003; Skwarchuk 2004; Wiliam et al. 2004; Mulvenon, Stegman and Ritter 2005). There is some literature that suggests that teachers of low achieving students distrust large scale assessments more than those serving high achieving students (e.g. Skwarchuk 2004). There is also literature that claims that teachers of high achieving pupil populations make better use of assessment information in a diagnostic way than those working with less successful learners (Black and Wiliam 1998; see also Guskey 1988; Allinder 1995). Some of the literature indicates that teachers don’t fully understand principles of assessment (e.g., Stiggins, Conklin and Bridgeford 1986; Schafer and Lissitz 1987; O’Sullivan and Chalnick 1991; Black and Wiliam 1998; McDonald 2002; Childs and Lawson 2003). Other literatures are about different kinds of assessment theories. Current literature reflects a discussion between ‘traditional’ and ‘cognitive diagnostic’ (CDA) approaches to assessment (NRC 2001). CDA assessment is designed to provide information about how well a learning objective has been mastered (e.g. Brookhart 1997; Black and Wiliam 1998; Shepard 2000) Traditional assessment is designed to categorise students and emphasise differences using tests designed to produce statistical norms and Bell curves on the lines of intelligence tests an IQ measurements. CDA emphasizes learning theories and empirical studies. (Anderson et al. 1990, Hunt and Minstrell 1994, Siegler and Shipley 1995, Snow and Lohman 1989; Nichols 1994; Popham 1999; NRC 2001; Ercikan 2006; Gorin 2006; White 2005, 2006; Mislevy 2006; Schafer 2006; Leighton and Gierl 2007a, 2007b)) Traditional assessments are not based on such studies of and concerns with learning theory (Leighton and Gierl 2007a). However the distinction between CDA and traditional approaches is not a distinction between formative and summative assessments because CDA can be used for both formative and summative purposes. The engagement with learning theory is the heart of a real distinction between the two approaches. An approach to assessment that is focused on learning theory can nonetheless be used summatively, where there is no diagnostic purpose and the underlying pressure remains selective and norm referenced (Wiliam 1994). Reliability and validity are central concerns of any such assessment, especially for high stakes which are usually large scale assessments (Linn 1989). Much of the literature concerns standards of validity and reliability in assessments that attempt to incorporate teacher assessments, portfolios and other practices derived from adopting ideas developed from understanding learning theory (e.g. Stiggins 1991; Lukin et al. 2004). There are studies of how developments in incorporating learner theory focused practices into assessments are able to achieve valid and reliable standards as well as the belief that traditional assessments can in turn deliver diagnostic information (e.g. Gierl et al. 2005; Tatsuoka, Corter and Tatsuoka 2004; Briggs et al. 2006; Huff and Goodman 2007). The current literature emphasises that assessment literacy has become a matter of both mastering measurement theory involving validity and reliability but also of mastering the CDA information of assessments. There is some literature that suggests that CDA practice is not well understood by most teachers (Black and Wiliam 1998; Stiggins 1991; Leighton, et al 2010). This change of emphasis in assessment theory from just traditional forms of assessment designed to relativise test scores against a norm towards a recognition that educational assessment needs to be more than that if validity of inferences from test scores is to be established doesn’t replace the scientistic model but adds to it (Cronbach, 1957, Embretson 1983, Loevinger 1957, Pellagrino & Glaser, 1979). It isn’t always clear. Some write as if it’s a complete break. As a representative sample of this thinking, Gipps writes: ‘What we are observing in assessment is a shift in practice from psychometrics to educational assessment, from a testing culture to an assessment culture. However, it is not just that we wish to move beyond testing and its technology, but that the shift involves a much deeper set of transformations, hence the paradigm shift: our underlying conceptions of learning, of evaluation and of what counts as achievement are now radically different from those which underpin psychometrics’ (1994, p158). There is hedging in this thinking. The psychometric norm-referencing still underpins criteria and construct referenced assessments and traditional assessment systems are thought capable of being meaningfully used to deliver at least some of the required validity and reliability of assessments requiring diagnostic feedback. So, for example, Mislevy writes: ‘A paradigm shift redefines what scientists see as problems, and reconstitutes their tool kit for solving them. Previous models and methods remain useful to the extent that certain problems the old paradigm addresses are still meaningful, and the solutions it offers are still satisfactory, but now as viewed from the perspective of the new paradigm’ (Mislevy, 1993, p4). The plausibility of this thinking depends on how far it is credible that a new paradigm of assessment need only change interpretive practices rather than the constitutive elements of the whole practice. This actually contradicts what criterion and construct referenced assessments were thought to be doing. So when Glaser introduced criterion-based assessments in the 1960s it was explicitly intended to replace the psychometric based norm referenced assessments. ‘What I shall call criterion-referenced measures depend upon an absolute standard of quality, while what I term norm-referenced measures depend upon a relative standard’ (Glaser 1963, p519). So criterion assessment was supposed to assess ‘…student achievement in terms of a criterion standard thus provide information as to the degree of competence attained by a particular student which is independent of reference to the performance of others’ (Glaser 1963, p520). By 1994 it was clear that this wasn’t the case as standards were being artificially created using non-random test samples to secure reliability of test result distribution (Gipps 1994, p79-80, Wiliam 1994). The hedging can partly be explained in terms of the scientific objectivity that psychological theories claim to legitimise themselves. The old paradigm of assessment based on psychometrics was largely influenced by psychologists. Galton, Binet, Spearman and Burt were psychologists. Skinner was a psychologist (Leighton & Gierl 2007, p4). There are some who argue that assessment has become a domain for psychologists rather than educationalists and therefore ‘those psychologists specializing in psychometrics have been devoting more and more of their efforts to refining techniques of test construction, while losing sight of the behaviour they set out to measure’ (Anastasi 1967, p297). Mislevey writes that ‘it is only a slight exaggeration to describe the test theory that dominates educational measurement today as the application of 20^th century statistics to 19^th century psychology’ (Mislevy 1993, p19). The history of English school’s curriculum and assessment supports this (White 2005, 2006). A feature of the literature regarding new construct referenced assessment is bifurcation regarding attitudes towards the role of value judgment in assessment. Construct referencing is based on various learning theories and is a kind of CDA. There are two strands to this developing field. One set of literatures emphasizes the eradication of value judgments from CDA. A good representative of this type of CDA is Leighton (Leighton et al 2010). They summarise the position, writing: ‘CDAs are not based on value judgments about student cognition, but on empirically-based models of learning of how students of different knowledge levels think and reason in response to assessment features’ (Leighton et al 2010, Leighton and Gierl (2007a). A result of this is an emphasis on replicable techniques designed to generate the processes that inform student learning. Training teachers in techniques that supposedly help assessors scrutinize their own beliefs about the empirical evidence needed to make an assessment valuable is a belief that values can be read off from empirical data. This coheres with Millikan’s unreflective conventionalism (Millikan 2005). It also replicates some of the thinking about learning it is supposed to replace. Resnick, writing about the old learning theories that traditional norm referenced psychometric testing of the 1920s was based on, noted that there was an embedded assumption ‘… that knowledge and skill can be analyzed into component parts that function in the same way no matter where they are used’ (Resnick, 1989, p. 3) Resnick noted that this ‘building block’ model of learning assumed that ‘… complex competencies could be broken down into discrete skills learnt separately, through developing individual stimulus-response bonds’ (Gipps 1994, p18). This encouraged teaching the isolated components. In recent literature about training teachers a similar building block theory seems to be the model where techniques for discrete skills of assessment technique are suggested. Techniques are taught for making sense of empirical evidence, which is chunked down into discrete parts. The techniques are designed to eradicate the need for making value judgments about pupil’s work but rather identify empirical evidence interpreted as levels of learning in the light of a dominant learning theory (Leighton et al 2010). Wiliam writes that assessors are able to follow the procedures and therefore can make the correct assessments without knowing why. An evaluation is thought to be irrelevant. Consistent identification of replicable conditions understood in terms of knowledge levels is what makes a good assessor. Dylan Wiliam summarises the position: ‘To put it crudely, it is not necessary for the examiners to know what they are doing, only that they do it right’ (Wiliam 2000 p10). This is the crucial error of a diagnostic approach to summative assessment. Diagnostics may nor require a value judgment because its purpose is to secure future action. But grading is evaluative and so to remove value judgment is disasterous. Grading without judgment raises the notorious problem of counterfactuals. Counterfactuals are incorrect inferences that without further thought fit the facts as well as the facts. A sieve can sieve the smallest balls mindlessly. The balls automatically fall through the sieve holes. But if all and only the smallest balls are coloured red then we can ask whether size or colour was selected by the sieve (Sober 1993, p98-100). This is the problem of ‘freeriders’ brought about by considering the role of counterfactuals in selecting (Gould and Lewontin 1979, Fodor and Piattelli-Palmerini 2010). Freeriding is the idea that some qualities are accidently attached to qualities that are being selected for a purpose. Free-riding is a phenomenon of counterfactuals. Counterfactuals are ‘what if’ propositions. Only judgment can accommodate counterfactuals. This is the fact that undermines all attempts to smuggle in teleology to any process that claims to be automatic and judgment free. An awarder in high stakes assessment is faced with each candidates performance. The problem is like that facing the evolutionist asking about the whiteness of a polar bear. Did the colour evolve because of whiteness or because of camouflage? There is no a priori law that can answer the question. There are many physical laws that can explain everything about a polar bear although probably not an overarching law that explains all those laws. In particular, only by reasoning it out and looking at the facts, in particular how the polar bear actually became white, can the answer be arrived at. The explanation is not a law but a history. It is decidedly post hoc. Similarly, why Napoleon lost the battle of Waterloo requires a post hoc explanation. There are physical laws explaining everything that happened in the battle, but no law explaining the battle of Waterloo that can tell us why Napoleon lost (Fodor & Piattelli-Palmerino 2010, p137). By proposing an assessment technique that doesn’t require judgment Wiliam moves from post hoc explanation to the equivalent of having an overarching law that has been identified and which the assessment technique mechanisms can instantiate without the need for judgment. Grading conceived of as giving a plausible account of the quality of a particular object doesn’t give an account of a covering law of value for such an object. It doesn’t because it doesn’t think there is one. Wiliam’s account of construct referenced assessment as requiring no judgement, and of Leighton et al who make the same claim is one that begins by developing a theory of assessment that seems to replace the behaviourist scientistic paradigm of psychometrics for a hermeneutical approach but then goes further. They seem to reject the ‘historical’ for a ‘nomological’ explanation. Nomological explanation have a powerful role to play in empirical science. But nothing has a nomological explanation unless they are natural kinds. And for them to work the initial state the world is in has to be stated and the laws too in order for predictions to be deduced from these initial conditions and the application of the laws to those conditions. Specifying the initial conditions is difficult and tends to require simplified modelling in order to remove ‘noise’ from ‘information.’ But assessment theory assumes grades are conventions not natural kinds. Nomlogical explanations are inapplicable. For example, there are no nomological generalisations about what makes a good essay. The point isn’t that there aren’t facts that contribute to any essay’s goodness. It’s rather that there aren’t any general laws about these facts. It is also a point against thinking that there are generalisable laws about the facts that make an essay good that these facts tend not to make sense in terms of the goodness of the essay when considered in isolation. Contributory facts tend to appear in clusters, like syndromes. Good spelling considered in isolation doesn’t make a good essay. This point can be strengthened. Fodor and Piattelli-Palmarini make an exact analogous point about the selection of phenotypes. Phenotypes are the cluster of facts that collectively make up what is selected. No single fact in the cluster is selected because in isolation they serve no purpose. They give the example of size: ‘size affects fitness; but it doesn’t follow that there are laws that determine the function of a creature as a function of its size’ (Fodor & Piattelli-Palmarini 2010, p126). Size is dependent on what other elements fused to other variables. So too are the variables that make a good essay. Goodness chooses the whole fusion of variables, not good examples of each of the variables in isolation. The whole is more than the sum of its parts. This is what makes it unlikely that there are general laws determining the goodness of anything being assessed. There are perhaps general laws about each of the variables taken in isolation, but the goodness of the whole need not be determined by anything like a generalisable law. The sort of mechanism that Wiliam proposes is one that needs to be able to use such a law so that it can be applied without the need to consider counterfactuals. Historical narrative ‘…starts with an event for which it seeks to provide an empirically sufficient cause (it was for want of a shoe that the horse was lost)’ (Fodor & Piattelli-Palmarini 2010 p136-7) Post hoc though its explanations are, they are not ad hoc. It is wrong to think that an assessment system that requires subjectivity, perspectivism and interpretation and so on is anti-scientific. Physics isn’t the only model of science; so ‘many paradigm scientific theories are…best understood as historical narratives … theories about lunar geography, theories about why dinosaurs became extinct, theories about the origin of the Grand Canyon, or of the Solar System or, come to think about it, the Universe’ (Fodor and Piattelli-Palmarini 2010, p137). Only minds can be aware of counterfactuals. Any system of grading must be able to appeal to distinctions between counterfactuals. Counterfactuals can have causal effects only through the mediation of minds. Grades of goodness for different reasons that are coextensive in some but not all worlds raises the difficulty for Wiliam of justifying which world counts automatically, without the use of mind. The CDA system is like the position taken by evolutionary psychology. Steven Pinker writes: ‘Was the human mind ultimately designed to create beauty? To discover truth? To love and to work? To harmonise with other human beings and with nature? The logic of natural selection gives the answer. The ultimate goal that the mind was designed to attain is maximising the number of copies of the genes that created it. Natural selection cares only about the long-term fate of entities that replicate...’ (Pinker 1997, p43). Pinker’s use of ‘natural selection’ in this passage is being used like Wiliam’s notion of techniques for construct referencing. It is supposedly a non-intensional system but when discussed in terms of how it can achieve its ends it becomes intensional. As Fodor comments, ‘The human mind wasn’t created, and it wasn’t designed, and there’s nothing that natural selection cares about; natural selection just happens’ (Fodor & Piattelli-Palmarini 2010, p213). Wiliam denies the need for intensionality in grading and so denies the resources that intensionality and its ability to give counterfactuals a causal role in decision making he requires. Earlier it was contended that CDA uses its learning theory as a kind of Sober’s sieve (Sober 1993, p98-100). A sieve is a mindless sorting mechanism. The sieve can be used to select without anyone knowing why or how. The example from Sober is a sieve that selects the smallest marbles. They happen to be the red ones. The colour of the marbles free-ride on this selection purpose. Complete specification of the sieve is required to discriminate free-riders. A complete description of the sieve includes specifying its design purpose. The counterfactual problem is therefore part of the process of design. The purpose of the sieving has been designed by a mind aware of counterfactuals. If Wiliam is proposing that the awarders of grades are users of sieves then he is proposing that so long as awarders grade according to the assessment theory they will award according to the grading judgment of those who designed the theory. They are in the position of the Urmsonian apple selectors who are forced to follow the judgements of others. They may not know the meaning of the sorting brought about by following the sorting mechanisms they have been given, but there is someone else who does. A further point is that in terms of Sober’s sieve, even the complete specification of the mechanism doesn’t determine how the function of the mechanism is to be understood. A sieve for flour lets the good stuff fall below the holes. A sieve for gold leaves the good stuff above the holes (Fodor & Piattelli-Palmarini 2010, p129-130). So all sieving, even sorting, has to be the property of minds. The argument isn’t that distinctions between freeriders and non-freeriders can’t be made. Mills’ ‘method of differences’ is an early description of how it can be done (Mill 1846). The point is just that to run differences there has to be a conscious mind running the differences. Counterfactuals are what minds do to run Mill’s differences. Minds can use what doesn’t exist to cause things. The past, future and the imagination are all ways in which the mind can make counterfactuals cause things to happen. A system purporting to run inferences without judgment is saying it can assess counterfactuals without a mind. The issue isn’t that there can’t be distinctions made and correct inferences made from a candidate’s performance. It’s rather the idea that there can be a way of doing so that can sort out intensional extension blindly. An intensional context is one where coextensive terms don’t necessarily have the same truth value. Frogs snap at flies. Frogs snap at airborn black nuisances. But there can be imagined possible words where one proposition is true and the other false even though in our world they are coextensive (Fodor & Piatelli-Palmarini 2010). Wiliam’s claim that grading can take place without the awarder of the grade knowing what they are doing either relies on someone else knowing what they are doing or else is disingenuous. Construct referencing makes much of the mind dependency of grading. It also advertises to being a contrast to the older psychometric paradigm that, in idealising a spurious view of physics, hoped to achieve a precise and discourse-free objectivity. In proposing that graders can grade correctly even when they don’t know what they are doing Wiliam seems to be contradicting both of these claims for construct referencing. Construct referencing requires intensionality for the same reason that any grading system needs intensionality. Selecting for good is not the same as selecting. ‘Selecting for’ requires intensionality to identify counterfactuals. It is consequently mind dependent and for that reason is subjective. The theorists who embed thinking and language as being largely social think that constructs used for assessment are likewise social facts. The oddness of Wiliam’s idea of intensionality-free assessments is made more emphatic when learning theories largely opposed by C.D.A. theorists are criticised on similar grounds. Skinner’s operant learning theory is a learning theory analogous to the psychometric ideal of achieving an explanation according to scientific principles. His theory blended empiricism with positivism. His Behaviourism is Empiricism’s tested model of learning (Gellner 1975). Wiliam explicitly criticises this approach to assessment measurement in terms of its behaviourism when he writes, ‘it has become almost commonplace to argue that changes in assessment methods are required because of changing views of human cognition, and in particular, the shift from ‘behaviourist’ towards ‘constructivist’ views of the nature of human learning’ (Wiliam 1994, p1). In fact Wiliam rarely lapses into this CDA-orientated thinking of Leighton, Gierl et al writing that ‘… changes are still firmly rooted within the psychometric paradigm, since within this perspective, the development of assessment is an essentially ‘rationalist’ project in which values play only a minor (if any) role. The validation of an assessment proceeds in a ‘scientific’ manner, and the claim is that the results of any validation exercise would be agreed by all informed observers’ (Wiliam 1994, p1). Methodological positivism prohibited the explanatory legitimacy of invisibles, notably intensionality. So ‘folk theory’ intensional terms like ‘belief’, ‘knows’ and the like were prohibited in any explanation. Learning was to be understood as purely the result of environmental stimulus reinforcing associations between inputs and behavioural outputs. The Skinner Box is its proptotypical experimental apparatus (Fodor 1970). The point of drawing attention to the Skinner Box is to point out the similarity between the Skinner Box and the way standard high stakes tests are organised. A Skinner Box is a closed box in which the test subject is placed. The box is closed to prevent the experimenter from observing the rat and biasing her views about that behaviour. Recording the rat’s behaviour is an electronic, neutral machine. Thus the set-up is supposed to be neutral. The box is empty apart from its test subject. The rat is made hungry by not feeding it. Once hungry, a bar is introduced which, when pulled, food pellets are dropped into the box. The recorded behaviour is converted into a graph measuring the frequency of the rat’s response as a function of time. The graph is able to show how the frequency of the rat pressing the bar changes over time. Prototypically, for a long, while nothing happens. Then the rat presses the bar and then, for a while, nothing happens. After this period it presses the bar again. Now the frequency level of the bar pressing starts to rise. The Associationist Behaviourist assumes that each time this bar is pressed and the food dropped in the association between bar pressing and food becomes reinforced. There comes a point in the experiment when the rat gets the connection and the frequency flattens out at a very high number. The manipulation of this very restricted and controlled situation can vary the distribution curve that results from either increasing or decreasing rates of reinforcement, or increasing or decreasing the hunger of the rat, the force required to press the bar and so on. The learning that is brought about in the rat in such circumstances is labelled operant learning. The situation is an operant paradigm. A scientistic model of assessment shares features with the operant paradigm. It requires an artificially controlled environment, it needs to be able to record and graph patterns of responses to identical stimulus and it is able to adjust and manipulate the curves to suit whatever distribution curve is required. Statistical norms can be constructed out of this, and statistical measurements of learning are based on such norms. Learning is conceived of as a set of habits, each habit a response to certain conditions. Skinner has two kinds of habits when learning is language learning. One is the habit of verbal response to stimulus. So a person has the habit of saying ‘dog’ when in the presence of a dog, for example. The second is to make characteristic non-verbal responses in the presence of certain verbal stimulus. The habits can be used to shape behaviour. This Skinnerian operant shaping, stimulus/response reinforcement can bring about new habits that otherwise might never have been learnt. This is done by gradually leading the learning towards a required behaviour through reinforcing bahaviours that take them towards the desired end. So, for example, a parent observes their baby responding to a glass of water. In the environment of a glass the baby responds. Whenever a random sound produced by the baby approximates to some degree the word ‘glass’ the parent reinforces the association of that sound by rewarding the baby. Once the sound is reinforced then reinforcement is used to get the sound to be successively closer to the required ‘glass’ word sound. Non-verbal responses are supposed to be operantly shaped in a similar fashion. Further to this is a principle of generalisation which allows Skinner to account for the fact that habits can work using a notion of ‘similarity’. So an upside down glass still elicits the response that a right way round one did. Its scientistic approach attempts to explain all mental activity in terms of quantatative stimulus/response behaviours that are all observable. It is a model that purports to present objective material for scientific research. It attempts to give a complete description of the mechanicism to produce its results. The machine like quality enables for high levels of reliability. Subjectivity is formally excluded both from the experiment itself and the generation of its results. The data generate statistical norm distributions that are mechanically generated. Psychometrics and behaviourism/Associationism have combined to produce classical test theory. Scientific objectivity motivates the way this testing model works. Its reliability, its interpretation-free data, its suitability for repeated application, the statistical distribution curve norms generated by the approach appeal to scientific principles of mathematical precision, rigor and objective, mechanised, repeatable procedures. I think this systematic educational mechanism links with current educational debates around Lyotard’s performativity. Lyotard writes; ‘The true goal of the system, the reason it programs itself like a computer is the optimization of the global relationship between input and output: performativity’ (Lyotard, 1984, p11). The issue is broader than assessment. Learning and teaching has been thought of in terms of reinforcement and stimulus/response mechanisms. Learning as the development of habits in response to various stimulus is a key to operant processes of learning. Rote learning models, with the teacher as stimulus provider reinforcing through repetition and drill required formulas are all modelled on a Behaviourist/Associationist operant learning model and minimise the role of understanding and belief in achieving clarity and precision. Behaviourism/Associationism doesn’t work because as a matter of fact there are no reliable responses to any stimulus situation in the actual world that are like those produced by the Skinner Box rat. Reality is too complex, too full of unpredictable variables for any kind of operant theory of learning to be plausible. Chomsky showed that the theory of associationism was too feeble to explain verbal behaviour (Chomsky 1959). The simple input-output mechanism couldn’t account for the productivity of language.

4.3 the skinner box

The Skinner Box operant-learner-friendly environment is a stipulated artificial context. Anything learned from being in such a box would be highly atypical. The problem comes when operant learning is treated as prototypical. Chomsky criticised Skinner’s ‘Verbal Behaviour’ by arguing that operant learning couldn’t generalise to non-artificial, non-operant contexts (Chomsky 1959). Language has properties of productivity and freedom from stimulus input that operant models deny. The impoverishment of the operant model is critical to this criticism. The impoverishment misdescribes both actual environments in which language, thought and learning takes place and also misdescribes the behaviour even within the operant environment. A rat in a Skinner Box is recruited to perform with only a few strategies available. Fodor (Fodor 1970) considers rat behaviour in a Skinner Box and rat behaviour when in its natural ecology building a nest. Fodor thinks that what a rat does in a Skinner Box is rather close to what a human would do. It makes a reasoned assessment of the situation and recognises that the only solution is one of brute force, the pressing the bar solution. The rat understands the connection between bar pressing and food. Fodor thinks that the response rates for typical rats in Skinner boxes makes this a better ininference than the Behaviourist one. Rats typically do nothing for a while and then they try out the bar a couple of times and from then on the curve takes off. Fodor thinks that this confirms his belief that the rat is forming beliefs that lead to his behaviour rather than the other way round. Fodor thinks that rat behaviour inside the Skinner Box is misinterpreted by the Behaviourists (Fodor 1970). Behaviourists think that generalisation is a principle that explains how the rat transfers knowledge of bar pressing in one Skinner Box to bar pressing in another when the bar is in a different place in the box. The skill learnt in the first box is productive; it can be used in the non-identical second box. Generalisation explains this feature according to Behaviourists. But the principle of similarity inherited by generalisability is question begging. For example, a maze running rat masters the maze to get the food. In a second scenario it has to swim the maze to get the food. Swimming requires completely different motor responses. What explains the ability of the rat to master the flooded maze? What principle of similarity is identifiable that explains this ability? This problem of productivity affects all Behaviourist/Associationist explanation. Chomsky asks how it is possible to understand a sentence we’ve never heard before (Chomsky 1959). What follows is a summary of Chomsky’s criticisms of Skinner. Chomsky thinks there are no ‘principles of relevant similarity’ that can explain this ability and yet operant learning requires exactly such principles. In a sense, everything is similar to anything in some way, so by what principle is something relevantly similar? Slippery slope formulations of this problem are relevant to understanding the force of this objection to the Behaviourist/Associationist model. The training of responses in an operant environment has nothing to say about principles of relevant similarity operational in any habit forming response. The principle of generalisability is therefore an empty principle and as such cannot be an explanation of productivity. The only way that habit forming would explain language learning would be if every possible sentence was learned through habit forming. But there are an infinite number of possible sentences. Therefore we would have to know all of them, which is impossible. At some point we have to generalise and at that point there is no account of this in the operant theory of learning. Chomsky thinks that the response to any input is accounted for not only by the input but also the inner state of the responder. Skinner’s model seemed to presuppose some sort of model that considered speech perception as responding to acoustic cues. But Artificial Intelligence engineers have failed to build a machine that can process acoustic cues as phonetic symbols. There are reasons for this that are well understood. Sounds produced in words are not determined sequentially. So a sound at the beginning of a word will be modified by a later sound in the word. Instructions for the transmission of sound can’t be transmitted sequentially because of this. So non-sequential strategies are needed for processing the system. Behaviourism can’t supply these. Chomsky thinks that because of this what language requires is nothing like the learning of a habit based on stimulus/response operant processes. There must be an inbuilt non-sequential processing strategy already running for language to be processed. (And other things too; the non-sequential processing is just one feature of language that seems to necessitate a non-operant system.) The idea is that children have such a system working by the ages of 3 months, maybe even earlier. Unless the Skinnerites argue that in a very short space of time this facility is itself a learnt habit (but of course the same problems arise) then Skinner’s explanations don’t work. Chomsky’s thinks a generative grammar is required to explain the productivity of language and thinking. He thinks a very complex set of grammatical rules operate syntactical and semantic rules to bring about the required productivity. A computerised theory of mind is the approach that offers this. Syntax rules operate to govern how language can be combined. Semantic rules show what combinations are meaningful. There are limits imposed on what kinds of grammars can be used by biological beings. For Chomsky, Fodor and other enthusiasts of a computerised, modular theory of the mind, people acquire language in the same way as they grow arms and have the colour skin they do. It’s an inbuilt facility, probably genetically programmed, literally independent of IQ, culture, training, what actual language it is or anything else. Operant theories of education are common from theories of rote learning, stimulus and response methods and the application of applied psychological theory to educational issues. Fodor, for instance, thinks that where education for Plato and Spinoza was about the problem of selecting the content of education, given the obvious fact that a person can’t learn everything, the view of education since Dewey has been one of deciding which technique to use based on some theory of applied psychology. (Fodor 1970) Fodor takes a radical position, arguing that ‘…it is a completely open question whether any teacher is better than none. Basically I believe there is no reason to think that there is a theory that can provide a technique for teaching, and that an undue amount of time has gone into searching for such a theory’ (Fodor 1970, p148). Sorenson’s solution to the sorites has a logical proof but its explanatory proof involves accepting a theory of modularity of mind. The necessary and absolute ignorance of boundaries is a function of the innate architecture of the mind. CDA assessment can’t justify removing value judgments from grading. I have linked attempts in the alternative to the psychometric paradigm to remove judgments by grade awarders from assessment to the same errors of Behaviourist based assessments such as psychometrics (Wiliam 2000, Leighton et al 2010). CDA mishandles knowledge. It proceeds as if it can remove judgment by fiat. It misunderstands the normative constraints of grading. It assumes a conventionalism that thinks that what we think is up to us. So the new paradigm overlaps in some respects with the old. It is focused on reliability and validity. It attempts to minimise the role of belief and judgment in decision making. It assumes that there are indeterminacies of language. It assumes we can invent determinacies by removing judgment, the source of indeterminacy. A theory of an innate mental architecture gives explanatory proof to Sorenson’s logical proof that solves the sorites puzzle. The proof shows that language is not inconsistent and so indeterminacy is ignorance rather than a feature of natural language. The absolute nature of the ignorance disproves conventionalism. Therefore the CDA of Leighton et al that removes judgement from all cases misrepresents the complexity of knowledge in the same way that Behaviourism/Associationism and Psychomteric approaches to assessment do.

CHAPTER 5: CONNOISSEURSHIP

Two-Face: ‘Unbiased. Unprejudiced. Fair’(The Dark Knight 2008)

5.1 INTRODUCTION

The chapter examines a version of assessment that attempts to include the role of judgement in its approach to assessment. It is assessment that models better than CDA the ‘hermeneutic turn’ in assessment paradigm that Wiliam identified as being all to all modern assessment systems (Wiliam 1994). The approach takes a conventionalist approach to assessments, understood in a way that assumes that ‘…assessments … make explicit the test developer’s substantive assumptions regarding the processes and knowledge structures a performer in the test domain would use, how the processes and knowledge structures develop, and how more competent performers differ from less competent performers…’ (Nichols 1994, p 578) the new paradigm no longer merely requires content specifications to describe their objectives because ‘…efforts to represent content are only vaguely directed at revealing mechanisms test takers use in responding to items or tasks’ (Nichols 1994, p585). Inferences from tests are required to understand and reflect differences between groups, changes over time and processes of learning (Cronbach & Meehl, 1955, Kane 2001). Value judgments are given a high profile in this approach. The chapter examines how it has developed. It agrees that the theory models well the apparent indeterminism of language but is at best can only model relative borderlines. It is therefore an assessment theory that is incomplete.

5.2 CRITERIA AND REASON

In the last chapter a version of assessment was sketched which suggested that a version of conventionalism without reason was an alternative to the scientistic paradigm. The assertion that ‘…it is not necessary for the examiners to know what they are doing, only that they do it right’ (Wiliam 2000 p10) is taken as a prototypical example of this approach. But Wiliam also thinks that ‘…recently … it has become almost commonplace to argue that changes in assessment methods are required because of changing views of human cognition, and in particular, the shift from ‘behaviourist’ towards ‘constructivist’ views of the nature of human learning. However, these changes are still firmly rooted within the psychometric paradigm, since within this perspective, the development of assessment is an essentially ‘rationalist’ project in which values play only a minor (if any) role’ (Wiliam 1994, p1). Dropping grader value-judgement from explanations of grading verdicts is to revert to Behaviourist views of learning. This chapter offers a different version that assumes similar commitments to assessment theory and conventionalism but assumes a role for rational belief in grading verdicts in order to more fully escape the behaviourist paradigm (Hume 1777/1975; Lewis 1969, Searle 1969, Sellars 1963). Dylan Wiliam gives descriptions of the hermeneutical assessment process that requires intentionality and intersubjectivity (Wiliam 1994, 1996, 1998). This approach shares a similar base of learning theory but, in contrast to versions downplaying the intentionality and intensionality constraints of grading discussed in the last chapter, it emphasises skills of appreciation and connoisseurship in assessors and of them acquiring guild knowledge (Sadler 1986, 1989, Eisner 1985, Marshall 2000, Wiliam, 1994, 1996, 1998, Black 1993, Black & Wiliam 1998, Harlen 2004, 2006). Judgement and belief is essential in this approach. It is sufficient that graders know well the subject matter being graded (English Literature graders knowing well English Literature, Mathematics graders knowing well Mathematics and so on) and make judgments based on this knowledge. Intentionality and Intersubjectivity are essential to this approach. Scientistic approaches to assessment have been labelled as norm and cohort referencing and criteria referencing respectively (Wiliam 1994). Both Norm/cohort referencing and criteria referencing are ways of assessment that are rooted in the scientistic paradigm of objectivity and are focused on ensuring reliability and validity. They achieve universal decisiveness and the production of superlatives. These are completeness conditions for any competent assessment system. The simplicity of its approach tidies up the question and answer system. However a chief source of obscurity in any such formal system is, ‘does it speak to the question being asked? The complaint of such systems is that they don’t. They consistently reply to the wrong question (Messick 1989, p31). Norm referencing was used in the UK for assessing the now defunct ‘O’ level tests and is still used in ‘A’ level assessment which are regarded as being the ‘gold standard’ for assessments in the UK. As an aside, this fact supports the contention that the scientistic paradigm remains the accepted, default position of high stakes assessments despite all the criticisms. Most reading tests compare a set of test results against that of a test, control or normative group. Using this information the test agencies work out percentages of candidates who would be expected to pass and fail a test in order for the test to be the equivalent of the test one. Tests that are subsequently engineered to produce a similar distribution curve is counted as a reliable test. In each subsequent test, candidates are ranked and distributed along a similar curve. Only a set percentage of candidates can pass the test as determined by the norm group. As Wiliam points out ‘… sabotaging someone else’s chances improves your own. Such a test is truly competitive’ (Wiliam 2001, p8).

5.3 SORTING

But rank ordering can be achieved reliably without knowledge of what is being ranked. Two further problems with norm referencing are identified in the critical literature. The first is that the norm group may become unrepresentative. Dylan Wiliam makes this point when he points out that ‘… until recently, the performance of every single student who took the American Scholastic Aptitude Test (SAT) was compared with a group of college-bound young men from the east coast of the United States who took the test in 1941’ (Wiliam 2001 p9). The second is that because it stipulates that a certain percentage of candidates will be awarded each grade each year standards will shift (Marshall 1997, p103). As a response to these criticisms criteria referencing was introduced in the 1960s and 1970s. Marshall gives the driving test as an example of criteria referenced assessment (Marshall 1997, p102). The examiner tests each candidate against a set of criteria. There is no question of there being a predetermined quota of people who could pass the test. All candidates or none might pass or fail, because criteria are specifiable, transparent and open to all allowing ‘…a process of inquiry into the adequacy and appropriateness of interpretations and actions based on test scores” (Messick 1989 p31). Criteria specifications need be precise if items selected for assessment could be automatically selected (Popham 1980). But criteria are underpinned by norm-referenced assumptions which require judgments that go beyond the criteria themselves (Angoff 1974, Wiliam 2001). The paradox of criteria is that you already must understand them if you are to interpret them correctly. Their supposed use as guides to understanding is undermined by this idea. Criteria bring ‘criterion-referenced clarity’ (Popham, 1994a) to what is being assessed. The norm reference test doesn’t do this; it only tells you how you compare to the norm. Cohort referencing tells you how you rank relative to the cohort. Both reliably assess mysteries. Wiliam is succinct: ‘How do we know that the items we are choosing are discriminating between students on the basis that we think they are? The answer is we don’t. All we have done is created an artefact which homes in on something and then items that converge are deemed to be measuring the same thing’ (Wiliam 1994, p15) In a later paper he writes that ‘Hill and Parry (1994) have noted in the context of reading tests, it is very easy to place candidates in rank order, without having any clear idea of what they are being put in rank order of and it was this desire for greater clarity about the relationship between the assessment and what it represented that led, in the early 1960s, to the development of criterion-referenced assessments’ (Wiliam 1998, p4). But if criteria are the equivalent of Polyani’s maxims, as Wiliam argues they are (Wiliam 2001, p9) then criteria can’t deliver clarity. “Maxims cannot be understood, still less applied by anyone not already possessing a good practical knowledge of the art. They derive their interest from our appreciation of the art and cannot themselves either replace or establish that appreciation” (Polanyi, 1958 p50). In reality criteria were underpinned by norm-referenced assessments to offset this problem (Anghoff 1974). Criteria referencing also brings other problems. ‘…Interpretation of the criteria can become too rigid and inflexible’ (Marshall 1997, p103). Overdependence on a mark scheme led to inaccuracy of Key Stage 3 tests in 1995 (Marshall 1997, p103). Popham’s ‘Hyperspecification’ leads to easier predictability, leading to teaching to the test (Popham 1994; Smith 1991), Examiners slavishly following precise mark schemes missed rewarding innovative, unpredictable answers. The examiners reverted to being sorters rather than graders. Added to these problems is the related question as to whether any assessment has left anything vital out so that the assessment under-represents the construct being assessed and results in an inaccurate picture of what a candidate has achieved. This is a central issue of questions concerning assessment using the scientistic model where, like a Skinner Box, honing down test items to a controllable minimalism distorts the normal ecology of a construct domain and requires inferences based on untypical and abnormal evidence.

5.4 THE MACNAMARA FALLACY

The Macnamara fallacy expresses the problem that the focus on reliability brings to educational assessment. Charles Handy thinks precision leads to a disastrous expressibility deficit: ‘The Macnamara Fallacy: The first step is to measure whatever can be easily measured. This is OK as far as it goes. The second step is to disregard that which can't easily be measured or to give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can't be measured easily really isn't important. This is blindness. The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide’ (Handy, 1994 p219). Reliability mechanisms organize patterns of convergence enabling universal decisiveness by removing information about the content of the decisiveness. It is sorting without specifying purpose. The Macnamara fallacy reminds us of the dangers that follow from assuming that only items that can be readily assessed reliably are important. In education, the pros and cons of terminal assessments and non-terminal assessments often advance around this issue. The validity of construct representation is a key issue for validity of any assessment. If, for example, group work, leadership and redrafting skills are an essential part of a construct being assessed then a short terminal test which cannot assess these things will under-represent the construct (Wiliam 1997) and compromise the validity of the test. However, it may be a very reliable test. The assessment is a reliable assessment of something, but not of what it purported it to assess. This is Fodor’s point about the description of rat behaviour in a Skinner Box. It isn’t typical behaviour and so can’t be generalized to rat behaviour generally (Fodor 1975). Wiliam (1994) argues that hedging about the replacement of the psychometric paradigm of assessment is a mistake. He rightly thinks that intensionality is incontestably required for assessments. Wiliam’s understanding of the development of the paradigm shift is as a development from within the old paradigm. Wiliam describes how assessment engineers in education moved from understanding validation as a combination of its content, its criterion and its construct (Guion 1980), to it being just a matter of its construct (Loevinger 1957 p636). Wiliam thinks this new view of construct validity was interpreted initially from a positivist standpoint in order to retain scientific credentials (Cronbach & Meehl 1955 p290). Cronbach later thought his approach had been ‘… pretentious to dress up our immature science in the positivist language; and it was self-defeating to say ...that a construct not a part of a nomological network is not scientifically admissible’ (Cronbach, p159). Wiliam thinks Messick (1989) broadened the scope of assessment intensionality so that the crude positivist stance was no longer prevalent. He presented constructs not as facts but as models. Models were to be assessed as being fit for purpose rather than true or not, as being judged appropriate rather than correct. Validation became a matter of judging how well models were fit for purpose (Embretson 1983). Messick (1980) developed the idea that construct validation involved both the meaning, the consequence, the interpretation and its use. In introducing these into assessment validation Messick introduced values into educational assessment. Wiliam comments that ‘The widespread adoption of Messick’s framework as the ‘canonical’ method for classifying validity argument has been extremely important because it has forced ethical and value considerations and social consequences onto the agenda for validity researchers’ (Wiliam 1994, p11). The broadening out of construct validity from considering only a value-free evidential base to allowing value based judgments of consequences still maintains aspects of a scientistic model to the extent that it assumes the value-free content of the evidential base. Yet how is it possible to judge whether an assessment well-represents the construct of interest or its predictive power without bringing in values? He thinks that reliability is often a trade-off with validity. Yet to assume that reliability is the primary criterion of a good assessment is itself a value-laden assumption. “Constructs, measurements, discourses and practices are objects of history” (Cherryholmes, 1989 p115). Wiliam points out that the consequence of this is that maximizing reliability is not a given but something that is opted for on the basis of an evaluation. The purpose of the test will determine what makes a good test, not whether reliability has been maximized (Carver1974). The demand that reliability be maximized is a consequence, for Wiliam, of the scientistic imperative, which is a disguised value of the scientistic position. He writes, ‘The fact that construct validation has for so long been taken to be value-free testifies to the power of the discourse in which it has been conducted. Indeed, Gramsci’s notion of ‘hegemony’, as a situation in which any failure to embrace whole-heartedly the prevailing orthodoxy is regarded as irrational or even insane, seems to describe the situation rather well’ (Wiliam 1994, p16).

5.5 a new voice. a new paradigm

He thinks that it ‘… is time that educational assessment stopped trying to ‘be a science’, and found its own voice’ (Wiliam 1994, p18). follows from his developing an alternative to the educational assessment model he has so far described. ’Construct referencing’ is an approach that ensures universal decisiveness because it radically reconceptualises the idea of evaluation for assessment.( Sadler 1989, 1998, Black & Wiliam 1998,James et al 2006

5.6 wiliam’s new paradigm for assessment grading

Wiliam begins by drawing the distinction between perlocutionary and illocutionary speech acts. (Austin 1962) The perlocutionary speech act describes what has, is or will be. The illocutionary brings about a ‘social fact’ (Searle 1995). An illocutionary speech act is ‘performative’ (Butler, 1997). A jury’s verdict of guilty is an illocutionary speech act, saying it makes it so; what the judge says about the crime is a perlocutionary act. The philosophically interesting thing about Wiliam’s argument is that he s attempting to move educational assessments away from the scientistic paradigm of propositions, captured by the perlocutionary speech act, to a different paradigm, captured by the illocutionary speech act. He thinks assessment must move away from the psychometric paradigm). (Wiliam, 1994) He writes: ‘Developing on the work of Samuel Messick, in this paper it is argued that no such ‘rationalist’ project is tenable, but that validation of assessments must be directed by a framework of values that is external to the technology of assessment’ (Wiliam 1994). The argument that educational assessments are not perlocutionary, that they do not assert anything about the candidate’s ability, knowledge, or expertise, means that they cannot be mapped out onto a measurement scale of anything at all. Wiliam thinks that instead of measuring anything, an educational assessment is an illocutionary speech act, akin to the proclamation of the priest of the marriage or the guilty verdict of the jury. In education the purpose is the inauguration of individuals into a community of practice (Wiliam, 1997, p1). Wiliam thinks that this is not based on any objective assessment. ‘The assessment is not objective, in the sense that there are no objective criteria for a student to satisfy’ (Wiliam 2001, p10). When in this mood, Wiliam thinks there ‘… is therefore no such thing as an ‘objective’ test. Any item, and certainly any selection of items, entails subjectivity, involving assumptions about purpose and values that are absolutely inescapable. Value-free construct validation is quite impossible’ (Wiliam 1994, p16). Some people think consensual agreement is reached because the assessors have a threshold standard in mind upon which they all converged (Christie & Forest 1981). This is a version of limen-referencing. This links up with Sadler’s idea that assessors have a notion of standards in mind (Sadler 1987). Assessors come to agree on a judgment either because they have clear notions of grade thresholds in mind, or they have clear paradigms cases of certain grades in mind even though they have no clear view of thresholds. The construct referencing idea avoids these ideas. It has good reasons for doing so. Thresholds are difficult to know. Paradigm cases are also problematic. The idea of a threshold and a typical case suggests that there is a single scale of factors being assessed whereas what may typically be involved is something more like a syndrome; where many factors are collected together in different ways. A construct is a shared construct of quality that is then applied by assessors to whatever is being assessed. The approach doesn’t rely on operant environmental contexts. Unlike norm and criteria referenced assessments where hyperspecification requires the separatiion of learning and testing from their natural settings, just as a Skinner Box separates the rat from its natural environment, the construct referencing idea dispenses with such artificial constraints and can apply to natural situations. Proponents of construct referencing think it a subjective or intersubjective process . Some, like Carver, think that maximizing reliability is an option not a requirement of a grading systemthinking that the goodness of a good test is decided by its purpose(Carver 1975). But grading reliability is a requirement because it is a requirement of grading. The dominant view of assessment takes the view that when an assessment is made, evidence is presented and the assessor infers the ability, knowledge or the expertise of the candidate from the evidence (Wiley & Haertel, 1996). ‘One does not validate a test, but only a principle for making inferences’ (Cronbach & Meehl 1955, p297). Assessment results are perlocutionary on this view, they are statements about the candidates. (The statements could be about predictions about what the candidate might be able to do in the future or has done in the past but however understood they are to be understood as being assertions about something (Anastasi 1982, p145). The scope of what assessments are about has required broadening in order to address the issue of under-representation of constructs being assessed. Before construct referenced assessment what Wiliam calls the ‘reverse engineering’ of assessment engineers to eliminate the unreliability had failed because only by taking out essential features of the construct could reliability be assured. Wiliam thinks this alternative approach achieves better reliability as well as validity.

5.7 consensus as social fact

Construct referencing approaches reliability in terms purely of whether there is a consensus reached. It isn’t interested in how it is reached. The reasons for each individual agreeing are likely to be different. The roads to agreement are likely to change from person to person. However, there is evidence that the reliability of this method exceeds that of the more traditional scientistic methods of securing reliability. Subjective reliability doesn’t answer the question of what is being assessed. What it does is rather create a social fact that has important consequences, just like when a judge declares a verdict or a priest a marriage. It decides what is to count as an answer, rather then attempt to track truths. An example of construct referencing is the award of an old PhD (Wiliam 1994). In those days the assessment looked for ‘contribution to original knowledge.’ Candidates tended to offer a thesis and be involved in an oral examination. A person or a panel of people would then decide whether to award the PhD or not. The award is negligibly perlocutionary because, argues Wiliam, to ask what the award tells you about the person receiving the award is that it tells you almost nothing because the variety of PhDs is so vast. To know someone has a PhD doesn’t give you many facts, beyond the establishment of a certain social fact. That fact is that the individual has entered ‘into a community of practice’ (Wiliam 1997). The idea of a ‘community of practice’ is one introduced by Jean Lave to describe a speech community that, to a greater or lesser extent, ‘does things the same way’ (Lave & Wenger 1991) It is a crucial concept for Wiliams in his attempt to move educational assessment culture away from scientistic reliability models, which cannot deal with full representation of constructs without losing reliability (Wiliam 1994a). The authenticity of work produced by candidates outside of the Skinner Box environment of traditional external tests often gets mixed in with the issue of whether a test is legitimate. The use of course work portfolios has been understood as being an alternative to external testing because of the dominance of the scientistic model. This is a contingent feature of what a coursework portfoilio can secure. Validity requires proper construct representation and this can be secured in conditions that ensure that students are not cheating.

5.8 intersubjectivity

Wiliam considers these judgments to be neither subjective nor objective but what he calls ‘intersubjective’ (Wiliam 1997). The creation and maintenance of a social fact depends on the legitimacy of the social group whose intersubjective consensus sustains the fact. Where trust diminishes, as when a legal system becomes tainted by dictatorship, for example, then trust in the law may well diminish to the point where it can no longer create its social facts. Wiliam argues that so long as there is trust in the education system then the social facts of assessments in high stakes tests will be maintained. In the UK the teachers of the English Language NEAB 100% coursework GCSE exams approached assessment using construct referencing in the late 1980’s. This exam system combined internal and external assessment. The candidates’ own teachers marked the work but their judgments were rigorously checked by at least four other assessors: colleagues in the same English department/exam centre and a range of expert assessors paid by the exam board. The procedure ran as follows. A teacher would assess the folders of all her candidates. She would put them in a rank order and grade them on a eight point scale, from A to G (any folder failing to achieve an award of a G would be assigned a U grade, signifying ‘Unclassified’). The folders would then be swapped with a colleague who would check the rank ordering and the grades. Where there were disagreements the two teachers would discuss their reasons and try and resolve differences. If a folder was deemed to be close to the border of a grade boundary the rest of the department would scrutinize the folder and help make a decision. The whole school entry would then be moderated by the department. Teachers would check borderline cases to ensure that the borderlines were in the right place and that folders were in the right rank order. Results would be sent off the exam board. A percentage of the folders would then be sent off to an exam appointed review panel of teachers, chosen for their expertise and accuracy in trial marking. Another small sample would be sent to another exam board appointed assessor, the ‘inter-school assessor’, who would also spot check the assessments of the school. The findings of this assessment and the folders were then also sent on to the review panel. The review panel members worked in pairs on school centers. The panel was able to change a school’s grades if they felt that of a particular grade over 50% of the grades were incorrect. All folders were then returned to schools with detailed comments from the review panel about the quality of the work and details of any adjustments made. This process was supported by a trial marking process which enabled centers to test out their judgments with local consortia of schools which would feed in to a national network, organized by the exam board. This process enabled teachers nationally to become inducted into ‘a shared meaning amongst a community of interpreters, a model consistent with the best literature on hermeneutics’ (Marshall 1997, p106). The shared meaning is what we have been calling the ‘construct’ of value. Wiliam, writing about this, focused on the aspect of the process which enabled a honing of their understanding of this construct, in this case, the construct of English. The aspect that he considered crucial was the way teachers discussed borderline cases. Wiliam thinks that to look at difficult cases clarifies assumptions of interpretation that might well be hidden in easy cases, in the way that examining the duck-billed platypus, a furry wingless beast with a beak that lays eggs outside its body is a good animal to study if trying to sort mammals and birds.

5.9 moss’s hermeneutical approach generalised

This assessment focused on the shared understanding of the assessors who in turn made their expertise a core value. Moss call this an hermeneutical approach: ‘A hermeneutic approach to assessment would involve holistic, integrative interpretations of collected performances that seek to understand the whole in light of its parts, that privilege readers who are most knowledgeable about the context in which the assessment occurs, and that ground those interpretations not only in the textual and contextual evidence available, but also in a rational debate among the community of interpreters’ (Moss, 1994 p7). Without having the shared expertise one could not be part of the community. Induction of teachers into these shared meanings, through the process of trial marking and moderation of work meant that the approach worked like a sort of apprenticeship scheme. But the assessment system was also about inducting pupils into this shared understanding. There was an assumption of a seamless web of connections that would induct pupils gradually into the shared understandings of the community understood crudely here as ‘the study English practitioners.’ Wiliam makes this connection between pupils being like apprentices when he writes: ‘An apprentice carpenter, nearing the end of her apprenticeship, will be asked to demonstrate her capabilities in making (say) a chair or a chest, and a student nearing the end of a particular phase of their mathematical education could be asked to assemble a portfolio of their work’ (Wiliam 1994). That what was being assessed was a broad portfolio of coursework which included pieces that had to be done under supervision of a teacher meant that the construct represented by the portfolio was broad enough to be a valid representation of the construct being assessed. The merging of the assessment with the learning environment rather than radically disconnecting it from such an environment is part of how this form of assessment no longer conforms to the scientistic model, yet there seems no obvious merit in retaining the latter. If the scientistic objectivity such a disconnection accomplishes diminishes validity and reliability then only if scientistic objectivity is seen as a good in itself or if it can be shown to deliver other, overriding benefits, can such a form of assessment be justified. The subjective approach to assessment is more reliable than assessments of traditional, scientistic tests (Wiliam 2001, p10). It avoids the underrepresentation of the construct in the assessment and so inferences from the test are more valid.. In an interview Cresswell is quoted as saying that he would ‘…much prefer to see a return to an approach where coursework is embedded in students’ learning, with rigorous moderation rather than the withdrawal of coursework. Proper exam coursework provides choice for students about their approach to learning, helping them to enjoy it and motivating them to achieve success…lets us assess important things such as practical skills, speaking and listening and the ability to do research, gather information from a range of sources and construct a coherent argument with it (Cresswell 2006). Cresswell’s approach to vagueness fits with the Wiliam model of ‘construct referencing’. The emphasis on ‘rigorous moderation’ in particular picks up the process of assessment that inducts learners into a community of shared meanings. ‘In order to safeguard standards, teachers were trained to use the appropriate standards for marking by the use of ‘agreement trials’. Typically, a teacher is given a piece of work to assess and when she has made an assessment, feedback is given by an ‘expert’ as to whether the assessment agrees with the expert assessment. The process of marking different pieces of work continues until the teacher demonstrates that she has converged on the correct marking standard, at which point she is ‘accredited’ as an assessor for some fixed period of time’ (Wiliam 2001, p10). This process is noteworthy in that it points to one aspect of this type of assessment that may well be decisive in its not being universally adopted. It is an expensive system, involving a lot of teacher time devoted to developing their skills of discrimination. The scientistic method is by comparison relatively cheap.

5.10 trust issues for construct referencing

Cresswell thinks the issue of authenticating a candidate’s work as their own has caused trust in non-external assessment to fall. However in the AQA review of coursework of 2006 plagiarism was an issue with higher educationalists rather than secondary ones, and concerns over the usefulness of coursework were expressed mainly by teachers of RE who, in the same report seemed to be the teachers using coursework to assess ‘fluency of language’ rather than key issues of their RE construct (AQA 2006). It is important to disconnect the notion of construct referencing from coursework portfolios. In the NEAB coursework model, some of the work had to be done in the classroom and teachers were able to judge whether the whole of the portfolio was authentically a candidate’s or not by using this work as a control in some sense. Wiliam actually thinks that consigning coursework to outside the classroom is another sign of the dominance of the scientistic model. Removal of this flawed objectivity would enable coursework to be integrated into classroom practice. In such a case, ‘ Coursework would be coursework – the vehicle for learning rather than an addition to the load’ (Wiliam 2001, p11). Construct referencing addresses the problems of under-represented constructs being assessed and the problem of de-contextualised interpretations inherited from the scientistic paradigm. It mobilizes a learning theory that can be conceived as being derived from insights in constructivist, social constructivist and constructionist theories. The approach has been influenced by Sadler’s contention that written standards have necessarily fuzzy borders. (Sadler 1989) Sadler contends that sharp numeric cut-offs between grade boundaries are therefore difficult to capture in written criteria. This has been understood in the literature as merely resisting the possibility of capturing standards in words. A typical example of this reading of Sadler’s contention is in Wyatt-Smith where she writes: ‘Sadler’s (1989) writing on the formulation and promulgation of standards, specifically that standards written as verbal descriptors are necessarily fuzzy (as distinct from sharp, such as numeric cut-offs). As such, they have boundaries or demarcation points that defy efforts to capture them precisely in words. For this reason, standards acquire meaning, or have meaning ascribed to them through use over time, and as understandings develop within communities of users.’ (Wyatt-Smith 2010, p63)

5.11 the function of fuzziness

The fuzziness is given a function. It resists precise verbal definition but nevertheless there is no absolute barrier to knowing where the borderline is. A community of users can come to know where the borderline is. This is presumably because they make the decision as to how the term is to be understood based on agreed convention and as such they are in charge of where the borderline is. Wherever they say it is, there it is. It is a stipulationary model of meaning. The community of users collectively stipulates away the fuzziness. In this way the literature connects Sadler’s fuzziness to important issues within the assessment literature. It links to the issue of standards. It links to social theories about learning (Shepard 2000, Wenger 1998). Such theorizing about learning links learning with different aspects of social processes: community, as in ‘a community of learners’, a community of interpreters’ where learning is connected with belonging; identity, where learning is linked with the idea of becoming part of a social group, or a type, or a model; meaning, where learning is linked to different types of social experience that becomes meaningful to the learner and is something linked with issues of motivation for a learner and setting; and finally learning as practice, where learning is connected with theories of doing and activity (see Wenger 1998, p5). Fuzziness is also connected by the social stipulation reading to a particular socio-cultural framing of assessment issues. It is connected with a view of language that takes language as inherently social and cultural, drawing on theorists such as Voloshinov and Kress so that the fuzziness of standards and the subsequent understanding of them involves language (Voloshinov 1986; Kress 1989). Practices of assessment from this perspective are construed as practices involved in understanding language; moderation is about textual practices stabilizing interpretation through talk and interaction (Wyatt-Smith 2010, p64). Again, the appeal of social constructivist theories can be detected in this. I take them to be species of conventionalism. The literature critical of assumptions about high stakes assessment tend to focus on how procedures set out by assessment boards tend to interfere with the interpretive processes assumed by this sociocultural framework. Inter-subjectivity is a term used to explain that evaluative judgments are subjective but valid and reliable. Consistency can be learned, however, in which case inter-subjectivity may be descriptive of social habit. Harlen (2005, p213) writes about the way ‘…teachers share interpretation of criteria and standards…’ to develop common understandings. Wyatt-Smith cites a teacher who writes about the process in these terms: ‘When we were stuck whether to give them like a D or an E, or a C or a D, um, someone in the group was always able to pluck out um, the pertinent point that would get us across the line one way or another and everyone else would just, would just go ‘yep, that’s right, that’s it’ and so that was really good and I don’t think that you can come to that by yourself’ (Wyatt-Smith 2010, p66). In a later chapter I examine how Raz mounts a sophisticated defence of these processes. Sadler’s identification that qualitative standards for assessments are necessarily vague or fuzzy (Sadler 1987, 1989) has led to an extensive literature about how explicit specification of standards can be achieved. Much work on the need for descriptive statements, criteria, examplars, and their interactions to produce the required sharp specifications guiding judgment has arisen from this source and from the sociocultural framework. What the literature assumes is that there are a number of ways that social practices can overcome the problem of fuzziness of standards. Sadler notes that the: ‘...use of natural-language descriptions together with exemplars is unlikely to provide a complete substitute for, or render superfluous, the tacit knowledge of human appraisers, simply because external formulations cannot be exhaustive and cover every conceivable case [nor would we want them to]’ (Sadler 1987, p201). There are references to extra textual considerations operating to decide where precise borderlines are. Harlen discusses bias in teacher judgments when making assessments and thinks that it is caused by teachers bringing in irrelevant considerations (Harlen 2005, p213). However there is frequently mobilized arguments and insights from the constructivist/social constructivist/constructionist theories to suggest that bias is not an inherent necessary flaw. Assessment literature about judgment discusses the role of additional resources that are used to help make judgments. A key issue is how these additional resources contribute to the function of standards in making judgments (Wyatt-Smith and Castleton 2005; Cumming et al. 2006; Cooksey et al. 2007). This literature discusses various types of extra resources such as textual support materials provided by assessment boards to supplement assessment processes such as guides, annotated samples, as well as other resources such as ‘unstated standards’, teacher curriculum knowledge and prior experience’ and teacher knowledge of and attitudes to student characteristics. There is literature that is about the best ways of establishing the reliability of teacher assessments, such as ‘…the extent to which teachers share interpretation of criteria and standards’ (e.g. Harlen 2005, p213). Sadler thinks that fuzziness of descriptive statements about the boundaries of standards can be overcome by adopting a form of social constructivism. I have argued that this response is a species of conventionalism, that standards are conventions that are up to us (Riscoria 2010, p1). But understanding Sorenson’s solution to the sorites shows that conventionalism is false. Even if considered legitimate, the hermeneutical, constructivist approach may strike some as being conservative, unable to provide adequate resources for innovation. But grading verdicts understood as illocutionary speech acts creating Searlean ‘social facts’ are only as conservative as the conventions are. Ceremonial illocutionary speech acts of weddings have innovated to be applied to gay as well as straight weddings suggesting there is no absolute ban on innovation. (Austin 1955, Searle 1995, Wiliam 1997).

5.12 intersubjectivity and its sources

A further suspicion is that somehow subjectivism is to blame, that there is something inherently unsafe about making subjective rather than objective judgments for high stakes. If objectivity is understood as ‘scientific objectivity’ the new paradigm’s subjectivity is a further contrast to the scientistic assessment paradigm. Its subjectivity is also contrasted with individual subjectivity by drawing on notions of ‘intersubjectivity’ from theories of social constructivism, constructivism and constructionism (e.g. Rogoff, 1990; Vygotsky, 1987) where social factors shape and evolve the construction of knowledge (Gredler, 1997; Prawat & Floden, 1994, Lave & Wenger, 1991). Intersubjectivity is a species of subjectivity. It refers to the sharing of subjective states (Scheff 2006). It has roots in Husserl’s philosophical analysis of phenomenology and has been important in psychological theories based in phenomenology (e.g. Stolorow 1987, 1997). But even when characterized as subjective, grading and assessment is not a purely descriptive exercise. It is normative. It is normative because it is partly explanatory. A belief is presented as the best explanation of what an awarder of a grade thinks. The awarder is in the same position as onlookers. To be rational is to ascribe the best beliefs to oneself that explain the decision being made.

5.13 fuzziness modelled as relative vagueness

Proponents of the subjective approach to grading don’t discuss absolute vagueness but treat sorites-fuzziness as relative vagueness. Defending this type of position, Williamson says that the vagueness is about unclarity about the actual grade being applied in any case. The unclarity is caused because of the chaotic usage of the meaning. Each time it is used there is uncertainty about exactly what is meant by the grade in each case. No human could keep track of the usage. But absolute borderline cases are uniform and therefore relativising them to human limits on discriminability is parochial. There are also counter examples to the instability of meaning supposedly brought about by use. Natural kind terms stabilize irrespective of use. And nor is it clear how algorithms stabilize meanings. Relative vagueness seems closely aligned with supervaluationist approaches. So Williamson writes that: ‘As a first approximation, for the supervaluationist, definiteness is truth under all sharpenings of the language consistent with what speakers have already fixed about its semantics (‘admissable sharpenings’); for the epistemicist, definiteness is truth under all sharp interpretations of the language indiscriminable from the right one. In both cases, we hold everything precise constant as we vary the interpretation’ (Williamson 1999, p128). Wiliam thinks that Sadler’s fuzzy borders are examples of merely relative vagueness and that interpretation is the resource required to deal with it. He presents a sorites involving a driving test requiring ‘…among other things, that the driver “Can cause the car to face in the opposite direction by means of the forward and reverse gears”. This is commonly referred to as the ‘three-point-turn’, but it is also likely that a five point-turn would be acceptable. Even a seven-point turn might well be regarded as acceptable, but only if the road in which the turn was attempted were quite narrow. A forty-three point turn, while clearly satisfying the literal requirements of the criterion, would almost certainly not be regarded as acceptable. The criterion is there to distinguish between acceptable and unacceptable levels of performance, and we therefore have to use norms, however implicitly, to determine appropriate interpretations’ (Wiliam 1997, p2). Yet he avoids answering where between the seven point turn and the forty-three lies the superlative ‘last turn acceptable’. If he doesn’t know his incomplete knowledge is not due to measurement error. When the stakes are low we don’t care. But when stakes are high indifference is more difficult.

5.14 the incompleteness of admissable sharpenings

Supervaluationists can’t know whether all admissible sharpenings have been considered. The vagueness of ‘admissable sharpening’ is a barrier to completeness of interpretation. Vagueness is not a purely semantic barrier. Sorenson insists that the completeness concern is gritty and not over-intellectualised. It is common-sense when the stakes are high. (Sorenson 2001, p 52-53) For example, faced with a man with a gun, for example, you want to check and re-check that he really has used all his bullets before you decide to fight him. The barrier to assigning a grade in a borderline case is not exhaustively explained because of the human unknowability of which precise concept of grade is being used because ‘any representational source of inquiry resistance will do’ (Sorenson 2001, p56). Knowledge has a complex structure. It is because of this structure that vagueness is enquiry resistant and therefore the semantic approaches to borderline cases (truth value gaps, gluts, intermediate truth value and so on) fail. Given that this alternative paradigm to assessment is largely hermeneutical, it has been largely content to downplay, even ignore any normativity of knowledge claims that are bound up with assigning grades. It has tended to characterize its approach in terms of intersubjective conventionalist invention rather than discovery to overcome the enquiry resistance of borderline cases and fuzziness.

class=Section7>

CHAPTER 6: RAZ, DAVIS AND CONVENTIONALISM

The Joker: ‘I try to show the schemers how pathetic their attempts to control things really are’(The Dark Knight 2008)

6.1 INTRODUCTION

This chapter examines Raz’s theory of ‘parochial concepts’ as a way of reconceptualising objectivity in the domain of legal jurispridence (1999). Raz thinks that subjectivism is criticised because a key element, perspectivism, is linked to relativism. He thinks he can remove relativism Davis thinks the new paradigm for assessment is not yet established because of assumptions imported from the earlier one. Both are good examples of the geography of the new paradigm and help exemplify the issues.

6.2 CONVENTIONALISM

I think the complex of ideas governing the new assessment paradigm mishandle the complexity of knowledge and belief. Applying vagueness to assessment systems shows that conventionalism that assumes that meaning is use, is open texture, is a form of ambiguity between different languages, that language contains no hidden truths, that meanings are known either a priori or a posteriori and that we can shift asking ‘what is the answer?’ to ‘what should count as an answer?’ is a mistake. I argue that the species of conventionalism that governs assessment practices is a thesis that includes all of the above assumptions. Approaches involving communities of interpreters, speech communities, social context, perspectivism and anti-essentialism combine so this species of conventionalism can be taken to be a sub-species of meaning holism which denies that language can have meaning in units smaller than a sentence. This chapter supports the claim that this is the geography of the new assessment paradigm by examining how the elements are prototypically used. Andrew Davis and Joseph Raz present sophisticated versions of such use. Davis works in the field of educational assessment and Raz in jurisprudence and both are concerned with clarifying the limits high stakes adjudication. This chapter shows how plausible this approach is.

6.3 andrew davis’s complaints

Davis agrees with those who think that psychometric paradigms of assessment are still too prevalent. He orchestrates conceptual reasons against its assumptions in order to support the new paradigm. He thinks that most assessment systems still assume that they are testing for decontextualised, essentialist objects. This is assumed in the psychometric paradigm discussed earlier. Galton (1869) thought of intelligence as ‘a natural ability’ Binet and Simon (1916) talked about ‘’a fundamental faculty’. Terman (1921) thought that intelligence was the ability to think in abstract terms’ and from there a notion of IQ was developed which could be uncovered, classified and then tested. Jensen (1979) thought that there was a link between IQ and speed of neural processing, and proposed ways in which this speed could be measured. Recent fashion has given Howard Gardner’s idea of different types of intelligence some prominence in discussions over intelligence. Gardner (1993) divides up intelligence into several autonomous abilities, which are then deemed to explain why David Beckham is a genius at football but no Euler at maths. Sternberg (1999) discusses ‘information processing skills’. Alongside this is the urge to classify humans using statistical analysis and the invention of norm referencing (Hacking 1990). General ideas about how statistical patterns reified taxonomies of human categorization (Davis and Cigman 2009).

6.4 kind of natural kinds

The genealogy of the view about educational assessments common sense belief in essentialist constructs run thus from eighteenth century Lockean natural kind philosophy of essences to nineteenth century and earl twentieth century essentialist thinking about statistical reifications of norm referencing to developed ‘scientistic’ thinking around quantification and measurement of traits, most obvious in assumptions about IQ but also in discussions about traits and abilities which assume essentialist thinking about such constructs. So, Davis is suspicious when he reads of talk of students developing their self-assessment abilities, as he finds in Black (1999) in the context of maths education. When he reads Hiebert, Carpenter et al (1999) talking about students learning ‘how to construct strategies and how to adjust strategies to solve new kinds of problems’ Davis asks, ‘Precisely what do our educators have in mind when they make references of this kind?’ (Davis 2009).

6.5 INTERPRETATIVE COMMUNITIES

The phrase ‘situated cognition’ is a particular perspective taking up the general position Davis is arguing for (Lave and Wenger 1991) capturing the thought that meaning holism embodies. Davis links this idea in particular to Donald Davidson’s idea of ‘anomolous monism’ (Davidson 1970, Davis 2009). The idea is that constructs like ‘intelligence’ don’t refer to anything independent of the social context and the interpretive community using the term. The picture of intelligence being something that could exist independent of such thing is incoherent. Putnam’s twin earth argument also impresses Davis (1975). Putnam proposed a thought experiment designed to question what identity conditions are required to apply the concept of similarity. Davis reassigns the issue to his own educationalist agenda. He asks us to imagine a child reading aloud a book in English and then another child on a completely different planet reading aloud the same text. He argues that only by importing a whole set of social contextual aspects from our setting into the distant planet setting could the child there be said to be reading aloud the book. Davis proposes that the conclusion is conceptual not empirical: ‘Without the existence of the practices associated with written and spoken English it is not possible (logically) to specify fully the content of those beliefs required for her to perform the task of reading aloud an English text in the full knowledge of what she is doing. Without the existence of the practices in question, the movements of her mouth, tongue, larynx and lungs, her eye movements, the inclination of her head towards the page, the consequent sounds and so on will not actually be reading an English text. The description of her performance as an act of reading aloud an English text logically requires the social environment outlined above’ (Davis 2009). He thinks if a construct has a necessary identity condition with its social environment then to fail to reproduce that environment fails to reproduce the same construct. Davis agrees with Margaret Archer who writes of social structures as being ‘‘relations into which people enter (which) pre-exist the individuals who enter into them, and whose activity reproduces or transforms them’ (Archer 1998, p359). (e.g. Porpora 1998).

6.6 anti-essentialism

He thinks his anti-essentialism is supported by the work of Boaler (1997) and Cooper and Dunne (2000). They think about situated cognition in mathematics where the issue of transferring maths skills from one environment to another is problematised in terms of the different contexts into which the maths ‘abilities’ are situated Cooper and Dunne place emphasise on Wittgensteinian ‘rules of the game’ involved in classroom maths whereby the maths abilities of pupils are identified with these perceived rules which often involve social class differences between pupils. Davis thinks that these elements generalize to all learning. Davis concludes that these high stakes exams are of necessity invalid, because they are disconnected from the social context that is their necessary identity condition. MEANING

6.7 holism and reflexivity

Holism is simply the view that it isn’t possible to construct a meaningful term so that it can be identified separately from the language in which it is situated. The idea of reflexivity allows Davis to show that educational constructs being assessed for high stakes having this aspect problematises further any idea that there are things that can be tested without reference to any ‘community of practice’ and so on. Davis cites Hacking on this: ‘Interactive kinds involve ‘looping effects’: We think of these kinds of people as given, as deﬁnite classes deﬁned by deﬁnite properties... But... they are moving targets because our investigations interact with the targets themselves, and change them... That is the looping effect. Sometimes our sciences create kinds of people that in a certain sense did not exist before. That is making up people’ (Hacking 2006, p2). Rather than being ‘indifferent kinds’ analogous to the objects of scientific investigation (indifferent in the sense of giving no role to a community of interpretive practice) they are ‘interactive kinds’.

6.8 INTERACTIVE KINDS

Davis’s six ‘complaints’ (Davis 2008) against current high stakes testing are all derived from this argument and the vision of interactive kinds. The ‘fine grained relationships between… capabilities on the one hand, and test scores and test preparation on the other…requires logically flawed conceptions of transfer and of psychological traits’ (Davis 2008, p12). A decontextualised test, purporting to hone in on the essential commonalities of an individuals psychological ability to do or know something, is in fact destroying the conditions in which a genuine skill and knowledge can be identified and presented.

6.9 the impossibility of precise enough similarity

A second strand of his argument is to argue that teachers, by trying to teach to the test inevitably distort the genuine learning objectives that a learning construct was aiming for. Good at maths refers to a construct that is far too entangled in a community of practice to understood in the required fine grained way high stakes tests require. Genuine knowledge and those skills are all situated in communities of practice. They aren’t pure psychological capacities or traits. Therefore Davis can show that high stakes tests fail to deliver the goals they attempt to deliver, which is a test of genuine knowledge and skills. This links to his third ‘complaint’ which is about the connection between teaching to the test and assessment criteria. He states the problem simply: ‘Elements of achievement which cannot be assessed reliably... will not be assessed at all. In a high stakes system teachers know exactly what is, and what is not, assessed and teach accordingly, thus limiting the curriculum and its delivery’ (Davis 2008 p13).

6.10 the assumption of necessary resistance to essentialist reification

Tests designed to discover whether learners are improving in a certain area, say in maths, so long as they are taken away from the social practice context which is a necessary part of the construct, will not be able to give the necessary information. League tables and other mechanisms of accountability that use test scores derived in such a way are therefore deeply flawed.

6.11 unpredictability denied

For Davis the idea of a prescribed methodology presupposes and different conceptualisation of learning and teaching, one which roughly envisages a much less entwined construction which is able to separate the entwined elements of Davis’s model and is closer to the skinner-box models discussed in earlier chapters. Without this fine grained universal decisiveness is impossible.

6.12 wittgensteinian meaning holism

He thinks the arguments against the psychometric paradigm are connected to a species of meaning holism that explicitly picks up Wittgensteins adage that to know the meaning of anything is to know the whole language (‘Only in the context of a sentence does a word have a meaning’ Frege ‘on Sense and reference: ‘…to understand a sentence is to understand a language’ Wittgenstein P. I, para 199: ‘Only in the context of the language does a sentence (and therefore a word) have meaning‘ Truth and Meaning, p22). The first of the key arguments is Frege’s argument that the meaning of a word can’t be identical with its reference. ‘Morning star’ and ‘evening star’ have different meanings and yet refer to the same object. Davis agrees with Frege and the later Wittgenstein who thought that what did make ‘dog’ mean dog (in English) was the role the word played in the sentential context. And the sentence gained its meaning from its role in the language as a whole and not its reference. The word ‘dog’ now became meaningful as a result of inferences made possible through other elements in the language, such as ‘animal’. Quine’s question was whether there were necessary definitional inferences that could be identified that would explain why we knew that ‘dog’ meant dog in English and not anything else. Quines answer was there were no principled ways of making a distinction between inferences that were obligatory and those that weren’t. In principle there was no word that had to be defined in a particular way. The inference of ‘Bachelor’ necessarily meaning ‘unmarried male’ was no longer justified but was rather a matter of constant revision as language itself was continually being revised. All inferences changed the whole of the language system and all its meanings. Instances of unmarried males who were not bachelors (e.g. the Pope) meant that there couldn’t be analytic meanings. Davidson’s model of this was to see language as a network of inferences, a semantic conceptual space where everything in the language was connected up. Davis thinks that assessment that relies on notions of tests being the same, of answers in tests being clear evidence of identifiable thoughts, of comparability and similarity relations between test scores, performance, marker judgments and so on are all undermined by the dificulty of constraining similarity in the context of this meaning holism.

6.13 raz and value objectivity

Raz thinks that this conventionalist, meaning holist, anti-essentialist position of Davis and is objective because it can impartial (Raz 1999). Raz thinks universal decisiveness problematic, although like Cresswell, he believes the indeterminacy of vagueness only a relative indeterminacy. Raz thinks that objectivity is possible only in domains that allow it. He calls this ‘domain-objectivity’. (Raz 1999, p120) According to Raz a domain that enables something to be objective is a domain that has propositions which belong to it that can be known. If knowledge implies truth, then the domain of objectivity requires that its propositions be governed by truth valuations. He thinks that there are propositions that we can’t know. So domain-objectivity can’t be defined in terms of our ability to know the propositions in it. Epistemic objectivity and domain objectivity are linked. To have an objective belief about a pupil’s performance in a test, for instance, is to form beliefs about the pupil that are free from bias and are impartial. This is possible because test performances are in a domain that enables beliefs that can be expressed in propositional form that can be assigned truth values.

6.14 objectivity as absent partiality

Raz thinks that epistemic objectivity is a matter of repressing any bias or partiality in the forming of a judgment. Epistemic responsibility requires epistemic objectivity if the aim of belief is knowledge. This requirement is inappropriate if the domain is not one that allows for objectivity of this kind. Raz thinks that working in the objective domain requires that subjects work in an epistemically responsible way. They should not be partial and biased in their judgments. What they believe is subject to the classical reflex (e.g. subject to bivalence, the law of non-contradiction etc) and so they must be correct or incorrect in their beliefs. Being fallible humans, an assessor can be objective even when they are wrong so long as they are being epistemically responsible in an objective realm. They may be wrong because they were pressed for time, or if they miscalculated, or misunderstood what was being assessed.

6.15 Raz’s Domain Objectiity

Raz has seven conditions of domain objectivity (Raz 1999, p123). Domain objectivity has to have the possibility of a knowledge condition, it has to have the possibility of an error condition (which includes the possibility of being correct/incorrect, of beliefs being contradictory and of being able to change one’s mind), it has to have the possibility of epistemic objectivity and it has to have a relevance condition. Further to these, domain objectivity has an independence condition. This is the condition that the world exists in some way independent of minds. Bernard Williams expresses it as ‘… knowledge is … of what there is anyway’ (Williams 1978 p64). But Raz thinks this doesn’t commit him to Wiggins idea that a statement’s truth value is independent of anyone’s ability to appraise it (Wiggins 1980). Realism of some kind is therefore a condition, and this links with the sixth condition of the objectivity domain, which is the condition of a single world, a single framework in which the objective domain is presupposed to operate. The seventh condition is the possibility of irrationality. The list is not closed. It is a list composed to argue the pressing case. As new challenges arise then the list is amended to attend to the challenges. Relevance is not always about having reasons. Relevance is complex. Relevance is always ‘relevance to whom and to what purpose?’ Deciding what is relevant to testing Shakespeare’s ‘Hamlet’, for example, is not reduced to a priori principles

6.16 objectivity, disagreement and falsehood

Teachers disputing the grade judgement of a pupil performance in a test can be objective in their judgements and still disagree. Objectivity is no guarantee of truth or of agreement. Arguments that think that only subjectivity explains inconsistent belief are undermined by this approach. People can disagree but they both can’t be right. Raz thinks objective beliefs are free from bias, partiality and so on have therefore this important feature, which is that they are claims of knowledge. This is opposed to claims that aren’t. Raz denies coherentism because he thinks coherent set of beliefs may not be objective ‘because of how things are’ (Raz 1999, p131).

6.17 parochial concepts and the objectivity of perspectivism

Raz thinks that perspectivism can be rescued from relativism. He thinks that most of our thinking uses ‘parochial concepts.’ These are concepts that require a certain perspective to be operational but which nevertheless can be objective. He defines them as being ‘…concepts which cannot be mastered by everyone, not even by everyone capable of knowledge. ‘Non-parochial’ concepts can be mastered by anyone capable of knowing anything at all’ (Raz 1999, p132). Interests, imagination and emotions have concepts that can’t be mastered by creatures that can know things but not these things. A concept that is only knowable by a knower with that interest is therefore a parochial concept, and is unknowable to a creature without such interests. Raz thinks most of our natural languages are made of parochial concepts. He thinks that only universally knowable concepts are non-parochial. If interest-related concepts are parochial concepts then so are evaluative and normative ones. These are key concepts in construct referencing and the very concepts that scientistic objectivity objected to because they were parochial. Some concepts are ‘thick concepts’ in that they are rich with perspectivism. The excellence of a novel or the power of a piece of music are thick concepts. Behaviourism/Associationism was an attempt to redescribe these concepts without perspectivism. Construct Referencing denies the possibility of such reformulation. Nagel is an example of someone who thinks that parochial concepts are not objective but agrees that they allow us to have knowledge we couldn’t have without them. He thinks that they are only able to deliver knowledge of our mental states. He thinks that ‘Although there is a connection between objectivity and reality – only the supposition that we and our appearances are part of a larger reality makes it reasonable to seek understanding by stepping back from the appearances in this way – still not all reality is better understood the more objectively it is viewed. Appearance and perspective are essential parts of what there is, and in some respects they are best understood from a less detached standpoint. Realism underlines the claims of objectivity and detachment, but it supports it only up to a point’ (Nagel 1986, p3). Raz disagrees because he thinks when we think a sunset is beautiful, for example, the beauty is objectively known because of the evaluative parochial concept being used .It could not have been known by non-parochial conceptual means. Construct referencing also supposes that Nagel is wrong because it needs to claim that we can use parochial concepts to know more than just the mental states of assessors. An assessment has to be more than part of an assessor’s autobiographical reflex. This mistake is what Raz calls the ‘fallacy of equivocation’ (Raz 1999, p135). Parochial concepts allow that objective knowledge of non-subjective mental states be known. Raz thinks that they are ineliminable because without them much of an objective reality would not be accessible. Parochial concepts are largely productive of other concepts. They are not tied to a single perspective because of this. Parochial concepts rely to some extent on an individual’s perspective but are not exhausted by this. So mastery of parochial maths concepts enables a person access to scientific concepts. Colour concepts enable many art concepts to be developed. Nagel thinks that the less a concept needs parochial concepts the more objective it is. The concept of thought is thought by Nagel to be less parochial than that of Christianity (Raz 1999, p136). An alien who can think can’t on the basis of merely being able to think grasp the concept of Christianity because Christianity involves parochial concepts such as love and salvation and redemption. These are concepts that need not be available to all who are able to know.

6.18 raz’s anti-convergence arguement

The scientistic vision opposed by construct referencing is the idea that we can describe the world using concepts that are not parochial concepts. Bernard Williams writes about this project in Peirceian terms: ‘The suggestion is that there are possible descriptions of the world using concepts that are not peculiarly relative to our experiences. Such a description would be that which would be arrived at, as CS Peirce put it, if scientific enquiry continued long enough; it is the content of that ‘final opinion’ …independent not indeed of thought in general, but of all that is arbitrary and individual in thought’ (Williams 1985 p244). Parochial concepts are merely products of a particular cultural milieu and have no means of conceptualising the world beyond its own framework. Raz thinks philosophers such as Putnam, (1992) and McDowell (1983) think that the view is inconsistent. Williams himself agreed with them later (Williams 1985, p140). ‘Convergence’ is an important idea in this argument. Raz thinks that convergence is a suspect condition. He thinks that there will be degrees of convergence but that disagreement and dispute will always remain where parochial concepts are used. Competence of use is thought to deliver convergence. But in a trial, two people may hear a testimony and one believe it to be true and the other believe it to be not true. Both may insist on their being rational and also that the other is rational in their belief, even though they believe the other belief to be false. In an educational assessment two teachers may disagree about the qualities they find in a pupil’s essay without ascribing madness to the other. Parochial concepts may strike some as being more suspect than non-parochial ones because they seem limited in scope. They are merely parochial. The accusation of parochialism is only philosophically serious if it shows that parochial concepts are incapable of delivering truths about the world that couldn’t be known without them. Grading may be parochial. Williams thinks that non-parochial concepts can be used to test the reality of parochial ones and that this justifies giving them objective status and parochial concepts non-objective status. But there are only some parochial concepts that can be tested like that. Freeing parochial concepts of local bias may be a function of the non-parochial, but it is limited to concepts that can be so tested. McDowell makes the point that this is hardly a good reason for making non-parochial concepts the primary concepts, or giving them a special status hierarchically above the parochial in terms of setting out the conditions for knowledge. The idea of convergence seems to run through much of the thinking linking objectivity to epistemic absolutes. It is also a prototypical part of universal decisiveness requirements in assessment systems. Two things seem to be motivating the thought about objectivity requiring convergence. One is that there must be a universality of knowability. In principle at least, genuine and therefore objective knowledge is of concepts in principle known to all in respect of the fact that they are able to know anything. Parochial concepts are by definition not able to satisfy that condition. They are not available to all knowers because they depend on special conditions obtaining. To see the green sea the knower must have eyes and so on. So ‘green’ is parochial, as is ‘sea’. The other thought is that everyone will agree about the concepts.

6.19 epistemic luck

Williams talks about luck as a key condition of a knower like ourselves. We are both subjected to moral and epistemic luck. As thinking subjects we all have our own personal histories: we are born in a particular place and time with particular and unique events, thoughts, feelings and circumstances shaping the possibility of access to certain concepts. It is because we are able to access certain concepts that we are able to think about certain things. If concepts are not universally present to all knowers at all times then inevitably we are not going to be able to know everything that it is possible for a knower to know. This is the predicament of ‘Neurath’s boat’. Neurath wrote about ships that have to repair themselves at sea, using the best materials they can grasp at the time rather than being afforded the luxury of returning to dry land to rebuild. Raz thinks it doesn’t follow from this condition that objective knowledge cannot be obtained from conditions of epistemic luck. Although the parochial is unavoidable if epistemic luck is unavoidable it doesn’t follow that objectivity is impossible. What it does suggest is that the truth and falsehood of parochial claims are dependent on social facts. This in turn suggests that shared agreements are required for objective facts to be established, relying on shared judgements. Evaluative reasons for things require mastery of shared judgements and understandings. It is this shared understanding and judgement that ‘…incline us to accept the legitimacy of their use’ (Raz 1999, p146). But if this is the case then what objectivity can there be to the claim that a thick evaluative judgement, such as ‘Tarantino is a great director’? If it is totally dependent on sharing judgements and understandings of a particular contingent social group then the evaluation itself is contingent. The reasons for making the judgement will be arbitrarily linked to membership of a particular group. The normativity of the truth claim of the statement is then undermined by being merely a description of some fact about a certain social group. This is a key objection to construct referencing. If it hopes to make justify its assessments in terms of its ability to make objective knowledge claims then it must be able to do more than describe the agreed judgments and values of the particular social group.Raz thinks parochial concepts are not descriptive. They invent and they access value.

6.20 inventing values

Raz thinks philosophers draw a distinction between conditions that sustain and create goods and values and conditions that access them. Raz has four examples of how social values are created and maintained and how they are made available. ‘Fashion’ is something that is socially constructed and is highly localised. It is an extreme case of parochial concept. ‘Fashion’ is highly localised. Producing concepts of fashion depends on what people do and their shared reactions and attitudes. Fashion is not stable and is highly contested. Another kind of socially created good is that of a game. Chess was invented in a certain place and time. There was a time when chess didn’t exist. There will be a time when it doesn’t exist anymore. It isn’t as parochial as fashion, however, because it can transfer itself from locality to locality. Chess in New York now is like chess in India in the fifteenth century. Chess is another parochial concept that is less linked to a specified context. In some respect chess once invented can exist forever. Even if eventually chess is forgotten it nevertheless makes sense to say that it is relearnable, unlike a fashion which is more rooted in its temporal and cultural setting. According to Raz, chess, unlike fashion, can be understood in the same way outside of its social group of origin to a degree greater than that of fashion, manners and other extreme parochialisms.

6.21 accessing values

However there are other goods which themselves aren’t socially created but which require parochial concepts for accessibility. Raz gives as an example of this a sunset. Sunsets can be supposed to have looked pretty much as they do now for many years before humans were around to look at the. And it was perhaps only at a certain time in history that people started seeing that they were beautiful. So here the case is of an aspect of reality that was made accessible via parochial concepts. Culture brought access to the beauty of sunsets that had always existed. This fleshes out Scruton and Wittgenstein’s ideas about ‘aspect seeing’ Davis finds helpful above.

6.22 universal values

There are some values that are universally accessible even if not accessed by the same parochial concepts. Thick evaluative concepts tend to be concepts that are interdependent and which have consequences if they aren’t accessible. A person cannot be responsible for something that she is incapable of knowing. Yet the inability to have access to that can have bad consequences even though understanding this is defeated because of the lack of understanding. Everybody living before the concept of ‘killing innocents is always wrong’ was developed could have understood this even before concepts such as ‘innocence’, ‘intention’, ‘killing’ and so forth were developed and combined (Raz 1999, p152). Raz thinks this argument resists an argument that would say that all thick evaluative concepts, such as the ‘subtle charm’ of a play, for example, are constituted by contingent social facts. The most that can be said of such an evaluation is that according to membership of the particular relevant social group which is able to access the concept, this is what the social group membership agree to think when faced with such a play. It is judged to be subtly charming because the group understands subtly charming as something they agree to think in such circumstances. The circularity of the argument is defeated by an analysis that argues that the evaluation is not constituted by any social facts. The social facts merely give access to the value of the play through the concepts available to the group. Access is what is importantly done by the social facts, not constitution. The play would still be subtly charming even if no one had access to the concepts, and furthermore, some may be recognised (by people with the concepts) to have made such judgements without knowing that they were in fact so doing (perhaps because they were expressing their judgement in terms that were religious, or derived from a different set of parochial concepts).

6.23 the seven constraints on construct referencing

Construct Referencing has been required to answer the challenges of objectivity. If construct referencing is to be considered a form of domain objectivity then it should meet the seven conditions of domain objectivity discussed above (Raz 1999, p123). Firstly it should meet the possibility of a knowledge condition. That it is a social fact rather than a natural kind fact doesn’t preclude knowledge. Skepticism about the knowability of social facts would require that claims to knowing economic facts were equally unknowable. Accessibility to the beauty of sunsets is also made accessible via concepts based on social facts, but the beauty is of the sunset, and this existed before humans, let alone ‘beauty of sunset appreciating’- humans. The assessing of the comic timing and spry insights into the tragic nature of a character’s situation in a pupil’s short story are knowable to assessors even if the concepts are parochial, socially created concepts. If they are knowable then they have to have the possibility of an error condition, which includes the possibility of being correct/incorrect, of beliefs being contradictory and of being able to change one’s mind. Inexperienced graders are often error prone, inconsistent and change their minds. Induction into the guild knowledge of the community of interpreters, is a process of exposure to critical judgment making and a deepening of one knowledge of the relevant concepts, a teacher learns more about the construct that these concepts give her access to. Experts disagree amongst themselves. An expert can be mistaken without being incompetent. No assessment is valid if biased or partial. Construct referenced assessments therefore can meet the requirement of the possibility of epistemic objectivity. Similarly, only relevant knowledge is allowed to be used in such an assessment if it is to be objective. If domain objectivity has an independence condition does construct referencing meet this requirement? Constructs are learnt values, existing independently of any individual grader. Access to some parts of reality is available through parochial concepts. Although parochial concepts are perspectival they nevertheless track reality, as in the case of fashion, of chess, of the beauty of a sunset and of enriched life. If they can’t then one has to believe that minds can alter sunsets or adopt a skepticism about the existence of beautiful sunsets. It’s possible that even without concepts of beauty there are other ways of knowing that sunsets are beautiful, in the same way that legal practices may be used before a concept of law understands them as legal concepts. (Raz 2009). Raz’s sixth condition of the objectivity domain is the condition of a single world. This requires that assessments using construct referencing presuppose a single world in which they operate. This can be construed as a requirement for a ready-made world which many philosophers think is not possible (Sellars, Rorty). It is also a familiar argument against relativist claims that are often associated with an ‘interpretivist’, ‘perspectivist’ philosophical position. If objective claims are largely based on conventions and these are relative to contingent social contexts then objective claims are ultimately contingent and relativist claims of a potentially limitless number of constituent worlds. The single world thesis seems undermined by this argument. But Raz doesn’t presuppose that there are not differences of perspective and so forth. All he requires is that they are different perspectives and so forth in the same presupposed world (Raz 1999, p125-6; Williams 1978, p68). Raz’s seventh condition is the possibility of irrationality. The possibility of absurdity is something that construct-referenced assessments guard against. For example, if transitivity principles are being applied to assessments then it is absurd in any construct referenced assessment to award A a higher grade than B and B a higher grade than C and yet think that C is better than A. The threat of this kind of absurdity typically comes about when the number of candidates is larger than can be remembered by a single examiner or when the time taken to make the assessments is of a duration that is greater than a memory can deal with. It is for the system to build in checks to remove as many of these absurdities as it is possible for the system to assess. However, the fact that there is a threat of absurdity ensures that construct referencing fulfils this last requirement of objectivity. Raz’s idea of parochial concepts places explicit constraints on the Intersubjectivity that Construct Referencing instantiates. It enables the hermeneutical assessment paradigm to be understood as being objective without being parasitic on the discourses of the exact sciences. Raz doesn’t think the required universal decisiveness of an assessment or legal system is possible even though objectivity is possible. Like Davis, his futilitarianism is based on his sophisticated model of parochial concepts and their role in accessing and constituting facts and values. Davis’s arguments are those we have examined in relation to educational assessments. Raz’s arguments are important in discussions of how legal decisiveness is achieved. The next chapter examines some important legal discussions in order to show how educational and legal requirements for universal decisiveness are comparable.

CHAPTER 7: legal vagueness

[Batman slams The Joker’s head on the table]The Joker: ‘Never start with the head, the victim gets all fuzzy’(The Dark Knight 2008)

7.1 INTRODUCTION

Educational assessment is not alone in overruling logical proof and substituting illusion for actuality. This chapter provides evidence that in law a similar model for legal vagueness is theorised by Timothy Endicott (Endicott 2000). The chapter shows how Endicott’s theory is very similar to Cresswell’s, adopting a model of similarity to prototypes to avoid the geometric metaphor of sharp borderlines. He thinks of his approach as Wittgensteinian and cites with approval Sainsbury, a philosopher Cresswell finds influential (Wittgenstein 1954, Sainsbury 1990). He refuses to accept the logical solution to the sorites. Incredulity at this epistemic exit motivates his refusal although it is Williamson’s relative vagueness that he targets. The chapter examines the role of interpretation in the theory (Marmor 1992, Schauer 1991, Dworkin 1997, Kelson 1991) and finds that Endicott’s model contains the same fundamental errors as Cresswell’s and those modelling construct referenced assessments. The chapter concludes that the meta-problem of vagueness, that of incredulity in the face of its solution, is more pervasive than just affecting educational high stakes judgments.

7.2 ENDICOTT’S LEGAL VAGUENESS

Decisiveness is required in law as well as education. Indeterminism is a threat to universal legal decisiveness. Raz’s views are influential in law. Like Davis, he thinks indeterminism is a natural condition of language. Vagueness, as a source of indeterminacy, is modelled I law. (Endicott 2000). Raz’s pupil Timothy Endicott’s model of vagueness is very similar to Cresswell’s, using a model of similarity to prototypes and avoiding the geometric metaphor of sharp borderlines. He thinks of his approach as Wittgensteinian and cites with approval Sainsbury, a philosopher Cresswell finds influential (Wittgenstein 1954, Sainsbury 1990). The chapter examines the role of interpretation in the theory (Marmor 1992, Schauer 1991, Dworkin 1997, Kelson 1991) and finds that Endicott’s model contains the same fundamental errors as Cresswell’s and those modelling construct referenced assessments. The chapter concludes that the meta-problem of vagueness, that of incredulity in the face of its solution, is more pervasive than just affecting educational high stakes judgments. The law like education has a systematic duty to make high stakes decisions. Endicott also thinks that there is no solution to vagueness. Endicott thinks that it is literally true that vagueness is an untheorisable case of borderless transition (Endicott 2000). Endicott thinks that ‘there is no better way to account for the application of vague expressions than in terms of resemblances to paradigm cases’ (Endicott 2000, p48) ‘Resemblance’ is an indefinite requirement which leaves the language user to decide what counts as sufficient resemblance. Vagueness is built in to this notion and it is because of this that he thinks it is more useful than trying to picture vagueness using something so unvague as a sharp borderline.

7.3 the irremovable indterminacy of law

He thinks that cases of vagueness are where people can’t decide whether a concept applies in a certain case because they can’t tell if it is similar enough to a paradigm of the case. He thinks that the indecision this causes is irredeemable. A decision in any such case is therefore arbitrary. He disagrees with Dworkin’s solution that interpretation can deliver a determinant answer. Endicott argues that interpretation cannot eliminate indeterminancy, even if it is interpretation outside the law. He thinks that is merely another way of insisting upon the ‘technical tool’ of assuming bivalence, for example that there is always a right answer to any question of law where ‘right answer’ means ‘as constrained by the law.’ It is impossible to achieve predictability in the law in all cases – Endicott argues that it is impossible to treat like cases alike because of this. What Endicott is not saying is that controversial questions cannot be answered. Only where the controversy is caused by vagueness is indeterminism truculent. Endicott claims that the law is vague and the vagueness is not just a feature of the way it is represented in legal language. So goodness is vague, not just the word ‘good.’ This claim comes from his insistence that even where there is semantic determinacy there may be pragmatic vagueness. And we can see that where there is pragmatic vagueness is also vague. The conclusion he wants to draw is that the law and the rule of law are not undermined by the idea that there is vagueness involved in them: it claims that the ideals of law and rule of law cannot be understood properly if they are made to rest upon determinacy in the requirements of law. Endicott argues that there are no solutions to the sorites puzzle that are not unprincipled and arbitrary. Lawyers, like teachers having to make judgements, are fundamentally faced with questions that have no determinate answer caused by vagueness. For Endicott the existence of ‘higher-order’ vagueness is a crucial barrier to theories attempting to overcome the paradox presented by the sorites. If the interpretive resources of the law can overcome higher-order vagueness then he feels that the law would not be vague and all questions as to what judgement to make would be determinate (or at least, any indeterminacy would not be caused by vagueness). It would seem that if there are interpretive resources in education, just as there are in law, then again, if these can overcome the threat of vagueness then educational assessment can be rid of vagueness. Endicott thinks there are no reasons, however, for thinking that the law has such interpretive resources; similarly there seem to be no such resources in education. If this is so then educational assessments are irredemiably vague. Endicott’s claim is that ‘higher-order vagueness is truculent’ (Endicott 2000, p77). Many-valued logics and supervaluationism all suffer from higher order vagueness. There is no possibility of non-arbitrary assignment of sharp borderlines using these solutions, they offer substituting one vague borderline for many and the failure of these approaches leads Endicott to conclude that the truculence of higher-order vagueness means that any theory shouldn’t deny it, try and designate a number of orders of vagueness nor assert that ordinary vague terms are vague at all orders. Endicott thinks that no theory should even try and solve the sorites paradox (Endicott 2000, p78). He has two arguments. One is that vague terms don’t draw sharp boundaries and secondly that any theory that solves the paradox must portray vague words as if they did. In doing so they would have to misrepresent vagueness (Endicott 2000, p78). In this fundamental sense he thinks they would be inauthentic. The ‘epistemic’ theory explicitly does this and so he thinks such a theory therefore deeply inauthentic. Endicott claims that the tolerance principle can be true in all cases of the ordinary use of a vague term. He cites Crispin Wright’s commonsense indeterminist who claims ‘…to accept both the coherence of vague expressions – their possession of at least some determinate positive and negative instances – and their limited sensitivity [i.e. their tolerance]’ (Wright 1994). This leaves us with an unresolved paradox. Endicott thinks that value terms are counter-instances of any epistemic approach like those proposed by Williamson and Sorenson to vagueness. The epistemic solution to vagueness is the claim that there is a counter-instance to every sorites series. Educationalists don’t need to worry about all sorites series, however, they can restrict themselves to worrying about sorites series attaching themselves to evaluative terms. Assessment is about evaluation. So even if there are some cases where the epistemic solution works it is important that these include evaluative terms for it to be relevant for educational assessment. Endicott highlights this group of vague terms – normative, aesthetic and value terms – as being both irredeemably vague and largely ignored by philosophers. He comments that despite being ‘one of the most interesting aspects of vague language… philosophers concerned with vagueness have little to say about it’ (Endicott 2000, p127).

7.4 Endicott’s good soup argument against williamson’s epistemic solution to vagueness

Endicott’s argument asks us to imagine a chef making a good batch of soup. The chef knows that it matters that the soup is a good soup and knows that to make a good batch of soup adding one grain of salt will not make a difference as to whether it is a good batch of soup or not. Even if it makes a discriminable difference, he knows that the difference between a good soup and a not good soup is not determined by a grain of salt. Williamson construes vagueness as being ignorant of the small difference that actually does make the difference. Endicott thinks that the difference between a good soup and a non-good soup cannot be about a marginal difference, even if noticeable, because the distinction between good and not good is one that marks out a known difference that matters. The margin for error principle used by Williamson’s epistemic solution can’t apply because it claims that the difference between a vague term and its contrary is hidden in an unnoticeable or non-important difference, a difference that doesn’t seem to matter but actually does. But the chef knows that it matters that the soup is good, knows that a grain of salt doesn’t matter and therefore knows that a grain of salt cannot make the material difference between a good soup and soup that isn’t good. So our chef knows that adding one grain of salt to a soup she knows is good will not alter the fact that it is good. Endicott thinks that evaluative terms do not allow for the margin of error principle because he thinks the principle only makes sense if small differences can matter. The evaluation that the soup is good matters and to change it also has to matter. In the situation of evaluation, tolerance itself is evaluative. To say that we can tolerate a small difference is to say that we evaluate something as being too small to matter. As Endicott puts it, ‘If two batches of soup are not significantly different, one cannot be good if the other is not good, because that is a significant difference’ (Endicott 2000, p128). Williamson may deny that we always know why a good thing is a good thing. A soup may seem to be identical with another soup considered good but itself not be considered good. Endicott thinks that this ‘…would make evaluation unintelligent’ (Endicott 2000, p128). He thinks this would amount to saying that there was a significance that was unknown to the evaluator that was yet significant to them. Endicott thinks that a secret law would not create an unknowable norm for action. Endicott thinks that ‘…publicity of laws is one of the requirements of the rule of law’ (Endicott 2000, p125). Claiming that there was something like a law guiding the evaluator, unknown to the evaluator and yet determining a norm for her would be incoherent. Evaluation is about setting and using norms guiding action. Endicott thinks that even if epistemicism is right in saying that there are secret rules of use determining sharp cut-off points and thus conserving bivalence for some terms this cannot be the case for evaluative terms. Evaluative terms are about setting norms, standards, giving guidance for decisions, and these have to be known. Endicott thinks that any distinction that attempts to draw a non-trivial distinction must be justified by something that is known to matter by the person making the distinction. He thinks the ‘margin for error’ principle of Williamson cannot therefore be applied here because the usefulness of a term in drawing a non-trivial distinction cannot then be found in any trivial distinction between the term and its contrary. So an essay that is a good essay can’t be changed into being a non-good essay by anything trivial. In order for an essay to be judged non-good there would have to be something that mattered that was the reason for making the distinction between the two essays. If missing a single letter never mattered then a teacher who knew that it mattered whether an essay was good or not, and who knew that missing a single letter never mattered would therefore know that taking away a single letter couldn’t make a difference between a good essay and a non-good essay. She therefore knows that a good essay will still be a good essay even if a single letter is taken away. A sorites from this ends up falsely showing that a blank sheet of paper is also a good essay. Whatever the paradox comes from, therefore, it isn’t ignorance caused by the ‘margin for error’ principle because our teacher knows that a trivial difference never makes a difference that matters to whether the essay is good or not, and she also knows that it matters whether the essay is good or not. Williamson might argue that there need not be a material difference between good soup and not good soup, a good and a non-good essay. This would depend on allowing that a non-good soup need not be judged to be different from a good soup in any way that mattered. It would matter if it meant that a grain of salt changed the good soup into a disastrous soup. However, the judgement that it is a good soup is a judgement that claims that not only the soup is good but that it matters that it is good. Typically this is true. And yet ‘…it never matters whether an extra grain of salt has been added’ (Endicott 2000, p129). So too the essay and the single letter. It typically never matters whether a single letter is taken from an essay. Endicott’s conclusion is that: ‘we cannot make sense of the meaning of evaluative expressions if we attempt to assert bivalence for them’ (Endicott 2000, p129). Endicott’s investigation of vagueness and law draws careful attention to the fact that although the meaning of legal expressions are vague he thinks law itself is vague as well because much of law is based on decisions being made, decisions and actions based on both written legal statutes and rules on the one hand and customary rules too. So what the law does is vague too. Educational assessment is similarly vague. It isn’t just that the meaning of educational assessment language is vague but because it is about practical decision making of various orders, it is intrinsically vague in itself.

7.5 pragmatic vagueness

Pragmatic vagueness ensures that educational assessments are vague even when the language in which they are couched is precise. For example, it may be semantically precise to say ‘Arrive at five o’clock’ but pragmatically vague because the customary rules governing the correct etiquette for arrival at the tea party may well be vague. ‘ A semantically precise-looking injunction in a mark scheme or criteria could well be pragmatically vague. Endicott thinks that the definition of pragmatic vagueness arises out of ‘the semantic vagueness of terms such as ‘appropriate’ or reasonable’ (Endicott 2000, p51). The vagueness is not about the meaning of the words as such but about what it is appropriate to say and do. Endicott thinks it is clear that there is no clear distinction between the two. He thinks an important reason for not drawing a sharp distinction between semantic vagueness and pragmatic vagueness is that ‘it is often impossible to isolate questions of truth of statements, given a particular state of affairs (which we might call semantic questions) from pragmatic questions about what is appropriate or reasonable’ (Endicott 2000 p51). This point emerges strongly from Endicott’s understand that the connection between what we mean is determined by how we use the word. Endicott locates two possible ways of understanding what we mean by the Wittgensteinian injunction that ‘meaning is use’. One is to interpret this as looking at how people in fact actually use the word. This is a descriptive interpretation of use and there are obvious questions that such an interpretation raises. How can we have the idea of a mistaken use if the way we know something’s meaning is by what people do with the word and nothing else. And we can ask which people? All people? Are there no people who know better than others about how to use the word? The vagueness of the identity of the speech community is one problem. And what of this? Jim stopped applying the word dark at 8.30, Sally at 8.50, but those facts alone can’t give either of them a reason for agreeing to apply the word until 8.40. Williamson would say that neither of them knew what their whole pattern of use was. This was what was hidden from each of the speakers and so in a genuine sense the ‘use’ of the word ‘dark’ is hidden from them (Endicott 2000, p124). But what Endicott presses is the question as to what would be a reason for a speaker to use the word ‘dark’ in any way. Williamson would have to argue that the thing that makes the speaker use the word is that it is dark. Darkness must have sharp boundaries because of the logical argument which requires bivalence. But Endicott thinks that to argue this ‘…would abandon the notion that use determines meaning’ (Endicott 2000, p124). Endicott thinks that descriptive understanding of ‘use’ can’t be used to provide reasons for speakers to use words as they do. This brings in a second way of interpreting ‘meaning is use’. In this interpretation, ‘use’ is understood in its prescriptive sense. It is the way it is useful to understand the word in that way. Regularity of behaviour is treated as a reason for using the expression. For this reason Endicott says that ‘dispositions matter just in so far as they justify applications of expressions, and they can justify applications of expressions only to the extent that they are intelligible to speakers as providing a justification’ (Endicott 2000, p125). Endicott is keen to rebut the claim that what this boils down to is a verificationist position where there can be no unknowable truths about meaning. Endicott draws a distinction between knowing a fact and knowing a norm. A fact that can’t be known would be subjected to a form of verificationism if, because of the fact that it couldn’t be known, we denied that we could say it was true or false. But that is not the claim here regarding the meaning of an evaluative term. Endicott thinks it is not intelligible for a person to have a reason for doing something that is hidden from that person. ‘Use’ would guide her in applying the word. It would be useful, as in helpful. This normative function of ‘use’ undermines the idea that precise but hidden boundaries help determine meanings and avoid vagueness. We can begin to see the importance of Endicott’s argument against epistemic theory. Williamson would argue that there was a hidden trivial difference between a good folder and a non-good folder, for example, a difference that conserves bivalence and prevents the logical sorites. Further he could argue that the trivial thing might matter enormously because what matters is context dependent. In a context where every little letter matters, then the removal of a single letter could indeed be the difference between a good and a non-good essay. In such contexts there is no vagueness. Endicott’s reply to this line of reasoning is a version of the ‘you changed the meaning’ argument that Williamson himself uses against stipulation of sharp boundaries. He argues that whereas we’d be able to generalise the use of the expression ‘good folder’ to paradigm cases of good folders we wouldn’t be able to do so with the precise version even if they could be applied to the same cases. This is because there aren’t any precise paradigm cases of good folders. Knowing how to apply the term ‘good folder’ to a known example is to judge that the similarity between the known case and the paradigm of what a good folder looks like is similar enough in a salient way. To say that the sharp boundary version of good folder is similar enough in a salient way is an inexplicable idea. It would fail to be generalisable in the same way that the vague term was; we wouldn’t be able to point to the way the folder is like the paradigm because in the sharp version the way of determining the applicability would have to be different from that. It would require us to make something trivial on the vague use of the term non-trivial. What didn’t matter would now matter. It would therefore have to be presented in a different way. And this is fatal for the claim that a sharp line can be drawn using trivial differences. Dorothy Edgington summarises this point as being that ‘The difference between a true and a false judgement is meant to be a difference that matters. Yet for any putative line, there will be no significant difference – no difference that matters – between things just either side of it (Edgington 1997, p299).

7.6 similarity supplants boundary modelling of vagueness

Endicott reconceives vagueness in terms of a ‘similarity model’ rather than in terms of what he calls the ‘boundary model’. Endicott thinks that if use determines meaning then the correct application of words depends on the dispositions of the speakers to use the words correctly. This is what Endicott calls ‘the boundary model’ and is what he thinks characterises Williamson’s view about meaning and use. The ‘boundary model’ explains ‘ the application of vague words as determined by a social choice function (from dispositions of speakers to correct and incorrect applications of words’ (Endicott 2000 p3). Endicott thinks that Williamson endorses this view. Endicott disagrees with it. Endicott argues for what he calls a ‘similarity model ‘of vagueness. This claims that vague terms are vague because they apply to objects that are ‘sufficiently similar to paradigms.’ This is not so much a theory as a model. Linked to this Endicott adds the two controversial claims that the meaning of a word can be seen as a rule for its use or as sharing fundamental normative characteristics with rules and that all general evaluative and normative expressions are necessarily vague. Endicott thinks that no social choice can determine sharp boundaries but that this doesn’t have to mean that the location of boundaries is indeterminate. Rather, Endicott concludes that it just means that necessarily, they cannot be precisely located. The social choice argument can be revamped to claim that the location of boundaries is roughly determined by social choice but this is rejected because it leads to the equivalence of incommensurate options. He thinks to do this is unprincipled. Pragmatic Vagueness is an important feature in law and Endicott gives as an example the way lawyers work with the notion of ‘interpretation. This is clearly relevant for our examiner who has to interpret what she is examining according to a set of criteria, or a mark scheme or a construct. Endicott has a legal example of ‘The ‘valiant beggar’ to illustrate the point. Edward 3^rd prohibited the giving of alms to a valiant beggar as a measure against labour shortage after the Black Plague. The legal question was then what one should do if we were confronted with a valiant beggar in the cold who was likely to die. Christopher St Germain argued that the law would punish the alms giver but equity would exempt the law-giver from operation of the law through an interpretation of the statute. There is no sharp boundary as to where the statute applies or not, even though there is no vagueness about the words and what they mean. They mean that no alms are to be given to valiant beggars and this is a sharp, determinate meaning. Applicability of the precise legal statute is pragmatically vague. Relevance and sufficiency are vague. Endicott thinks theories of either are impossible because they are not theoretical notions. From this he concludes that there will be no such thing as a theory of semantic or pragmatic vagueness (Endicott 2000, p53). For Endicott such notions are inextricably bound up with the application of vague language. What happens when we have to make a decision about a borderline case? In such a case it is not clear what is the right thing to do. Endicott’s ‘The case of a million raves’ is how he illustrates the sorites problem for legal judgements. A judge is confronted with a million raves. The first rave is clearly causing great distress to neighbours. The one next to it a decibel lower, a change too small to be discriminable to neighbours, so is more of the same. As we move along each rave is a decibel lower, right down to the millionth rave which is so quiet that no one even knew it was happening. Between any two successive defendants in the series there is no difference that the local inhabitants can perceive. The law must prosecute the painful raves and acquit the rest. But how can a court convict one and dismiss the next one given that they are in all material senses the same? If the law has to treat like as like and justify its decisions then we seem to be in an impossible position (Endicott 2000 p58). Hans Kelsen (Kelson 1991) argues that the norm works as a frame in which any decision made within the frame is within the law. There are no gaps, no borderline cases. If there is vagueness in something then the court has discretion to apply the law to it in one way or another. But the discretion still makes the decision legal. The trouble is that its successful application depends on being able to know where the clear case ends and the vague case begins. But that is vague too. Higher order vagueness prevents Kelson’s idea from working. When to use discretion is a vague question and so completeness of law fails. The failure to understand the depth of the problem vagueness poses is a common one, as we have seen, and higher order vagueness one of its deepest traps. The attack on indeterminacy is to say that ‘the law has resources that prevent gaps.’ This is also the hope of the educationalist assessor who, as we saw, strives to ensure that every judgement is both valid and reliable. Dworkin’s ‘ Right Answer Thesis’ is one which claims that there is always a right answer to any dispute. (Dworkin 1991) Dworkin’s solution to the threat of vagueness argues that the rules of construction could require that a rule only be applied in cases in ‘the indisputable core of the languages’ (Dworkin, R. 1977 p67-69). Borderline cases are therefore not subjected to the same rule. This is similar to Cresswell’s solution in that he too argues that borderline cases be treated differently from the core cases. Raz critisises Dworkin for drawing two sharp lines instead of one – between definitely true and borderline and between borderline and definitely false. A borderline case for Dworkin says ‘ p is neither true nor false’. If truth is disquotational (ie snow is white is true only if snow is white) then the claim is self contradictory and therefore absurd. This is the argument that Cresswell’s simple heuristic also falls foul of. But Endicott thinks the simpler argument is that no one can assert either side of the claim. Why not? Because in a borderline case it’s not clear whether or not it is appropriate to assert p or not p. It’s not that definitely there is no true answer, it’s just that we can’t be sure what that answer is. When did Apollo 11 leave the earth’s atmosphere? Endicott thinks that to say there was no last moment in which it was in the atmosphere is not to say that there was a moment when it was neither in nor out of the atmosphere. Endicott thinks Dworkin wants to construct laws without indeterminancy using a rule of construction that eliminates vagueness. Dworkin is therefore accused of illegitimately inventing rather than discovering what the law requires. Such a rule might not exist but also the canons of construction tend to be formulated in vague language. So we would need rules for constructing those canonical rules of formulation to eradicate vagueness. And they would use vague language so they in turn would need rules. The argument runs into an infinite regress. It seems clear that there must be indeterminancy somewhere. It is ineradicable unless epistemicism is true and there are precise borderlines. It would seem that the same arguments Endicott develops apply equally to any attempts to formulate educational assessment without gaps. Endicott’s thought is that it isn’t the Laws’ language that is just the issue, even though they get applied. This is because principles don’t get applied, they just suggest direction for the application of law. And principles are concepts that admit different conceptions. Merely generalising that solution to all concepts doesn’t eradicate vagueness because if it gets you out of the problem of semantic vagueness it still leaves you with the pragmatic vagueness attached to application. Endicott’s edict to reject the Fregean spatial metaphor of the border to understand vagueness – don’t get hypnotised by borderlines/borders – is something he recommends because the boundary model is itself vague. A particular reading of Wittgenstein’s ‘Philosophical Investigations’ supports this (Wittgenstein 1954, p71). This reading suggests that a boundary is vague because of the vagueness in which idiolects are aggregated, and the vagueness about which slice of language precisely is being aggregated, and about which dispositions count (and therefore which don’t) and who precisely counts as a member of the community, as well the vagueness arising from ethnolinguistic variation and the inability to discern when a particular disposition is mistaken, and what the context is and what effect context has on the location of boundaries. Endicott agrees with Wittgenstein that the problem is that there is no justifiable way of organising vague concepts precisely. The boundary model makes language look as if it is inept in creating boundaries. But as Wittgenstein was keen to emphasise, language works, is essentially vague rather than inept. Endicott thinks we must resist some of the conclusions that line drawing metaphors seem to imply, especially the idea that it can give that we can provide ordered scales to precision. Endicott’s use of the similarity model is his response to this. This model constructs vagueness as flexibility in the normative use of paradigms. Vagueness is a feature of the creative use of language rather than a deficit. So we can ask to what does a vague expression refer? Endicott’s answer is that we are referring to objects that speakers treat as paradigms and objects that are sufficiently similar to the paradigms. He thinks that ‘vague words apply to objects to which it is generally useful to apply them; the indeterminacy of vagueness lies in indeterminacy of purposes and needs’ (Endicott 2000, p155). To use an example from Wittgenstein: ‘If I am standing in a square and I ask you to stand roughly over there, pointing my hand, I am not drawing a boundary and asking you to stand within that particular spot. Rather, I am giving an example and expecting you to take what I say and do in a certain way’ (Wittgenstein 1953, p71). Language doesn’t fail to set precise boundaries but rather is about being ‘…able to use paradigms, and vagueness is flexibility in their use’ (Endicott 2000, p155). Endicott considers three criticisms of this model, taken from Sainsbury . Firstly it seems to presuppose that every boundariless concept must be instantiated. Secondly, there is variation of paradigms for a concept, so we could know a concept without knowing any particular one paradigm. Thirdly, the sorites paradox is not solved. The first one disregards the flexibility at the heart of the notion of a paradigm case. We can learn the paradigm for a dragon or unicorn without there being dragons or unicorns. The second criticism captures something essential and powerful. There may well be several paradigms of excellence at English. There can be nothing in the notion of similarity to a construct or paradigm that necessitates that the same paradigm or construct must be useful in the same way to everyone using the concept. All that is required is that similarity to the same construct or paradigm is being used to understand the concept. Any paradigm will be always incomplete and the application a choice made by the user in terms of how useful certain things are in relevant understanding. This conjoins Urmson’s necessarily incomplete criteria, Sadler’s idea of fuzzy boundaries and Wittgenstein’s ‘meaning as ‘usefulness’. Therefore not everything in the construct can be, nor need it be, useful in any particular case. The issue is then how restrictive a construct is. Endicott thinks the boundary model suggests vagueness is just an obstacle to classifying something. But if we think about other sorts of questions the idea of a precise location in a scale of ordering is not to the point. The art teacher typically asks if a child’s painting is more beautiful than another’s and we do not need to accept that there is a required determined scale of ordering of objects ranging from beautiful to non-beautiful in order for her to do this. To a connoisseur small things in the painting will help make the decision in the comparison but no one small material difference between paintings will make a difference to the judgement. Endicott thinks that this is because the evaluative judgement that the painting is beautiful is not dependent on any small material differences. He thinks deciding whether Mary’s painting is more beautiful than Alisha’s is not about deciding on a scale and then trying to place the paintings on it. Rather, what makes beauty and what makes something more beautiful than something else involves incommensurates and therefore no single ordering will be possible. In a borderline case we might be able to say that Alishas’s painting is more beautiful than Mary’s but not more beautiful than Hilary’s and that Hilary’s is more beautiful than Mary’s. Endicott thinks that the boundary model pushes us into thinking this is absurd because it pushes us into thinking that the problem is placing things in conceptual space on a single scale. But if incommensurates makes such a scale impossible then we need to resist the picture fixed by the boundary model. Endicott argues that typically such orderings are necessarily incomplete because the sources of vagueness such as immeasurability, incommensurateness and so on do not allow for a complete ordering on a single scale. If true then the epistemic version of vagueness can’t work because it claims that there is a complete ordering on a single scale and that the vagueness stems from ignorance of this. If we think about what might be useful as a construct of beauty we can see that there could not be a single determinate ordering of beautiful through to non-beautiful. The vagueness of ‘beauty’ caused by incommensurates insists that there is always more than one way of deciding on an order. There could be no principled way of settling on one single ordering as the right one. Endicott thinks that vagueness comes about because no aggregate of people’s dispositions to use vague terms could result in a single determinate and precise ordering. This is one reason why the scientific approach to making evaluative judgements involving vagueness is doomed because science requires that we can always represent whatever is being evaluated on a single scale of measurements. If we return to the idea of measurement, one way of thinking about this term is to think in terms of ordering items in a conceptual space. Yet once incommensurates and immensurates are factored in then the idea of such a complete ordering becomes impossible. Utilitarianism liked to think that it could compare the utility of different individuals on a common scale. It assumed that you could plot everything out on a single line. But with essays and any ‘multi-dimensional’ vague thing we can’t plot everything out on a single line. Or rather, we can but there can’t be just one way of doing this, because of the multi-dimensionality. We might have one essay that is full of imagination. (Essay A) We might also have one with lots of good secretarial skills in evidence. (Essay B) Another essay might be just like essay B but with an extra secretarial skill. We might well judge both essay A and B as equal borderline cases of good essays. Consistency would suggest that C should be ranked slightly above B because it is slightly better. Yet even so, there is nothing about C that makes it a better essay than A. Transitivity does not apply in this case because being better on one scale doesn’t effect measurement on another. And if this is right then we can’t have precise orderings generated by aggregating scores and rankings. Endicott is arguing with Dummett and others about ‘…the penumbra of application of vague positive adjectives, and of the application of comparative adjectives’ (Endicott 2000, p141). He denies transitivity. He agrees with Dummett that if Milad is a C grade and Ashot isn’t as good a C grade as Milad then, if Ashot is better than David, then Milad is better than David. But if it is indeterminate whether Milad is better than Ashot it would then not follow that if Ashot was better than David then Milad would have to be better than David. The transitivity relation that Dummett insists on (Dummett 1978, p262) is just a tautology: Transitivity follows from having a precise ordering in place. If this is impossible, for example when vagueness creates indeterminacies, then because we can’t have the required classification ordering we don’t get the condition from which transitivity follows. Cresswell recognises this and admits that his heuristic for attaching value to borderline cases ensures that there will be incorrect assignments of value that deny transitivity constraints. Pupils will necessarily be awarded wrong grades. Endicott is sharp on the reason why people nevertheless ignore this: ‘The ordered individual rankings of the boundary model are pseudo-orderings produced by a theoretical requirement of determinacy’ (Endicott 2000, p147). The actuality of indeterminacy caused by vagueness for Endicott (and Cresswell) makes sense of the notion of disagreement and flexibility and the need for a discursive approach to assessment. Competent users of the concept of C grade in English will have a broad knowledge of the concept through having a wide set of salient factors that might be combined to make up the construct of C grade being used for comparison in a particular case. And they will understand that choosing a certain paradigm/construct for comparison is not always the only legitimate one. Another competent user of the concept may choose a different way of understanding the paradigm/construct. For vague assessment concepts there are no absolute paradigms/constructs. In Razian terms, we might say that there are more than one parochial concepts useful to access a value. Paradigms vary and therefore assessment procedures need to accommodate this flexibility due to vagueness. Endicott thinks that vagueness gives licence to disagreement over how to decide cases. It might seem that if this is the case then different concepts are being used. But that would only follow if there was a way of having a precise determinate paradigm and a precise relation of comparison. But it isn’t possible to have a single paradigm of heap because there is no single way of understanding ‘heap.’ Because of this, there is no absolute paradigm of heapdom. There are many ways in which one could compare something to the paradigm, and several paradigm cases that could count as the paradigm to which a comparison of similarity can be made. The notion of a ‘paradigm case’ is itself vague and this accounts for this variation. Sadler and Wiliam’s notion of ‘construct’, like Endicott’s ‘paradigm,’ is therefore vague. And this allows for variation of constructs. . Sainsbury’s third point is accepted. The sorites problem, and with it vagueness, remains unsolved because the similarity relation it requires is vague and so is the notion of a paradigm. Endicott thinks that the ‘… best thing that can be said for it is that its vices and virtues correspond to the nature of the subject matter’ (Endicott 2000 p157). If Endicott thinks that there is a licence to disagree then does he think that interpretation can rescue a judgment from the indeterminancy of vagueness? The idea that the law is not the written rules but the interpretation of the written rules suggests that one could therefore insist that until a decision was made to determine the meaning of the rule nothing was indeterminate. Because a ruling one way or the other always has to be made, bivalence may be said to prevail in all cases and everything is, once so interpreted one way or the other, determinate. This is a position taken up by Dworkin. It is a claim that there are resources besides the meaning of words that can eliminate vagueness. For this argument to work there must be: Resources besides the meaning of words These resources need to be precise not vague. If by ‘resources besides the meaning of words’ we are saying that education language use can be contrasted with uses of language without such resources then the argument is illegitimate, because there aren’t any.

7.7 the incoherence of semantic autonomy

Endicott thinks that the idea of the ‘semantic autonomy’ of language is incoherent. A proponent of this idea is Frederick Schauer who claims, for example, that three shells on a shoreline that make a pattern that looks like the word ‘cat’ carries the meaning ‘…independent of those who use it on particular occasions’ (Schauer 1991 p102) Endicott thinks the autonomy is spurious because it depends on a language user using the language (Endicott 2000, p18). As well as this, for it to be properly autonomous, the word ‘cat’ couldn’t have anything to do with cats and Endicott says, ‘no one can know the English language, and know nothing else’ (Endicott 2000, p19) If it is conceded, as it is by Schauer, that context is a bare minimum requirement for knowing a language, then it might be that vagueness occurs when we try and decontextualise our judgements. The resource beyond the meaning of words is then the context of use. But context is not a source of vagueness. Context supports a process of disambiguation but vagueness isn’t ambiguity. So, if we don’t mean by ‘resources besides the meaning of words’ anything more than what can be done with ordinary, conventional language (e.g. when a policeman coherently says ‘Clear all vehicles away from the scene so the ambulance can get through’ (Endicott 2000, p164) then we might concede that all language use has resources besides the words. But if that is the case, we know that these resources don’t help resolve the sorites paradox because in ordinary, conventional language vagueness is commonly found. But maybe the extra resources of legal and educational language are capable of precision. However, Endicott, points out that these resources are generally considerations of principle and lack precision as a special structural feature. Strict construction is vague because of higher order vagueness. General principles of consistency are vague as they are forms of analogical reasoning and sufficiency and relevance of similarities are vague, as was argued above. And also, even if we could identify a decisive phenomena in a particular context through using a principle, we could not justify the polar difference in treatment between the next case in the sorites series. What would be wrong is that using such a technique would lack integrity. This is a crucial aspect of the case of the million raves in Endicott’s example regarding the law. No matter what interpretive devices one wanted to draw on Endicott thinks there could be no principled way of treating adjacent candidates very differently in a sorites series. Would not the assumption and application of educational bivalence restore integrity to such a decision? (e.g. the requirement that a decision must be reached, one way or the other over what grade is to be awarded and so on) This is a principled way of dividing things up at precise points. The ‘duty to decide’ is a feature of assessment that is distinct from resources in other forms of communication (except maybe law) which gives a reason for saying that the resources of education are determinate. ‘We make sense of them [indeterminacy claims], if there is any sense to make, by treating them as internal, substantive positions based, as firmly as any other, on positive theories or assumptions about the fundamental character of the domain to which they belong. In law, for example, the functional need for a decision is itself a factor, because any argument that the law is indeterminate about some issue must recognise the consequences of that being true, and take these into account’ (Dworkin 1996, p137). Endicott thinks this argument fails because a duty to give a decision cannot be a reason for saying that we are required to give any one particular decision. Will a creative notion of interpretation be the extra resource that will resolve indeterminacy?.Endicott examines Andrei Marmor’stheory about the way interpretation helps the law sort out indeterminacy. Marmor’s theory rests on two theses: that every interpretation is an attribution of actual or counterfactual communication intentions to the real or fictitious author (the Intention Thesis) and that interpretation is an exception to the standard way of understanding language (the Exception Thesis) – this is a reading of Wittgenstein’s idea that ‘There is a way of grasping a rule that is not an interpretation’ (Wittgenstein 1954, p201). So interpretation is different from understanding and it happens when understanding is underdetermined by rules or conventions. Following a rule is acting, not interpreting. The simple view requires just something like the exception thesis, not the intention thesis. Endicott argues that once one concedes the second point enough has been conceded to deflate any argument that interpretation would resolve indeterminacy. If someone is not simply understanding what a concept means but is having to devise an exception then one is conceding the vagueness of the concept and agreeing that only something else can prevent this. The process moves from discovery to invention. The idea that there is a gap between a rule and its applicability is something that Wittgenstein’s later philosophy denied and so to argue that interpretation can bridge the gap is to misconceive the problem posed by vagueness. There is no gap. If there was and interpretation filled it then we would committed to an infinite regress. The problem is not about there being a gap which interpretation can fill because what is lacking is not an interpretation but understanding. Understanding is about action in the sense that when we don’t understand we don’t act. Hence in ordinary circumstances it would be a joke if I drove up to a red light and asked ‘what do you make of that?’ (Endicott 2000 p174). Any interpretation that intended to clear up semantic vagueness would merely push the problem of vagueness into the domain of pragmatic vagueness instead. An assessor might decide on a precise interpretation of a construct or standard or paradigm or criterion but there would be nothing principled to restrict vagueness of practical application of that precise construct, standard, paradigm or criterion. Indeed it is likely that where there are semantically precise standards ingenuity of applicability is likely. There are many ways in which interpretation of application can create wriggle space for itself. The precise standard that says that only a candidate who ends every sentence with a full stop can pass an examination can be subverted by the argument that the purpose of the semantically precise standard was to capture something vague, which was that a candidate must be able to use full stops normally and adequately in a way that shows that they are full participants in a community of practice. How many full stops could be missing, and what circumstances might be acceptable, is vague, (think again of the example of the three point turn, which is semantically precise but pragmatically vague) and subverts any semantic precision. Decisiveness can’t be the discovery of what doesn’t exist so becomes an enquiry into what should count as an answer in the light of this fact. Endicott endorses this approach (Endicott 2000) and would agree with the limitations that Creswell’s solution enjoins (Cresswell 2003). Endicott’s theory can be seen as a pragmatic response in the face of relative borderline cases. But this model is condemned because it allows incredulity at the implications following the solution to the sorites override logical proof. Examining the model of legal vagueness provides no new resources to those already modelled by Cresswell in educational high stakes grading. Considerations of how the law treats vagueness seems to end in the same place as educational ones. Decisiveness about borderline cases requires reverting to a process of invention. As such, it is necessarily an unprincipled and arbitrary decision presented as principled and truthful. Legal vagueness and educational assessment vagueness are both used as justifications for lying.

class=Section9>

CHAPTER 8: the sincerity and purpose of vagueness

The Joker: ‘How about a little magic trick?’(The Dark Knight 2008)

8.1 INTRODUCTION

Dworkin thinks that vagueness is trivial. He denies the denial of universal decisiveness. In order to examine the claim I look at the dispute between Ronald Dworking and HLA Hart in the field of jurisprudence (Hart 1962, Dworking 1986). The chapter shows how the attempt to model borderlines as a way of granting discretion is relevant for high stakes assessments in education too. In doing so it raises issues of sincerity involved in making judgments in borderline cases, applying the definition that “A lie is a statement made by one who does not believe it with the intention that someone else shall be led to believe it” (Isenberg 1964, p466).

8.2 HART VS. DWORKIN

One of the great legal disputes of the last half century has been partly about whether there is legal discretion. This debate is personified as a dispute between HLA Hart and Ronald Dworkin. Scott J Shapiro is dramatic: ‘For the past four decades, Anglo-American legal philosophy has been pre-occupied – some might say obsessed – with something called the “Hart-Dworkin” debate (Shapiro 2007, p22). Dworkin took Hart to be saying that there was discretion and he has continually argued against this view (Hart 1962, Dworkin 1986). This is a context that partly explains why legal philosophers like Endicott dismiss Dworkin’s ideas refuting indeterminism in law (Endicott 2000). Dworkin denies indeterminism and Endicott doesn’t. The hermeneutical paradigm of assessment is one that mirrors the same concerns. The vagueness of criteria may or may not be thought to offer discretion on the part of users. Endicott agrees with Hart who thinks indeterminism caused by legal vagueness grants discretion to judges. Dworkin thinks there is no indeterminism caused by vagueness and so there is no discretion for judges. Endicott thinks discretion amounts to judges being licensed to make arbitrary decisions. He thinks there’s a arbitrariness then the gain in expressiveness justifies the loss in reliability. This suggests that vagueness has a function. It licenses discretion.

8.3 discretion as the function of vagueness

Construct referencing and criteria referencing were both responses to the perception that discretion gave assessment systems the resources to maintain the validity of assessments and be decisive. Discretion is conjoined with the possibility to decisively interpret. Dworkin makes an explicit appeal to this when he writes: ‘We make sense of them [indeterminacy claims], if there is any sense to make, by treating them as internal, substantive positions based, as firmly as any other, on positive theories or assumptions about the fundamental character of thedomain to which they belong. In law, for example, the functional need for a decision is itself a factor, because any argument that the law is indeterminate about some issue must recognize the consequences of that being true, and take these into account (Dworkin 1996, p137). ’But a duty to be decisive can’t authorize any particular decision. ‘A duty to decide is a reason to give a decision, but is not a reason to conclude that the law requires one decision’ (Endicott 2000, p167). Interpretation raises the issue of constraint and freedom. Too much constraint restricts expressiveness. Too little constraint leads to anarchy. Discretion is part of an interpretive license. Paul Standish has shown that this is an orthodoxy of the hermeneutical approach to language, arguing that the ‘… widespread acknowledgement within philosophy of the narrative structure of human experience – for example, in Paul Ricoeur, Alasdair MacIntyre and Charles Taylor…’ (Standish 2008, p1) requires an approach to reading this complex narrative in ways that acknowledge the ‘… opacity of language, its thickness, so to speak, its recalcitrance…’ (Standish 2008, p11). He references Derrida and discusses the way that language always requires a reading that defers a final interpretation. ‘As Jacques Derrida has extensively shown, words are such that their meaning can never be fully realized but is always endlessly deferred: they are open to new citation, relocation, reinterpretation, in contexts that we cannot possibly foresee, and they come to us out of infinite histories of use, the extent of which we cannot possibly recover’ (Standish 2008, p11). Standish is largely critical of the use to which the hermeneutical turn has taken in education because he thinks it largely ignores its indeterminism. He thinks a solution to this incoherence is to focus on learning that resists the idea of the possibility of high stakes assessment grading and instead develop the subject matter of education that ‘… implies … the need for an initiation into those traditions of thought and understanding characterized by texts that are resistant to reading’ (Standish 2008 p12). Standish is consistent in thinking that radical indeterminism makes universal decisiveness impossible. Standish thinks the attempt to use the hermeneutical resources to eradicate indeterminism unprincipled. However, the ‘Construct Reference’ approach to assessment of Sadler, Wiliam, Cresswell et al is an attempt to reconceive assessment in terms that give a role to interpretive resources and the discretion interpretation gives for universally decisive. It gives purpose to the indeterminism that requires any interpretative act. An approach to assessment that attempts to resist the old scientistic paradigm of assessment and learning that ‘…inflated demands for accountability and its voyeuristic scrutiny of performance indicators’ (Standish 2005, p5) and that instead requires acknowledgement of the normal human relationship with the world which finds epistemic hostility rife. The hermeneutical turn in assessment is a version of what Standish is writing about when he talks about the world ‘sneaking back’ after the attempt to achieve certainty. ‘I pare down my thoughts but the world sneaks back in. This then is the epistemologists denial. We deny facts, people, aspects of ourselves, the world itself – here, in the anxious grasping for security or grounding, with its complementary suppressions and repressions, is an amplification of skepticism’s existential truth’ (Standish 2005, p4). Standish observes that the hermeneutical turn cannot be universally decisive and complains against the bad faith of practitioners who think the indeterminism invites interpretative practices to dispel it. What he thinks it requires is something more akin to acknowledgement. This is the problem for assessment systems required to be universally decisive. For construct referenced assessment systems that adopt an interpretative role akin to Eisner’s connoisseurship, vagueness, as a source of indeterminism, becomes a resource for discretion to be universally decisive. Vagueness was first considered as a hindrance to the precision required in logic and mathematics. Vagueness generated the sorites puzzle which made classical logic seemingly justify false conclusions from true propositions. To some vagueness showed that natural language concepts required elements of discretion to ensure that meaning was maintained in borderline cases. Vagueness had a function which was to embed discretion in all languages except those artificially created to be precise such as logic and mathematics.

8.4 penumbral cases

So I cast the famous dispute between HLA Hart and Ronald Dworkin in terms relevant to discussions about vagueness. Hart thought that there are penumbral cases where there is radical uncertainty as to whether a concept applies in a certain case (Hart 1961). Cardoza wrote before him of ‘… the borderland, the penumbra, where controversy begins’ (Cordozo 1921, p130). Glanville Williams wrote that ‘ Since the law has to be expressed in words, and words have a penumbra of uncertainty, marginal cases are bound to occur’ (Glanville Williams 1945, p302). Hart wrote that, ‘We may call the problems which arise outside the hard core of standard instances or settled meaning ”problems of the penumbra”…’ (Hart 1958, p607) and in his Concept of Law he writes of ‘… a core of certainty and a penumbra of doubt’ (1961, p123). Standish and radical indeterminists don’t think there is any core of certainty and that everything is a twilight zone. But the views expressed are match prototypical expressions of vagueness. Russell, for example, wrote, ‘All words are attributable without doubt over a certain area, but become questionable within a penumbra, outside of which they are again certainly not attributable’ (Russell 1923, p87). Quine talks about penumbral meanings in ‘Word and Object’ (Quine 1960, p123) which was where he thought the anaytic/synthetic distinction was not true. Hart’s thoghts about law were being written at the same time as Quine’s. He thought that this gap in the assumption that all cases either do or do not fall under a certain concept gives a judge discretion to decide a case outside the resources of the law. He thought that judicial decisiveness ‘…in cases where it is not clear …’ (Hart 1961, p126) required an act of invention. HLA Hart thinks the resources of the law in these penumbral cases have been exhausted. Dworkin disagrees (Dworkin 1986). He draws a distinction between concepts that are criterial, where disagreement is not genuine and where applicability is just a matter of making an arbitrary decision as to when a concept’s extension stops, and those that are interpretive concepts. For Dworkin the decision about when baldness begins is a triviality because it is about a criterial concept. Contestability over a concept that is not trivial is when people dispute its core meaning rather than the limits of its applicability. A genuine dispute over baldness for Dworkin would be if one person judged baldness to be mostly about the ratio of hair to head area and someone else thinking it was mostly about masculine aesthetics. The dispute would be then about the core concept itself rather than whether any case was an instantiation of it. Dworkin’s thinks there are always resources available to discover the source of a dispute. Dworkin thinks that two people with radical differences will argue because they are disagreeing about some interpretation they have of some key element in the subject under discussion. ‘The law never is at a loss to discover the disagreement and bring about a resolution. The law is at no loss because language is never at a loss’ (Dworkin 1986 b, p119, 1986, p41-42). Dworkin cannot conceive of a non-trivial dispute that doesn’t involve ambiguity. Dworkin thinks that ultimately disputes happen because people are talking past each other. Resolution of a dispute requires discovering where this is happening. The criterial concept dispute is one that arises once the pivotal case has been decided and the two people no longer dispute what the concept is. They merely disagree about its extension. Disputes over pivotal case concepts is a dispute caused by ambiguity. Once resolved the only difficulty is deciding where to apply it and there is discretion in this. This could come down to being simply a matter of personality or purpose. Dworkin thinks that this is trivial. Dworkin thinks that how things are understood is identical to their meaning. A disagreement about a pivotal case is therefore a disagreement about what people think they are talking about. Disagreement is, as in supervaluationalism, prior to agreement about which proposition is being discussed. It is disagreement about which proposition is being used. Propositions can be vague and so can’t be identical with ambiguity for this reason. But vagueness occurs after disambiguation. In law judges make assertions about absolute borderline cases. They have to because they have to make decisive judgments. Dworkin thinks decisiveness of core examples is all that is required. He thinks criteria are not definitions of core concepts but just a way of testing for that. There may be different operational definitions but they aren’t constitutive definitions. HLA Hart used Wittgensteinian arguments to say that law made use of vagueness. Hart taught Raz, whose theory of epistemic objectivity was used in chapter three. Endicott was a pupil of Raz and so there is an impressive genealogy to this Oxford jurisprudence defending the purpose of vagueness in law because Hart and Endicott both think that vagueness has a purpose. If vagueness was merely relative vagueness then there could be a purpose. Thinking that borderline cases are merely relative borderline cases allows for stipulation of a truth that is otherwise undiscoverable. Cargile thinks that the future could reveal borderlines that are unknowable, so absolute unknowability can’t be guaranteed (Cargile 1969). The fact that our teacher doesn’t know when to enter Milad doesn’t guarantee that no one in the future would be similarly ignorant. Scheffler agrees with this and believes that the belief in limiting enquiry that defines an absolute borderline case rests on the challenged distinction between analytic and synthetic truths (Scheffler 1972, p72-78). Williamson (1994) thinks that a priori limits on enquiry are problematic too and sees no reason for postulating them. His ‘margin for error’ principle is how he explains relative borderline cases. Sorenson however thinks that there are a priori reasons for absolute borderline cases. Sorenson thinks that all propositions have truth values in the same way as all objects have shapes (Sorenson 2000). Objects have shapes even if they have no particular shape they clearly have; similarly propositions have truth values even when there are no truthmakers. Absence of a truthmaker is merely the assertion that no one is in a position to know what makes the truth about the proposition. Sorenson replaces the truth value gaps with truthmaker gaps. If so then absolute borderline cases have to exist. The teacher making the decision about an absolute borderline case is drawing the line in the face of her absolute ignorance rather than in the face of a truth value gap. Her decision is arbitrary and insincere because there is the truth of the matter. Sorenson thinks that most lawyers think that there are absolute borderline cases. Arbitrariness cannot be eliminated and so is justified. Raz thinks that borderline cases in law are relative. He thinks that the law allows for discretion to use non-legal standards to decide a legal case. He thinks that discretion is designed into a well functioning legal system. Raz thinks that moral systems have relative indeterminacies as well as legal systems. He writes: “The fact is that the discretion allowed in most legal systems is much in excess of that required to deal with inevitable indeterminacy of any legal system. Most legal systems introduce deliberate indeterminacy into many of their rules in order to leave certain issues to the discretion of the courts. This practice should not come as a surprise. We know that in many matters individuals may, and are often encouraged by law, to agree to refer their disputes to arbitrators who are often allowed to apply non-legal standards. Similar considerations would suggest that on certain issues it is best to leave the law indeterminate and compel the parties to litigate before courts, which will be bound by law to apply nonlegal standards. On other occasions the legislator may lack the political power or will to decide an issue and may prefer to leave it for judicial determination” (Raz 1984, p83). Raz thinks these indeterminacies are caused by truth-value gaps (Raz 1984, p81). But Sorenson argues that these deliberately engineered indeterminacies are merely relative borderline cases. The legal system deliberately sets up relative borderline cases so that they can be determined by extra legal resources. They are not absolute borderline cases. They are engineered to help a decision being reached. Relative borderline cases are useful for change but absolute ones are functionless.

8.5 insincerity

Gallie thinks that some absolute borderline cases are ‘essentially contested concepts’ (Gallie 1955-1956, p172). These are concepts that are constituted by endless disputes about what they mean. But Gallie is wrong. An absolute borderline case is not a concept that licenses endless dispute. Once it is known to be an absolute borderline case then what is being denied is that there can be further investigation to resolve the dispute. Attempts to dilute the implications of absolute borderline cases fail for Moorean reasons: it is impossible to assert propositions that are explicitly detached from the truth. Such assertions are self-contradictory and can’t be sincerely asserted. Dylan Wiliam, when arguing for the role of the performative aspect of adjudication, assumes the performative aspect of a judgment is constitutive. He draws on J.L. Austin (1962) to justify this. Austin has different types of performatives however. Some are commitments that don’t require evidence, such as promising and proposing. Others, such as appointing and proclaiming are exercises of power and are labeled ‘excertives’. They too require little evidence. Wiliam draws on these to free grading from evidential constraint. But this is not justified by Austin who categorises grading as an example of verdictives (grouping them with diagnosing and acquitting) and writing that they ‘consist in the delivering of a finding, official or unofficial, upon evidence or reasons as to value or fact, so far as these are distinguishable’ (Austin 1962 p153). A verdictive is a judicial act as distinct from legislative or executive acts, which are both exercitives. But some judicial acts, in the wider sense that they are done by judges instead of for example, juries, really are exercitive. Verdictives have obvious connections with truth and falsity, soundness and unsoundness and fairness and unfairness. That the content of a verdict is true of false is shown, for example, in a dispute over an umpire’s calling “Out,” “Three strikes, ”or “Four balls” (Austin1962, p153). Wiliam argues that a teacher making adjudication is like a referee making a decision. When the referee decided that France had scored a goal in a World Cup qualifier against Ireland in 2009 the goal stood even though it was disputed at the time. France went through to the world cup finals and Ireland didn’t because of the goal. The referee however did make a wrong decision even though he was sincere at the time of making his decision. He didn’t see the foul that led to the decisive goal. The referee can’t call upon the fact that he called the decision in a certain way as the reason for the goal standing. His reason has to be the reason for which he awarded the goal, which was the erroneous judgment that there was no foul. The report of the match records a goal. But part of that report also includes the fact that the referee should not have awarded the goal. Wiliam’s reading of the situation makes the reason the referee had for making the judgment he did irrelevant to the decision. Similarly, Wiliam thinks that the reasons that teachers have for making their decisions are irrelevant to the decisions they make. This misrepresents the truth about the situation. A referee makes sincere mistakes. Teachers make sincere mistakes. Wiliam’s idea prevents this from being possible. A teacher, like a judge, faces absolute borderline cases and has to insincerely make a decision. This is not a mistake, nor is there any possibility of it being sincere once she knows she is dealing with a case of absolute borderlines. The difficulty facing an referee is that they are obliged to be decisive in the heat of the moment whereas a teacher and a judge can have greater time to decide a question. It may be argued that teachers don’t have to face any absolute borderline cases. This is because they can always design assessments that keep students away from borderlines. One way they might do this is to make the assessments precise. This is a solution offered by Sorenson (Sorenson 2000, p404). The idea would be to steer students away from borderline thresholds. But this solution runs into the problem of inauthenticity discussed earlier. If vagueness is a side effect of assessing real knowledge and most real knowledge is vague then vagueness is unavoidable. Dworkin opposes the claims of Hart and Endicott that there are cases that are indeterminate. He argues that there are always discoverable answers to any legal question, denying Hart’s idea of penumbral cases being indeterminate and requiring invention rather than discovery within the law. High stakes assessment systems have confused generality, ambiguity and vagueness. Open-endedness is considered by Hart to be a cause of ‘open texture’ and indeterminacies. But open-endedness is a feature of generality rather than vagueness. Generality is not to do with borderline cases but helps deal with the inexhaustiveness of description. A criteria that says that a candidate should ‘write clearly’ is more handy than one that tries to list all the disjuncts that might make up a criteria for ‘writes well.’ Hart thinks that ‘writes well’ is more versatile than any list and that this is because it is vague. Sorenson thinks that the versatility is not because it has absolute borderline cases but because of its generality.

8.6 family resemblance.

Hart uses Weissman’s idea of the penumbra and Weissman developed his thinking about indeterminacies from Wittgenstein’s idea of ‘family resemblance’ terms. These are terms, such as ‘games’, that cluster a diverse group of things without identifying any one thing that is common to them all. These terms are not a mode of vagueness however but of generality. This is because the fact that there is nothing necessary and sufficient for its application to a family resemblance term that does not automatically generate borderline cases for that term If vagueness is not understood in terms of absolute borderline cases but as underspecific generalization then vagueness can be given a function within an assessment system. Assessors can have the room to stipulate and to use discretion. A good assessment system should attempt to avoid overconformity and ensure that underspecificity doesn’t enable individuals too much power of discretion. However, vagueness as absolute borderline cases related to the sorites puzzle is of a different order. It has no function. It causes arbitrary and insincere judgments for systems demanding unversal decisiveness. Grading systems are contradictory systems. They are, like colour spectrums, impossible objects. For some the idea of using a contradictory system to guide behaviour is absurd. Sorenson thinks that the colour spectrum is useful even though it is an impossible object. It is a plethora of absolute borderline cases. The transition from red to non-red presents itself as a case of borderless transition even though such a thought is self-contradictory. It is still a useful object to guide people into learning about colours and making distinctions between, for example, red and not red. Although there is an absolute barrier to knowing precisely where red ends on the spectrum it still gives us knowledge about red. Red is vague because it has absolute borderline cases. It is also a general term, saving us the trouble of discriminating between scarlet and crimson. Sorenson thinks that generality can be functional. In particular he thinks that the specific discretion that Raz and Endicott think are essential parts of a well functioning system of law are derived from generality rather than vagueness. He agrees with them that interpretation is a key element in discretion. Generality requires that an interpretation is made to decide how to satisfy a description. The discretion granted by generality is discretion granted to the speaker. Disputing a speaker’s use does not overrule the right of the speaker. Generality is a truth about multitudes and as such gives the speaker the discretionary right to use a term across the multitudes as she chooses. Ambiguity also generates the possibility of discretion. If I hear an ambiguous term I have discretion as to which of alternative meanings I chose to assign to the term. The discretion is generated by the ambiguity and it is the listener’s discretion. Even if a speaker insists that they meant another meaning I still have discretion based on the ambiguity of the term used to assign it the meaning I wish to. In an exam it is in the interest of exam questions and criteria to be unambiguous as this discretion could undermine the purpose of asking the question or seeking a certain answer. It could elicit an irrelevant but justifiable answer relative to those purposes. An exam system that asked questions that was insensitive to the discretion generated by ambiguity would be a poor exam system. There are different kinds of ambiguity. Polysemy is when a word has more than one sense. Amphiboly is ambiguity caused by syntax. Ambiguity is a dispute about the correct identification of propositions. Empson wrote about seven types. However, an exam system that contains ambiguity is not necessarily a poor exam system. Ambiguity can be removed but sometimes the cost of removing it is at the expense of over-explicitness. So long as the discretion caused by ambiguity is not too great ambiguity may be tolerated. There are ways of disambiguating every case, however. Arguments from meaning holists focus on ambiguity. Arguments from Wittgenstein, Davidson, Davis and Putnam conclude that ambiguity results in a situation where no two speakers could ever mean the same thing. Any ambiguity would ensure that what was meant would be in the uncontrollable and unsystematic hands of interpreters. In law this is the jist of Dworkin’s argument against Hart. Hart took law to be open to discretion. Dworkin thinks that this is an argument about the semantics of law being indeterminate. Dworkin argues that the law is not indeterminate because ambiguity can always be disambiguated using a rule of construction. If vagueness is considered a matter of ambiguity or generality then vagueness has a discretionary element. Kit Fine and David Lewis think that vagueness is a species of ambiguity, albeit ambiguity on a “ambiguity on a grand and systematic scale” (Fine 1975, p282). But absolute borderline cases are neither a function of ambiguity nor generality. Propositions can be vague. Ambiguity is not about propositions. Ambiguity occurs at the pre-propositional state. Ambiguity arises when one isn’t yet clear just what the proposition is. This is the critical argument with those who think vagueness can be sharpened and truth value gaps closed by such sharpening. Sharpenings, or precisification, occurs before the proposition is agreed, and vagueness remains for whatever is generated. Higher order vagueness is this phenomenon. Ambiguity doesn’t suffer from this phenomenon. Concepts are vague but not ambiguous. As Sorenson argues (Sorenson 1998), ambiguity arises when one isn’t sure which concept is being used. The concepts themselves are more than likely to be vague but are not in themselves ambiguous. Unlike generality and ambiguity, then, they do not generate discretion. Sorenson thinks vagueness is unavoidable ignorance. It is often confused with ambiguity and generality, which are themselves often confused. But generality gives a speaker discretion and ambiguity gives a listener discretion and vagueness gives no discretion at all, but is just an unavoidable side effect of language. It may be useful to compare it with the blackness of shadows (Sorenson 2008). Shadows are black and the blackness is caused by an absence of light. A black painting is also black and so its blackness may be thought to be identical with a black shadow’s black. But the black of a painting is caused by the presence of colour-suppressing light. The cause of the black is not identical even though its effect may strike the eye as seeming to be identical. The similarity is therefore misleading. So too with vagueness. Vagueness often is taken to be like ambiguity and generality in some respects. However, the similarity is misleading. Although vagueness seems to be ambiguity about the meaning of words, thoughts and concepts it is caused by a priori ignorance, just as a shadow ‘s colour may seem to be caused by the presence of a certain kind of light (light that suppresses colour) but is in fact caused by an absence of any kind of light. This apparent similarity may have contributed to attempts of exam boards to suppress vagueness through strategies that were best fitted to removing ambiguity and generality. The close supervision of examination procedures, leading to precise, unambiguous and specific questions allowing for largely predictable, controllable responses which in turn generates the data needed to produce carefully calibrated statistical distribution curves requires disambiguated questions and a use of generality that gives discretion to the questioner as to what test items are to be used in any assessment. Generality gives power to the questioner. Disambiguation removes discretion from the answerer and the marker. Yet vagueness cannot be removed by controlling the scope of generality and ambiguity. And vagueness doesn’t give discretionary powers to anyone involved in the assessment, be they the examiner or the candidate. Vagueness just generates an absolute ignorance. What the inquiry resistance does is to produce a situation where a decisive judgment is made insincerely, as discussed above. They are just guessing, or doing the equivalent of tossing a coin. But an assessment system that admitted that would be accused of being incompetent. As noted above, a judge who decided borderline cases by tossing a coin was punished even though by forcing the judge to be decisive he was being forced to make an arbitrary judgment. Some might argue that all decisions are actually motivated by arbitrary factors, external to the actual admissible reasons for making a decision. Legal realism, for Sorenson, is ‘…an exaggerated recognition of judicial bad faith’ (Sorenson 2001, p416). The legal realist accepts that all judgments are made out of external reasons produced by sociology and economics, the personal politics, personality and prejudices of judges. Legal justifications are merely predictions. Dworkin doesn’t think that reasons are illegitimately imported into the decision making of judges. He believes that there are always legitimate legal reasons to answers to any legal question. He denies the possibility of absolute borderline cases. The problem for him is to say that meanings must be the same if there is to be a disagreement. Thick concepts of value where we might disagree about, say, the beauty of a painting, the power of a piece of fiction or the substantial rightness of a law doesn’t provide us with anything like the biological DNA of a natural kind, such as a purple lion, to decide the right answer. Dworkin thinks experts disagree about thick concepts of value, even in law, whereas scientists don’t disagree, because DNA will decide whether a purple lion is a lion or not, irrespective of whether non-experts think lions can be purple or not. There is an agreed method and agreed criteria for deciding the matter. But experts in thick concepts disagree.

8.7 dwokin’s criterial and interpretive concepts

He thinks this is because concepts are not all the same. Umberellas and books are criterial concepts. He thinks we share the criteria as to what they are and that these criteria exhaust the extent to which we can disagree about them. Where agreement of applicability breaks down, as in baldness, then disagreement no longer is genuine. An arbitrary decision can be made to decide on applicability but because he thinks the disagreement is no longer genuine the arbitrariness is trivial. The purple lion case is a case of Lockean ‘natural kinds’, where agreement and disagreement are allowed through referring to natural kinds. A third kind of concept is an ‘interpretive concept’. These are designate concepts of value. They have life in evaluation and criticism. They serve to designate appropriateness of conduct and action. So when we say that justice forbids progressive income taxes then we are talking about appropriate behaviour when it comes to approving or disapproving behaviours. The function of interpretive, critical concepts is to provide resources for reasoning further. We might disagree about the criteria for approving behaviours. Paradigm cases allow for agreement. These allow for a certain kind of discussion. They change over time, not like axioms such as ‘traffic laws are constitutional,’ ‘convicting an innocent man is wrong,’ ‘Stealing from a blind man is wrong’ and so on. They resource following arguments about what justifies the paradigms. Disagreement is about the justification of agreed paradigms. The arguments about best justifications are again the kind of arguments that might be illusory. For example, we might agree why a paradigm is valid but the agreement disguises disagreement at a deeper level. So for example, I say the paradigm represents the best understanding of democracy, giving the power to the people and so forth and you may have a different understanding of democracy, saying that democracy upholds individual dignity instead. Our conceptions of democracy differs. They may converge in paradigm cases but they lead to different applications and conclusions in contested cases. Unlike what he calls criterial concepts, these are interpretive concepts because we can continue to argue about them. We can find standard paradigm cases of democracy. They licence disagreement. Reasons never run out even if the energy to argue might according to Dworkin. This is how we should understand value argument as tracing roots across vast networks of values. Paradigm cases are contested and ask for best reasons for justification of them. This is what the relevant concepts are like for Dworkin. This is a process that is recognisable in English assessment moderation meetings and critical discourse. It denies the existence of absolute borderline cases. All significant concepts present only relative borderline cases. This approach is not assigning a function to vagueness. Dworkin just doesn’t think important conceptual disputes are vague. But Sorenson’s remodelling of vagueness shows the assumption to badly underestimate the complexity of knowledge and belief. It is now time to examine exactly what Sorenson’s model is.

CHAPTER 9: SORENSON’S ABSOLUTE VAGUENESS

‘I've only read Kafka in German - serious reading - except for a few things in French and English - only The Castle in German. I must say it was difficult to get to the end. The Kafka hero has a coherence of purpose. He's lost but he's not spiritually precarious, he's not falling to bits. My people seem to be falling to bits. Another difference. You notice how Kafka's form is classic, it goes on like a steamroller - almost serene. It seems to be threatened the whole time - but the consternation is in the form. In my work there is consternation behind the form, not in the form’Samuel Beckett, interviewed by Israel Shenker, New York Times, 5 May 1956, Section II, 1, 3

9.1 INTRODUCTION

This chapter sets out the particular view of vagueness that I apply to educational assessment grading for high stakes. The chapter signals six requirements of any competent educational grading assessment system. 1. They have to be rational. 2. They have to be reliable. 3. They have to be valid. 4. They have to be universally decisive. 5. They have to produce superlatives. 6. They must be simple. The first three are about consistency. The fourth and fifth are about completeness. Simplicity is a constraint on how the whole system is understood and whether it is usable. It also guarantees believability. A formal answering system is one that tidies up question and answer systems in a way that allows for the mechanical determination of statements made by the system. Informal systems are natural but according to Sorenson they have two layers of obscurity. One is whether an answer speaks to the question and secondly if it does, which answer does it give? Vagueness is about the first layer. It is not about what the answer is to any question but is about whether an answer speaks to the question. The chapter then explains the five elements of Sorenson’s epistemic position of absolute vagueness. 1. Forced analytic errors are common. 2. Language systematically passes off contradictions as tautologies. 3. There are infinitely many pseudo-tautologies. 4. Competent speakers ought to be permanently fooled by them. 5. Precedent justifies this functional, massive inconsistency (Sorenson 2001, p68). Sorenson’s Epistemicism solves the sorites puzzle that characterises vagueness. The thesis is about the way this theory interacts with assessment theories. It shows that commonly held beliefs about language and assessment are not able to model vagueness properly and are therefore false. It is suggested that once absolute vagueness is understood then its implications lead to a meta-problem of vagueness, that of incredulity. This is a more insidious and intractable problem to solve than vagueness. Vagueness is solved using very little technical logical apparatus and a modicum of common sense. It is simple. The incredulity it brings about is vast. The solution requires that many beliefs commonly held need to be revised. The chapter discusses the ones that most interest educational grading assessments. The chapter concludes by summarising the position. The failure of current assessment systems to model absolute vagueness at best makes them incomplete and at worst wrong.

9.2 DOGMATIC ASSERTION OF SUPERLATIVES – THE EDUCATIONAL CONTEXT

Assessment grading systems are serene in the same way Beckett speaks of Kafka. The consternation is in the form. The argument of the thesis is about a consternation arising from the unassailable reasonableness, even benevolence, of the form. Educational grading assessment systems for high stakes are prototypically committed to various things. Importantly, they have to be universally decisive. Every candidate that is entered for an assessment has to be dogmatically awarded a grade or else labeled ungraded. A grading system that failed to classify everyone entered for it would be deemed incompetent. They also are committed to producing superlatives. These are the denizens of Bellow’s ‘Most Most’ club in the first chapter. Examples of superlatives are candidates who are the best worst candidates or the worst best candidates. These are grades that require decisions at the very boundary of a grade. Grades, understood by these systems as labels for values, are therefore sharply bounded and determinate. The idea of values having sharp boundaries is considered unacceptable to contemporary philosophy. There is not thought to be a sharp boundary identifying the upper limit on goodness. In fact most words are thought to lack this idea of determinate boundaries. We can think about this by considering a word such as ‘noonish’. It seems absurd to claim that there is a last noonish second. If someone claimed that a thousand seconds past twelve marks the boundary between noonish and non-noonish we would accuse the person of misunderstanding the term. The mistake would not be about whether they had identified the correct second but would be about thinking that there could be such a thing as the last second. The claim that someone has identified the superlative ‘last noonish second’ is rightly taken as a failure about how to use language (Sorenson 2001). This difficulty is generalisable to most words used in natural languages. If Fodor’s Language of Thought exists (LOT) it would also be true of thoughts using that language too (Fodor 1975). The absurdity of asserting superlatives and sharp boundaries coupled with the dogmatism required for universal decisiveness which are considered non-negotiable of a competent grading system in education is the concern of the thesis. The absurdity of superlative assertion, such as ‘A thousand seconds after noon is the last noonish second’, is typically explained in terms familiar to philosophy of language. In the theories used to explain the indeterminism of language which the absurdity of superlaitives points to explanations model semantic indeterminism. In these models language is in some ways conceived of in terms of indeterminate. Words and concepts are thought of as being somehow incomplete or open. Theories work from this assumption to try and model ways in which the indeterminism of language can achieve various levels of completion. Interpretation is given a leading role in this. There is a diversity of solutions presented within this approach, and in educational theory philosophical theories from both Continental and Analytical traditions are familiar. It is a programme of work often associated with the names of Wittgenstein, Davidson, Rorty, Sellars, McDowell and Putnam from the Analytic tradition, Derrida, Foucault, Gadamer, Heidegger and Hegel from the Continental tradition with Cavell and Dewey and Peirce bridging the divide. What these thinkers have in common is the dogmatic belief that indeterminism in language is real and that language and thought, especially the concepts that are the constituents of thought, need to be understood in a way that does justice to this fact. (See last chapter). Certain key features are identifiable in these attempts to explain indeterminism as a genuine feature of our thought and language systems. Assumptions of ‘open texture’, context dependence and holism are key elements of theories approaching the problem. The term ‘open texture’, of both thought and language, was made familiar by its use in the jurisprudence classic ‘The Concept Of Law’ of Hart first published in 1961. It was a term used by Weismann who in turn took it from Wittgenstein to capture language’s resistance to superlatives and its resultant indeterminism in borderline cases (Weismann 1945; Wittgenstein 1953). Hart himself developed the idea in a context similar to that of educational grading. The law seemed to require dogmatic assertion of superlatives, therefore taking dogmatism into an area that normal thought and language thought absurd. Hart’s theory of law argued that these extreme cases of indeterminism required allowing discretion so that the required decisiveness of a legal answer system was maintained. Context dependence is used as a way of denying that language and thought can be fixed by essentialist commitments. In this way the indeterminism of language and thought is understood as being a product of ambiguity about which contexts are relevant to determining meaning. In education the construction of meaning and learning depends on modifications of this basic idea. The idea generally develops the approach that thinks that context can dispel, to various degrees of success, ambiguity and that meaning is always determined by context. The imprecision of words and thoughts is both diagnosed by this approach and treated. Radical indeterminists such as Derrida deny that indeterminism can ever be fully treated away. A typical example of the sophistication of this approach is given by McDowell where the correct response to the embeddedness of language in the context of a tradition is to recognize it: ‘The feature of language that really matters is rather this: that a natural language, the sort of language into which human beings are first initiated, serves as a repository of tradition, a store of historically accumulated wisdom about what is a reason for what. The tradition is subject to modification by each generation that inherits it. Indeed, a standing obligation to language in critical reflection is part of the inheritance... But if an individual human being is to realize her potential of taking her place in that succession, which is the same thing as acquiring a mind, the capacity to think and act intentionally, at all, the first thing that needs to happen is for her to be initiated into a tradition as it stands’ (McDowell 1996, p126). In education the idea of the situatedness of belief and learning is powerful and often used to criticise pictures of rationality as situation-independent (Dreyfus 2007). Discussions about how context may be non-conceptual and yet contribute to ordinary coping also derive from this general approach (e.g. Taylor 2002, p113). The indeterminism of language is also a motivation for attempts to find non-conceptual ways of modeling rationality. For example, Dreyfus gives a phenomenological account of the Heideggarian idea of ‘coping’ in order to account for meanings not accounted for in pure conceptual analysis. Dreyfus thinks that indeterminism is like the lower floors of the ediface of knowledge, suggesting that meaning is not conceptual from top to bottom but only in the upper floors (Dreyfus 2005, p2). I think Dreyfus and others like Taylor (Taylor 2002, p113) and Standish (2008) are drawn to non-conceptual ideas of rationality because they think that language and concepts are intrinsically indeterministic. I think that it is belief that language alone is defective when independent of context that motivates discussions about rationality in education. Derry thinks ‘… the Myth of Disembodied Intellect, a myth involving a particular conception of rationality…’ is a picture of disembodied intellect that is flawed. She thinks that, ‘Conceptions of rationality particularly within education relating to the practice of teaching and learning have played a part in the rejection of the sort of curricula content which is perceived to exemplify a rationality of the form that is detached and then brought to bear on individual situations’ (Derry, 2008). Holism is the theory that denies that words and thoughts can make sense in isolation from the whole language system in which they are embedded. Wittgenstein thought that to understand a sentence required knowledge of the whole language system. This approach has influence on critics of various attempts to construct education and learning theories modeled on non-holistic approaches to thought and language. In assessment, Andrew Davis explicitly develops his critique of much current educational assessment from a perspective of meaning holism. Heidegger is also characterised as an important figure in this: Standish describes his position as one that asserts that ‘… there is a mutual appropriation of ‘man’ and world through language. There is a fit between our ways of thinking and the contents of our thoughts, and this is generated through the nature of our needs and our purposive behaviour. It is on the strength of, and out of, this holistic being in the world that more abstract or theoretical forms of thought become possible, including such abstract thought as of the realm of law’ (Standish 2008, p7). It should be noticed that a metaphysics is presupposed in such discussions, one which disputes attempts to disentangle facts from values. This is a metaphysics that Standish links not only to Heidegger but also to McDowell, Putnam, Emerson, Thoreau, Cavell, Wittgenstein and Nietzsche. The approach also is deeply suspicious of positivism and ‘thin’ conceptions of language, such as those put forward by Michael Dummett who sees language as just a means of communication and a vehicles for concepts (Standish 2009, p5) because the theorists are correctly aware that normal language use and thought determines that the dogmatic assertion of superlatives is absurd. Positivism’s attempt to apply meaning exclusively to language and concepts that could be dogmatically assertive of superlatives ended up reducing most language and thought to meaninglessness, which was a cure worse than the disease. Clearly, any philosophical theory needs to be able to explain the absurdity of dogmatic superlatives. Standish’s understanding of Stanley Cavell’s relevance for educational concerns about this matter are illustrative of the sophistication of such approaches. For Standish, ‘Cavell attempts to diagnose that human tendency to seek something beyond the criteria that are ordinarily available to us, as if there were something necessarily deficient or disappointing in the circumstances of our lives. It is easy to see a manifestation of this within educational practice where the ordinary judgments of the teacher are disparaged in favour of what is supposedly the greater rigour of objective testing’ (Standish 2008, p8). Standish thinks Cavell thinks determinate criteria under-represent the open-endedness of rationality. He thinks the open-endedness is linguistic. Standish thinks that criteria in education that are determinate likewise under-represent proper rationality. I think that the idea of dogmatic assertion of superlatives has been characterised as by some philosophers of education as a manifestation of the way principles of efficiency and effectiveness are considered in atomized, artificially standardized concepts as if they are components of a smooth running machine. It thinks there is a radical indeterminacy of language and so determinate language with sharp boundaries misrepresent rationality. I think in this way arguments against atomized, quantificatory assessment in education are linked to views about language. I think systematic educational mechanisms the dogmatic assertion of superlatives ignores the absurdity of such assertions. I think in this way discussion of superlatives links with current educational debates around Lyotard’s performativity. Lyotard writes; ‘The true goal of the system, the reason it programs itself like a computer is the optimization of the global relationship between input and output: performativity’ (Lyotard, 1984, p11). This is a very live theme in contemporary educational philosophy (for a discussion see e.g. Dhillon and Standish, 2000). Standish has characterised it as being part of a fantasy about objectivity being essentially secured by quantificatory measures alone. I think that it is a common position that: ‘The interpretation of everyday language is a Sisyphean task, a task without end and without progress, for the other is always free to make what he wants differ from what he says he wants’ (in Fischer, 1989, p50), and that indeterminism is ‘a curse laid on us... by the impossibility of making any expression match what we want to express. Instead of a transparent medium, language resembles a dense screen that we have to hide behind even when we may think we do not want to’ (Fischer 1989, p50). So I think there is a vast philosophy of education literature that thinks that the indeterminism of our concepts and the apparent absurdity of asserting superlatives is due to having indeterminate language. So I think the philosophy of language and concepts is at the foreground of this literature. It reveals how central to educational values and practice the issue is. Theories of the open texture of language, of meaning holism, and theories requiring anti-decontextualisation of rationality are common in this literature.

9.3 vagueness and a new research programme for the philosophers of language

Vagueness is important because to solve the problem of vagueness Sorenson thinks this view of language is false. As Sorenson puts it, our language is not inconsistent because it is unopinionated. Yet this should not be thought to offer succor to those who think language and concepts can be decontextualised, atomized and used to assert superlatives. That would be to mistake user meaning with system meaning. The meta-problem of vagueness is the resultant position that even though the solution of vagueness agrees that language is fuly consistent and determinate it is nevertheless impossible to know superlatives. Asserton requires knowledge therefore it is still impossible to assert superlatives. The dogmatism required by educational grading is absurd and the resultant resistance to enquiry is absolute in this new setting.

9.4 sorenson’s absolute vagueness

Sorenson thinks that the problem of indeterminism is not caused by language’s indeterminate meanings. He thinks that languages are fully determinate. Examination of vagueness shows why. Vagueness is a key source of indeterminism of borderlines and the absurdity of asserting dogmatic superlatives is due to vagueness. Sorenson thinks that all natural language and its concepts are fully determinate. He thinks that the absurdity of believing them to be so is caused by over-dogmatic beliefs of our language system. These compel belief in false, a priori tautologies. He thinks our inability to believe in determinate sharp boundaries for most natural language words and concepts is due to ignorance rather than inbuilt deficiencies or incompleteness of our language system. He thinks there is a simple solution to the sorites paradox at the heart of the problem of vagueness. The solution pushes him to think language is determinate (Sorenson 2001). Sorenson thinks the real problem of vagueness is its meta-problem rather than vagueness itself. The solution is straightforward and uncomplicated. The response to the proof is the meta-problem of vagueness. The solution to the sorites causes incredulity. Even Sorenson is incredulous. He hopes that familiarity will lead to acceptance. He thinks about cases where this has happened before, as with other paradoxical realities, such as ‘horseless carriage’ (Sorenson 2005). I think incredulity is a predictable response because not only does it contradict refined theoretical positions in the philosophy of language, it also violates common-sense views about our beliefs that are based on seemingly good first -hand data about our own linguistic and conceptual use. And the illusion of vagueness survives its exposure. He thinks that the solution supports a view that regards indeterminacy as a natural phenomenon but think that its source is in our own ignorance. He thinks that words are determinate but are necessarily unknowable to us in certain cases. So although superlatives are real, we can’t know them. No thinker could because the ignorance is a side-effect of correct language use rather than part of the purpose of correct language use. So he thinks that there is the last second of noonish, that there is the hairiest bald man and the baldest hairy man, the fattest thin man and the thinnest fat man and in educational grading contexts, there is the cleverest C grade candidate and the dumbest B grade candidate. Sorenson’s solution to vagueness is elegant and simple requiring just ‘lightweight common sense and textbook logic’. He thinks Bellow’s ‘Most-Most’ club exists. He thinks that all vague terms are sensitive to small differences. The classic sorites puzzle asserts a first step, such as that a thousand grains of sand equals a heap. Then follows the induction step: If n grains of sand equals a heap, then n-1 grains of sand equals a heap. It follows that a single grain of sand equals a heap. This is valid y mathematical induction. It’s valid by applying modus ponens 999 times. Boolos (Boolos 1991) proves it valid that if a collection of 1000 grains is a heap then so is a collection of 2. Sorenson thinks that there is a simple solution to this. He rejects the induction step. This blocks the sorites. However, it results in proof of the existence of sharp thresholds. Knowable thresholds would mean that vagueness didn’t exist. If knowable by aliens and God then again he thinks that vagueness wouldn’t exist because they would tell us where the thresholds were. McGinn’s thinks humans lack the required concepts to understand consciousness (McGinn 1995). Sorenson thinks that Pinker thinks that people are healthily limited (Pinker 1997, p562). Fodor thinks that a representational theory of mind blending modularity and computationality is our most fruitful theory of mind. Sorenson thinks modularity and computational theories of mind help explain the dogmatism of beliefs about language and concepts, including the dogmatic belief in indeterminate boundaries. Sorenson thinks understanding structures of belief and knowledge helps dissipate incredulity at the solution to the sorites puzzle. A key part of the incredulity is puzzlement about what could specify the sharp boundary. If 1000 seconds after noon is actually the last noonish second, and no one can know this, and the false idea that borderlines are imprecise, then what makes it so? Sorenson uses perceptual beliefs about colour to illustrate dogmatic beliefs. He thinks that bands of colour don’t correspond to objective discontinuities in light wavelengths. Yet we persist in seeing them even after we know this (McGinn 1983, Peacocke, 1994, Johnston 1992). Though physics tells us that there is no boundary between green and yellow I think there must be a boundary. Colour scientists think this is old news. They think that we have developed minds that enhance and suppress aspects of what can be seen in order to maximise useful perceptual beliefs. Sorenson thinks that this double mechanism means that we not only miss the boundary between yellow and green but we also see the absence of a boundary.

9.5 the dogmatic falsehoods of language

Sorenson thinks language is also subject to dogmatical falsehoods. Just as perceptual beliefs are subjected to contradictory instructions, so too with our language based beliefs. Sorenson thinks that the logical proof solving the problem of vagueness is overridden by dogmatic beliefs generated by linguistic homunculi. Biology trumps logic just as it trumps the physics of colour. Sorenson thinks that the illusion of boundarylessness is because of dogmatic omission and commission produced by the modularity of the mind. I think that grading works in the same way as colour spectra and linguistic categorization. I think that educational grades are labels for value and value have to be thought of as having no sharp boundaries because they are vague. I think that this is an illusion and that grades do have sharp but unknowable boundaries. I think therefore that any grading system that is dogmatic about locating a sharp borderline is necessarily unbelievable. I think that no theory that overwrites the illusion can hide the illusion. Sorenson thinks that in this way vagueness is like probability. A person may try and abandon belief and substitute probabilistic reasoning in its place. Assessment that calculates grades in terms of probabilities attempts to overwrite believability. But probability theory can’t overwrite belief. Sorenson thinks that illusions can be functional. Gestalt principles of continuity ensure that we see rapidly sequenced still pictures as we do in cinema (Raffman 1994). This is a dogmatic effect of these visual principles. Once in place and activated, these principles have automatic effects. He thinks that the illusion of boundarylessness has developed in order to prevent time-wasting looking for boundaries. He thinks that the cost of such searching would be too expensive. He thinks that as a basic principle only making distinctions that are perceived is the norm that the structure of the mind creates. I think that because an educational grading system, especially for high stakes, tends to demand higher standards than the norm allows us to apply, we are faced with unbelievable overrides of the necessary illusions. Sorenson thinks that because most words appear under the illusion people think that it is part of its meaning. This is the core thought that much current philosophy adopts. Vagueness doesn’t prevent all knowledge and so is not a reason for universal skepticism. I know that some people are cleverer than others but I don’t accept superlatives ‘cleverest stupid person’ and ‘stupidest clever person’. Sorenson thinks that many things we take for granted at face value are never tested and so their truth is assumed. This is a useful short-cut in managing belief systems, especially when reality is largely complex and hostile to knowledge. Sorenson thinks that economics make false anti-superlatives productive. Education also does so. Generalizations such as ‘there’s always someone cleverer than you’ and ‘an interpretation never ends’ are anti-superlatives and are therefore false even if consequences may be productive nevertheless. If the source of the anti-superlative thought is vagueness then the denial is based on a dogmatic illusion. Milton Freedman, the economists, thought that false superlatives were not true but their consequences were (Friedman 1953). He said he accepted true consequences of economic axioms but not the truth of the axioms. As a short cut rule of thumb anti-superlatives can be useful and are therefore pragmatic. But they are false. Anti-superlatives are useful if their use is restricted. They can help us know other things. This theory of vagueness denies any objective indeterminacy. It is unsurmountable ignorance. Sorenson call it ‘absolute ignorance’. Contemporary philosophers of language and concepts think that indeterminacy is inherent I the system itself. So when faced with indeterminacy they are content to say that although they cannot answer the question there may be nothing that we don’t know (Parfitt 1984, p213). But if every statement is either true or false then we need to change our opinion about how language works.

9.6 the motivation for retaining classical logo

Sorenson thinks that because classical logic is at the heart of science and maths we should change our beliefs about language rather than classical logic. He also thinks that no alternative to classical logic has worked. So he turns to our beliefs about language. His views are largely consistent with Fodor’s theories of a Representational Theory of Mind (RTM) that models mental operations as computerized representations in a modular structure (1985, 2009). He takes seriously the innateness of a universal grammar as proposed by Chomsky (1957). In so doing Sorenson’s vagueness proposes consequences that challenge many common views about how people think and learn. He thinks, like Fodor, that everyone is a linguistic genius by the age of five. He thinks we learn language fast because of evolutionary pressures. He thinks that to learn fast we have had to cut down the possible search space and the content of this space. We don’t learn about how we learn our language. We can speculate that this is because such a fine-grained, nuanced reflexivity would be too expensive for our hunter-gatherer ancestors. So he thinks we are very good at learning language quickly but pretty poor at acquiring meta-knowledge about how we do it. He says that this meta-knowledge is ‘spotty’ (Sorenson 2001, p9).

9.7 opinionated spotty knowledge

He thinks that we are over-opinionated about how we learn language. For educationalists, awareness of this tendency to dogmatise our ‘spotty’ knowledge suggests we should opt for a policy of caution and humility when trying to understand these things. In philosophy ‘spotty knowledge’ is sometimes labeled ‘intuition’ and Timothy Williamson thinks there are no good reasons for adhering to them in the face of contradictory proof (Williamson 2000). Sorenson’s belief may be generalized. That we are not innately-equipped to learn about learning suggests most thoughts that seem spontaneously obvious about how we learn are probably false. Fodor thinks that there are no true theories about how language is best learnt. He suggests that we should be cautious in proposals about how we might teach language. He opts for a pragmatic approach. Do whatever seems to bring about good results. Stop doing things that don’t. Don’t be too confident that generalities based on these heuristic practices are true (Fodor 1975). An example given by Sorenson is how the spontaneous learning of sign language by deaf people was misrepresented as deaf people learning a mere pictorial sign language (Kyle and Woll 1985). In the 1880s a Milan conference banned sign language and forced deaf people to try and learn the language of people with hearing. It led to years of pointless pain for these people. They were forced to learn something that, being deaf, was largely inaccessible to them and they were stopped from learning a language that had as much ‘semantic closure’ as any other natural language. A language said to have ‘semantic closure’ means that it is a language that can express anything. It wasn’t until the work of Slokoe in the 1960’s that sign language was recognized as a language. Meta-linguistic immodesty is analogous to immodest views about the characteristics of vagueness. The spotty knowledge we have about meaning and concepts is often used to trump logical proof. If language is not indeterminate then learning theories that assume that it is are accused of immodesty. The consequences of immodesty in the case of deaf language is a warning for educationalists.

9.8 science not scientism

A distinction can be drawn between science proper and pseudo science, labelled scientism, using this framework. I think science (and maths) is based on classical logic. I think that alternatives to classical logic are not as productive. Some people want to argue that physics has changed what we know about reality so that classical logic is disconfirmed. This is to misunderstand logic. All logic can do is give rules for deciding consequence relations. It can’t be conformed or disconfirmed by reality. Therefore the fact that physics uses theories of quantum mechanics and relativity makes no difference. Choices between different kinds of logic can only occur if we ask the different logics to model the same consequence relations. But the logics that have been used as if they are rivals to the classical scheme are all modeling different consequence relations. Therefore they are not rivals. This may strike educationalists as being dry and irrelevant. But bivalence is a principle at the heart of Sorenson’s solution to the sorites. He solves vagueness and the sorites puzzle by stipulating bivalence. And the stipulation is justified by the success of maths and science. Genuine science is run on consequence relations that are of classical logic. The weakest part of the scientific method is the empirical premise. Conflict with classical logic is, according to Sorenson, usually fatal. This justifies applying classical logic. By classical logic I am talking about principles such as bivalence, the law of the excluded middle, the law of non-contradiction and so on. These are simple consequence relations. They are not esoteric. Sorenson’s astonishing conclusion is a consequence of holding on to much that seems uncontroversial and normal. But the reason for maintaining them is because they have proved ultimately the model that produces maths and science. Scientism is a label for activity that attempts to reproduce scientific methodology in domains not appropriate because initial empirical conditions can’t be stabilized without inauthentication. Empirical premises in a scientific proof are improved if superfluous premises are deleted. The removal of ‘noise’ from ‘signal’ removes signal in scientism. In order to achieve consistency empirical premises are selected to ensure that the rational proof of the scientific process is achieved. Scientism is where the legitimate business of removing superfluous premises is replaced by the illegitimate removal of non-superfluous premises. It is also where illegitimate superfluous premises are added.

9.9 sorensonian futility

The claim that concepts have sharp borderlines that cannot be known is absolute in Sorenson’s explanation of vagueness. Sorenson seeks precedence for his futilitarianism. He cites John Locke who writes: ‘I suppose it may be of use to prevail with the busy mind of men to be more cautious with things exceeding its comprehension: to stop when it is at the utmost extent of its tether: and to sit down in quiet ignorance of those things which, upon examination, are found beyond the reasch of our capacities’ (Locke 1690, I, p28). Colin McGinn takes a futilitarian approach with the philosophy of conciousness (McGinn 1993). McGinn thinks that the standard model of knowledge puts consciousness beyond human knowledge. He thinks that humans have a biological reason for not being able to understand it.

9.10 absolute futilitarianism

McGinn’s futilitarianism is relative to human biology. He can imagine aliens with different conceptual capacities able to grasp consciousness. Sorenson’s futilitarianism is not relative to any answer system. The enquiry resistance of borderline cases is absolute. Neither aliens nor an all knowing God could know where the sharp borderlines are. Knowing which was last noonish second, or the lowest mark needed to gain any grade is absolutely unknowable. This is the claim that astonishes people. In educational grading teachers and graders point to the mark that is the superlative mark for a grade. This is taken to be knock-down discomfirmation of Sorenson’s futilitarianism. Teachers and graders may think that even in cases where they can’t do this there is the possibility that they might be able to. They might say that ‘anything is possible’ and draw the conclusion that if something is possible, then it can’t be ruled out as happening at a future date. Sorenson’s thinks that if anything really is possible then it is possible that borderline cases are absolutely enquiry resistant. His point is that although it is true that there is a last noonish minute, it is inaccessible. He thinks that there are real things that are necessarily inaccessible to us. He thinks that vagueness is a phenomenon that reveals things that are always hidden from us. He compares it to the assertion that there is a successor number of the largest known integer. It is a consistent statement but it nevertheless requires that the number is forever inaccessible. Assertions using indexicals can take a similar form, such as, ‘tomorrow never comes’.

9.11 the wrong type of borderline

The incredulity response to Sorenson’s solution to vagueness is labeled the meta-theory of vagueness. Filling the explanatory gap created by the logical proof that there are unknowable sharp borderlines is not mandatory. But in the philosophical literature about vagueness dealing with the incredulity has been a major preoccupation. Timothy Williamson thinks that the proof needs no additional explanation. As Sorenson puts it, when people stare at him in disbelief, he just stares back.’ Knowing the power of classical logic to produce scientific and mathematical knowledge overrides intuitions that contradict logical conclusions. Williamson thinks the ignoranceis explained by a ‘margin for error’ principle (Williamson 1994, Sect 6, p4). The principle is that there has to be certainty for any proposition to be known. He thinks our powers of discrimination are limited and therefore when close to a sharp boundary we are unable to achieve the fine-grained discriminations needed to know any sharp borderline. The threat of measurement error is therefore a source of vagueness for Williamson. However, this is an explanation that relativises the indetermination to human powers of discrimination. These will vary from person to person. A borderline case is therefore a matter of not being certain which particular concept is being applied. Vagueness is relative to the speech community (Wiiamson 1994, p217). This is a relative borderline case and is therefore the wrong kind of borderline. Sorenson’s vagueness is a necessary semantic feature.

9.12 two kinds of proof

Sorenson thinks that there are two kinds of proof. There is a proof of existence. There is an explanatory proof. Only the first is required. The second is merely a desiderata. The meta-problem of vagueness, which is to explain the incredulity, is not about logic but is rather about psychological and epistemological matters. Logic is straightforwardly a feature of the existence proof. Sorenson takes issue with Williamson’s explanatory proof because he thinks restricting the unknowability of vagueness to humans is provincial. Sorenson makes a universalist claim. He thinks that reality is impervious to explanatory cravings. Sorenson thinks the meta-paradox is constructed out of compelling intuitions that are then allowed to override logical proof. The inconceivability of a last noonish second is taken as proof that there is no last noonish second. Williamson and Sorenson think that we should not trust intuitions that are ruled out by logic. Sorenson likens the meta-paradox of vagueness to the proof that minds and brains are only contingently identical or that a number of odd numbers equals the number of all numbers. Sorenson thinks there are two kinds of epistemic puzzles about ignorance, the Chomsky problem and the Orwell problem. Chomsky wonders why we know so much from so little evidence. He answers this question by saying that there must be innate knowledge. Orwell’s problem is how do we remain ignorant in the face of overwhelming evidence. Sorenson thinks that this is the problem of vagueness. It is the Orwell problem applied to arguments. The depth of the problem of vagueness is illustrated by Sorenson by considerations about evolution. He thinks there is a lot of evidence for evolution but despite this it is not believed. However, most biologists believe it. In the area of vagueness, not only do non-professional philosophers not believe it, neither do most philosophers. Sorenson thinks that they have let intuitions cloud their judgment. I think in educational assessment that the implications for other accepted theories is a barrier for acceptance too. For example, positivist and scientistic approaches to assessment grading are prototypically criticised by denying assumed theories of abstract rationality underpinning these approaches. So, for example, Jan Derry writes that : ‘Abstract rationality has increasingly been a target of attack in contemporary educational research and practice and in its place practical reason and situated thinking have become a focus of interest.’ (Derry 2008, p i) Andrew Davis, a leading critic of current grading assessment is clear in linking the mistakes of current practice to flawed philosophical theories about the nature of language, conceptualization and rationality. He writes that ‘…the conception of knowledge and of psychological states seemingly required by the demands of high stakes testing is ﬂawed. So my scepticism relates to the possibility of viewing mental content through certain conceptual lenses’ (Davis 2006). This lens is presumed to be a form of meaning atomism which he opposes with meaning holism. He cites the philosopher Donald Davidson with approval: ‘... it is impossible to take an atomistic approach, because it is impossible to make sense of the idea of having only one or two beliefs. Beliefs do not come one at a time: what identiﬁes a belief and makes it the belief that it is is the relationship (among other things) to other beliefs ... because of the fact that beliefs are individuated and identiﬁed by their relations to other beliefs, one must have a large number of beliefs if one is to have any. Beliefs support one another, and give each other content...’ (Davidson, 2001, p124). Meaning holism is a theory that has traction for critics of assessment theory and practice that embeds atomistic views about language. What Wiliam has labeled the ‘hermeneutic turn’ in educational assessment agrees with Davis’s thought that a good way of criticizing the assessment practices using the scientistic paradigm is by adopting a new paradigm of language and rationality. Davis says this explicitly when he writes, ‘I am contending that if high stakes testing is to be defended, we need ﬁne-grained judgments about the contents of other minds of a kind which would only be possible if learning could be ‘parcelled into discrete testable bits or units’ (Davis 2006). I think this link between conceptions of language and rationality is why educationalists opposed to the kind of assessment systems Davis opposes have strong opinions against accepting Sorenson’s solution. It leaves their alternative paradigm vulnerable to counter attack.

9.13 what vagueness tells us to believe about language

Sorenson wrote a book about philosophical blindspots in 1988. Blindspots are consistent propositions that can’t be rationally accepted. GE Moore’s paradox is a prototypical example of this phenomena. Moore was puzzled by our inability to assert propositions such as ‘I went to the cinema last Monday but I don’t believe I did.’ The sentence is consistent and meaningful but unassertable. It was the puzzle that Wittgenstein was intrigued by. Wittgenstein’s approach to language and rationality is important to the ‘hermeneutical’ philosophical paradigm. Moore’s puzzle is a puzzle because it is a case of unacceptable consistency. The solution to vagueness is a deeper puzzle because it requires that threshold statements are inconsistent. We are compelled to round off insignificant differences and so competent language use requires that we have to accept that John is a C grade if he is insignificantly different from Bob who is a C grade. We are compelled to believe that if 1000 seconds after noon is noonish then so is 1001 seconds after noon noonish. The compulsion is because the proposition expressing the thought is construed by anyone competent in English as a tautology. Therefore to express the proposition that 1000 seconds after noon is noonish but 1001 seconds after noon isn’t is to express a contradiction. The superlative is construed as a contradiction by competent speakers of English but English is not inconsistent. It is free of contradictions. This is the puzzle that uncouples belief in theories espousing language as inconsistent. The theories held by Davidson and others explain language as a system that licenses contradictions. The precision of the contrary paradigm agrees but thinks it can eradicate contradictions by developing precision. Vagueness shows that both these positions are fatally flawed. Language is free from the contradictions that one approach seeks to precisify and the other wants to exploit. I think a tautology is something that owes its truth to the meaning of itself and not from anything external to that, such as the world. Sorenson thinks that even if tautology is narrowed to apply only to statements owing their truth to the meaning of logical words competence in English (and any natural language) requires us to think many tautologies as contradictions. Sorenson thinks that because of this there are valid arguments that we should consider invalid and invalid ones we should judge to be valid. And further, he thinks the errors are irredeemable because we are unable to precisely identify them. Sorenson thinks that a Fodor-like or Pinker-like modularity of mind theory help explain this. But he doesn’t think that the truth of what he says relies on this. He thinks he is offering an explanatory proof only when he refers to modularity of mind. The proof is logical. Sorenson thinks that belief that words mean what people or communities of interpretation or society stipulate them to mean is incompatible with this view. This view lies behind significant theories. Poincare thought that axioms of geometry were conventionally chosen. It is the standard position in linguistics since Ferdinand de Saussure. Logical Positivists such as AJ Ayer and Carl Hempel developed it as a way of defeating rationalism. Pierre Duhem thought that scientific laws should be valued for their predictive power and correspondence with observations because he didn’t think humans could ever know true metaphysical reality. In Law Ronald Dworkin in ‘Law’s Empire’ has three rival ideas about law and one of them is conventionalism. This view maintains that state coercion is justified if legal institutions contain clear social conventions that the rules of law depend on. The scope of the applicability of law has to demarcated clearly by these conventions. Dworkin famously criticises the position himself because he thinks that there are many times where clear applicable legal rules are absent.

9.14 conventionalism

Conventionalism has been taken to have declined since extrapolation of Poincare. However, the radical extrapolation of conventionalist ideas by Wittgenstein, Quine and Carnap are taken as foundations to many contemporary themes in current philosophy, including those approaches to education that I have labelled ‘hermeneutical.’ (Ben-Menahem 2006) But Rescoria writes: ‘Conventionalism surfaces in virtually every area of philosophy, with respect to such topics as property (Hume's Treatise of Human Nature), justice (Hume's Treatise again), morality (Gilbert Harman (1996), Graham Oddie (1999)), geometry (Henri Poincaré (1902), Hans Reichenbach (1922), Adolf Grünbaum (1962)), Lawrence Sklar (1977)), pictorial representation (Nelson Goodman (1976)), personal identity (Derek Parfit (1984)), ontology (Rudolf Carnap (1937), Nelson Goodman (1978), Hilary Putnam (1987)), arithmetic and mathematical analysis (Rudolf Carnap (1937)), necessity (A. J. Ayer (1936), Alan Sidelle (1989)), and almost any other topic one can imagine’ (Rescoria 2010). I think that if conventionalism were true than vagueness would be hard to explain. I think conventionalism makes all meaning a matter of stipulating meaning, so we stipulate meaning to whatever we take them to mean in a conventional way. If this is true then taking a statement to be tautologous would make it so. But the solution to vagueness requires that this isn’t the case. Therefore we can’t stipulate words to mean what we conventionally take them to be. I think this makes Wittgensteinian ideas about how meanings derive from embeddedness in ‘language games’ suspect. To correctly use language is to deny sharp boundaries to vague terms. If Wittgenstein thinks that correct use means something like ‘follows the correct move in a language game’ then Wittgenstein is wrong. Correct use of vague terms means that a person has to not believe in sharp boundaries for vague terms. We are also committed to believing vague words having no sharp boundaries in their definitions. But these are false beliefs about the meaning of vague terms. Wittgensteinian conventions of use fail to generate true meanings.

9.15 vague assessment

The vagueness literature uses geometric metaphors for illustration. It connects to the picture of grading and assessment as sorting. Sorting things into categories is pictured as pigeon-holing. The borderline between two pigeon holes is the prototypical picture of vagueness. A borderline cases is a case where there is difficulty about which of the boxes something belongs. Grading is prototypically thought of like this. Where there are two grades, grades can be pictured as boxes. A borderline case is a candidate where it is not clear which of the boxes the candidate belongs to. The borderline could be pictured as a new box lying between the two other boxes. But it is better to picture the borderline cases as being equally in both of the two existing boxes. Easy borderline cases might be pictured as belonging to one of the boxes but only just. The troubling borderline cases are those that menace logic. We can ask of borderline cases: who decides which box they should be put in? Who draws the line? Some assessments cause borderline cases because they ask questions that are resistant to answers. In law Hart famously thought about the question whether ambulances were allowed in the park if all vehicles were banned (Hart 1961). He thought that this was an example of the open texture of language. He thought that there was no decisive answer and that this licensed discretion. Crispin Wright thinks that vagueness can be modeled in a way that licenses discretion (Wright 1995). If there is permission to dissolve a question through discretionary means, maybe by appointing an institution or office with such a power, then the vagueness is relative to that answering mechanism. I think in law the dispute between Hart and Dworkin is about whether vague law gives discretion to judges to decide borderline cases. (See Ch 8) In education there is no formal recognition of powers held by any institution or persons to license discretion. Perhaps there are informal ones. Relative borderline cases are cases where there is a need to complete the meaning of the criteria for deciding a borderline case. Teachers are given local permission to disambiguate. In such cases new answer systems are introduced to mop up the incompletion of the initial one. So where a candidate is showing borderline C grade aptitude in her response according to initial criteria, further criteria are introduced to mop up the ambiguity one way or the other. Decisiveness is reached although it is a decisiveness that is not extremely dogmatic. In these cases the candidate is considered a case of C gradedness, but not a clear case. Nevertheless, the local permission to disambiguate by introducing supplementary criteria grants permission for enough dogmatism. These disambiguations only work in cases of relative vageueness. They work in borderline cases of borderline cases. But there is a fultility attached to absolute borderline cases. In these cases the question asked is so epistemically hostile that its recognition as such should signal the end of further attempts to answer. The epistemicist doesn’t think this is because something is incomplete in the language being used that needs rectification. It is because the truth of the aswer is always unavailable. In educational assessment absolute vagueness should be minimised. Because assessment systems require universal decisiveness or dogmatism, any such system that embeds absolute vagueness will be in danger of losing credibility. It is the propensity to respond in cases of absolute vagueness that undermines any pretence of reliability that a system of educational assessment requires. The ignorance of the system to answer cases of absolute vagueness is not a confession of the incompetence of the system or its graders. The ignorance is not a personal quality of anyone. It is a purely impersonal feature of the epistemic situation. The silence required by such a situation has several features that show this. It is inevitable and so obligatory, unlike situations where silence might be a chosen preference. Sorenson gives as an example of non-inevitable silence the case of blind refereeing. Referee’s are told to withhold their knowledge of a writer if they recognize her from reading the article they are being asked to judge. In the NEAB 100% English coursework assessment, folders were sent blindly to assessors. Information about their agreed rank ordering and the grades agreed by the school assessment was withheld from the graders. Some ignorance is chosen, as when anthropologists do field work and learn well unreliable systems of another culture, such as witchcraft. Teachers sometimes use answer systems that they don’t claim to be reliable in order to help learning. Randomised marking used for purely pragmatic ends, such as motivation, can be impressively decisive because they aren’t even trying to give correct answers. Where they can go wrong is if there is some sort of mechanical breakdown, as when a teacher’s hastily scrawled C looks like a D or where a mark is incompletely decipherable. In absolute borderline cases there can’t be a stipulated answer. This can be a good test of whether a case is an absolute borderline case or not. If a decision made by fiat is thought to be accurate then that proves the case was only a relative borderline. An answer to an absolute borderline case is incoherent. So if Jane and Jim are indiscriminable then to award them different grades is incoherent, even if Jane is the worst case of a grade and Jim, being indiscriminably worse then Jane, is actually a different grade.

9.16 decisiveness, dogmatism, bureaucracy and obscurity, and validity

Systems of assessment that require universal dogmatism or decisiveness are tidy systems. The level of tidiness is artificial because universal decisiveness is impossible in natural languages. Educational assessment systems requiring reliability must be able to regiment its system of questions and answers so that each question is answered by means of a clear clerical task. Sorenson labels this a ‘formal system’, which tidies up question and answer systems in a way that allows for the mechanical determination of statements made by the system. These are non-natural. Informal systems are natural but according to Sorenson they have two layers of obscurity. Educational assessment discusses this issue in terms of validity. Reliability is about consistency where we ask: does the system always give the same answer, regardless of who answers or where? Validity in educational assessment is about whether an answer speaks to the question and if it does, which answer does it give? Vagueness is not about what the answer is to any question but is about whether an answer speaks to the question. This has enormous implications for assessment systems that propose answer systems that answer in cases of absolute borderlines. Formal assessment systems may appear to have formal proof that their answers speak to the issue being assessed. The norm referencing and the psychometric paradigm of the scientistic approach to assessment that assessment experts like Wiliam seek to replace are formal (Wiliam 1994). Such systems are silent about disputed grades because their system decides by fiat and therefore precludes disputes at grade boundaries. The hermeneutical systems of assessment that replaced the earlier paradigm tend to offer more pluralistic systems where unofficial answer systems can be added. In order for both approaches to be efficient there is a need for bureaucratic virtues to prevail. In both cases the systems adhere to some approach that means that answer systems are voluntary. Acts of interpretation, for example, as used by skilled teachers to discuss the grading of complex, multi-dimensional essays, are usually taken to be outcomes of deliberation and voluntary thinking. I think that the intrusion of involuntary answer systems is under-recognised in both these situations. So for the rigid norm referenced grading answer systems, there is no guarantee that even if reliable and valid the grades are believable. This is because belief is not wholly voluntary. Involuntary answer systems may be at work that conflict with the findings of any formal one. And this leads to a legitimate disagreement. And in this disagreement someone is making a mistake. This conflicts with those who think that absolute borderlines license disagreement. So two competent teachers who disagree over a borderline case are licensed to disagree and endlessly dispute (Wright 1995, p138). This extends the thought of someone like Gallie who thinks that some concepts are properly defined in terms of the endlessness of disputes about what they mean (Gallie 1955-6, p172). Wittgenstein thought there was general agreement about how the world is and how we behave. Gallie and Wright invert this. Sorenson disagrees with this inversion because he thinks beliefs about borderline cases should always be retracted. He thinks that to have a belief about a borderline case is like saying ‘ Pudding is not a solid but I believe it is.’ The epistemic approach to vagueness is not alone in thinking that belief about borderlines is impossible. Supervaluationists think that a borderline case is neither true nor false. They can’t have a belief about a borderline case because of this. Educational assessment systems require consistency, completeness and simplicity to be adequate answer systems (Sorenson 2001, p29). He thinks we tend to overestimate the ability of answer systems to achieve these. We are used to is being embarrassed by false determinism because it is more common. Arguments against scientistic assessment such as norm referenced answer systems tends to accuse such a system as being producing false determinacy. But false indeterminacy is an embarrassment too.

9.17 is vagueness all in the mind

Some philosophers think that the sorites is just a psychological phenomenon. If so then the normativity of the sorites is due to purely psychological and not logical mechanisms. Goldman thinks this when he writes: ‘… they are predictable from a psychological perspective’ (Goldman 1989, p150). But it can’t be purely psychological. Goldman thinks that there are two psychological mechanisms at work simultaneously when the mind considers a sorites puzzle. One mechanism considers binary categories. Each side by side category in a sorites is indiscriminably different and therefore is taken to be identical. Another mechanism instantiates a good continuity principle over the whole sequence. This leads to a contradictory belief in borderless transition. Yet this psychological patterning of beliefs is not restricted to just vague predicates. So it is possible for precise ones to generate the same belief in borderless transition. Sorenson once thought that electron was vague. He thought that electrons would each vary slightly in the way in which they fitted the term. Then he learned that all electrons were identical. He was wrong to think that they were vague. He thinks that subjective vagueness isn’t vague either. Scientists in the past believed in entities that subsequent science has proved don’t exist. The old scientists’ belief that these entities were vague is therefore a purely subjective thing. Vagueness is about things that have possible absolute borderline cases. Empty metaphysical things only have impossible ones. The principle governing this is simple: you can’t think vagueness into existence. There are also cases where hidden vagueness doesn’t trigger the same psychological mechanisms. And vice versa. Russell’s famous barber paradox about the bald barber who shaves all and only those people who don’t shave themselves is often thought to be vague but is actually precise. It isn’t enough to reduce vagueness to behaviour either (Sorenson 2001, p30). Diane Raffman thinks that the psychology of the sorites is produced by Gestalt psychology. Wittgenstein was interested in this psychological mechanism whereby the mind switches from seeing something as one thing and then as another. ‘Seeing as’ is used to explain how I a sorites sequence the mind sees something as one thing until a point is reached and then it switches to seeing it as something else. The epistemic solution to the sorites shows that Raffman is mistaking speaker meaning with statement meaning. Psychology makes someone believe things which contradict what is actually meant. Vagueness is also attributable to thoughts that are too complex to be thought. It is also a feature of logical falsehoods that are too complex to be thought of without the brute force of supercomputers. A person just hasn’t got the physical capacity to behave in the typical way towards sorites of this kind. Psychological and sociological accounts of vagueness fail. This is an important claim when considering assessment grades. Real vagueness is about correct categorization. Absolute vagueness comes about when there is a commitment to this. If a candidate is placed in a grading category there has to be a commitment to the correctness of that placement that makes the categorization absolutely obligatory. The genuine devil at the heart of a case of absolute vagueness is an absolute obligatory overcommitment. Sorenson thinks that anyone caught in a genuine sorites is, ‘… like a handyman who has promised to install a ten by ten carpet in a nine by nine room’ (Sorenson 2001, p34). Relative borderline cases are genuine borderline cases, only the over-commitment is relative to an answer system that can theoretically resolve the problem rather than being absolute.

9.18 vagueness isn’t meaninglessness

Some argue that vagueness is meaningless. Incomplete criteria or incomplete meanings are meaningless because they are beyond our knowledge. In this way meaninglessness does seem to track accurately enough an absolute borderline case. But it misrepresents it too, because the distinction between ‘what is the last ‘C’ish mark?’ and ‘How long is the tail of a zong?’ is lost. The extremism of this position is handled by other approaches to valueness that refuse the solution of Epistemicism. Supervaluation approaches vagueness in terms of thinking about language as maturing through use. Meaning grows into vague language so that the potential for meaningless is mitigated by language users working it into sense. Supervaluationists such as David Lewis and Kit Fine agree that classical logic is too important to change (Lewis 1999; Fine 1975). They claim to accommodate classical logic but critics say they are disingenuous. They argue that to say ‘ Pudding is a solid’ is meaningless but ‘Pudding is either a solid or not’ is true. This helps model absolute vagueness correctly to the extent that it models that a person partially understands a vague statement. It resists the nihilism of denying meaning. But it is disingenuous about retaining classical logic. By having truth value gaps it presents a picture where instead of just two boxes for everything, one for true and one for false, it places a third where there is a gap in truth values, ie neither true nor false. This requires a mishandling of precisification ordering. Truth values in classical logic require to be applied after we are clear what proposition is being asserted. Supervaluationists propose to do it afterwards. Therefore it mishandles the complexity of truth. Disagreement between grading experts may seem to have this form however. The obscurity of whether any system is speaking to the question often leads to clarifications and supplements to initial statements in order to discover whether it does or not. But vagueness is about the obscurity of whether the system speaks to the issue at all, and this happens after it has been made clear what is being said.

9.19 analytic/synthetic

Current philosophy of language is suspicious of the distinction between statements that are made true by what they mean and statements that are made true by the world. Quine didn’t think there was need for the distinction (Quine 1951). Sellars didn’t think there was the right kind of world to do the job the distinction required. Schiffer thinks that logic can change to accommodate physics finding out that the world is a quantum universe (Shiffer 1979). Quine wanted to replace truth with a pragmatic notion, such as ‘convenient for science’. Schiffler thinks that any case of absolute vagueness can be defeated because it can be ‘…dissolved into a contextual counterpart signifying a particular indecision theoretically soluble by further enquiry’ (Schiffer 1978, p78). Sorenson thinks that science has limits. The Universe is too vast for anyone to know much about it. The modularity of mind has pessimistic implications for anyone proposing that potentially we can know anything. Fodor (1983) and McGinn (1993) follow Chomsky in thinking that the mind has an analytic bedrock given to us by biology. If Chomsky, Fodor and McGinn are right then even if the analytic/synthetic distinction fails there are still a priori analytic limits to what we can think. These limits are not merely medical limits of human discriminability but are conceptual, pre-programmed by our modular mental architecture.

9.20 williamson’s relative vagueness

Williamson thinks vagueness is only relative to the speech community of humans (Williamson 1994). He thinks that his ‘margin for error’ principle gives us slippery slope arguments without committing us to absolute vagueness. A supreme knower could know the last noonish second. He thinks drawing a line is an attempt to avoid the instability of the chaotic nature of language use. He thinks it is an important defense against the instability of not being able to know which concepts are being used at any time. It is an approach that only makes sense if belief is voluntary. If all beliefs are involuntary then it can’t handle the idea of absolute vagueness. The involuntary behaviour of beliefs helps explain why grading is normative. Sorenson thinks that belief attribution, including self-attribution, is ‘part of an explanatory enterprise’ (Sorenson 2001, p44). When grading we are in the process of explaining the world and what I believe is influenced by what I ought to believe. I can’t help but believe what I consider the best belief. I can’t know a belief is the best belief but chose to believe something else. Rationality is when someone choses. If I am grading someone I have to believe whatever best satisfies the desires relative to my belief about the situation. This is an important constraint on grading and is violated by any answer system that fails to be consistent with it. Williamson thinks vagueness is about not knowing which concept is being used (Williamson 1994). Where grades are precise then we can ask if any individual fits that category. Vagueness raises a new complexity where we aren’t clear which category we are supposed to be using. This inexact knowledge is not enough to produce a sorites slippery slope because we might find ways of using other inexact knowledge to sort it out through immersion into a term’s practices. The vagueness explained is only for artificial cases not natural ones. ‘Clever at maths’ is not vague because of measurement errors. It is through chaos that vagueness arises naturally in our language according to Williamson. His theory of language use is Wittgensteinian. The way different people use a word contributes to its unknowability because no one can track which meaning is actually being used at any one time. Rather like a weather forecast, predictions are effected by the butterfly effect. Meaning supervenes on use is how Williamson explains why we cannot discriminate precisely between concepts. In grading, we may both be saying that Milad is an ‘A’ grade but my ‘A’ Grade concept draws a line before yours does. Williamson is accused of being anthropocentric. Tattersall thinks that once we shared the planet with 15 other types of hominid (Tattersall 2000). Williamson thinks that the human speech community is the only one he currently need be concerned with. He thinks that pockets of precision can be explained by natural kinds of the sort studied by science that stabilize boundaries. Measurement and algorithms also help to stabilize potentially chaotic situations. But Fodor thinks there could be a science of Sundays, or doorknobs. Williamson thinks there is a link between his Epistemicism and Supervaluationism. Both assume that vagueness is about not knowing exactly which language we are using at any time. The idea of precisifying concepts is the method recommended by supervaluationists to complete a language in a way that doesn’t contradict past decisions by speakers. But the flaw to this approach is that I can’t know whether all precisifications have been completed. Without knowing that I can’t know the proposition. Sorenson thinks that without knowing that the process of precisification is complete I can’t know that something is definitely true (Sorenson 2001, p50). This is not to set the standard for knowledge artificially too high. Animals have reliable expectations of complete enumeration. In assessment criteria there is exasperation when criteria are not complete and loopholes and misunderstandings occur. Complete enumeration is required for contracts, deals and the maximization of quality controls. Similarly, Sorenson thinks that jokes, recipes, compliments, direction giving, instructions, craft proposals all exhibit the requirement of complete enumeration. Grading is no exception to this. And complete enumeration is impossible. If I can know without having to completely enumerate what I know then it implies that I know that I know. The principle that knowing requires that one knows one knows is the KK principle. It is false. It is again an easily demonstrated falsehood. I know that Sorenson is a philosopher of vagueness. I know this when I am sleeping. But I don’t know that I know it then. Williamson thinks it is impossible to enumerate completeness because he thinks for every predicate there are an infinite number of precisifications. A finite being could never enumerate all of them. This approach blocks completeness for finite thinkers like humans by proposing that there are too many options for such a creature to handle. Sorenson thinks that the problem of complete enumeration is too simple from this point of view. Williamson and the supervaluationists simplify the problem to a single element. But Sorenson thinks that there are many blocks to complete enumeration and completion. Circularity, unbelievability, inexpressibility are also blocks on completion. But none of these are usually enough for absolute vagueness.

9.21 spotty, untidy and scruffy, what real knowledge looks like

The key idea, connected with the idea of the innateness of language and thought, supported by ideas of Chomsky and Fodor (1983) and Pinker (1994), is that knowledge is not neat and tidy as Williamson assumes. Williamson assumes a tidy logic and largely an ideal knower. But innateness suggests that language and thoughts are evolutionary reflexes and are likely to be ‘scruffy’. The ‘scruffy nature of human knowledge’ is a key element to Sorenson’s ideas about knowledge (Sorenson 2001, p55). Sorenson thinks it is the untidiness of knowledge that vagueness inherits. The human face of vagueness is at odds with the tidy logic of unclarity that Williamson and supervaluationists espouse. Grading also inherits this general untidiness. The vagueness of a grade can’t be tidied up and made neat. Alternatives to the epistemic approach to vagueness fail to model the completeness requirement. Sorenson thinks that what alternatives do is only indirectly represent the indeterminacy of vagueness, proposing truth value gaps or gluts or the intermediate truth values of fuzzy logics.

9.22 forced analytic error

The scruffy nature of knowledge combines with a language that is consistent. Competent language use requires that we accept some contradictions as analytical truths. The sharp boundaries of things like grades cannot be known and so no one claiming to know the superlative point of any grade will be believable. Sorenson thinks that this is a case of where ‘ The epistemology of language collides with its semantics’ (Sorenson 2001, p59). The grading system of any educational assessment system will have sharp borderlines. Yet anyone competent enough to understand what a grade is will be obliged to believe that the borderline is not precise. The obligation will not come from the world but from the fact of competent language use. It is an a priori obligation of what appears to be an analytic truth. Empirical warrant is not required. Being incorrigible because of this means that they can’t be shown to be falsehoods. But they must be. And they must be for anyone thinking about grades. Even God, thinking about our thoughts of grading, has to embed our own perspective and so is committed to believing a priori analytic falsehoods too. The ignorance of where the precise borderline is isn’t about conflict between a theory and competence. This is a source of ignorance in assessment when competent assessments are overruled by a theoretical perspective that contradicts it. There are theorists of assessment who think that teachers can’t properly asses without knowing a theory of assessment. This can set up the sort of conflicts I am thinking of here. But vagueness’s ignorance is not a conflict of this sort. It is the competence of language use itself that produces the ignorance and generates belief in contradictions. There is no need either to think that there are obligations to produce probabilities for beliefs. There is no need for anything to back an assertion except competence of language use. Many things can have objective, external reasons supporting a belief. A grader may apply a rule by intuition but also draw on a critical level of thought to find justifications for the application of the rules applied intuitively. A grader thinks that Milad is an A grade and has a critical meta-belief about that belief that gives the basis of her belief. These are acceptance rules. These rules are subjective when it comes to language because as Sorenson says, ‘what makes a language my language is my relationship with it’ (Sorenson 2001, p67). With vagueness nothing can prevent an intuition being wrong-headed nor the acceptance rule from being an analytic falsehood. Attempts to avoid the problem of belief by assigning probabilities instead is too costly computationally. There are too many beliefs to calculate probabilities for.

9.23 grading and inconsistent machines

Sorenson has argued five main theses. Forced analytic errors are common. Language systematically passes off contradictions as tautologies. There are infinitely many pseudo-tautologies. Competent speakers ought to be permanently fooled by them. Precedent justifies this functional, massive inconsistency (Sorenson 2001, p68). Grading systems need to be universally dogmatic. If Sorenson is right then the sharp borderlines between grades are non-natural. They are a double bluff. Naturally competence in language use obliges us to think contradictions are tautologies and therefore denies knowledge of sharp borderlines. Grading systems in education stipulate sharp boundaries, forcing the appearance of tautology denial. But the involuntary nature of belief means it is impossible for this appearance to be true. No one can believe the denial of a tautology and be competent in their language use. Grading systems therefore impose systematic disbelief on graders. If passed off as belief to deliberately deceive then such activity amounts to equivocation. Equivocation to deceive is lying. Grading systems need to be decisive and so need to control the expressiveness of language. Tarski thought that the liar paradox was solved by inventing different levels of language, none of which were able to express anything that wasn’t scientifically worth saying. In this hierarchy of language levels the liar sentence ‘This sentence is a lie’ can’t be expressed. Ideas of restricting language’s expressive power appeal to assessment engineers. The desire to model assessment on science is what has been called ‘physics envy’ and I have labeled scientism. Carnap’s variety of ‘physics envy’ took the form of an extreme dream of sharpening language so that vagueness wasn’t possible (Carnap 1950, ch1). The problem with all such dreams is that they produce a fatal expressiveness deficit. Such a language would not even be able to express its satisfaction in its achieved precision because precise is a vague term and would therefore have been eliminated. Frankly, the language would not be able to express much. Too much would be repressed. Assessments that aim to repress the expressive power of language are accused of being also too repressive. When for example Andrew Davis discusses the relationship between validity and reliability in assessment he is drawing attention to this point. He thinks that the price of good reliability is too much inexpressibility. Schiffer agrees. He thinks that English is ‘semantically closed’ and that ‘When philosophers shut down part of the system for the sake of consistency, they violate conceptual norms’ (Schiffer 1998, p75). Kant ‘s antinomies in his ‘Critique Of Pure Reason’ presented the idea that rationality was two faced. When taking reason for a walk about the nature of the world Kant thought there were always competing reasons for accepting conclusions that taken on their own would be decisive. So, for example, there was a good reason to think that time had a beginning. Equally, there was a good reason for thinking that time was infinite. His left handed escape was to think that reason wasn’t something that could be applied to the noumenal world but was only appropriate for the phenomenal world. Kant was a totalitarian sceptic about the possibility of knowing anything about the actual world. Gellner thinks that Kant’s Romantic conception of rationality was one that detailed the structure of modern rationality (Gellner 1975). Schiffer’s approach to paradox is to accept the futilitarianism of Kant’s position without accepting the totalizing skepticism. Sorenson’s epistemic vagueness accepts Schiffer’s denial of skepticism. But he thinks Schiffer’s attempt to refuse scepicism is wrong because it requires conceptual impossibility. Schiffer introduces Chihara’s ‘Secretary Liberation’ paradox to explain his attempt to escape Kantian skepticism (Chihara 1979). Chihara asks us to consider a club for all and only those who are not permitted to join clubs for which they are secretaries. The club becomes so popular it requires a secretary. The question that seems to raise the paradox is whether the secretary would be allowed to join the club. Like the liar and barber paradoxes, we seem to be overcommitted to saying both yes and no. Schiffer’s point is to raise the possibility that language rules are like the rules of the Secretary Liberation club. If language is a system of rules and rule-following is a way of understanding what we mean when using language, then the obscurity of meaning, or its over commitment as analysed by Kant, is not because of misapplication to the wrong sort of world. It is in the nature of language to be perversely two-faced. If Schiffer is right then the incompleteness of criteria for rules is explained in terms of them leaving open options for decisiveness that can never be closed. Radical indeterminists about language, familiar in the field of post-modernist literary studies and prototypically associated with Derrida, are one way of construing the radical indeterminism. Wittgenstein’s ‘Philosophical Investigations’ may be read as a way of trying to show that language will never sharp boundaries for criteria for use. The idea of language games is designed to loosen belief in such a conception of language. In showing the problem with trying to fix a single sharp atomized context independent meaning on any sign which he thinks is fatally required by modern conceptions of assessment, Dworkin’s interpretive concepts presuppose the sort of determinism that Schiffer’s idea about the rules of language would support (Davis 2006). But Schiffer’s analogy between the rules of language and the rules for the Secretary Liberation Club show the flaw in the whole idea of dismissing absolute borderlines. The paradox shows that there couldn’t be such a club, and so there couldn’t be a secretary. Dwokin’s approach is to stipulate that the secretary of the club is excluded from the constitution of the club. But such a stipulation would involve a massive contradiction between the stipulation and the rest of the discourse. And it assumes that such a club would be even possible in the first instance. But such a club would imply that there could be such a secretary. There couldn’t. Therefore it is impossible (Sorenson 2001, p76-77). I think that this is a key issue for over-formal assessment systems. I think grading requires the radical violation of conceptual norms. I think that they result in substituting sharp homonyms for natural language. But I think that typical assessment systems are embarrassed in two directions. Over-formal systems, such as the prototypical psychometric approaches, are embarrassed by the indeterministic nature of language that makes them too unreliable. They try developing a system that revises language to constrain this alleged openness. Alternatives such as criteria referenced assessment and construct referenced assessment (Wiliam 1994) also think language is indeterminate but think that by stipulating notions such as constitutive context norms they are better at not violating the conceptual norms of such a language. Neither approach recognizes the possibility of absolute vagueness. This suggests that the idea of language presupposed by both conceptions is mistaken. Language is not indeterminate and in need of sharpening to produce reliability. This is a deep threat to formal systems such as psychometricism and any other forms of assessment assuming a form of ‘physics envy.’ But this also is also a threat for assessment systems that think that they are modeling genuine conceptual norms by assuming openness and indeterminism of language. Considerations of absolute vagueness show that language is fully determinate. Therefore the contrary idea is an illusion. This suggests that both conceptions of assessment are at best incomplete and at worse just plain wrong. Neither of them can model absolute vagueness. Consequently they require either revisions, to minimise their flaws, or abandoning altogether.

CHAPTER 10: anti incredulity

‘To refute him is to become contaminated with unreality’Borges: The Avatars of the Tortoise

10.1 INTRODUCTION

This chapter returns to the meta-problem of vagueness, the incredulity of people in the face of its solution. It recognizes that the consequences for assessment as examined in previous chapters are large enough to make people refuse to accept the proof, despite its logical validity. Firstly, the chapter addresses the thought that incredulity is based on thinking that vagueness implies irremediable breakdown in communication. The chapter argues that vagueness is not a function of miscommunication theories by showing that if absolute vagueness is true then it must be true of a Language Of Thought (LOT) as modelled by Fodor too (Fodor 1983). It shows that because vagueness is not a type of ambiguity as some theorists think assessment systems that attempt to disambiguate will still be left facing vagueness afterwards. Then it argues against the idea that incredulity may be based on resisting the accusation that decisiveness about absolute borderline cases is a form of lying. The chapter examines sincerity and stipulation and shows that the accusation is justified. The chapter draws further distinctions between relative and absolute borderline cases. It discusses the way absolute vagueness might be avoided by graders. It challenges Wiliam’s reading of Austin in developing his theory of assessment (Austin 1962, Sorenson 2001b). It examines ways vagueness might be confused with various forms of ambiguity. This may motivate that absolute vagueness can’t be true, which would lead to incredulity at the solution to the sorites puzzle motivated by the belief that disambiguation is usually possible, at least in principle. The chapter examines various ways in which we have to accept unknowability in order to suggest that the unknowability thesis is not a unique case. The chapter examines supertasks as examples of epistemic hostility (Earman and Norton 1996; Laraudogoitia 2009; Black 1950; Benacerraf 1962; Thomson 1954, p55). It examines impossible objects as another kind of resistance to enquiry (Marcus 1981, Stalnaker 1984). It examines Moorean counterprivacy as another example (Moore 1950).

10.2 THE ABSOLUTE UNKNOWABILITY OF VAGUE BORDERLINES

Sorenson thinks that applying the principle of bivalence to a borderline case commits us to saying that a borderline statement is an unknowable truth or falsehood because there is no fact of the matter to make it known. So, ‘coffee is food’ is either true or false but no one can know which. A borderline pass student is either a pass or not but no one can know. A borderline guilty person in the dock either is or isn’t guilty but no one can know which. Other theorists think that this is wrong. A borderline case is one where one makes it true or false. These theorists replace discovering the truth with inventing it. What makes something true? Some think facts, states of affairs. Critics think that this is a version of logical atomism that can lead to a type of verificationism and logical positivism. Most are unwilling to be logical positivists but say that there will be a truth maker of some stripe. There are many who conclude that a deviant logic is required to cover over the need for something between bivalence and truth makers. Supervaluationists reject bivalence (Fine 1975) and propose a truth value gap. Fuzzy logicians think that a borderline case statement is to a degree true and false (e.g. Machina 1976; Sanford 1975). Some think borderline statements are meaningless and so need a logic of meaninglessness (Hallden 1949). Intuitionists think they can explain borderline cases by removing double negation (Putnam 1983). Paraconsistent logicians think that instead of a gap there is a truth glut. Borderline cases are both truth and false and so require a logic able to restrict the inferential implications (Priest 1987). Some philosophers think language not logic is the problem. Carnap wanted to eradicate inconsistency from language. Russell thought that this was impossible and so thought logic only applied to a Platonic language. There is general agreement that all natural languages are vague. Dummett thinks it covers all natural languages like dust. There are examples of precise languages. Official maths language, chess notation and programming languages are all precise.

10.3 fodor and lot

Fodor thinks that language is not learned but acquired. His Language of Thought (LOT) is an explanation that rejects meaning holist views of language. It is of interest to philosophers of vagueness because many of the reasons that explain vagueness gain credibility by making vagueness an effect of inferential language learning and communicative acts that meaning holists use. If LOT is vague then the attractiveness of theories requiring holism to explain vagueness is diminished. This is a strategy Sorenson follows (Sorenson 1991). He thinks that LOT is likely to be vague because it has to have the same expressive power as natural language. By analogy it is vague. Evidence of LOT has to be that there is something natural-language-like in the head; if its there, the evidence is likely to show a vague LOT. LOT sentences are propositional, although they allow for indexicality (Fodor 1981, p177-203). Propositions are vague so LOT propositions must be vague. If I think Jerry is borderline clever it is wrong to think I sincerely believe something precise of Jerry. Saying I think Jerry is a borderline case of clever is a denial that I have an opinion that there is an n such that I now believe Jerry to have that n. If we say that no LOT is vague then it leaves us not knowing nearly all our beliefs. Without vague thoughts logical service is lost. In some inferences it is only through understanding the vagueness of terms can one detect the validity or invalidity of an argument. For this reason Frege/Russell’s thought that logic only applies to Platonic languages is false. LOT sentences can’t be ambiguous but they can be vague. Ambiguity is about deciding between different propositions. Vagueness is about deciding within a single proposition. But some think that LOT makes minds inhuman because a machine conception of ‘minds as computers’ would mean that they can’t handle vagueness. Dreyfus arges that a language of precise definitions would create an unsustainable tension with the fuzziness of human concepts (Dreyfus 1972). But the fuzziness is vagueness and not ambiguity. Where a definition is vague then rules of applicability can be vague too without tension. Fodor says: ‘The most that can be inferred from the existence of open texture is that if a formula expresses the truth conditions on P, then its truth value must be indeterminate wherever the truth value of P is indeterminate’ (Fodor 1975, p63). Evidence for LOT is also evidence that LOT is vague. If someone tries to express a borderline case of a LOT concept then the person has difficulty finding the word ‘on the tip of her tongue’. Introspection suggests that we have many vague thoughts. Sorenson thinks it unlikely and unattractive to think that vagueness only exists in natural language not LOT. It would mean that animals and mutes had precise thoughts and that highly articulate people were vague. Definitions resist precision because LOT is vague and the original thought has to be preserved. Gist memory also suggests vague LOT sentences. We tend to remember gists of sentences rather than precise syntax and that these gists conserve vagueness (Sachs 1967, p437-42). LOT has to preserve our psychological ‘poor balance on slippery slopes’ (Sorenson October 1991, p395). We are vulnerable to the fallacies of vagueness because we think in vague thoughts. If LOT is true then vagueness has to be explained in a way that doesn’t require explanations only possible using social, public resources. There are bogus explanations of where vagueness comes from. One explanation thinks vagueness is about underspecificity rather than borderline cases (Sorenson 1989, p174-83). Thirty-something is not vague but is unspecific because ‘there are no borderlines afoot’ (Sorenson 1991, p396). Underspecificity is a function of generality. Generality is sometimes explained as a trade-off paying for communication. It doesn’t explain borderline cases. Russell thinks underspecificity is caused by diminishing information but again this does nothing to explain borderline cases (Russell 1923, p84-92). LOT sets conditions for vagueness that can’t be met by explanations that might seem to be reasonable for ordinary languages. Frege thinks that vagueness is caused by under-definition (Frege 1980, p139-52). However, LOT doesn’t have anything introduced by definition. Fodor thinks that most concepts are indefineable (Fodor, p285). LOT causes problems for Wittgensteinian ideas comparing languages to cities. Cities have some planning but some districts are ad hoc. Language similarly is only incompletely intergrated over time as accretions develop. LOT ‘has no collective aspect‘ (Sorenson 1991, p367). LOT requires methodological solipsism (Fodor p225-53). Language is not a growing thing but something given. Vagueness conceived as Epistemicism is the only theory of vagueness consistent with LOT. Sorenson is pleased that ‘I am one of the epistemic theorists… LOT implies the truth of my position but (thank goodness) not vice versa’ (Sorenson 1991, p389). Wright thinks we need tolerant predicates and this causes vagueness (Wright 1975, p325-66). From this Wright thinks we can learn language from paradigms. Some theorists think that we learn concepts from paradigms which are reinforced by each case. Weakening reinforcement causes vagueness (e.g. Quine 1960 p125-6). Prototype theorists can’t explain LOT like this because it isn’t learned. Also, a prototype can explain degrees of obscurity but most obscurity is remedial and not connected to possession of borderline cases. Finally, LOT is computational and prototypicality isn’t computational so tracing vagueness to prototypicality is going to fail to explain the vagueness in LOT. It might be argued that vagueness is caused by communication problems in primary language acquisition but this can’t work for LOT as LOT isn’t used for communication. Vagueness cannot be explained by fading usage in LOT because the meaning of LOT predicates doesn’t depend on use. LOT is triggered by environment but use can’t develop a concept that is hardwired. Sorenson thinks there are two kinds of borderline cases, neutral and opinionated kinds. A neutral case is one where the stakes are low and no one really cares. Cases are trivial, such as whether we should call a baby a surviving twin or not if a woman aborts one of her two babies. Opinionated borderlines are where people are competent but in dispute. However LOT isn’t amenable to the kind of explanation that says it is caused by different linguistic communities drawing different boundaries. Multiple language ideas are irrelevant because LOT is universal and purely solipsistic. Partisan borderline cases are the result of collective decisions which can’t affect LOT. If language expands to meet new needs through metaphor then metaphor might be thought to be the source of vagueness. Resemblance is a vague notion and metaphor is based on resemblance. So metaphor is a source of vagueness. But LOT is not metaphoric but literal because it isn’t a communication tool. LOT is also maximally expressive and so has no need to expand its expressive power. Therefore metaphor can’t be a source of vagueness in LOT. If LOT was reducible to mental images then the vagueness of mental images may be thought to be a source of vagueness. But LOT is not imagistic because ‘images never have truth values and so cannot be objects of belief’ (Sorenson 1991, p400). If evolution hasn’t completed LOT then perhaps vagueness is accounted by this unfinished, under-developed state. This can’t be used to revive partisan borderlines because evolution works on the individual rather than the group. If this explanation suggests that LOT has evolved slightly differently in each individual this would not be a cause of vagueness but of ambiguity because partisanship would be between different languages. LOT is committed to methodological solipsism and so it might be argued that its vagueness is found outside in the world and its history. LOT is also maximally expressive to us. An evolutionary approach postulates intermediary advantages at each stage of development but for LOT evolutionary explanation requires immediate payoff. There can’t be any artificial language that expresses our beliefs better than LOT because every proposition has to be accessible to us in LOT. Inefficiency is not a source of vagueness. If it was then selective pressure against waste would mean we’d evolve towards greater precision and less vagueness. Supervaluationism is a form of explanation that thinks vagueness is caused by fading use. Meanings are decided by the users deciding them, based on need and purpose. Supervaluationist logic adapts classical logic to accommodate this ‘legislative economy’ (Sorenson 1991, p401). They propose truth value gaps. But LOT isn’t a language depending on use. Nor is it a language that has a use for vagueness because it isn’t a language that progresses and revises. It is complete at all times. Vagueness loses the function assigned to it by supervaluationists. There is no piecemeal fixing of meanings. There is no open texture in LOT. Supervaluationism fails to handle the distinction between ambiguity and vagueness because disambiguation makes a vague statement true as well as precisification. And when dealing with the sorites their solution is one that turns on equivocation between small different meanings of a word rather than contradiction within a single meaning. Fine thinks that supervaluationism can sustain the distinction: he thinks vagueness is underdetermined meaning and ambiguity is over determined meaning (Fine 1975, p 266). But this solution still depends on interpreting propositions as being different propositions using the same word and so there is no tension. Once brought under a conflict of application of a single proposition then ambiguity is irrelevant and the problem is vagueness. The gap/glut usage is about vagueness not ambiguity. Supervaluationists think that intentions of users or degree of similarilty between readings are relevant to the logic of vague terms but these are not reasons for different logical treatment. LOT makes a distinction between vagueness and ambiguity not just with logical consequences but also with empirical consequences. Ambiguity tends to trigger longer reaction times the more ambiguous a proposition becomes. Vague statements don’t have longer reaction times as they become vaguer. There is therefore a psychological difference between ambiguity and vagueness. Incoherentists think that vagueness eradicates the usual rules governing use. They think that borderline statements are inconsistent and so statements that seem invalid are actually valid and that some that seem valid are invalid. Unger and Quine think that because language is acquired in an ad hoc way there are conflicts between rules of use. LOT is not so acquired and so incoherentism is ruled out. Any supplemental explanation drawing on the idea of a divided linguistic community is also inapplicable because of the solipsistic nature of LOT. Many valued logics fail to sustain classical inferences and so again is inapplicable to LOT (For example, the way many-valued conjuncts and disjuncts are calculated in such a way that a straight contradiction is assigned a degree of truth). Psychiatrists interested in using this approach using the idea of ‘clear cases’ but the contrast between clear cases and vagueness is not the right contrast. Some things, prime numbers for instance, can be assigned degrees of typicality without vagueness (Armstrong, Gleitman & Gleitman 1983, p263-308). Any theory that associates vagueness with use is inapplicable to LOT. Partisan borderline cases and fading usage are the commonest approaches and they fail because LOT doesn’t depend on use for meaning. An epistemic approach explaining vagueness is applicable to LOT because it thinks that borderline statements conform to standard logic and many of them are true. The theory implies that there is semantic uncertainty at borderline cases. Sorenson thinks that this indicates that language is a source of many hidden truths. Putnam’s twin earth discussion challenged the idea that meaning was a priori. After Putnam theories of meaning assumed that meaning was known a posteriori. Epistemic theory thinks that borderline cases are unknowable and so are neither. Many people think that the amount of semantic uncertainty implied by the theory is too much and that the theory is too extreme. Sorenson thinks that the situation is analogous to other unimaginable theories that gradually became imaginable. Sorenson thinks that gradually it will be a theory that gains in appeal until it is finally accepted. It is like Darwinism and Alfred Wegener’s theory of continental drift, of heliocentric astronomy, non-Euclidean geometry and subatomic physics. When faced with an unknowable people like to ask why it is unknowable. There are things we don’t know because it is useless to know. Knowledge is expensive so when there is no benefit we revert to ignorance. Where useless knowledge exists it is understood as a by-product of useful knowledge. Useless knowledge is anomalous, ignorance is the cheapest state and knowledge, being expensive, needs explanation, usually in terms of usefulness. Sorenson thinks ‘knowledge is diagnostic of utility, not a cause of it’ (Sorenson 1991, p408). In these terms the ignorance of borderlines is expected. Threshold knowledge would have to be a by-product of knowing and the epistemic theory suggests that no such by-product happened. Epistemic theory supports the idea that even if there were sharp thresholds we wouldn’t know them and this would be systematic ignorance. Another reason for accepting epistemic theory is that we tend towards minimising error rather than maximizing truth. Mistakes are worse than failures to know things. When dealing with tampered evidence the protocol tends to be to throw away the whole evidence and start again, even though there is likely to be truth in the flawed evidence. It is cost effective to do this. Similarly, threshold hunting is likely to be costly and error strewn. It is cost effective for organisms to throw away knowledge of thresholds in order to maintain efficiency and keep down the costs of reliable knowledge utility. This sort of thinking is reflected in acceptance rules in science journals, in theories of colour error in seeing (Wald 1972, p94-103) and in the thinking of Duhem. Ignorance could be explained as a way of limiting search space. Chomsky when discussing how language is acquired thinks that there are too many theories that would fit incoming data. A completely open mind would not be able to select a unique theory. Because of this he thinks there must be an inbuilt constraint, a mental architecture. ‘Intrinsic principles of mental organisation permit the construction of rich systems of knowledge and belief on the basis of scattered evidence. Such principles, which constitute an essential part of human nature, also determine which systems will be more accessible to the inquiring mind, and may, indeed, impose absolute limits on what can be known’ (Chomsky 1971, p49). Universal grammar is his proposal for engineered absolute ignorance so that acquisition of language is possible. The ignorance proposed by epistemic theory can be seen as a similar blindspot required for meanings to work with the flexibility and range they have. Game-theorists also show that there is a role for ignorance. It can be in everyone’s interest to maximize ignorance. In the horror film ‘The Midwich Cuckoos’ the children are destroyed through the professor realizing that the children were able to mind-read. He had to force himself to become ignorant by thinking about a wall so that the children were unable to know what he had planned. It could well be that knowledge of thresholds is strategically in everyone’s interest to be in the dark about. If it were harmful we might expect thresholds to become knowable over generations. There may be social benefits. Women have developed biologically to ensure that men are ignorant of when they are most fertile. This brings about the social benefit of ensuring that men adopt a strategy of fidelity (Halliday 1980). Ignorance of borderlines may well be of benefit; strategic judgment may be enhanced because of ignorance of where a line should be draw. Ignorance may be developed to prevent misadventures. Too much fine-grained knowledge may prove to cause difficulties that ignorance prevents. Artificial Intelligence models of computerized theories of mind require programs that forget information in order to achieve required levels of human ignorance. Vagueness may be connected to randomness. Randomness can be linked with the creation of stability in neurological processing. Apparently, healthy people have turbulent brain waves and people having a fit have smooth patterns. This is a sort of resonance effect explained using catastrophe theory, of the sort that explains why soldiers march out of step over bridges. The above were biological reasons for ignorance but there is a sense in which vagueness is about logical impossibility. Sheffler disagrees and thinks that there is no reason to think that in the future thresholds may be discovered. Sorenson’s epistemic theory doesn’t think that this is possible because the limit is logical and a priori. The theory resists the way other theories deny that there is a fact of the matter to be known in a borderline case. It resists the switch from asking ‘What is the answer?’ to ‘What should count as an answer?’ It denies the switch from seeking the truth to inventing a pragmatic response, from moving from a process of discovery to one of invention. In assessment this is a denial of the supposition that threshold identity is absolutely indeterministic. An assessor who genuinely thought this would be lying if she then made a decision. Stipulation and invention are disingenuous in cases of indeterminacy. The problem is one of sincerity of the Moorean kind. Wiliam and construct reference adherents cannot create truth value gaps by putting truth value statements under our control. An umpire who calls a decision has to call it sincerely. She cannot say: ‘I believe the batswoman is out but I have no belief one way or the other about whether the batswoman is out’. An educational assessor is similarly constrained not to lie. But the assessment system requires a verdict. From the perspective of the system an assessment has to be made. There are no explicit justifications for the system requiring assessments to be insincere and dissembling. Sincerity is consistent with a decision that the assessor may not wish to endorse. If the system is using criteria rules that she disapproves then she may follow the rules whilst wishing heartily that the rules be changed. Teachers are often in the situation where they are having to assess their students using rules of assessment that they disapprove of. In such circumstances they are acting sincerely. They are committed to failing a student even where they think she is a pass and vice versa in such circumstances where they disagree with the rules.

10.4 sincerity

A common problem is the confusion of the test for a property with the property itself. The test for ‘good at English’ should not be identified with the property of the property ‘being good at English’. If this happens then asking which test is a more reliable test is pointless. Wiliam reads JL Austin’s theory of performatives so that insincerity is impossible both in terms of commissive insincerity (they believe the opposite of what they say) and omissive sincerity (they don’t say what they really believe to be the case). Stipulation precludes lying if saying makes it so. Fiat precludes absolute vagueness. Therefore there is no vagueness. If fiat is the way decisions are made in high stakes assessments then there is no vagueness in these assessments. It isn’t the case, however, that stipulations can serve to eradicate absolute vagueness. Borderline cases are part of the meaning of the term and stipulation of a sharp borderline changes the subject. Stipulation in high stakes assessment is a case of lying: “A lie is a statement made by one who does not believe it with the intention that someone else shall be led to believe it” (Isenberg 1964, 466). If you believe coffee is an absolute borderline case of food then you cannot sincerely believe coffee is food. There are experiments that show that deciding to decide even when you cannot know answers produces a psychological response Ekman calls ‘duping delight’ (Ekman 1985, p76-79) .According to Ekman and Sorenson (Sorenson 2001b, p390 ) the same behavioural cues will be exhibited in such people as are found in liars. Educational assessment systems for high stakes are systems that have created the need for sincerity and decisiveness in all cases, including absolute borderline cases. It creates a predicament that is characterized by moral conflict between the duty to be decisive in all cases and to be sincere in all cases (Sinnott-Armstrong 1988, 39-52). In cases of absolute borderline cases there is a conflict between sincerity and decisiveness. Decisions made about absolute borderline cases are lying verdicts. The suppression of vagueness is something that assessors like to do because they don’t want to lie. Decisiveness trumps sincerity. Assessments for high stakes are conducted in an atmosphere of high trust and morality. The moral conflict at the heart of these conflicts make the high moral atmosphere of the system hypocritical. Some teachers refuse to assess because they want to avoid this conflict. Some will see the benefits of assessment but still want to minimise cases of adjudication in absolute borderline cases. Recall from the previous chapter the idea that relative vagueness can have a purpose. So, for example, Raz thinks the indeterminacies are relative borderline cases caused by truth value gaps (Raz 1984). Raz thinks that the indeterminacy is relative that law is creating. Asking questions that allow for relative borderline cases is a way of controlling the way a question is answered. It is something that can be used in an educational context also. It allows for flexibility in the answer. But this is not a matter of absolute borderline cases. A question will prefer to ask whether enough salient knowledge has been shown by a candidate to pass rather than ask exactly how much knowledge was required. It may be that the purpose of the this thesis is to throw absolute borderline cases into educational high stakes assessment systems in order to make current assessment systems break down. It may be thought that this would be a good thing at the moment. But this would hardly justify saying that absolute borderline cases had a function in such a system. There may be ways of using vagueness politically. But systematic utility isn’t functionality. If relative borderline cases can be useful it is through the recognition that supplementary procedures and premises might resolve undecideable cases. Absolute borderline cases are resistant to this sort of tinkering. They do not respond to new information. An assessment that is decisive when it is an absolute borderline case could have the force of precedent but can’t be reliable. The assessment is not based on evidence and due consideration because it is determining something that is incapable of such decisiveness. In an assessment it is where there is a dispute that a borderline case is easily recognized. Sorenson thinks that in cases where it is recognized the best strategy of an assessment panel is to avoid answering any question about its borderline status. The best tactic is to recast the questions required so that the question about the borderline case is avoided. This is analogous to the question about when a fetus is a baby. Thus is a question about an absolute borderline case and so has an unknowable answer. In the US, the Supreme Case in Roe vs Wade changed the question about abortion into an issue about privacy to avoid having to rule on an unknowable issue. Teachers can’t answer unknowable questions decisively and sincerely because no one can, not even God. But because decisiveness trumps sincerity teachers have to answer insincerely when faced with borderline cases. There are irrelevant considerations that might be used to make the decision such as working out which decision has the best pedagogical consequences or which will make the grade distribution overall look best or which decision will constrain grade inflation issues. None of these reasons justify the decision in terms of which grade should be assigned to the student. The teacher has to lie in an absolute borderline case. The arbitrary decision is based on other things that can be relevant to the preservation of the system. Endicott thinks that this is why tossing a coin to decide the fate of the student is unacceptable even though it is no more arbitrary in terms of answering the question as to which grade the student’s performance merits. A teacher making a decision in a borderline case might tell the student that her decision was insincere. In this way there would be no deception. It would be a high risk strategy especially in a high stakes assessment. Some people hope that asserting something doesn’t commit one to the truth of the assertion. Woozley writes: ‘Reasonable, hardheaded lawyers can properly discuss the question (and disagree with each other in their answers to it) what the right answer to a question of law is, even though they agree that there is not a right answer to it – yet. So having grounds for asserting p does not imply that p has a truth- value – unless to say that p is true (false) just means the same as saying that we have better (worse) grounds for asserting p than for asserting 􏰃p. But it quite obviously does not (Woozley 1979, p30). However, this would lead to asserting contradictions. It would imply that one could assert ‘Milad passed the exam and it is neither true nor false that Milad passed the exam’. It is also usually thought that belief aims at the truth. Wiliam uses the performative aspect of adjudication to resist the claim that in absolute borderline cases the assessor is lying. Saying makes it so. But Austin thinks that verdicts, including grading, ‘…consist in the delivering of a finding, official or unofficial, upon evidence or reasons as to value or fact, so far as these are distinguishable. A verdictive is a judicial act as distinct from legislative or executive acts, which are both exercitives. But some judicial acts, in the wider sense that they are done by judges instead of for example, juries, really are exercitive. Verdictives have obvious connections with truth and falsity, soundness and unsoundness and fairness and unfairness. That the content of a verdict is true of false is shown, for example, in a dispute over an umpire’s calling “Out,” “Three strikes,” or “Four balls” (Austin 1962, p153). Disputes over whether a call was correct or not do not focus on the performative aspect of the decision but rather on the evidence used to justify the decision. In such cases an umpire can lie. If she thinks that the batswoman was not out then calling her out is a lie. Wiliam ignores the bookkeeping aspect of performative acts. Umpires, assessors, judges make calls that are empirical reports. Parallel to these are the unofficial histories of such verdicts. This is what Sorenson calls ‘unregimented history’ (Sorenson 2001b, p403). When faced with absolute borderline cases an assessor is not able to rely on performative act theory to help them out. It would be better rather to work hard and avoid being put in a position where they have to decide such cases. Educational assessment systems might be organised to ensure that no students end up near thresholds. The use of precise grading systems are attempts to do this. This is something that legal systems of adjudication can’t do as easily according to Sorenson (Sorenson 2001b, p404). But a precise grading system can’t avoid being an insincere recording device if what is being graded is vague. Some teachers don’t think that this is the case. They agree with Dworkin that all vagueness is relative (Dworkin 1977) and that there is always the possibility of a known right answer. Much discussion about the functionality of vagueness in assessment is not about vagueness but generality. Generality is where a single term is used to cover several things. It has the same functionality in assessment as it has in ordinary discourse. Vagueness can be taken for indexicality. Indexicality is a form of ambiguity and so has the same sort of functionality as ambiguity. Meaning change of indexicality is systematic but is not a function of vagueness. Ploysemy and amphiboly (ambiguity through syntax) is where a term has many senses and this kind of ambiguity is unsystematic. The trade off between explicitness and efficiency is at the heart of the functionality of these kinds of ambiguity. Again, they are not relevant when discussing vagueness. Vagueness in educational assessment arises because assessments use vague predicates, not because we want the vagueness. Much functionality assumed to be a function of vagueness is actually not so. Hart discusses the way vague terms allow for extension of knowledge but his examples are about generality not vagueness. Generality can bring about flexibility in the use of a term and this is not due to borderlines. Wittgenstein thought family resemblance was a kind of vagueness but it is actually a mode of generality where borderline cases are not the point. Where vagueness is understood as a species of generality then vagueness can serve to have the same functionality as generality but in such cases we are not discussing absolute vague cases. Borderline cases are not purely psychological but are associated with epistemology. If it were just psychological then vague behaviour would suffice. But Sorenson imagines cases where psychologically we sort a predicate into clear positive cases, clear negatives and borderline cases in between shaded appropriately without the predicate being vague. So vagueness requires a normative aspect as well as the psychological. A subjectively vague assessment system is not an actual vague assessment system. Is current high stakes assessment based on a subjectively vague system where assessment grading is subjectively vague but not actually vague? If a grade is actually self contradictory, then the laws implying a grading system’s existence only have subjective vagueness. This mirrors Sorenson and his argument from the contradictory nature of God’s existence. ‘If God’s existence is contradictory, then the laws that imply His existence only have subjective vagueness’ (Sorenson 2001b, p409). But Raz has shown that grades are not subjective and that they have normative force. Vagueness is a feature of a Razian objective system. The development of the summative assessment system for the UK’s school system sketched above is familiar to educationalists. The thought of this thesis is that it is the development of an impossible object or impossible figure. Impossible figures were introduced to scientific literature in 1958 (Penrose & Penrose, 1958). Visual illusions have presented these figures through numerous publications since (e.g. Ernst 1986, Escher 1967, Uribe 1978, Mandelbrot 1977, Hoffman 1998).

10.5 impossible objects supertasks etc

The resulting classification system used resembles a colour spectrum, another standard way of classifying objects. A colour spectrum is an impossible object even though it is both a familiar and acceptable way of classifying. An impossible object doesn’t have to be unfamiliar or impracticable. In language a contradictory statement can be grammatical and therefore meaningful, but necessarily false. Chomsky makes this point. ‘Colorless green ideas sleep furiously’ is perfectly grammatical and violates semantic rules. Hume’s prototypical theory of mind, his Theory of Ideas, was thought by Hume to be constrained by the impossibility of forming the idea of impossible things, such as mountains without valleys (Hume 1739-40 p32). But thinking a thing and it being possible are different. The mountain without a valley may be an impossible object but the sentence represents its logical impossibility well. It is necessarily false and an assertion of its existence is also false because the world is not something that can be self contradictory. The Moorean sentence that so intrigued Wittgenstein is both grammatical and semantically consistent. ‘I went to the cinema on Friday but I don’t believe I did’ is possibly true but impossible to assert. The assertability derives from the assertability conditions of Gricean implicature whereby although ‘…some maxim is violated at the level of what is said, the hearer is entitled to assume that maxim… is observed at the level of what is implicated (Grice 1981, p33). Although the semantics of the sentence are logically consistent what an assertion implies is that there is a hidden contradiction of belief. Sorenson notes that in an episode of Star Trek, captain Kirk shows he is aware of the systematic use of inconsistency in our thinking when he exhorts his men to investigate the possible and then, failing that, the impossible. A grading system may also be likened to a supertask (Laraudogoitia 2009). The prototypical example of a supertask is that imagined by James F. Thomson with his puzzle of the Thomson’s lamp (Thomson 1954-55). It is often considered as a variation on Zeno’s paradox. He imagines a task that requires a quantifiable infinite number of operations occurring sequentially within a finite period. The lamp has a lamp switch with just two positions, on off, and Thomson imagines switching it on and off, beginning with a minute interval and then subsequently with exactly half of the previous tie interval between switches within a two minute period. It is indeterminate what position the switch is in at the end of the two minutes, and indeterminate whether the light is on or off. Thomson thought that the working of the lamp is contradictory and so his lamp is an impossible object. Black’s infinity machine is a machine that can carry out an infinite number of tasks within a finite time and so he thinks it is an impossible object (Black 1950-51). Benacerraf's thinks that both Thomson’s lamp and Black’s infinity machine are not impossible objects because the end of an infinite series does not have to share properties shared by the series previous to its completion (1962). For example, the decimal expression of 1/3 is 0.33333… which is a number less than 1/3. But the end of the series is precisely 1/3. However, epistemological supertasks are restricted to the domain of possible psychologies and these don’t justify eradicating the perception of the situation as part of the problem. Black and Thomson, contra Benacerraf, assume that the state of the system is a logical consequence of states it has been in before, that properties shared by the partial sums of a series are shared by the limit of those sums and that properties shared by a succession are shared by the limit. In systems requiring human thought, Sorenson thinks that these assumptions are hard-wired even if they are a priori false. A grading system is an epistemological object that is not intrinsically impossible but is one that has to be conceived as such by intentionality of any sort. Hence its quality of absolute unknowability of thresholds. Supertasks are paradoxical but not because of inherent inconsistency of the notion of a supertask per se. They are paradoxical in the way Moorean sentences are. This view of supertasks is adhered to by Earman and Norton (1996) according to Laraudogoitia 2009. The fault is in ourselves. I stay with considerations of supertasks to illustrate more about the grading system comparison. There exist Thomson lamps in reality. The lamps yield different states at the end of the process depending on the model employed to and this is taken to show that Benacerrafic reasons work: the final state is independent of the previous sequences. Sorensonian absolute vagueness doesn’t allow for physical determinacy to emerge out of its indeterminacy because it proposes an absence of any fact that would make a threshold true. The perceived absence of a natural limit is contradicted by the principle that all propositions in natural language are necessarily bivalent. Black’s infinity machine seems physically impossible because it would require objects to be simultaneously both permanent and continuous when these principles seem to be contradictory in this context. In the classic film ‘The Fly’ objects are transported by going out of existence and then reappearing elsewhere. This contradicts the principle of permanence. The principle of continuity requires that there is a precise position for every object. Black’s infinity machine, like the teleportation machine in ‘The Fly’, seems physically impossible. In the novel ‘Tristam Shandy’ the diarist hero is involved in a supertask: it takes him a year to write an account of each day in his life: as a mortal his task will never be completed but if he lives forever every day of his life will be completed because there will be a year that corresponds with each day. He won’t live forever and so he won’t complete his task. An impossible object embeds an illusion, however. Escher’s picture ‘The Waterfall’ is an example of an illustration of an impossible object. Ordinary onlookers see the picture as giving contradictory evidence of perspective simultaneously. The result is that the onlooker is perpetually baffled by the picture. The perplexing nature of the illusion continues even after it has been explained. The illusion is resistant to cognition. This contradicts people who think that once an inconsistency is understood as such it can’t be believed. Marcus thinks that explaining inconsistency dispels belief in it (Marcus 1981). Stalnaker thinks belief can only be in non-empty possible worlds (1984). This is a familiar but false attitude. Sorenson has a Cartesian infallibility argument proving that everyone believes at least one contradiction. Essentially he argues that to mistakenly believe that it is impossible to believe the impossible would itself be a belief in an impossibility (Sorensen 1996). Literature, film and the visual arts have many examples of impossible objects. The Aleph in the story of that name by Borges is an object that contains all perspectives of the world. The Disc of Odin is a coin with only one side. The Library of Babel may be an impossible object, containing as it does every arrangement of every language. If some are inconsistent then impossibility will result. In Kafka’s The Trial the bureaucratic system the protagonist finds himself trapped inside is a labyrinth which suggests a never-ending maze. In The Castle the road leading to the castle always seems to be advancing upon and yet getting no closer to the castle at the same time like a version of Zeno’s paradox. In the Spanish film The Saragossa Document, an old favourite of directors Bunuel and Scorsese, the protagonist finds himself in scenes that haven’t yet happened. The Czech animator Jan Skankmayer’s Alice is a sequence of impossible worlds, faithful to the impossible worlds of Alice in Wonderland and Alice Through The Looking Glass. As with his Czech counterpart in literature, Franz Kafka, his impossible object is sinister. The painter Breugel the Elder painted an impossible gallows on which sits a magpie. Los de Mey has a painting of two Breugel-like peasant characters running towards a large impossible arch of stone. Marcel Duchamps’ Apollinere Enameled (1916-1917) depicts a little girl enameling an impossible bed. There are whole websites devoted to illustrations of impossible objects. In music too there are impossible objects. Roger Shepherd has created an ever rising tone. It can be heard at the illusion-works.com web site. Diana Deutsche creates auditory illusions and paradoxes (Deutsch 1972, 1978, 1979). There are impossible mathematical objects. Roger Penrose’s triangle is one such. Impossibility is a condition of a priori inconsistency. They strike us as being more peculiar than a posteriori inconsistencies because we are less likely to patrol them than the latter. Inconsistency is familiar and inconvenient at times. Sometimes it isn’t noticed. The world isn’t inconsistent because world’s aren’t things that can be inconsistent. Inconsistency is therefore a feature of representation. A grading system is a classification representation. It is not surprising upon reflection that it turns out to be inconsistent. But inconsistency is constrained as it is considered inaccurate. This constraint drives the reliability constraint in grading systems. Detected inconsistencies are embarrassments to assessment systems that have high stakes. These illusions survive our knowing that they are illusions, as does the colour spectrum. Fodor thinks that our mind is made up of a rag bag old stratified reflexes. (Fodor 1983) Sorenson (2002, p78–93) thinks that these reflexes help us to understand why we are subject to some illusions and why they remain even after we know that they are illusions. Modularity of mind is the idea that each reflex is made up of homunculi, little people, each assigned a specific task. Functional theories of the mind, drawing on computerization, find this an appealing way of explaining how the mind works. Daniel Dennett uses this approach to say that the consciousness of consciousness is an illusion. Consciousness is an impossible object for Dennett. In these theories, each homunculi is assigned a task simpler than the overall task to be completed. The theory hopes to be able to delegate tasks out to simpler and simpler homunculi until all that’s left is physics. This has not happened yet. So this is a research programme, but one which is very much alive (Fodor 2009). However, it is noted that these tasks assigned to the homonculi have evolved and the coordination of tasks is rather hap-hazard and imperfect. Richard Dworkin thinks that evolution is hap-hazard and imperfect because evolution can’t start with a clean slate and develop the optimum design but has to adapt pre-existing materials. The mind is built using the same evolutionary principles. To explain how a colour spectrum is understood using this we are asked to imagine that the colour spectrum is being processed by competing homunculi, each concluding their different tasks. So in the case of a colour spectrum, for example, one homonculus is tasked to make sense of the perception of the whole spectrum and concludes there is red at one end and non-red at the other and specifies transition between the two. Another is tasked to register at the micro level and notes that each adjacent part of the spectrum is the same colour, concluding that there is no transition. Contradiction! The two messages conclude that there is transition and no transition. There is no coordinator to sort out these two different thoughts about the spectrum and so the mind is permanently thinking contradictions. The perception is therefore one that sees both transition and no transition (Hurvich 1981). Being reflexes, this perception is persistent even after the problem is explained. Explaining is not ‘explaining away’ because there is nothing to remove the a priori working of the mind that is structuring the perception. Being reflexes of our thought they are examples of a priori belief in contradictions. Sorenson thinks that if we accept a priori thoughts then there are good reasons for believing in a priori contradictions. If we assume our language is modular then it is explicable using this computational model just as perception is (Fodor 2009). The homunculus working out the best grammar to understand sentences will not always be coordinated with each other due to the accidental, ad hoc nature of evolutionary design. Just as with perception, Sorenson think we should expect linguistic a priori illusions caused by homunculi failing to coordinate their findings. The way we think will therefore likely to be full of a priori illusions that will be cognitively resistant. Some philosophers disagree. They think that the inconsistent belief in colour spectra is a special feature of perceptual belief and can’t be generalized. The fact that the illusion of borderless transition persists is taken as proof that perception can’t utilise concepts and so is separate from other types of believing (Crane 1988). However, other philosophers disagree with this because they think that the idea of contradiction doesn’t make sense without involving concepts. They argue that there are indeed two concepts competing to make sense of the spectra but that there is no need to think that we ever assign one content to the competing understandings. What we see is disagreement between two consistent representations. So in his discussion of the Waterfall Illusion which occurs when a person stares at a waterfall for a time and then looks at rocks, causing the perceptual illusion that the rocks are moving, ‘One of these two perceptual experiences gives us the corresponding belief, say that a doesn’t move, which then suppresses the rival inclination to believe that it does’ (Mellor 1988, p149). Sorenson’s position disagrees with both Mellor and Crane. The inconsistent process a normal perception of rocks. This is exactly the same process used when we come to believe in the grading system. It is the abnormality of detecting the inconsistency inherent in the process that is striking. Such inconsistencies are common but usually remain undetected. The abnormality of the assessment system makes us detect incoherencies that systematic. Sorenson thinks there is evidence for believing that all animals (as well as humans) tolerate inconsistency systematically. He notes that no animal camouflages itself using inconsistency dissonances. He thinks that if animals didn’t tolerate inconsistency then inconsistency could provide cover from potential predators (Sorenson 2002, p21). Psychologically we are prone to round off in the same way pocket calculators round off 1 divided by 3 to a number less than a third. Although this psychological heuristic is usually undetected and harmless it is a source of inconsistency. When the stakes are raised mere inconvenience becomes more serious. Our psychologies are not available for modification to accommodate the impossible. A pupil of mine once denied that the Penrose Triangle was impossible because his dad had built one out of wood. His devious brilliance was a matter of treating the drawing as merely a two dimensional set of lines on a paper plain and then converting them into a three dimensional interpretation. This is always possible (e.g. Uribe 1986). Resistance to the idea of grading systems as inconsistent representations can take this approach. The attainment of total reliability is achieved by psychmetric means or complete interpretation so that the indeterminate become determinate. The availability of this solution to the sorites is compromised by a psychological constraint. The absolute borderline cases of grade transition are cognitively impenetrable. Sorenson discusses experiments that have calculated what an alien might perceive if capable of perceiving four-dimensional topographies. The experimenters understand geometrically the topographies but can’t visualize the objects themselves (Kim 1978). Sorenson thinks that the fact that we can think about impossible objects and take seriously some version of the research programme of computational modularism should help us believe that we necessarily hold inconsistent beliefs. Philosophers believe this is an incredible belief but it helps explain why the development of validity and reliability and a summative assessment system such as described above appears to be a legitimate approach to classifying educational performances.

10.6 grades and spectra

This thesis is suggesting that we current grading systems for high stakes tests developed and refined over the last fifty or so years, as described above, are impossible objects like a colour spectrum. We can imagine why the thesis believes this if we simplify the situation of grading. Imagine a system where what is being tested is the smartness of candidates. ‘Smartness’ is understood in a non-technical sense that everyone understands, both in and out of the education system. The system has invented an assessment that can validly and reliably detect smartness. It recognizes that there are degrees of smartness and that candidates range from very smart to not smart at all. Just like a colour spectrum can capture degrees of red running from very red to not red at all, our grade system of smartness is equally capable of showing degrees of smartness running from smart to not smart. Of course, the big difference is the source of the knowledge. A colour spectrum depicts impossible perceptual knowledge. A smartness spectrum reflects knowledge about smartness that are derived from knowing what smartness entails. Controversy of course surrounds the nature of smartness, but for the sake of the illustration, let us assume that there is general agreement about smartness and how we know it. However this is accomplished, assume everyone agrees that the grade spectrum captures accurately the knowledge of smartness and its degrees. The difference between the colour and the smartness spectra is not that they don’t both have a sharp transition between red and non-red, smart and non-smart. They do. The difference is that the grade spectrum used for high stakes grading claims knowledge of where the transition between smart and non-smart is to each degree. Where the colour spectrum seems fuzzy at the borderline, so that it isn’t clear where the border is, or whether there even is a border, the grade boundary used in the schools is precisely and sharply drawn, to the fine grain of a single mark. The strangeness of this is often overlooked. The strangeness lies at the heart of this thesis. We can imagine a teacher presented with two candidates work that are different by only a single mark. The teacher is unable to discriminate between the two candidates because the difference is too small to be noticed. (Cresswell 2003, Baird 2000) Yet the grading system, which is supposed to truly reflect ‘smartness’, is able to make the distinction. The strangeness derives from this fact. Imagine a situation where everyone in the world is asked to differentiate between the two candidates and no one can, then this merely emphasizes the problem. A distinction is being made that no one on earth can recognize or make. The assessment system able to make distinctions that no human can make can be likened to an inhuman or superhuman machine, transcending the limited powers of fallible humankind to deliver truths our minds cannot grasp. If we return to the brief history of the assessment system in the UK, there is nothing there that suggests that engineers of the current summative testing arrangements were attempting to construct anything with such powers. Some might argue that to describe the system as an inhuman machine is to overstate an important feature of the system. The requirement to make tests ‘teacher proof’ and of eradicating all idiosyncratic, local, idiomatic features that would prevent the establishment of a completely neutral, universal, standardized and bureaucratic system means that ordinary understanding of ‘smartness’ would have to be sharpened. A stipulated sense of ‘smartness’ is an inevitable consequence of standardization and this sharpening inevitably fails to exactly correspond to natural, ordinary uses of smartness. This is a case of ‘conflict vagueness’ according to Sorenson 1992. (Sorenson 1992) ‘Vagueness’ is being used here as meaning that there is difficulty in locating a borderline for a term. He points out that we assume that we are normally using just one meaning of a term. Sorenson says in ‘Dead Ringers’ a character who finds herself sleeping with both twins when she assumes she is only sleeping with one has assumed a false sense of unity (Sorenson 1992, p168). In the case of our grade system there are now two definitions of ‘smartness.’ One is the normal one and the other is an artificial, technical, invented educational, test definition. Assuming that there is only one meaning is a mistake. Finding out which one is being used in any situation is an epistemological question that usually can be discovered through disambiguating questions. When faced with conflicting meanings Sorenson 1992 doesn’t agree with Sorenson of 2002. The earlier Sorenson believes that when confronted with two versions of ‘smart’ the term has come apart. This early Sorenson thinks we may alternate between the two different meanings. He likens this to conflict behaviour in animals. A herring gull has a reflex to remove red objects from its nest. It has a reflex to roll egg shaped objects into its nest. Confronted with a red egg shaped object the gull will roll the object in and then out, back and forth (Eibl-Eibesfeldt, 1975). When a teacher explains to a parent that their child is smart but not in a sense relevant for the assessment but the next day puts the child through the lesson for the assessment she is showing similar conflict behaviour, moving from one meaning of smart to the other without having the means to resolve the conflict. Early Sorenson thinks alternatively we can average out the use of alternatives. Oxymorons capture this tendency in terms like ‘brunch’ and ‘blind sight’. Cats when they are faced with fight or flee conflict both advance with their back feet and retreat with their front, creating their Halloween arched back. In education some think Diploma qualifications in the UK share this feature. In the debate over academic against vocational educational purpose the diploma presses features from both into one assessment. Candidates for these exams are asked to be prepared for academivocationalism. A criticism of this averaging response is that it disadvantages candidates who prefer just one of the alternatives and find themselves being penalized by having to achieve in the less favoured aspect. Supporters argue that averaging provides a more rounded sense of educational achievement. Or we can use redirection. Robins will attack the ground when angry with a foe that they calculate will kill them. In debates between nature/nurture explanations of smartness, conflicting parties in the debate often accuse the opposition of redirection. Talk of poverty is just redirecting the debate away from the true cause of underachievement which is genetic. Talk of genetic determinants of smartness is in turn accused of being a redirection strategy turning scrutiny away from the true causes of underachievement which is social context. Another strategy is to try and ensure that there was good overlap between the old meaning and any newly stipulated one so that the meanings blur together. The idea would be to make ‘smart’ mean the same thing in certain cases and only diverge in the borderline zone. But reasons for choosing the sharp borderline use of smart would be only justified by evidence of expediency. The Sorenson 1992 thinks that we can stipulate away the difficulty, sometimes with brute power, often using reasons. This is his ersatz solution to the problem. He thinks we can invent our way out of the difficulty ‘…by abandoning the original concept in favour of its more acceptable replacement. Old inconsistent beliefs will just fade away because our credence can now be realigned around the rectified terminology’ (Sorenson 1992, p175). He agrees that this solution is opportunistic and utilitarian rather than ontological and principled. If ‘smart’ is stipulated to mean what the assessment system says it means then it can’t be assigned a truth value according to Sorenson 1992. However, this need not be a permanent feature of stipulation. If, as Sorenson 1992 suggests, the new use becomes standard then truth values will be assigned to it eventually. Setting grade standards for the grade spectrum requires stipulation. Sorenson 1992 thinks that like flattery this is done most effectively when done with discretion. JO Urmson wrote that setting grade standards requires blurring the distinction between the old definition of smart and the stipulated one being used in the assessment system. The difference between proposing criteria and applying already accepted criteria is done effectively using blurring. The way the grading systems were established in the UK story suggests that a combination of means were used. Change is often effectively introduced by drawing on familiar terms and practices emphasizing traditional methods and values whilst also suggesting that new developments will be transformative. In recent reforms in the UK reforms have proposed criteria whilst applying older versions, just as Urmson recommends. So a ‘back to basics’ message which involves ability setting, traditional teaching methods and arguments for the ‘Gold Standard’ of A levels are fused with reforms requiring an ‘education for the twentieth century’, that emphasizes the need for innovation and change blurs the difference between two meanings of educational value. The debate over standards can be conceived in the light of this strategy. In the UK there is much discussion as to whether school assessments are comparable with assessments from previous years. The curious nature of these discussions highlights how stipulated new meanings are presented as being both innovative and transformative as well as being a continuation of old meanings. Those supporting the assessment system argue that comparisons are possible as well as pointing out that new meanings for what is being assessed have been stipulated in order to maintain relevance. Those attacking it argue tend to say that the stipulated new meanings are unwelcome because they change the subject and also that nevertheless they can compare standards of the new invented terms with the old one. Sorenson 1992 thinks that sometimes it isn’t worth explicitly sharpening the border between the two meanings. Sometimes scientists don’t stipulate a sharp borderline in order to protect themselves from making a premature sharpening that is based on too little justification. So ‘virus’ is left to be classified by biologists as an organism even though it is a borderline case. To withdraw organism from classifying virus is not judged to bring about advantages that make the revisionist precisfication worthwhile. In education examiners may be asked to focus on the brute fact of decision making without attention being drawn to the problem of vague borderlines. This strategy is intended to force examiners to proceed as if they can make a decision even if they can’t justify it. Such a decision would be open to the accusation that they are arbitrary. Decisions that are arbitrary are no more informative than decisions based on the tossing of a coin. Some people think that an arbitrary decision is justified so long as it is presented in a way that doesn’t undermine the assessment system as a whole. Timothy Endicott (2000, p201) accepts that in some legal cases judges make decisions that are arbitrary in that they are not guided by the law in any sense. As we have noted, he argues that a judge would not be permitted to decide by tossing a coin. Tossing a coin would not be a disciplined way of coming to resolution. Endicott believes that judicial discipline is required to resist corruption, prejudice and willfulness. So even in a case when the stakes are very high but there are no better reasons for hanging an accused than not doing so, the judge facing the arbitrariness of the decision should ‘ not at any stage give up and say that there is no answer. The need for judicial discipline is a conclusive reason not to flip a coin, even in secret, even when (if ever) it is clear to the judge that the law does not resolve the matter’ (Endicott 2000, p201). Timothy Williamson argues that there is no justification at all for such stipulation (Williamson 1994). Red gradually becomes non-red and smart gradually becomes non-smart. Needing a sharp borderline for a decisive ruling about a borderline case, one is invented. The invention has at a stroke removed the concept. The concept of red and smart has been replaced by homonyms. Whatever is being assessed is not what was advertised. The newly minted school-smart is not smart at. In the tv show ‘The Wire’ a character remarks that a lie is not a different edge of the truth, it’s just a lie. Sorenson 2002 agrees with Williamson and disagrees with Endicott and Sorenson 1992. Sorenson now believes that Endicott’s judges are asked to believe that x is vague but believes that it is precise. This is a version of Moore’s paradox. Moore thought that there are some things that are impossible to believe even if they are not logically contradictory. He gave as an example someone who says ‘It is not raining outside but I believe it is.’ There is no formal logical contradiction in supposing that it is both not raining outside and that the speaker believes falsely that it is. But no person can be in a position to believe such a statement. The older Sorenson now thinks that even the judge who hedges by saying that he is making a decision in a grey area is attempting the impossible. By so doing they attempt to protect themselves from charges of both arbitrariness and error. The hedging is part of what Endicott is including in his notion of ‘judicial discipline,’ and can’t be accomplished by coin tossing or any other obvious concession to the arbitrary. Similarly a teacher might seek to protect themselves from charges of incompetence by saying that they are assessing a borderline case and so although they can make an honest attempt at judgment they may well be wrong. These are both cases of Moore’s counterprivacy absurdities. The judge and the teacher are saying that the case can’t be justified but they believe it is justified thus. They are claiming to both believe and not believe something at the same time. If educational assessments require stipulation of sharp borders then they require that decisions violate the epistemological constraint revealed by Moore’s paradox. The greater the number of grade boundaries, the greater the number of violations are required. Rather than identifying degrees of smartness in candidates it is identifying a purely invented quality that has no independent existent outside of the test instrument itself. This is not merely ‘teacher proof’ it is ‘everything other than the exam proof’. But it is also putting itself outside of the possibility of any belief. No believer can think that something is the case but think it isn’t. This psychological constraint is an a priori reflex of our minds and language. The accusation is that the assessment system violates conditions of possible belief. No one can believe that they know they have identified a sharp borderline for a vague term. Believing is constrained by epistemic concerns, not ontological ones. Even if there exists a sharp borderline between red and not red, we are not permitted to know it because we are constrained by a priori reflexes to think of red as vague. Attempts to overcome these constraints is to misconceive linguistic constraints. Language is normative, conditioning not just what we are actually able to think but also what we ought to think. This is the position of the later Sorenson (Sorenson 2002). The high stakes of some of these summative assessments adds to the problem of using an impossible object as a tool of assessment. Where the stakes are low, arbitrary and unbelievable judgments can be ignored. Lack of focus, lack of caring, hedging, lying to oneself and other tactics accommodate genuine self-contradiction and elide definite necessary confusion. People say what they say without believing what they say in borderline cases and it doesn’t matter. Levels of toleration for contradiction and impossibility are incredibly high in cases where nothing very much is at stake. A person can never believe that it’s not raining and believe it is but can say they do. A teacher or examiner can never believe that a candidate is a borderline case of a pass and believe the candidate has passed. If she says she does then she is lying. If she says it to herself then she is lying to herself. If she says it to others then she is committing a bald-faced lie. If someone believes that all lies are morally reprehensible then the teacher will be condemned as being morally reprehensible in such cases. Aristotle would be someone who condemned the teacher. He thought that “Falsehood is base in its own right and deserves blame” (Nichomachean Ethics, Bk. IV, 1127a, p28-30). Aquinas would also condemn the teacher. He thought that the semantic constraint that Moore’s paradox reveals is applicable in all cases. He wrote that “Words by their nature being signs of thought, it is contrary to their nature and out of order for anyone to convey in words something other than what he thinks” (Summa Theologiae 2a2ae, 110, p3). If however she is faced by people who only care if a decision matters then in a case where the stakes are low then she may be let alone. In this case it may be accepted that she is fooling no one, least of all herself with her remark, but in any case it doesn’t matter enough to be censured. The teacher may not be authorizing the hearer to agree with her. Saying her blatantly self contradictory belief may be an expression of her wanting the hearer to disagree with her and may not be an assertion at all. The teacher may be inviting contradiction to help expose her own confusion. The teacher’s motivation may be an invitation to clarify the difficulty of making a decisive decision in the face of vagueness rather than an assertion of belief in fact. Sorenson agrees with Aristotle and thinks that the teacher would not be making an assertion. In Aristotle’s ‘Topics’ Aristotle codified debating games. One such game was to present a view and then defend it until forced to contradict oneself. The teacher in this case begins with the defeated position and so can’t be asserting anything (Sorenson 2003, p 204-206). Timothy Williamson thinks that a constitutive rule for assertion is that you only assert what you know. (Williamson 1996) The teacher then can’t say ‘ I don’t believe Milad is a pass but I believe Milad is a pass.’ This isn’t an assertion. She can assert, however, ‘I know you won’t believe me but Milad is a pass.’ Some people might argue that the grading system using sharp borderlines forces assertions from markers and teachers that everyone knows are not believed. Michael Dummett thinks this. He argues that when Christians were being persecuted they were forced to make assertions that no one believed. But uttering the forced statements was enough. ‘[H]ere it is the saying that counts. The victim may know that his persecutors will be quite aware that, even if he says what they want him to, he will not believe it: what is important to both of them is whether he says it or not (Dummett 1981, p331). This contradicts Grice’s claim that to assert something is to intend to say something true and intend the listener to believe it (Grice 1989). Robert Brandom agrees with Dummett. He compares assertions with bets. (Brandom 1983) So the teacher is committing herself to future evidence to support the assertion that Milad is a pass. If Milad’s case is vague and she knows this then she knows there can’t be any future evidence to support her assertion. The claim is made under duress. The assessment system itself is forcing her to make an assertion. JL Austin thinks that in this case duress eliminates the assertive force of her statement. He thinks that where there is duress forcing a statement “we may even say the act was `void’ (or voidable for duress or undue influence) and so forth” (1962, p21) Kenyon thinks that “in a wide class of cases, duress seems to eliminate not merely the force of assertion, but also the relevance of content altogether, turning the utterance into a semantically structureless act of capitulation” (Kenyon 2003, p245). The result of having to make what seems like a decisive judgment about a vague case is to use statements with ‘semantic inertness’ (Sorenson 2004). But it doesn’t seem to be true that people put under duress can’t make assertions. Galileo was forced to recant his theory of heliocentricism and agree to restrain from spreading his theory. He agreed to do so. Had he reneged later he would have been killed. According to Sorenson, Catholics differ from Protestants in respect of this; Catholics only burn heretics who return to their heresies. The duress of the assessment system may be forcing the teacher to move from one understanding of smart to the newly stipulated one in the course of one statement. She may not acknowledge this but she is asserting a contradiction nevertheless. Thus she is asserting a falsehood. She can’t believe her assertion. Nor can anyone believe that she believes her assertion. Moorean counterprivacy forces us to think that the invention of known sharp borders for grades forces us to assert beliefs we cannot have. This is close to saying that the grading system of the assessment system forces bald-faced lies. This may be habit forming. The assessment system forcing the lies is backed by legislative powers and peer pressure. This may result in the phenomenon of acceptable lies where lies are accepted as false by everyone but are nevertheless acceptable. Brute force can lead to this. In a regime of political dictatorship people get used to saying false things knowing that everyone knows they are lying. Assessment systems are systems legitimate certain practices and make others illegitimate. In the first section of this chapter we have seen the huge machinery of coercion a high stakes assessment system has. Some think that brute power trumps coherence and truth. In such a system insincerity is one option. When a teacher decides near to the time of a test to abandon normal teaching and teach to the test she can be understood as being cynical and insincere. She believes that genuine learning is not captured in the assessments but knows that she has no power to alter the situation. Such lying can be a symptom of immorality but need not be immoral. A teacher may sincerely believe that teaching her pupils in their street language idiolect would benefit them more than trying to teach them the standard idiolect of power (Trudgill 1983; Coard 1971). However she may nevertheless teach them standard idiolect because she knows that funds would be withdrawn from her project that would have a greater negative impact than not making the concession. She might tell the sponsors that she believes teaching the standardised idiolect the best way to proceed. Even though everyone hearing her knows she is being insincere they may nevertheless find her stance morally good. We know that Endicott rebukes those who suggest that even when faced with an arbitrary decision tossing a coin wouldn’t undermine the integrity of the institution. The discipline required for making genuine decisions requires that even in borderline cases a judge needs to proceed as she would in any other case. The judge is, like the teacher, asked to act like the cricket umpire who makes an arbitrary decision to fulfill the requirement of decisiveness. But knowing that this is happening is easier to accept if one is not enmeshed in the decision. If the stakes are high then even if we are not personally involved in the case the acceptability of such a procedure is questionable. In such cases we know for Moorean reasons that the judge doesn’t believe in her decision and for the same reason we don’t either. The inauthenticity of the judgment can perhaps be dismissed as merely annoying in a situation where the stakes are low but high stakes makes it less easy to be so sanguine. High stakes make rewards for success huge and consequences of failure devastating. There is evidence that success in summative exams in the UK increases average earnings and life expectancy rates. High stakes tests change what is taught and how it is taught. As we have seen, for example, Harlen (2005) discusses the impact of ‘high-stakes’ tests and asserts that they lead to four negative things: teachers focussing on the content of the tests; the frequent administration of practice tests; the training of students in answering test questions to the exclusion of genuine learning; the adoption of ‘transmission’ teaching style, each of which prevents genuine learning. Additionally, the role of school as a transmitter of values and practices was also becoming less effective than required as more pupils in the system found schooling increasingly alienating. ‘Throughout the 1990’s, evidence was accumulating of the detrimental effect of frequent testing on students’ enjoyment of school, their willingness to learn, other than for the purposes of passing tests or examinations and their understanding of the process of learning’ (Harlen 2005). Another important critic, Dylan Wiliam (2003) asserts that schools have also increased their emphasis on the core (reported) subjects, at the expense of other important curriculum activities. For example, the Mathematical Association in 2005 published a paper in which it was stated that: ‘The current assessment system [backed by the accountability structure] encourages a mode of preparation for tests and examinations which focuses solely on the standard questions that appear on papers...[This] leads to the exclusion of more interesting and challenging problems and applications at all levels. These are the very things that are of importance to employers and higher education, because they stimulate interest and encourage independent thinking’. The evidence is that a consequence of high stakes testing is a narrowing of curriculum content in order to ensure test score success which in turn undermines the underlying purpose of the modern education system understood as giving people the genuine generic knowledge that equips them for later and continuous specialised learning. Coupled with Harlen’s research showing that pedagogical approaches have also become narrowed and circumscribed by the over-focus on test scores, the validity of high stakes testing is called into question by these and other like-minded critics. High stakes causes these distortions. Other problems with the adoption of high stakes for summative tests is that they tend to obliterate recognition that often the difference between success and failure is a matter of luck rather than an intrinsic difference in what a candidate is capable of doing potentially or even in what a candidate has actually done. People who think this point out that in many contexts failing and succeeding aren’t contraries. High stakes are connected to boundary setting in Theodore Sider’s paper ‘Hell and Vagueness’ (Sider 2002) where he argues that there is no morally just way of designing a system where consequences are vastly different for differences that are very small. In a theological setting, he argues that if God is always just then God is committed to the principle of proportional consequences which requires that like cases be treated in the same way. He argues that a punishment and reward system involving Heaven and Hell violates the principle because it is possible to imagine two people whose moral worth is different in a small respect being borderline cases. If one went to Heaven then the principle of proportionality would require that the other one did too. Sider imagines a line of all souls being judged with everyone standing next to someone who differed from them in only a tiny degree of moral worth. The principle of proportionality would ensure that if the most morally worthy one went to Heaven then all did. However, if the least deserving went to Hell, then they all would. Unless God can do the impossible and send everyone to Heaven and Hell at the same time then the contradiction would prove God does not have a Hell and Heaven reward and punishment regime.

10.7 the difference made by high stakes

This example of a sorites points to the feature of high stakes. A theologian could draw a different conclusion to Sider. The argument proves that Heaven and Hell cannot be so different that they would violate the principle of proportionality. This revisionist theology would use the sorites to propose that we must assume Heaven and Hell saliently similar in nearly all respects if God is respectful of the principle of just proportionality. The distribution of people being put in either Heaven and Hell would not face the charge of injustice through violation of that principle. The revisionist would be attacked however by traditionalists who would argue that flattening out the differences between Heaven and Hell is blasphemous and so intolerable. The traditionalists argue that the idea motivating the distinction in the first place is one that seeks to insist on an absolute gulf between God’s reward and God’s punishment. The revision undermines the motivation for having the distinction in the first place but at the expense of contradicting the image of God as proportionate in His justice. The revisionist might reply that the gulf is absolute but small. ‘High stakes’ is about the relationship between the consequences of success and failure. Arguments to flatten the difference between consequences of summative assessments in education face the same form of argument as the revisionist theologian. People arguing for a revisionist reading of assessment and its attitude towards success and failure might point out that in many contexts failing and succeeding aren’t contraries. So, for example, Michael Raynor argues that in business the profiles of many market leaders share more with businesses that fail than those that don’t fail but never become market leaders (Raynor 2007). The high stakes of tests based on success and failure demotivate risk-taking and ambition. A mediocre candidate avoids failure. Raynor cites as examples Sony’s Betamax VCR and its Minidisc music player. These were products that failed and nearly destroyed the company because it showed extraordinary ambition, great business strategy but was hit by bad luck (Raynor 2007, ch2). Analogously a risk taking candidate sitting a test may be ruined by bad luck rather than poor strategy and knowledge. Performance may be altered on the basis of risk and reward calculations. A calculation may be that there is more risk in spelling unfamiliar words wrong than in familiar ones. This may lead a candidate to decide to avoid using unfamiliar words to decrease the risk of being penalized. The resulting performance may be deemed mediocre because of this restricted vocabulary. This may cover up the fact that she is a more sophisticated language user than another candidate who performs similarly but has not learned more sophisticated language. Raynor agrees with Theodore Roosevelt, the twenty-sixth President of the USA, who made a speech on April 23^rd, 1910 delivered at the Sorbonne in Paris, called ‘Citizenship in a Republic’, but more commonly called ‘The Man in the Arena,’ which expressed the thought that credit in victory depends on risking failure. Just entering the arena and taking a risk deserves credit. Similarly we recognize the contribution of soldiers who die in overall victory rather than merely rewarding those who lived through to see the success. Teachers often want to commend those students who are ambitious and therefore risk making more mistakes than the students who play it safe and take no risks. A student trying to use a sophisticated vocabulary in her essay is likely to make more errors than the student who plays it safe. Harlen and Wiliam’s criticisms of current testing policy reflects this line of thinking. Similarly, people applaud risk takers like Blondin or the ‘Man on A Wire’ walking on a rope suspended above Niagara or the Twin Towers in New York rather than someone walking along a rope lying on the ground. An assessment that failed to understand the difference between the two accomplishments would be an assessment that misunderstood the tasks and the role of assessment. The motivation for such a revision in educational assessment is to encourage risk taking and ambition. Dan Gilbert (2006) makes a case for linking sexual attraction to reactions to stress. So Gilbert investigates the psychology of attraction by noting that the ‘…same neurocircuitry and neurochemistry triggered in response to stressful events (“flight or fight”) are also triggered in response to sexual arousal’ (Raynor 2007, p2). Gilbert’s experiments propose that love and hate are not linked contraries. Elie Wiesel, the Nobel Peace Prize winner thinks that the opposite of love is not hatred but indifference. Raynor and Roosevelt are arguing that the opposite of success is not failure but mediocrity. Traditionalists counter that measuring learning is the motivation behind having pass and fail grades. To flatten out the difference between success and failure de-activates that motivation. They might argue that without rewards massively different from failure everyone would settle for mediocrity, because mediocrity takes less effort than excellence. Varzi however thinks that there are formal difficulties in proposals to reward mediocrity (Varzi 2004, p107-109). To formally reward mediocrity would be like trying to have a prize for coming third. Varzi imagines an academy trying to introduce such a prize. Varzi thinks that if the second best can also compete for third prize this candidate can afford to ignore the possibility of first prize and settle for third. If the reward for third is similar to first but less costly and risky than first place then a prize for third place acts as a disincentive. It would pay for the second candidate to flunk and come third. Second place would be a worth less than first place and third. Traditionalists could argue that a prize for third is not the same as rewarding mediocrity. They could argue that the only motivation for having assessment is to find out how much a candidate had learned about a subject. Anything that undermined this was what they opposed. Mediocrity would be the measurement of a candidate who had not excelled but hadn’t flunked. The purpose of the revisionists is to use assessment as a motivation for learning but this mistakes summative with an element of formative assessment. Assessment of learning requires the kind of grading system that can truthfully rank order all candidates hierarchically and indicate whether the candidate has learned enough. A pass/fail border is motivated by these requirements and this motivation is sufficient. Traditionalists would then have to notice that the stipulation of grade boundaries undermines the objective motivating the stipulation in the first place. The gradualism of smartness, for example, presents a spectrum like that of a colour spectrum where there seems to be the inconsistency of borderless change. Stipulation of known sharp borderlines changes the subject and so fails to measure degrees of smartness (Williamson 1994). What use has an impossible object like an assessment’s grading system? Such a system may be practically good enough to regulate educational achievement. Even if the grading systems used by high stakes assessment systems are self contradictory because of vagueness grades can be used in a way that ignores the contradictions as much as possible. Muddling along and ignoring the contradictions is a better option than abandoning the whole system if the system produces high value. An assessment system attempts to track actual concepts such as intelligence in an individual. Concepts are not ambiguous but they are almost all vague. If assessment is of concepts then objective vagueness rather than subjective vagueness has to be. Can assessment language adjust itself to avoid contact with borderline cases? There may be cases where this is possible but there is a constraint on the relevance of the adjustment. Adjustment can lead to an assessment missing the point of assessment. A precisification of a vague term, or moving to a type of assessment that avoids vagueneness could mean that it is alienated from the purpose of assessment. It risks inauthenticity. And all such adaptive strategies result in insincerity. An assessment system aware of vagueness should minimise questions that raise the possibility of answers at borderlines. It should ensure that insincerity be minimized because sincerity is a key educational value. Forcing borderlines to be relative to an answering system is arbitrary and leads to inauthentic assessments. Grades have absolute borderline cases because there is no matter of fact that could be known. All bivalent concepts without facts making them so have unknowable borderlines. This is not a case of abnorance, which is ‘… the failure to know something that is not an appropriate object of propositional attitudes’ (Axinn and Axinn 1976, Sorenson ) and is what many taking the hermeneutical stance assume. The proper attitude towards this discovery should be agnosticism. Examiners should withhold judgment in absolute borderline cases and assume that the matter is closed. No evidence will be forthcoming. Sorenson thinks this attitude was invented by him because it is not agnosticism as suspension of judgment (Cargile 1967), nor Stoic agnosticism based on an elitism that thinks borderline cases don’t express propositions (Bobzien 2002), nor Leibnizian agnosticism that bases itself on a nihilism denying that vague predicates apply to anything (Levey 2002, p33) not even Williamsonian agnosticism based on ‘margin for error principles’ (Williamson 1994). Sorenson actually thinks absolute borderlines exist. His agnosticism is based on a priori reasons for the impossibility of knowing their sharp borderlines. The resistance presents itself as a false analytic truth. It follows that we are committed to believing self-contradictions. The logic of self-contradiction is that everything is logically implied from one. Yet the chaos of everything being permissible is constrained by psychology. It is at this point that Fodor’s adaption of Hume’s theory of mind is helpful (Fodor 2006). If our mental architecture is modular then errors remain localized. Sorenson concludes that many constraints on the entailment of contradiction are psychological. But he thinks language reflexes are different from other reflexes in that they are normative. Therefore the constraint on belief is absolute.

class=Section12>

CHAPTER 11: conclusion

‘I’m working with impotence, ignorance [....] My little exploration is that whole zone of being that has always been set aside... ‘Samuel Beckett to Israel Shenker (on 5/5/1956), reprinted in Graver and Federman, eds., Samuel Beckett: The CriticalHeritage, p148.

11.1 INTRODUCTION

Absolute borderline cases exist. None of our educational grading systems acknowledge this. I have argued that Sorenson’s solution to the sorites puzzle requires that we are condemned to ignorance about grade boundaries. Sorenson’s epistemic theory solves the puzzle of vagueness. In doing so it creates a meta-problem. Upon hearing the solution hearers are incredulous. The proof of the solution is logically valid. To deny the solution would require a rejection of the classical logical system. Sorenson considers this too expensive a solution. Instead, beliefs about language held by many philosophers are less essential and so require revising. Two conclusions are drawn. The revision of beliefs about language acquisition suggests a new paradigm not just for educational assessment but for education more generally. And applying vagueness to assessment suggests that sincerity is a further constraint onany assessment.

11.2 to silence

In applying this revisionist position to current models of assessment grading I argue that they have been shown to be incapable of modelling the absolute borderline cases of vagueness. I conclude by saying that this makes current grading systems incomplete. In attempting to force completion current assessments are threatened by claiming to make grading judgments about absolute borderline cases. This is impossible. Because the impossibility takes the form of self-contradiction, the impossibility is a form of absurdity that any commitment to logical inconsistency entails. The thesis has argued that a good assessment system has several minimal requirements. The thesis has argued that consistency constraints are one kind of requirement. Assessment systems must be rational, reliable and valid to ensure consistency. They are also required to be complete. Universals decisiveness and the production of superlatives are requirements of completion. Simplicity is the requirement that the system be comprehensible to users. Formal assessment systems have attempted to fulfil the three elements of consistency by tidying up answer systems to bureaucratic procedure. In terms of universal decisiveness and the production of superlatives success has been at the cost of ignoring the obscurity about whether any answer speaks to the question. Norm referenced psychometric-based answer systems ignore the problem by changing language from a vague language to a precise one. I have labelled these systems scientistic because they attempt to achieve the abstract formality of abstract science. Criteria and construct referenced assessments more closely mimic informal natural systems but this mimicry is incomplete because they insist on decisiveness about absolute borderline cases. The beliefs prevent the assessment systems from modeling thresholds correctly. In particular they deny the idea that vagueness implies absolute borderlines. At best, systems are assuming all vagueness is at most relative vagueness. Claims of objectivity and subjectivity are linked to the formality of the system. But the informality of natural systems results in the paradoxical illusion of vagueness. This commits competent users to make forced analytic errors and pass off contradictions as tautologies. Because what makes my language mine is its relationship with me, this can be characterised as subjective. However, because these are forced errors of competent natural language use, the errors have the normative force of objectivity. They are normative commitments of any competent natural language user. The analytic a priori nature of these commitments survives Both the formal and informal systems assume some kind of conventionalism. Yet conventionalism can’t accommodate absolute vagueness. It fatally confuses user meaning with statement meaning. Considerations of issues of counterprivacy, famously illustrated by the Moorean paradox, shows that the complexity of intentional states is mishandled by conventionalism. I think that assessment systems that model informal systems as used in natural language have been shown to achieve greater reliability and reliability than formal systems. But if they are to be complete they need to be able to accommodate absolute borderline cases. Decisiveness in such cases is absurd. Passing off decisiveness as a sincere judgment with the view to deceive is morally wrong. Considerations of vagueness therefore highlights a moral dimension to assessment grading systems. Universal decisiveness and the production of superlatives are features constrained by the need for sincerity. Sincerity places moral constraints on a grader that conventionalism cannot model. The epistemic solution to vagueness requires greater epistemic modesty about what we can believe and know. The metaphysics of words suggested by vagueness requires that their truth values are independent of facts. We can’t know the last noonish second because there is no fact of the matter to decide it. The gap in facts is not decisive however because the epistemic solution says that something has a truth-value without a truth maker (Sorenson 2001, ch7). This explains why identification of something as an absolute borderline case requires that further investigation is absurd. I think the thesis has shown how the two dominant paradigms of assessment so far being used in education have differed. But I think the application of vagueness to this field has also pointed up fatal similarities too. I have drawn a distinction between assessment using a scientistic model and assessment that uses a hermeneutical model. I criticise both because they fail to model absolute vagueness. I connect the Scientistic project to Behaviourism which reduces talk of belief and intentionality to talk of probabilities of behaviour. Dummett characterises Quine as a philosopher who thinks talk of beliefs, meanings and intentions is meaningless because they can’t be reduced to the behaviour of assent and dissent (Dennett 1979, p 53-70). For Quine, logical truth is characterised in terms of certain patterns of behaviour. Logical laws become well-established hypotheses with pragmatic justification and for this reason the analytic/synthetic distinction (e.g. truths by virtue of definition such as ‘all bachelors are unmarried males’ and truths in virtue of the facts such as ‘there are some bachelors who like butter’ cannot be drawn (Quine 1961b; 1960, p57-67). Sorenson’s explanatory proof of absolute vagueness requires analytic truths but is able to find the resources for them even if Quine’s position held. The hard-wiring of a representational theory of mind using a computational modularity model generates the required analyticity (Fodor 1983, 2003, 2010) But the argument against this approach is one that supports the belief that assessment requires reasoning rather than responses to stimuli. A grading judgment requires that inferences be drawn and this cannot be reduced to behavioural responses. Nor is it probability reasoning. Substituting probabilities for truth draws grading with the wrong crayon. Grading is making an epistemic judgment, an intentional state. This is the thrust of the chapter against psychometric assessment. It removes intentionality. And the world isn’t caused by probabilities. Probabilities are used to model states of affairs which are too complicated in reality to know. But stuff causes stuff not probabilities. No actual coin can be known to have an actual 50/50 proability of coming up heads. Its just that the actual material conditions of a coin are unknown so we substitute probabilities. Physics gets great results from this. But assessment isn’t physics. But the alternative to this approach to assessment, once Chomsky, Fodor, Cronbach and others have pointed out the flaws, imports similar mistakes despite it advertising opposition to the older model. Crudely, the thesis has not been convinced that the Quine/Skinner model of norm referencing has been completely replaced by models of criteria and construct referencing. Prototypical philosophers connected with this alternative assessment paradigm are Wittgenstein, Dewey, Davidson, Heidegger. I have characterised a whole cluster of learning theories in education that are influential on how assessments are characterised as being versions of a common approach. The way that assessment has developed has been to develop the idea that knowing something is a kind of knowing how to do something. For some, so long as a grader is able to apply a grade correctly it doesn’t matter if they know what they are doing. Where the emphasis was on behaviour modification constraints of grading awards, such as sincerity, was being ignored. Wiliam’s work on construct assessment moves away from the measurement paradigm of the psychometric approach but applies a model of conventionalism that interprets a grade as ‘… a statement that the performance is adequate to inaugurate the student into a community of practice’ (Wiliam 1998, p7). Wiliam claims that this is a common practice in Europe where ‘In the European tradition of examining, examination authorities create social facts by declaring the results of the candidates, provided that the community of users of assessment results accept the authority of the examining body to create social facts. That is why, in a very real sense, that as far as educational assessment is concerned, there is no measurement error in Europe!’ (Wiliam 1998, p7). The idea of creating social facts by fiat is a form of conventionalism. However, conventionalism is not able to model absolute vagueness and so either absolute vagueness doesn’t exist or conventionalism isn’t true. I think absolute vagueness exists and therefore thinks conventionalism is false. Therefore I think that Wiliam’s account of constructreferncing is false. I connect conventionalism (the fact that meanings are constituted by our conventional behaviours and uses and we control them) with the general idea that behaviour is prior to thought in the order of analysis. Psychologists too prototypically think this, writing things like, ‘These accounts all share the assumption that knowing the meaning of x involves being able to tell the differences between those things that are x and those things that are not (Bloom 2000, p18).’ Fodor thinks that ‘Wittgenstein … held … that having a concept typically involves knowing a criterion for applying the concept; one that is, at a minimum, reliably satisfied by good instances, in favorable conditions, etc’ (Fodor 2002, p15). I have argued that Sorenson requires Fodor’s idea of concept possession as being an intentional state rather than an epistemic state. In other words, being able to think about grades comes before one knows how to apply it in any particular case. So thinking that Jim’s is an A grade piece of work requires that someone is able to think about A grades and Jim and being able to do that means that I have the concepts for them. Thinking that Jim is an A grade is a mental act or episode. It isn’t a disposition and it isn’t something that comes about through social interaction and agreeing conventional usage of terms or behaviours. It is incompatible with the meaning holist’s idea that ‘…It takes two to … provide an objective test of correctness and failure…’ that ‘...The possibility of thought comes with company. (Davidson, p. 88)" This was the thrust behind the chapter criticizing assessment theories based on various constructivist, social constructivist and constructionist theories, of ideas about context dependency and perspectivism as a way of explaining concept possession. (Maybe one or other of these accounts is probably the best theory we have about how we learn concepts, but concept possession is not reducible to that. ) Instead of having an interpreter (convention) to help you think about grades, the thought is given full ontological autonomy, of the kind given to things like mountains. These are deep waters. The relevance of this deep water is the claim that assessment grading requires intentionality and this places a new constraint on a grader. It isn’t that graders must ‘know’, ‘believe’ and ‘select’ but that they ‘know that’, ‘believe that’ and ‘select for.’ Davidsonian and Wittgensteinian concept pragmatism is accused of getting concept possession the wrong way round. Understanding some sentence involves having the relevant concepts first and drawing inferences from it rather than the other way round. Sorenson’s explanation of vagueness denies that concepts are convention reliant or have their meanings by dint of inferences drawn from the role they play in sentences. A grader must have the concepts she claims to be using. Sorensonian vagueness relies on the fact that people with full concept possession are still ignorant of truth-values in borderline cases. Fodor’s approach, which Sorenson explicitly evokes, explains why this is possible. At least, it explains why the denial of the epistemic priority of sentences to words and/or thoughts/propositions to concepts allows room for thinking that concepts have meanings independently of what people think they are. Fodor thinks that ‘Conceptual role semantics is afflicted with holism and with failures of compositionality. And there are no convincing instances, including `and', where a conceptual role analysis of a word's (/concept's) content provides a plausible and uncircular formulation of its possession conditions’ (Fodor 2002, p25). A grader is committed to possessing the relevant concepts if she is going to think about grading and each proposition. But she can’t change the meaning of the concept. The concept is what constrains what she can think about. It is this that is the source for the sincerity requirement. She must be sincere. That means that she must think the concept applies if she says she thinks it applies. She must be telling the truth. If she can’t tell, then she shouldn’t say she can. Or vice versa. And she can’t invent concept meanings. If she has the concept of a grade and then wonders whether it applies in a particular case this wondering is constrained by the concept itself. If she could decide the conceptual meaning from the way she applied it isn’t obvious how concept possession could guide her decision-making. This prevents the sort of assessment system where fine grading makes discrimination impossible for anyone. Replacing similarity for identity of concept possession fails (Fodor and Lepore 2002, ch8). Assuming that every concept involving inference constitutes concept possession is holism, and this implies nobody shares the same concepts because everyone has eccentric beliefs about everything in some respect. The failure of the analytic/synthetic is the failure to discover enough complex concepts to justify saying that thought is made up of combined primitives. So what a concept is, is a primitive that carries no inferential commitment. ‘Unmarried man’ is no part of the meaning of ‘bachelor’, despite convention. Conventionalism (which is an epistemic account of how we make meanings mean what we want them to mean) can’t answer the question whether Jim is an A grade: what does is the ontological actual fact of whether Jim is an A grade, except in cases of absolute borderlines where the truth-maker is missing. The typical account of relying on epistemic capacities to compose our meanings is compromised by this explanation of absolute borderline cases given by Sorenson (Sorenson 2001, ch11) but was already incapable of sustaining a distinction between concept possession and constructs. Relativising epistemic capacities (such as concept possession of things like the ability to think about A grades) relativises to things that can’t compose. Concepts compose. Therefore prototypes, good instances, favourable conditions, conventionally agreed examples and the like can’t substitute for concepts. As Fodor explains, ‘This is a version of what cognitive scientists call `the pet fish problem'. Good instances of pet fish aren't good instances either of pets or of fish; and (because pet fish generally live in bowls, but typical pets and typical fish generally don't) the best conditions spotting pet fish are generally bad for spotting fish or pets per se’ (Fodor 2002, p13). I’ve argued that being able to sort A grade candidates is about having the concept ‘A Grade’. Knowing that doesn’t mean that I can sort all A-Grades from non-A Grades. It doesn’t mean being able to sort out good instances of A-Grades from non-A-Grades given favourable circumstances. This approach also means that grading isn’t grading behaviour. Grading is what you do when you have the concept of grading. And sometimes you can’t know if it applies. The epistemic condition of ignorance requires that concept possession is independent of epistemic states. And it requires that concept content is independent of how they behave in sentences and thoughts. Concepts can behave in certain ways but their content belies that behaviour. The sorites paradox is a prototypical example of this phenomenon. Its solution requires that all propositions have a determinate truth-value content even though borderline cases behave so that they require belief in indeterminacy. This is why incredulity is an appropriate response to Sorenson’s epistemic solution. It makes the familiar strange. So the application of vagueness to assessment gives a reason for thinking that there can be no cases of grading that are not either true or false. The conditions for asserting a grade are those of any truthful assertion and not invention of social fact. The impossibility of sincerity because of the absolute ignorance in absolute borderline cases prevents pragmatic solutions to the requirement of universal decisiveness. These include Cresswell’s ‘simple heuristic’ and Wiliam’s illocutionary analysis. This is the new news that vagueness brings to assessment systems. Graders need to be sincere in their judgments. Sincerity of belief is complex and modelling it correctly will be important for a competent assessment system that values completeness. It is an additional requirement, alongside rationality, reliability and validity, to the system’s overall consistency. It places severe constraints on the ability of a system to be universally decisive and produce superlatives. These are constraints on the ability of any system to be complete. But competence is undermined if an answer system attempts to answer when required to be silent. Sincerity gives a system the resource to avoid being undermined for that reason. Like the Ancient Stoa, refusal to be universally decisive and identify superlatives is rational and a sign of understanding. The obscurity of borderline absolute cases requires silence because grading requires sincerity.

REFERENCES

Lewis, D‘Truthmaking and Difference-Making.’ Nous 35.4 (2001): 602–15. Melia, J. ‘Truthmaking without Truthmakers.’ Truthmakers: The Contemporary Debate. Ed. H. Beebee and J. Dodd. Oxford: Oxford UP, 2005. 67–84 Lowe, E. J. The Four-Category Ontology: A Metaphysical Foundation for Natural Science. Oxford: Clarendon Press, 2006. Milne, P. ‘Not Every Truth has a Truthmaker.’ Analysis 65.3 (2005): 221–4. Rodriguez-Pereyra, G. ‘Why Truthmakers.’ Truthmakers: The Contemporary Debate. Ed. H. Beebee and J. Dodd. Oxford: Oxford UP, 2005a. 17–31. -Truthmakers Philosophy Compass 1/2 (2006): 186–200, 10.1111/j.1747-9991.2006.00018.x Gellner E (1973) Scale and Nation in Contemporary Thought and Politics ( 1974) London Gellner E ( 1983) Nations and Nationalism Sorenson RA ( 1991) Thought Experiments Oxford University Press

[RM1]

Make it stronger

[RM2]

[RM3]

Too technical Talk about vagueness generally in terms of problems and significance etc talk to man in the pub about it!

[RM4]

Say who he is and why is he there.

[RM5]

Make this clearer! No one believes his solution. Say more about the problem with the problem and how its significance.

[RM6]

Use semi-colons for the list

[RM7]