Category Tweaks: A Case for Clarifying Recursion and Enlarging FLN

Language is in some sense uniquely human, and this demands explanation. From that simple stance a torrent of theory has poured forth for centuries, with no grand unification in sight. Any nontrivial account of the origins and parameters of human language is inherently massively cross-disciplinary, and here we run into problems — experts in one discipline may lack precision in others, and thus fail to properly contextualize their findings, either by mis-appraising evidence from other fields or ignoring them entirely. The prudent strategy is agnostic and exploratory — look for isomorphisms across disparate spheres, lash them together, and see what sticks.

In the latter half of the twentieth century, Noam Chomsky made a series of observations about the operation of grammar that ignited tremendous debate. Noting that all human languages share certain properties that no other system of animal communication does, he presented a case for sharpening the distinction between uniquely human capacities — the faculty of language in the narrow sense, or FLN — and any mechanisms we share with other species — the faculty of language in the broad sense, or FLB — then using the resulting models to guide language research in other areas. After decades pursuing progressively simpler and more general models of syntax, he eventually came to believe he had pinned down the most essential abstract operations distinguishing human from animal communication.

Chomsky’s Minimalist Program aims for the strongest possible economy of derivation and representation, but this is not necessarily a valid basis for investigating language. Johnson and Lappin (1997) lambasted the MP as divorced from empiricism and vague to the point of uselessness. Pinker and Jackendoff (2004) objected that Chomsky had thrown too many babies out with the bathwater, writing off systemic adaptations for language as nonessential in a manner that did not conform to any coherent timeline of evolutionary biology. As much ink has been spilled in mutual admonishment for misinterpretations and strawmen as on these foundational differences (cf. Roberts 2000,  Holmberg 2000, Lappin et al. 2001, Fitch et al. 2005, &c).

The concept of recursion is central to Chomsky’s description of language, and over the years he has become more adamant that it is the primary indispensable component of FLN. Many objections to the MP hinge on the inadequacy of existing explanations for how the implications of this hypothesis can be reconciled with what we know about evolution and language acquisition, and many counter-objections contend that recursion is being misunderstood. It behooves us to examine it closely.

Recursion is best defined as a property of definitions. Recursive definitions are those that precisely describe an infinite set by specifying both a finite subset of its elements and a finite number of rules sufficient to describe all other elements in terms of that subset. For a definition to meet all of these criteria fundamentally implies that the set of rules must be self-referential <1>. This is because they are finite in number, yet must be used to discriminate whether any given element is a member of the infinite set in question — they must specify a procedure that can be applied to the finite subset of pre-defined base cases to produce members of their infinite complement, and also be applied to any such member to produce a different member, such that the same procedure can be repeated an arbitrary number of times to produce an ordered series of valid member elements beginning with a base case and terminating with the element in question. It is strictly equivalent to state that the rules must specify a procedure which can be applied to any element of the infinite complement, or to its own output, and by indefinite repetition produce a series of valid members of the full set that terminates in a base case.

These terms are cumbersome in the extreme, but have the virtue of exactness — this is easiest to see with purely computational examples. The infinite set of natural numbers may be defined recursively by specifying that 1 is a member and that adding 1 to any member produces another member — this is sufficient information to determine whether any given number is natural or not, by the simple expedient of seeing whether you can count to it. The Fibonacci sequence, another infinite ordered set, can be similarly defined by specifying the first two elements (conventionally 1 and 1) and defining any subsequent element as the sum of the previous two. In more formal notation, we can completely describe the entire infinite sequence using a finite subset of base cases and a self-referential successor function as follows:

F(1) = 1; F(2) = 1; F(N) = F(N-2)+F(N-1) Ɐ N>2.

Less trivially, entire mathematical proof systems can be defined recursively — with a finite number of axioms and a self-referential ruleset for combining them into valid expressions (and those into still larger expressions, &c.), we have accurately described the set of all expressions which can be proved within that system <2>.

It cannot be overstressed that recursive definitions, and the self-referential functions they include, are mathematical abstractions, and like all mathematical abstractions only have explanatory value for real-world phenomena insofar as they accurately predict those phenomena. This means that declaring a system or process in nature to be recursive without elaborating is ambiguous and imprecise — it is necessary to specify how and why it might be useful to model it using an infinite set as defined, tersely, by a finite subset and a self-referential successor function.

What sorts of systems are useful to model this way? Real-world phenomena that produce detectable patterns over time are well-modeled by recursive definitions if the space of potential patterns is large, the number of basic components that comprise the patterns is relatively small, and the state of the system at any given time consistently depends on a recurring elaboration of the prior states. As a rule of thumb, we tend to find it easy to conceive of these processes as tree-like.

Chomsky’s central insight is that the syntax of every human language <3> appears to involve such a process. They all make a practical distinction between grammatically well-formed and grammatically ill-formed sentences (independent of truth or intelligible meaning, hence the infamous well-formed but meaningless sentence “colorless green ideas sleep furiously”), and they all seem to arrange words in a binary hierarchy, embedding short sentences into longer ones by merging exactly two elements into a nested set that embeds like one of its members. These universal properties of grammar seem to imply a self-referential ruleset that maps a finite set of known words to an infinite class of sentences.

Of course, in reality, there are no infinitely long sentences, and the number of sentences ever composed is and always will be finite. Infinitude is a property of the model, not the phenomenon. Nevertheless, the most parsimonious way to explain the way we classify sentences is to suppose that something in the human brain is applying a recursive definition (Hauser et al. 2002) — nobody does it by memorizing a large set of sentences. Since various properties and implications of recursive definitions have been formally derived in a set-theoretical context, this presents a tempting opportunity to see if known math can point the way to unknown biology.

So, the sequences of words produced by humans using language appear to belong to infinite sets of grammatically valid sentences, and membership in those sets is reasonably well-described by self-referential definitions that involve embedding a finite vocabulary of words into sentences that can, in turn, be embedded like words in a larger sentence, ad infinitum. Thus, humans must, by whatever mechanism, be repeatedly performing some generative procedure that composes words into sentences before actually outputting them. Could this really be the unique, centrally human trait that underlies our capacity for language and abstract thought?

Not on its own. Nature abounds in generative systems whose output can be classified accurately with recursive definitions — motor sequences in rodents (Berridge 1990) and avians (Berger-Tal, 2015) are well-described with such models (Bruce 2016). Even sunflowers, not often hailed for their sophisticated communicative abilities, grow seeds in a pattern predicted precisely by the Fibonacci sequence, with each concentric ring (outside the innermost two, the base cases) containing exactly as many seeds as the sum of the previous two layers. That we produce vocalizations that tend to match a self-referential formula says nothing at all about our mental capacities; it is quite easy to describe an obviously non-conscious machine that outputs an unbounded variety of syntactically correct sentences (Searle 1980). That a process can be correctly predicted by applying a concise recursive definition implies only that it involves some sort of iterative feedback — it crops up everywhere we look. That may, in fact, imply strongly that the ability to recognize iterative feedback is much more interesting than the feedback itself.

To call human syntax recursive without further clarification equivocates at least two things — one very broad, and the other very narrow. The pattern of syntax is recursive in that it can be accurately described by a finite set of words and a finite set of self-referential functions: for a simplified example, one could tersely describe a valid sentence as a noun phrase and a verb, which noun phrase may consist of a single atomic word or a subsentence (Carstairs-McCarthy, 2000). Thus, a starting vocabulary and a relatively simple embedding function can accurately classify an infinite hierarchy of possible valid sentences, in the same way that two seed values and a self-referential summation function can accurately classify an infinite number of possible valid sunflower ring counts. Syntax is, in the productive sense, trivially recursive.

The really surprising thing about humans is not that we can produce patterns that are well-modeled by recursion, but that we can model them. We are not only able to generate recursive syntax, but to interpret it — we hear a finite number of sentences in our lives, yet somehow converge on simple rules that allow us to instantly agree whether a completely novel sentence is syntactically correct or incorrect. No other creature appears to be able to classify sentences in this manner — indeed, no other creature appears to be able to classify anything in this manner.

It is possible to express a hypothesis in purely behavioral terms: modern humans are the only animal that can adapt to a recursive reward schedule. That is, presented with the output of a recursive process, we can divine the successor function behind it well enough to predict further output with arbitrary precision, while any other species is unable to do the same. Crucially, in principle, it should be possible to test this without using any purported components of FLB whatsoever, and compare results meaningfully across clades.

Although no experimental paradigm has proceeded explicitly on this basis, several studies suggest it may be worth investigating. Human music embeds (non-semiotic) phrases and is well-modeled by recursive definition, but other primates cannot predict it well enough to clap along (Honing et al. 2012). No nonhuman species can reliably complete any analogue of the Towers of Hanoi puzzle, which human children can solve by deriving a general rule that applies across more steps than they have short-term memory to recall (Martins 2016). Some avians produce calls that are center-embedded in a way that is reminiscent of syntax, but they appear to recognize and interpret calls from conspecifics on a purely phonological level (Corballis 2007). Animal numeracy is extremely limited and its accuracy decreases precipitously as numbers get larger — they cannot learn except by ‘brute-forcing’ the problem space, as opposed to inferring the underlying rules. Avians can be painstakingly trained to recognize sets of up to six items (Pepperberg & Gordon 2005), and chimpanzees have some ability to recognize relative quantity and proportion (Woodruff & Premack 2001), but no amount of training seems to induce a schema adequate to the task of simple counting. Human children, in contrast, generally derive the successor function for counting by around age four (Fegenson et al. 2004), despite far less intense training on a far smaller sample set. Rosenberg (2013) finds that human infants likely organize their memories into binary hierarchies — isomorphic to universal grammar — and can use them to model the future. If no extant non-humans can make predictions based on self-referential definitions, it seems reasonable to speculate in terms of an exclusively human Faculty of Recursive Prediction, or FRP.

Two factors may excuse adding FRP to the already-overflowing bowl of alphabet soup that is contemporary linguistic acronym space: distinguishing recursion in a productive sense from recursion in a predictive sense, and reconciling whatever is useful about the MP with a coherent timeline of human evolution.

Proponents of the MP often suggest that recursive syntax is intimately related to our capacity and inclination to categorize in nested binary hierarchies — what Fitch (2014) calls our ‘dendrophilia’ — but the nature of the relationship is not always clear. A predictive account may clarify matters: a reductive definition of prediction might be phrased as the reparative correlation of schema with environment over time, and the evolutionary value of FRP as the ability to map a territory that includes other mapmakers.

In some sense prediction is most of what nervous systems do: they allow an organism to adapt their behavior to their environment quickly, rather than wait for a hard-coded adaptation to evolve. Mechanisms to minimize predictive error over time are built into brains at every conceivable level, and we have a good idea how many of them work (cf. Clark 2013, Corlett et al. 2009); prediction-based descriptions have a good track record of usefully informing biological research.

Chomsky’s Poverty of Stimulus argument (1980) states that children’s ability to learn an universal grammar from a very limited set of exemplars implies that such grammar is inborn. This is must be true in some sense — if humans can recognize and classify an unbounded quantity of novel sentences, and no other species can do so, some biological mechanism must account for it. Since we are necessarily descended from entirely alinguistic species, however, this raises the obvious question of how and when such a mechanism emerged, and the MP’s failure to provide an adequate answer is one of its most glaring flaws. Once convinced they could not ignore natural selection entirely, proponents have generally hypothesized a single mutation that spread through the human population around seventy thousand years ago, when widespread evidence of abstract art and rapid technological innovation begin to appear in the fossil record. Detractors rightly point out that mutations do not spread through an entire species for no reason, and that the capacity to interpret syntax does not provide any obvious advantage to an individual unless born into a world where it is already in widespread use — instead, many argue, it must have emerged gradually in concert with incremental refinements to the plethora of systems the MP lumps under FLB.

It is nevertheless worth trying to salvage the idea of a small, central language innovation in recent human lineage that is more than the sum of FLB, because a pure gradualist account does not explain modal independence very well. The communicative faculties of the modern human brain seem to use any intervening machinery they can get their hands on — this is why Hauser, Chomsky and Fitch (2002) argue for a narrow language faculty ‘at the interface’ of those systems. The modern language faculty shows a remarkable tendency to adopt entirely novel modalities on sub-evolutionary timescales — writing, for example, is quite recent, but does not depend any new physiological systems. A human may lose their entire vocal apparatus and still use sign language, which exhibits the same properties as spoken language (including de-novo formation of syntax, cf. de Vos & Pfau 2015). Persons who can blink but retain no other voluntary motor control are still able to communicate in language, if painstakingly; we should pay close attention to whatever it is in that situation that is fundamentally indispensable. It makes sense to draw some kind of distinction between peripheral capacities of language and central ones, even if the line between FLN and FLB in the MP may be drawn in the wrong place.

If FRP is a coherent notion, and reflects some underlying biological trait unique to modern humans, it is tempting to suppose it might be useful to a single individual even in the absence of conspecifics with the same trait, and thereby resolve the Promethean paradox. If we were steelmanning the MP, we could imagine that that FRP was enabled by some small mutation that spread through a world of silent apes on its own merits, and language was later exapted once the mutation had become common enough. On reflection, however, this is unlikely: if it is so universally advantageous, and simple enough to arise in one step, why hasn’t an homologous mutation occurred in other clades? We would need to explain why it provided a selective advantage in human ancestors but nowhere else.

This problem could potentially be resolved by enlarging, very slightly, the set of capacities we consider most central to language. If FRP only provides selective advantage in combination with some other indispensable trait, and that trait is very rare, we have a convenient way to explain why it has no equivalent in other species — for most of them, it would be useless. FRP enables the efficient reception of grammatical sentences; what if we could describe an environment where their transmission was already common?

Displaced reference is not unique to humans, but only barely. Honeybees famously dance to direct their hives to food, but this behavior does not generalize and is likely the hard-coded result of billions of years of eusocial haploidy. Some evidence exists that corvids can recruit others to a distant carcass (Heinrich 1998), but aside from this we are alone in our ability to refer to distant phenomena — even chimpanzees trained to correlate objects with arbitrary symbols or motor patterns appear to use them exclusively in reference to immediate stimuli, and only under prompting (Terrace 1979). Displaced reference has little obvious bearing on syntax, but is crucial for the use of words. Several theories of protolanguage are predicated on words emerging before syntax, noting that even a small vocabulary is immediately useful. However, they tend to have a hard time explaining why the specific universal grammar humans appear to use — which Bare Phrase Structure (cf. Chomsky 1994) describes with impressive parsimony — was adopted identically across the species.

Can we combine these ideas? As noted above, it’s much more common for an organism to make a series of movements that is well-described by a recursive definition than to be able to apply one predictively. If we do get some aspect of embedding syntax ‘for free’ from initially asyntactic protolanguage (cf. Heine & Kuteva 2007), that opens up the intriguing possibility of an implicit syntax — one that is entirely an artefact of generative functions, but initially could only be interpreted by simpler heuristics than FRP. This would explain the latter’s rarity and utility: if and only if your conspecifics are already displacers who talk, FRP lets you predict them better, and vastly expands the range of what you can talk about in the future. In such a model, focusing on displacement and FRP as separate but central traits would sacrifice the minimal amount of minimalism necessary to avoid the pitfalls of the MP’s magic bullet mutation. In essence, it would mean a two-factor FLN.

The ultimate object of reorganizing categories like this is to reconcile with natural history — the territory to be mapped is made of eras and events, not logical categories. Since current language behavior is a proper subset of the history of life, the former must conform to the latter in any model that aspires to be more than domain-specific. Cross-disciplinary speculation is asymmetrical; in the same way that chemistry is downstream of physics, linguistics is downstream of evolutionary neurobiology. A useful model in physics constrains chemistry, whereas a useful model in chemistry can at best suggest fruitful avenues of investigation in physics.

To turn two-factor FLN into a coherent speculative timeline means properly situating two events — the emergence of displacement and the later emergence of FRP — somewhere in the six million years or so since our last common ancestor with chimpanzees, and also describing what happened before, between, and afterwards — roughly speaking, three different era-specific slices of FLB development. Every trait involved in the account must confer plausible selective advantage; moreover, if they emerged suddenly, they must be plausibly attributable to a single mutation, or plausibly exapted as a spandrel of one or more traits with independent advantages to explain their prior development. The best estimate for the emergence of displaced reference is just over one million years ago, exapted from a limited call system achieving immediate reference and a capacity for intersubjectivity driven by alloparenting; the best estimate for the emergence of FRP is around seventy thousand years ago, triggered by a single mutation in otherwise anatomically modern humans who were already highly dependent on a mature protolanguage.

After their divergence from chimpanzees, the various hominid species adopted bipedal gait, encephalized rapidly, and lost much of their body hair. These traits are often considered adaptations to climate change or rapid deforestation — there exist explanations based on other pressures, but as gross anatomical adaptations they are in any case not fundamentally difficult to explain individually in straightforward gradualist terms. In combination, however, they created a reproductive bottleneck: in order to pass progressively larger heads through progressively narrower pelvises, these species had to give birth to progressively more premature offspring, which could no longer cling to hair and had to be carried and cared for for long periods before they could begin to feed themselves. Sarah Hrdy (2009) marshals a compelling case for a sudden shift to alloparenting in erectus between one and two million years ago, driven by the caloric demands of these helpless infants, and suggests that leaving children in the care of other adults allowed for a leap in the efficacy of foraging. She attributes the rapid subsequent development of social intersubjectivity and joint attention faculties to the comparative survival rates of infants better able to monitor and engage with multiple caretakers, and her account dovetails very well with the rhythmic, reparative dyadic and triadic interactions in early ontogeny described by Meltzoff, Trevarthen, and Stern (Beebe 2003). We may reasonably suppose that something more like modern human childhood than chimpanzee childhood emerged around this time: it was the first point at which infants had to be fed and held for many months before they were capable of even rudimentary locomotion, and were in the main only able to interact with the alloparents on whom they depended by vocalizing or by moving their eyes and faces.

Modern human infants typically begin to use simple words for displaced reference soon after they become capable of triadic interactions but before they begin to use any appreciable syntax. Before uttering any words, they hear quite a few — the stimulus is impoverished, not destitute — and babble in return, articulating phonemes stochastically and noting their effects in a manner that predicts later language ability (McConnell 2008). Erectus infants had no words to hear, so we are faced with another chicken-and-egg problem, but it may be resolved if we suppose that they were hearing something sufficiently word-like.

As with hard-wired alloparenting, hard-wired call repertoires are displayed by various primates but not by great apes. They are generally inborn, involuntary, and immutable, but despite this serve well for implicit immediate reference — groups of vervets, for example, will quickly coordinate their behavior en masse in response to a predator-specific alarm call, and may sometimes combine calls (Fischer 2013). Whatever pressures produce such systems, they are evolutionarily unremarkable for primates, and something homologous could plausibly have emerged ‘from scratch’ in our genus between one and six million years ago. If so, an erectus infant, like a modern infant, would have been cared for by adults who changed their behavior noticeably in response to certain phonemes, while simultaneously having very few other means available to affect their environment in any way — ideal pressures under which to voluntarize vocal production and generate new labels for shared referents. Thus armed with a repertoire of calls establishing immediate reference and primed to synchronize attention with conspecifics, erectus would have had all the prerequisites at hand to exapt displaced reference and gain the immediate advantages of a minimal, asyntactic vocabulary, one that could expand on sub-evolutionary timescales by virtue of their close imitative coordination and in contrast with the glacial pace of inborn call development.

Derek Bickerton’s admirably detailed account of protolanguage (2009) presupposes a discontinuity with animal communication; if the preceding model of displaced reference as a spandrel of vocal immediate reference (rare in nature but common in primates) and a cluster of helpless infancy adaptations (unique in primates but not in nature) is enough to explain that discontinuity, we have arrived at his starting point. He hypothesizes that displaced reference was a crucial element in constructing a high-end scavenging niche, initially enabling recruitment to distant carcasses by galvanizing conspecifics with an imitation of the animal in question. This usage is iconic rather than properly semantic, but over time, he contends, the imitation sounds would become decontextualized, evoking associated memories of the referent independent of any call to action, and thereby be transmuted gradually into full-fledged conceptual words.

Bickertonian protolanguage is a reasonable explanation for the success of erectus and subsequent members of our clade in expanding their range out of Africa and across the Old World. Simple words, strung together in short utterances without any particular structure, are more than sufficient to account for a collective advantage in scavenging and, later, in hunting — displaced reference allows for unprecedented coordination of activity. Quite a lot of communication is possible using words without any embedding syntax, and Bickerton compares such protolanguages to pidgins, which share this feature.

There is a problem with this comparison. Pidgins may spontaneously develop into creole languages, which do use embedding syntax, in a single generation — as soon as they are learned by modern human children, they pick up universal grammar and start accumulating the bells and whistles any language does. Yet it seems unlikely that erectus protolanguage did the same — tool industries stayed relatively stagnant across all hominids for the next million years, and there is little evidence of any representative art or symbolic physical culture. If this reflects a protolanguage that was stuck in a pidgin-like state, we need another evolutionary event to account for the leap to modern syntax and subsequent cultural takeoff. Bickerton thinks they were waiting for Merge, the abstract operation with which Chomsky (1995) describes the binary embedding process of his purportedly universal Bare Phrase Structure grammar; Gilles Fauconnier’s notion of blending (1999) might also fit the bill.

Modern human languages all exhibit duality of patterning — they compose a finite number of meaningless phonemes into a larger finite number of meaningful words, and also compose those words into an unbounded <4> variety of syntactic sentences. A long period of protolanguage before some sort of mutation enabling syntax helps account for this, but we still have to explain why the latter occurred only in the context of the former. Merge in particular was not originally developed as a concept under any assumption that a form of protolanguage was necessary to explain the advantage it conferred — we either lack an explanation of that advantage, or we lack an explanation for why it was not selected for in species without displaced reference already operating.

An alternative approach to protolanguage attempts to derive embedding syntax bottom-up from pure phonology. Andrew Carstairs-McCarthy (1999) has an account along these lines that describes the exceedingly gradual self-organization of something like universal grammar production from bare syllables, using such heuristics as call blending, synonymy avoidance, and implicit topicalization. If this model or any like it hold up, Bickerton’s pidgin-like protolanguage would, in fact, become gradually more like a creole language over time, at least in terms of production.

This is where FRP fits. If we accept that syntax could have arrived implicitly, strictly as an artefact of production or modulation and not necessarily a capacity of reception or demodulation, we get a built-in reason why the latter key mutation for which Bickerton is searching has to be unique to humans. Early protolanguage would have resembled the asyntactic ‘string of beads’ he describes, while late protolanguage would sound, to a modern human, as though it had syntax — but the hominids using it would still be interpreting it like a string of beads, severely limiting their ability to predict what their conspecifics were going to say or do next, at least as compared to a modern human. This sets up enormous selective pressure for FRP to catch on — a recursive predictor would be immediately privileged reproductively to the extent protolanguage was already productively syntactic, allowing them to adapt to already-crucial patterns of information in their environment better than anyone else around them. They would also get several uses for the same capacity — recursive prediction does not just allow for modeling, e.g., sentences of the form “If I say that she said that he said that… [&c.], then [X]” but also for predictively modeling the social situation that underlies such a sentence. They would be the first individuals capable of deliberate story-spreading or behavior manipulation — essentially, they would be capable of domesticating their peers, in a manner somewhat reminiscent of Julian Jaynes’ ‘bicamerals’ (1976). This holds as long as they had a ready-made productive embedding syntax with which to work.

By contrast, in the absence of such an implicit grammar, FRP is not useful: it confers no advantage if there is no ready-made system for it to predict. Indeed, it could easily be disadvantageous — disruptions of predictive mechanisms in humans often result in severe apophenia, a tendency to see patterns that aren’t really there. There is general consensus that abstract symbolic artifacts in the fossil record are plausibly indicative of behaviorally modern language, but by evolutionary standards they are expensive — every hour spent, e.g., carving sympathetic magic figurines under the mistaken impression they will summon what they resemble is one that might be spent avoiding starvation. FRP is tremendously useful to an ultrasocial species with a highly developed protolanguage, because the advantages outweigh the costs — anywhere else it emerges, it should be expected to die out <5>.

A “just-so story” stands or falls on falsifiability. The preceding sketch depends on the exaptations described to account for two modally independent core language faculties emerging roughly a million years apart; if any aspect of the evolutionary biology is definitively disproven, everything after that point should be thrown out, and if either of the two core faculties is shown to be common in other species or inextricable from broader faculties it does not belong in a two-factor FLN. It is also possible to attack with direct experimentation — a nonhuman species with FRP or a species with FRP but no displacement would knock the concept down.

As for gathering evidence, it might be interesting to design a test for FRP simple enough to be administered identically to animals and humans. Something along the lines of Stephen Wolfram’s elementary cellular automata might work — subjects would be trained to classify patterns of two colors as valid or invalid according to an ultra-simple embedding rule, and receive reward for correct answers. If FRP is a meaningful concept, animal performance should always decline relative to the length of the pattern, while humans above some critical age should be able to maintain accuracy indefinitely. If that critical age turns out to fall invariably after syntactic language is already in use, that would imply that syntactic language enables FRP rather than the other way around; if not, however, we would have an ontogenic window in which to look for physical structures to investigate, one we could potentially narrow by investigating what types of aphasia do or do not knock out all of FRP.

The real holy grail, as ever, is a well-understood set of neurobiological mechanisms accounting for the most central aspects of language, whatever they may be: mechanisms we can thoroughly interrogate down to the level of developmental genetics. To really, definitively pin them down for good would necessitate a bizarre megaproject, one that would dwarf the scale of less ambitious scientific endeavors (e.g. CERN, the Manhattan Project, the Apollo Program): to really understand how language evolved, we would have to evolve it again from scratch. It should in principle be possible to test a model of language origin by artificially selecting a very large population of apes through each stage of language acquisition in the last six million years of human lineage over the course of a scant few tens of thousands of generations — in the case of the above model, from referential vocal production to alloparenting to displaced reference to protolanguage to syntax to FRP — and wind up with a nonhuman being capable of true speech. Until we have met such a being, any universality in human language means we are necessarily extrapolating speculatively from a single species-wide exemplar, an N of 1. Barring a fortuitous quintillion-dollar grant to make linguistic uplift a reality, we will have to wait for incremental improvements in neurobiology to shed light on an incomplete narrative, or hope for more unforeseen revelations from disciplines not previously consulted.

<1> This is a consequence of the Recursion Theorem. Self-referential rules and their base cases, as described, are often simply referred to as recursive functions; this is deliberately avoided here to reduce conflation of terms.

<2> The set of all decidable statements in such an axiomatic system is, notably, not generally identical to the set of all necessarily true statements (cf. Gödel 1931).

<3> There is some (hotly contested) evidence that the Múra-Pirahã language may lack this quality, but the Pirahã are readily capable of learning other languages that have it, and thus do not on their own invalidate Chomsky’s hypothesized biological adaptation for universal grammar.

<4> It is important to distinguish the unbounded from the infinite — the number of actual sentences spoken cannot be infinite, but the schema we use to interpret novel sentences has to involve discrete infinity.

<5> Some findings suggest limited abstract art in prehuman hominids, such as purported Neanderthal burials in Europe and Erectus carvings on Java — these could be brief local flare-ups of the FRP trait that fizzled in the absence of a sufficiently advantageous application.

Beebe, B. (2003). A Comparison of Meltzoff, Trevarthen, and Stern. Psychoanalytic dialogues, 13(6), 777-804.

Berger-Tal, O. (2015). Recursive movement patterns: review and synthesis across species. Ecosphere, 6(9), 1-12.

Berridge, K. (1990). Comparative Fine Structure of Action: Rules of Form and Sequence in the Grooming Patterns of Six Rodent Species. Behaviour, 113, 21-56.

Bickerton, D. (2009). Adam’s Tongue: how humans made language, how language made humans. New York: Hill and Wang.

Bruce, R. (2016). Recursion in fixed motor sequences: Towards a biologically based paradigm for studying fixed motor patterns in human speech and language. CEUR Workshop Proceedings,, 272-282.

Carstairs-McCarthy A. (2000) The distinction between sentences and noun phrases: an impediment to language evolution? The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form: 248-263. Cambridge: Cambridge University Press.

Chomsky, N. (1980). Rules and representations. Oxford: Basil Blackwell.

Chomsky, N. (1995). Bare Phrase Structure. Oxford: Basil Blackwell.

Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36, 181–253.

Corballis, M. (2007). Recursion, Language, and Starlings. Cognitive Science, 31: 697–704.

Corlett, P. et al. (2009). From drugs to deprivation: a Bayesian framework for understanding models of psychosis. Psychopharmacology, 206(4), 515–530.

De Vos, C., and Pfau, R. (2015). Sign Language Typology: The Contribution of Rural Sign Languages. Annual Review of Linguistics, 1, 265-288.

Feigenson, L. et al. (2004). Core systems of number. Trends in Cognitive Sciences, 8(7), 307-314.

Fischer, J. (2013). Vervet Alarm Calls Revisited: A Fresh Look at a Classic Story. Folia primatologica, 84(3-5), 273-274.

Fitch, W. (2014). Toward a computational framework for cognitive biology: Unifying approaches from cognitive neuroscience and comparative cognition. Physics of Life Reviews, 11(3), 329–364.

Fitch, W. et al. (2005). The evolution of the language faculty: Clarifications and implications. Cognition, 97, 179–210.

Hauser, M. et al. (2002). The Faculty of Language: What Is It, Who Has It, and How Did It Evolve? Science, 298(5598), 1569-1579.

Heine, B. and Kuteva, T. (2007). The Genesis of Grammar: A Reconstruction. Oxford: Oxford University Press.

Heinrich, B. (1998). Winter foraging at carcasses by three sympatric corvids, with emphasis on recruitment by the raven. Behavioral Ecology and Sociobiology, 3, 141-156.

Holmberg, A. (2000). Am I Unscientific? A Reply to Lappin, Levine, and Johnson. Natural Language & Linguistic Theory, 18, 837–842.

Honing, H. et al. (2012). Rhesus Monkeys (Macaca mulatta) Detect Rhythmic Groups in Music, but Not the Beat. PLoS ONE, 7(12), e51369.

Hrdy, S. (2009). Mothers and Others:  the evolutionary origins of mutual understanding. Cambridge, Massachusetts: Belknap Press.

Janyes, J. (1976). The Origin of Consciousness in the Breakdown of the Bicameral Mind. Boston, Massachusetts: Houghton Mifflin.

Johnson, D. and Lappin, S. (1997). A Critique of the Minimalist Program. Linguistics and Philosophy, 20, 273–333.

Lappin, S. et al. (2001). The Revolution Maximally Confused. Natural Language and Linguistic Theory, 19, 901–919.

McConnell, E. (2009). From baby babble to childhood chatter: predicting infant and toddler communication outcomes using longitudinal modeling. Kansas: ProQuest.

Gödel, K. (1931). Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme, I. Monatshefte für Mathematik und Physik, 38(1), 173-198.

Pinker, S. and Jackendoff, R. (2004). The faculty of language: what’s special about It? Cognition, 95, 201–236.

Roberts, I. (2000). Caricaturing Dissent. Natural Language & Linguistic Theory, 18, 849–857.

Rosenberg, R. (2013). Infants hierarchically organize memory representations. Developmental Science, 16(4), 610-621.

Terrace, H. (1979). How Nim Chimpsky changed my mind. New York: Ziff-Davis.

Van Heijningen, C. et al. (2009). Simple Rules Can Explain Discrimination of Putative Recursive Syntactic Structures by a Songbird Species. Proceedings of the National Academy of Sciences of the United States of America, 106(48), 20538-20543.

Woodruff, G. and Premack, D. (1981). Primative mathematical concepts in the chimpanzee: proportionality and numerosity. Nature, 293(5833), 568–570.

Reason and the Ripper

A seven year old girl and a six year old boy — best friends who live in the same apartment building — set out together for their neighborhood bodega to buy some ice cream. In the building’s elevator, a man suddenly and brutally attacks them with a knife, killing the boy and wounding the girl. He flees. In the aftermath, as bloodstains are photographed, police assure gathered crowds of neighbors they are diligent and inexorable. At a nearby hospital, the boy’s mother falls to her knees in the street and cries aloud to God.

This crime is neither abstract nor hypothetical; it occurs on Schenk Avenue in Brownsville on the sunny first of June, 2014. The victims are PJ Avitto and Mikayla Capers (Celona and Wilson, 2014).

Within a week, the NYPD arrest the man responsible: Daniel St. Hubert, whom tabloids dub the Brooklyn Ripper. He is quickly implicated in two more stabbings, a homeless man and a teen girl. His history is long, violent, and marginal, and his family say they have tried without success to find him psychiatric help, even as he attacks them and others. As courts debate his fitness to stand trial, he displays no remorse, and menaces personnel at every facility in which he is confined. He claims to hear the voice of the devil. His only apparent motive seems to have been quieting the children down (Santora, 2014).

What are we to make of Daniel St. Hubert? What are we to do with men like him? The general question is, perhaps, the more tractable, but the particular drives home the stakes. Legal and medical systems intervened in his life at several points, yet their failure is carved in the bodies of children. Definitive confinement has come at enormous risk and expense, compounded by ambiguity about what constitutes an appropriate setting. What assumptions underlie this state of affairs?

The phrase mens rea — guilty mind — is at least as old as Augustine, and the concept is older still: as far back as Hammurabi’s laws, many punitive legal traditions have maintained distinct punishments for similar crimes, conditioned on the criminal’s mental state. The shades of capacity and intent considered relevant are shaped by prevailing local philosophy, but also inevitably by the practical limits of what it is possible to know about another person’s mind. Thus we face the unenviable prospect of engaging with the the interior life of a child killer

It’s worth noting that other traditions have simply sidestepped these questions — it is entirely possible to run a legal system that makes no such distinctions at all. What might that look like?

On that night in 2014, the police were not the only ones who spoke to the neighborhood. A block down Schenk Avenue, another group of men in matching jackets gave a remarkably similar series of assurances — this is our home, this will not stand, we will not stop until we find the man who did this. They were called the Tomahawks: a gang with deep roots, and many friends, in the area. Their exact words were not committed to print, but it does not take an enormous amount of imagination to predict what they would have done to Daniel St. Hubert if they had identified and apprehended him before the police did.

This brand of justice is the human norm. That doesn’t quite make it a default — the null case in any individual act of violence is simply escape, and the general result a monster-haunted world. This is the problem that the notion of licit punishment exists to solve, and even the most elaborate civilizations ultimately rely on the same basic threat as any Dunbar-sized tribe: if you transgress, a group too big for you to fight alone will find you and compel what they deem proper. Mob violence isn’t a fundamentally different system from policing in this sense — just a simpler one. In one primally satisfying punishment, it provides vengeance, prevents the target from offending again, and deters similar acts in the future. Courts, cops, and corrections facilities exist to accomplish the same things, if somewhat less efficiently.

This is, of course, a patently unfair comparison on a number of levels. The American justice system that captured and tried St. Hubert is laden with centuries of hard-earned lessons, safeguards against the vagaries of hasty retaliation. Yet scratch the patina of 21st-century social attitudes a bit and it’s quite recognizable as a hand-me-down from Britain, which was not at all shy about hangings in the era when America forked its common law; as late as the 19th century it prescribed death for such crimes as treason, pickpocketing, bridge defacement, and poaching. Dig further and that system, before it matured and hypostatized to the point it could execute kings, was a hodgepodge amalgam of antique Roman and tribal Germanic traditions. The latter, interestingly, were pointedly unsympathetic to mens rea distinctions, mandating identical punishment not only for the mad and the sane but for the intentional and the accidental — like the Tomahawks, the question of prior intent seemed beside the point to them.

Are the courts of Brooklyn today historical accidents? They attempt to distinguish murder from manslaughter from negligent homicide, and to tease apart deliberate malice from passionate lapses from the incapacity to deliberate at all. One can imagine history going differently, or point to parts of the world where different paradigms prevailed — in many, we may imagine, Daniel St. Hubert would have been executed years before for far pettier crimes, in others confined more decisively, in others even somehow rehabilitated before his atrocity. Was the modern form entirely latent in the ancient ones, or did it break away somewhere along the line? One may as well ask when St. Hubert himself passed the point of no return — it is the same fundamental question. Did he decide to stab children on the spot, or plan it? Did voices in his head compel him, and if so where did they come from? When, exactly, did tragedy become inevitable?

The hard truth, of course, is that it always was. There is only one world, and the Brooklyn Ripper is an inextricable part of it; any counterfactuals we construct are maps of a territory that does not exist.

For 18th-century philosophers dreaming of just republics, it was not outside the realm of reasonable speculation to suppose that free will really meant Free Will — the poetic ideal of freedom, the freedom of an immaterial soul intervening divinely with base matter, formally causeless and utterly unfettered by any force save conscience. We no longer have this luxury; wishful substance dualism died an ignominious death even as Descartes gestured vainly at pineal glands. Every scrap of evidence accumulated since has led us inexorably to conclude that nature is rule-governed in ways that absolutely do not allow for causal chains to begin ab nihilo with a human mind.

Yet the mirage dies hard. Despite the conspicuous absence of Free Will, most of life seems to run perfectly well on prosaic, everyday free will. The concept is inherently useful; attributing most actions to conscious choice is pragmatically sufficient to explain most of what the people around us do. Pursue their decisions further upstream and you run into infinite regress, as well as the dehumanizing prospect of total irresponsibility — why punish the killer if he could not possibly have done otherwise? If the degree of backward extrapolation is arbitrary, we might just as easily condemn the jailers who freed him, the parents who raised him, or the entire past history of the universe in aggregate, since none of those could have done otherwise either. As we do not exist in a vacuum, we draw those lines in different places for different reasons, depending on the circumstances; they are historical accidents, not objective universals, and it bears remembering that they change over time.

The strain of thought known as compatibilism aims to rescue colloquial free will without challenging material determinism, pointing out that our apparent freedom to act is more a matter of pragmatics than metaphysics, and thus not much affected by scientific revelation. On reflection, though, it is somewhat suspicious that compatibilists so often salvage exactly the consequences of free will that they already happened to enjoy, without having to revise existing attitudes about anything really surprising. Epictetus argued compatibilism (Bobzien 2001), and wound up arriving at roughly the same virtue ethics his pagan forebears already endorsed on more mystical terms; Augustine did the same (Couenhoven 2007), and derived from first principles the same Christianity he had already practiced. Is it any wonder, then, that more modern apologia steer unerringly for the same ideas — rights, agency, responsibility — that so fascinated modernity’s immediate predecessors, and on which they founded institutions we still use?

The impulse to defend free will is instinctively understandable. As modern compatibilist philosopher Daniel Dennett puts it, “the distinction between responsible moral agents and beings with diminished or no responsibility is coherent, real, and important.” It’s also at least partially inborn — infants begin to distinguish their own movements from those others before they are a year old (Sodian 2016). We all know what it feels like to plan and act, and what it feels like to be constrained. It is not intuitive to reconcile the apparent independence of human action with the truth that we are, fundamentally, machines. It behooves us to examine what that apparent independence is really made of. In what sense, if any, was Daniel St. Hubert responsible for his actions? Did he plan them rationally, acting as though free, or did he act as though compelled? Do these questions even make sense?

Dennett, to his credit, does delve into the areas where evidence diverges from praxis. Where libertarian incompatibilists tend to fall back on an atomic, indivisible consciousness in the general humanist mien of the past several centuries, he contends that modern neuroscience shows us consciousness is not unitary. In his own words (1984):

“The model of decision making I am proposing has the following feature: when we are faced with an important decision, a consideration-generator whose output is to some degree undetermined, produces a series of considerations, some of which may of course be immediately rejected as irrelevant by the agent (consciously or unconsciously). Those considerations that are selected by the agent as having a more than negligible bearing on the decision then figure in a reasoning process, and if the agent is in the main reasonable, those considerations ultimately serve as predictors and explicators of the agent’s final decision.”

These are not the terms with which we usually debate intent. They are cumbersome and unintuitive, but have the singular advantage of being entirely descriptive — an excellent habit. It’s also implicitly relative to the modeler — if something about the human mind is “to some degree undetermined,” the immediate question that comes to mind is “undetermined by whom?” By abolishing subjunctive teleology, we are forced to recognize that when we talk about free will, we are often really talking about prediction.

Much of what a brain accomplishes is fundamentally predictive. We know we tend to initiate actions before any conscious decision is made, and justify them after the fact (Libet et al. 1983). We know that our perceptions rely on internal models, and have some idea how this is implemented chemically (Corlett et al. 2009). We know that we make snap judgements of our own agency based on how well our sensory feedback matches our predictions — and that sometimes those judgements are wrong, as with the famous rubber hand illusion. We know that mechanisms to minimize predictive error over time are built into our brains at every conceivable level (Clark 2013), and that we share most but not all of them with other species.

Without a metaphysics of the soul to fall back on, it’s harder to pin down exactly what makes us unique (the general ability to adapt to a recursive reward schedule, perhaps?) but certainly we are better able to perceive and exploit subtle patterns in nature than any other species we’ve met, hence our monopoly on things like agriculture and medicine and atomic bombs. Humans are, in fact, so good at prediction that they become very difficult to predict — we can improvise, lie, even suss out another human’s predictions about our own behavior and deliberately violate them.

The same faculties underlie our ability to trace causes backwards — mental time travel runs both ways. However well we may know that causal chains stretch back indefinitely, the evidence for them is never perfect, and the trail usually runs cold in a human mind. PJ Avitto died because he was stuck with an eight-inch knife; the knife was propelled by an arm muscle, triggered by a motor neuron, triggered by… what, exactly? Other neurons, of course, but that doesn’t really tell us much. We can speculate about the wielder’s desires, his self-control, his upbringing, but it’s difficult to say for sure. In practice we cannot read his mind, or trust his explanation — whether or not he really hears command hallucinations, “the devil made me do it” was not a get-out-of-jail-free card even in places and times when demonic possession was taken quite seriously.  The predictive value of any explanation we can come up with falls of a cliff at the point where we have to speculate about his invisible, internal state of mind — not because that state of mind came from nowhere, but because we can’t reliably trace causation further upstream yet.

Can we articulate a prediction-based account of human violence in general? The proverbial Martian anthropologist might note that, although we cannot seem to eliminate it, we have tended, over time, to put a lot of effort into making it less surprising. In situations where we are suddenly attacked, we agree on counterattacks: that guy keeps mugging people, so let’s get the whole village together and beat him up. Where the counterattacks themselves become unpredictable, we agree on standards: we sometimes beat up the wrong guy, so let’s write down the rules and demand explicit evidence. Now the muggers can accurately predict punishment and avoid it by not mugging people, and the rest of us can walk down the street with a justifiable expectation of safety. So far so good.

The problem is that some violence is inherently unpredictable. We can understand the mugger — he wants money. We can put ourselves in the mugger’s shoes, model the mugger’s incentives, and act accordingly. But what is the child killer’s incentive? How could anyone know ahead of time not to get into an elevator with him?

We declare some violent behavior insane, then, because we cannot predict it — equivalently, we cannot assign any narrative in retrospect that makes sense to us. Worse still, we also cannot predict whether we will suddenly become insane ourselves, and so in recent centuries we have set up a system to treat us as we would wish to be treated if we did: more as patients than as prisoners. Much as criminal courts have gradually accumulated methods of establishing motive, mental hospitals have painstakingly taxonomised dysfunction and tried to establish common etiologies.

Daniel St. Hubert was originally diagnosed with paranoid schizophrenia, but beyond this wide and potentially outdated categorization little information is publicly available. If anyone wants to trace the roots of his crime in Brownsville deeper than the moment of its commission, they will have to do it the hard way — get in a room with him, learn what they can about his life, and add a grain of evidence to the common pile. Taking the time to study motivation and pathology in such matters, rather than operating solely on deterrence and restraint, is what lets us intervene in novel ways in similar future cases. This occasionally leads to spectacularly effective society-wide heuristics, such as “don’t hit your kids” and “use lithium to dampen mood swings.”

This project is, of course, hard to reconcile with the fact that brutal retribution is enormously, eternally popular. When St. Hubert was frog-marched out of the 75th Precinct, a crowd was there waiting for him, chanting the word “Death!” He did not appear perturbed by this in the least.

Shakespeare wrote in a world where criminals were publicly tortured to death. Would the prospect of drawing and quartering have stopped the Brooklyn Ripper? It’s impossible to say, but the existing threat of incarceration clearly wasn’t enough. We could speculate that he simply couldn’t think far enough ahead for any kind of threat to make a difference, or lacked the executive function to restrain himself if he did. It’s also possible he thinks more clearly than he lets on, acted pragmatically in service of a monstrous goal, and simply didn’t think the consequences were worth worrying about. Humans do not have an easily-readable utility function, which limits how useful a model utilitarianism is for real human behavior.

If we are to operate on a consequential basis, then we can only judge our violence by its outcome. If we, as a culture, are using confinement effectively, it should make violence more predictable, and that predictability should make it rarer. If we are using it ineffectively, the expected result is more unforeseen attacks. It is tempting to think some evidence-based technocracy could craft policy on this basis alone, judging the results dispassionately and making adjustments.

The problem is, consequentialism is a black hole. If, for example, we admit that unpredictable violence is even a partially heritable tendency, we cannot ignore the prospect that our own society owes what humaneness it has directly to the millennia its predecessors spent executing the most impulsively violent portion of every generation — it would be hard to argue this had no effect on population genetics. Game-theoretically the picture is even darker: endemically high trust may inevitably incentivize enough defection to bring the institutions that foster that trust crashing down.

Put another way, what would you think of a justice system that forcibly sterilized the families of criminals, regardless of whether they participated in the crime? How about one that pre-emptively imprisoned children they could prove had a high risk of violent behavior in the future, or one that doped the public water supply with psychoactive drugs?

If you want to make consequential arguments against those, you have to appeal to something like relative suffering — you have to predict such systems would result in more misery than they cured. Are you sure, though? Have you measured? Would you precommit to supporting one such regime if someone presented you sufficient evidence (cf. Blüml et al. 2013) that it would, in fact, lead to a more eudaimonic future, one that would outweigh any measurable costs? Or are there means no end can justify? Deontology is out of fashion, but in practical terms most of us operate on virtue ethics — if we didn’t, those potentially dystopian scenarios would not give us pause before we saw hard numbers.

If there is a criticism of our justice system inherent in the gruesome story of Daniel St. Hubert, it is that it sometimes errs on the side of ignoring available evidence. His victims did not have enough information to conclude with any confidence that he was dangerous — but civil authorities surely did. His pathology was known and his pattern of psychotic violence very well-established when he was released from prison nine days before the elevator murder, with no medication and no psychiatric referral. His arrest record was litany of brutality; he had strangled his own mother with an electrical cord. His parole officer recommended he be committed, but was ignored. Neighborhood cops knew about him, but could not intervene until the deed was done. Were some reasonable individual confronted with the same evidence, and somehow given sole and unanswerable responsibility for his disposition by fiat, they might well have concluded that he was overwhelmingly likely to do harm, and that this far outweighed the injustice of confinement or exile or even execution, due process be damned.

Our modern legal system is not built this way, precisely because of the problems arbitrary case-by-case penalties once created. Individuals are capricious and corruptible, so we have gradually separated the roles of judge, jury, and executioner, not to mention lawyer and bailiff and peace officer and forensic psychologist. That they hew collectively to any standards makes the personal violence of justice more predictable in most situations, but those standards can conflict and backfire on edge cases.

It is famously impossible to derive ‘ought’ from ‘is’ — oughts must be plucked from the evolutionary winnowing of received tradition, from pure aesthetic preference, or from the dreams of conscience. Few would not prefer to inhabit a world where PJ Avitto and Mikayla Kaypers had lived their childhoods, but such a world does not exist — it is a dream. That doesn’t make it useless; contemplating unreality may lead us to dwell on a past we cannot change, but it also helps us predict the future. How likely is it that somewhere downstream of here we will resolve a few institutional scleroses, and a similar tragedy will be averted by a person with enough information to evaluate the stakes, enough authority to act decisively, and enough proper incentive to exercise it? We cannot know for certain — all we can say is that, unlike any map of a world where PJ had a seventh birthday, we cannot yet rule it out.

Cold comfort indeed. Is there no firm ground anywhere in this nightmare? To name one is to editorialize, beyond the bounds of any detachment or objectivity; at least let that be explicit.

Mikayla survived the attack, and eventually recovered from her wounds. She survived because PJ died first — because, in the last seconds of his life, he interposed his body between hers and the knife. It is unseemly to quibble over how and whether he ‘chose’ in the scant time he had — such abstractions are for his killer, and for us, not for him. They are mechanical blueprints, operating diagrams for practical safety, utterly insufficient to describe the profundity of his ἀρετή. His fate, and all fates, were sealed at the moment the universe began — but what an incomparable honor it is to live in a universe where a six-year-old boy laid down his life for his friend.


Blüml, V. et al. (2013). Lithium in the public water supply and suicide mortality in Texas. Journal of Psychiatric Research, 47(3), 407-411.


Bobzien, S. (2001). Freedom and That Which Depends on Us: Epictetus and Early Stoics. Determinism and Freedom in Stoic Philosophy, 330-358.


Celona, L. and Wilson, T. (2014, June 1). Maniac wielding butcher knife kills child in elevator. The New York Post, 1/5.


Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36, 181–253.


Corlett, P. et al. (2009). From drugs to deprivation: a Bayesian framework for understanding models of psychosis. Psychopharmacology, 206(4), 515–530.


Couenhoven, J. (2007). Augustine’s rejection of the free-will defence: an overview of the late Augustine’s theodicy. Religious Studies, 43, 279–298


Dennett, D. (1984). Elbow room: the varieties of free will worth wanting. Cambridge, Massachussetts: MIT Press.


Libet, B. et al. (1983). Time of Conscious Intention to Act in Relation to Onset of Cerebral Activity (Readiness-Potential) – The Unconscious Initiation of a Freely Voluntary Act. Brain, 106, 623–642.


Santora, M. (2014, June 6). Before Arrest, a Long String of Violent Acts: Daniel St. Hubert Served Prison Time for Attacking His Mother. The New York Times, A17.


Sodian, B. (2016). Understanding of Goals, Beliefs, and Desires Predicts Morally Relevant Theory of Mind: A Longitudinal Investigation. Child development, 87(4), 1221-1232

Bayes’d and Con-fused

On Salience, Semiotics, and Schizophrenia

You’re laying in your bed, drifting off to sleep, when suddenly you realize there’s someone in the room with you. “Surely not,” you think to yourself in semisomnolent confusion — “I’m alone in the house!” But there in the corner of the room, shrouded in shadow, is a tall thin figure, and it is reaching towards you.

Suddenly you’re sitting bolt upright in bed, heart pounding, wide awake, poised to sprint for the door — or, if you’re brave, to fight off this mysterious home invader. That’s when you hear the noise of a passing car and realize that what you’re looking at is not in the room with you at all: it’s only the shadow of a tree, moving as the headlights beyond pass by. You mutter an obscenity or two about Plato’s cave, close the curtains, and drift off back to sleep just as soon as your heart slows down.

You probably remember an incident or two like this, an adrenaline rush followed hard upon by chagrin at your own mistake. What exactly happened to you? You thought you were seeing one thing, and subsequently realized you were really seeing another — but what does that process actually entail?

You saw a shadow moving, and the longer you watched it, the more you knew about it. When you had a little information, you concluded it was Freddy Krueger coming to get you, but when you had more information you concluded it was a tree. Simple! All we have to do is define ‘concluding.’ As it happens, this is not as simple as it sounds.

Let’s start again. You see the shadow moving — this is pure perception, straight from the eyeballs to the brain. As time goes on, all that changes outside your head is that you see more of the shadow moving. If you were some sort of an idealized empirical thinking machine, you’d start out completely agnostic, gather data from your senses, form a hypothesis, and test it against new data as it comes in.

On reflection, though, that’s not how it feels: you start out convinced there’s someone in the room with you, and a moment later, with barely any more information, you are suddenly convinced you’re looking at photons beamed out of a moving vehicle as partially blocked by the limbs of a tree. It doesn’t happen gradually at all, it’s as sudden as when the old lady’s nose becomes the young woman’s chin in the famous optical illusion.

The missing piece here is the notion of a schema. If sensory input were all you had to work with, you wouldn’t see Freddy Krueger or a tree; you would only see arbitrary patterns of light and shadow. But you already have a model in your head for two different sorts of spindly half-lit limbs whose shadows you might see on your wall — instead of building a theory on top of your sensory impressions, you’re matching your perception to the models that you’ve already built up.

So we’ve got two different systems in play: a bottom-up system that’s looking at black shapes on a white background right now, and a top-down system that’s recognizing letters and words you already have models for — dpseite teh fcat taht semotmies tehy aner’t a preefct mtach ot teh mdoels.

Psychologist Richard Gregory recognized that top-down judgements were important to perception in the 1970s, challenging the then-dominant paradigm of direct realism — the notion that we experience the world pretty much as it is — with the notion that we’re actually experiencing hypotheses.

After a close study of optical illusions, he concluded that what we see is about 10% actual visual stimulus and about 90% deductions made from memory; one could haggle over the precision of those numbers, but subsequent research has generally borne out the basic idea.

Thus, when you saw the shadows moving on your wall in low light, you didn’t have much information to work with, so you filled in the gaps with your memories of what human figures and trees, in general, look like. Mystery solved.

But wait. Why both? Nothing in that story accounts for the fact that you managed to make the switch. The top-down and bottom-up systems already got their accounts close enough to “shake hands” and agreed on a map of reality with Freddy Krueger in it; how did that transmogrify into a map with a tree and a passing car in it?

It’s not just that you learned, it’s that you judged your own learning — otherwise you would remember seeing Freddy Krueger turn into a tree. The second schema is nothing like the first, so our story is incomplete — we’ve accounted for perception and cognition, but not metacognition. How did you manage to conclude the prior map was just a tree-related illusion (dendrapophenia, perhaps) and update it?

If you felt your schema-sense tingling at the word ‘prior’ you’re probably familiar with Bayesian analysis, that keystone of conditional probability and scourge of undergraduate statistics students everywhere. If not, content yourself with knowing it’s a mathematical formalization of the idea that to get an accurate model of the world you have to take into account both prior knowledge and current experience, adjusting the former as you go (as with nearly everything here, this is a gross oversimplification).

Even if you see a shadow on your wall that looks about halfway between a tree and a bogeyman, that doesn’t imply there’s a 50/50 chance there’s a bogeyman in your room, because you have prior knowledge that there are lots of trees and very few bogeymen. This is the metacognitive judgement you made when you realized the likeliest explanation for what you were seeing was that it was a tree you mistook for something else.

It turns out human brains are superb at matching what they see to existing schemas, but hilariously terrible at judging the prior likelihoods of those schemas and adjusting them when they aren’t making sense, especially when they are asleep.

Yale researcher Philip Corlett thinks the human brain implements Bayesian reasoning on perception in a fairly direct chemical fashion. In his model, bottom-up processing depends on AMPA glutamate receptor activity, top-down processing depends on NMDA receptor activity, and dopamine codes for the level of prediction error — the amount of difference between the NMDA-modulated information about the map and the APMA-modulated information about the territory. He makes a convincing case that the cognitive effects of several psychoactive drugs fit this paradigm, noting for instance that PCP, which blocks NMDA receptor transmission, gives you exactly the sort of delusions and perceptual weirdness you might expect under such a paradigm.

This is an elegant framework, if it holds up — time will tell. Yet it still doesn’t do much to explain how we consciously update our priors. How is a tricky question, but there’s a fairly good bet as to where.

The anterior cingulate, which collars the front of the corpus callosum, seems to be key in making such judgements about perceptual model fit and adjusting accordingly. It also happens to dampen some of its activities during sleep.

This could account for your tendency, when dreaming, to perceive an immersive environment as real despite it having features whose prior likelihood should constitute a dead giveaway that the world in which you find yourself is not, in fact, real — sudden ability to fly, extra rooms in your house you’ve never noticed, highly improbable sex, &c. We can navigate our dreams, and cogitate a bit about what’s happening, but we never seem to evaluate how strange it all is until we wake up.

And there’s your bedroom monster: you hadn’t quite woken up yet, so the prior-adjusting part of your Bayes loop is out of whack, but the rest of your chain of reasoning is intact. You’re awake enough to identify a complicated visual pattern as something that might be a home invader and might be a tree, but not yet awake enough to realize that the former is so much less likely it’s not worth getting bent out of shape about.

So, if all this holds water, why should our reason be organized in this particular way? Why have a distinct bit of brain for evaluating models if it’s so easy to turn off? Why the separation of function?

One explanation might lie in the classic parable explaining the prevalence of anxiety. Three ancestral hominids are walking across the Serengeti when they spy a beige rock in the distance. 99 times out of 100 the beige rock is just that, but the 100th time it’s actually a lion waiting to pounce.

The first hominid is a Panglossian optimist, and always assumes it’s a rock; he’s right 99% of the time. The second is a perfectly calibrated Bayesian, and judges correctly that there’s a 1% chance it’s a lion and a 99% chance it’s a rock. The third is a nervous wreck, and assumes it’s a lion every time — he’s wrong 99% of the time. Every single human being alive is descended from the third hominid, the others having been eaten by lions, so we have inherited a tendency to spook easily.

Pat though that may be, it bears considering that it’s actually quite a feat for the third hominid to maintain their ability to perceive the rock bottom-up, make a snap top-down judgement that it conforms to their model of a lion, and yet never revise that model to let their guard down despite the fact that 99 out of every 100 memories they have of similar incidents involve them panicking over nothing at all. They would have to notice the rock and find it salient every time — implying a notable distance between their top-down schema and their bottom-up perception — and yet somehow avoid letting their sky-high error rate decondition their response to it over time. That sounds like a little more than a simple leophobia.

As it happens, there is a whole class of human beings notorious for their inability to update their prior models of the world: schizophrenics. Paranoid schizophrenics in particular famously suffer from intractable delusions of reference, believing strange things despite overwhelming evidence to the contrary. They seem partially unable to distinguish between symbols of reality and reality itself — they tend to confuse the thought of a voice with the sound of one, and will often fixate on seemingly irrelevant objects or phenomena and impute profound meaning to them. They also tend to have too much dopamine, hypofunctioning NMDA receptors, and abnormalities in their anterior cingulate cortex: adjust your prior models accordingly.

Could the cluster of symptoms in schizophrenia represent an archaic or atavistic form of consciousness? Psychologist Julian Jaynes explored this idea in depth, and put forth a theory too fascinating not to mention.

In his account, humans were schizophrenic by default until the late Bronze Age, with societies generally organized either as small hunting bands or into literate theocracies through which they moved as though in a waking dream, their actions in daily life dictated and coordinated by shared command hallucinations that they attributed to the voices of their ancestors and their gods — heady stuff.

Among many other points, he cites as evidence the (apparently) universal lack of metacognition in ancient literature, the privileged role accorded (apparently) schizophrenic prophets and sibyls in subsequent centuries, and the (apparent) pattern of Norman-Bates-esque corpse-hoarding in ancient Mesopotamia evolving via the veneration of dead kings to the worship of deities.

Several of his predictions have not worn well — he was convinced, for instance, that metacognition was predominantly an innovation in inter-hemisphere communication in the corpus callosum, and conceived of schizophrenia as something more reminiscent of split-brain epileptics — but it’s interesting enough just to think of the possibility there was a time before and after which the capacity to be un-schizophrenic existed.

A deeper evolutionary timescale might make more sense than positing self-reflective consciousness came from some kind of speech-catalyzed plague of hypertrophied cingulates (The Cingularity, or, Buscard’s Murrain) so recently as Jaynes argued, but the idea that we’re evolving away from apophenia both culturally and genetically deserves close examination.

Although their order and overlap are open questions, there likely exist a point before which animism was still the universal norm, a point before which no vocabulary to describe consciousness existed, and a point before which the neurological capacity for consciously correcting a false belief was simply not physically present, in whatever form.

The beings that lived without those things were either outright human or close enough for government work, and we catch tantalizing glimpses of how they must have experienced the world when our capacity for reflectivity is occluded in illness, on the edge of sleep, in mystic states, and in our childhoods.


How should we categorize intelligence?

A useful high-level division of species by category would be one that reflected both evolutionary and behavioral reality well enough to make valid predictions. Since behavior is immediately observable and evolutionary history generally involves more indirect inference, it makes sense to categorize behavior first and then look for evolutionary conditions necessary to produce it.

The first and most obvious line that may be drawn is between species with and without intra-generational learning, which is to say with and without neurons. The behavior of species without neurons depends on genome and circumstance — two (e.g.) sea cucumbers or with identical genomes in identical circumstances will behave identically, and large changes in behavior can only be produced over multiple generations by natural selection. In contrast, species with neurons are capable of learning — their behavior is mediated by long-term potentiation of neurons in response to past events, such that two (e.g.) dogs with identical genomes in identical circumstances may respond differently to the same stimulus if they have received different conditioning.

Although creatures with more developed brains have more nuanced heuristics available, this capacity for learning is broadly evident even in species with extremely simple nervous systems, like cockroaches (Watanabe and Mizunami, 2007). This suggests two categories, or more properly a category and a subcategory: life, and neuronal life.

Within the neuronal subcategory, adult modern humans use complex language that can direct and influence the behavior of other humans, including those not immediately present. They are capable not just of associating an arbitrary symbol with an object, but of distinguishing symbols as a category from objects as a category. This requires a theory of mind — for a human to understand that a novel series of symbols will be interpreted correctly by another mind, it is necessary that they understand both that other humans are similar enough to them to interpret the same symbols in the same way, and also that other humans are different enough from them to lack information they have or have information they lack. These abstract linguistic capacities appear to be unique to humans, and so humans can be placed in a third subcategory within neuronal life: conscious linguistic life, a set which currently contains only the human species.

Although complex language and theories of mind appear to be unique to adult humans, they do not develop immediately. Children fail to verbally identify differences in objects present in their own visual fields versus those of other people until they are around 6 years old (Piaget, 1969), do not begin to use complex elaborative syntax until they are around 2 years old, do not use simple word labeling until they are around 1 year old, and do not engage in communicative coordination of regard with another person and an external object until they are about 6 months old (Striano and Reid, 2006).

However, even before 6 months they are capable of protoconversations, mirroring the expressions on other human faces at a delay and coordinating the length of pauses between facial shifts (Beebe, 2014). This behavior implies both that the infant must be storing some kind of representation of another person’s face for the length of the delay and also that they can map this representation to their own face in order to mimic it. Do these pre-linguistic capacities exist in any other species?

Great apes become mobile much more quickly than humans do, and so infant great apes do not spend much time on the face-to-face protoconversations that immobile human infants engage in. However, they are able to pass mirror tests, which involve looking at their reflection and deducing the presence of a mark on their own forehead, about as well as human infants under the same circumstances (Bard et al., 2006). This strongly implies that they must also possess enough of a self-representation to map their own movements to observed movements over time, since they must determine that the movements of the ape in the mirror correspond exactly to their own and are not simply produced by another ape behind glass.

Great apes can also follow gaze and understand opacity (Povinelli and Eddy, 1996) in a manner reminiscent of human infants, and can use this to preferentially steal food that they can tell another ape is unable to see (Hare et al., 2000). Other primates can preserve abstract representations of sequence that simple stimulus-response chaining is inadequate to explain (Terrace 2005). The vast majority of animal species do not display these capacities.

For this reason it makes sense to posit a fourth behavioral category, within neuronal life and containing adult humans, which also contains human infants and arguably contains some other primates and hominids — a preconscious or semiconscious category, with great apes on the low end, human infants on the high end, and extinct hominids in the middle. In this category, organisms can store persistent representations and map their perceptions to internal models, but are unable to produce language or model differences between the states of knowledge of multiple individuals: they have some heuristics available for primary intersubjectivity, but none for secondary intersubjectivity (Beebe, 2003).

Do these four putative nested categories — organisms, neuronal organisms, semiconscious organisms, and fully linguistic organisms — correspond well to the evolutionary record? They appear to map to known clades; all species share a common ancestor, all species with brains share a more recent common ancestor, primates one more recent still, and humans one more recent than all of the others.

To the extent that ontogeny recapitulates phylogeny, therefore, the putative semiconscious category should predict a long period of time in which the human lineage developed and elaborated on preconscious representational abilities already partially present in apes, but did not display the abilities of modern humans to use complex language or elaborate theories of mind. It should also predict that the appearance in non-primate clades of any behaviors that appear to imply stored representation beyond simple behavioral conditioning but do not produce complex language will be produced by broadly similar evolutionary conditions. Moreover, if the structural capacity for speech and theory of mind evolved separately and significantly later than the capacity for symbolic representation, it should be possible to disrupt the former in adult humans while leaving the latter intact.

Dyadic interactions can be described conservatively as imitation at a delay. Infants are capable of initiating complex protoconversations with their mothers by around 6 months, of engaging in protoconversations initiated by the mother by around 3 months (Striano and Reid, 2006), and of subtle but measurable matching adjustments of facial expressions over time almost immediately after birth (Meltzoff and Moore, 1994). Protoconversations are characterized by two-way coordination of facial expressions and vocalizations, with infants first responding to and soon after initiating exchanges of mimicry that involve both partners matching not only their movements but their pace (Beebe).

This implies the capacity to store an abstract representation of a face in working memory, such that observed movements and timing can be recapitulated without an immediate stimulus. Unresponsive faces trigger distress behaviors in infants, which implies the ability to predict a mirrored movement on the part of another and register deviations from that prediction. Dyadic interactions require an ability to detect differences between sensory data and a stored model, for which simple behavioral conditioning cannot account.

As early as 6 months, infants are capable of sustaining mother-initiated interactions involving a third object, alternating attention between the mother and the object; by around 9 months they are initiating such interactions (Striano and Reid, 2006). They not only follow gaze — projecting a ray through three-dimensional space based on the mother’s eye movements and fixating on an object in that direction — but also check back after looking in the same direction, refocusing their attention if they have picked the wrong object as indicated by the mother’s use of indicative gestures or simple verbal labels (Baldwin, 1991). Multiple acts of gaze-following over time require not just a persistent representation of a face or body plan that maps to their own, but of a mind whose state may differ from the state of their own.

Triadic interactions require both the ability to note differences between sensory data and a stored model and the ability to adjust existing models on the fly to match a different model in someone else’s mind, for which simple model storage based on past experience cannot account. The latter capacity is a necessary prerequisite for complex language, as opposed to simple labels, because novel messages are ineffective if the other party cannot be relied upon to understand them; unsurprisingly, triadic proficiencies correlate with later language proficiencies (Brooks and Meltzoff, 2008). Infant-initiated triadic interactions may involve gaze redirection with deliberate communicative intent (Stern, 1971), implying that triadic infants have some capacity to model other humans as lacking information they have or having information they lack. The ability to compare a model of self with a substantially different model of another does not appear to be present in any other species.

Wallace questioned how Darwin’s theory of natural selection could account for human language and consciousness, given that only humans possess these features and that human minds seemed to him much more powerful than could be accounted for by simple selective pressure. After posing this question, he became a spiritualist and concluded that providence had intervened in evolution three times: once to produce multicellularity, once to produce brains, and once to produce human consciousness.

It is no longer considered prudent to speculate on divine intervention in evolutionary history, and so Wallace’s Problem boils down not to whether but to how the physical capacity for language evolved. Our current picture is incomplete, but seems to involve two major leaps in cognitive capacity — one from mirrored representations to differential representations, and one from differential representations to full-fledged language. To address Wallace’s Problem in substance requires us to explain what specific selective pressures produced these developments, what accounts for their apparent sudden appearance, and why they did not occur elsewhere in nature.

The environment of humanity and its immediate antecessors went through several major ecological changes in relatively short succession. The first of these, the deforestation of East Africa around 4 million years ago, produced bipedalism; this major anatomical shift can be explained in the traditional model of gradual change, as incrementally more bipedal individuals would gain incremental advantages in food-gathering by increasing their range and decreasing their energy expenditure (Rodman and McHenry, 1980). This incremental shift is well-attested in the fossil record, and occurs around the same time as the split between the Pan and Australopithecus lineages.

In addition to opening up new frontiers in foraging, bipedalism produces narrow pelvises through which it is difficult to pass an infant. Increased bipedalism therefore tends to produce infants born incrementally earlier in development, which require longer periods of care before being able to feed themselves. This means more pressure to find novel foraging strategies in order to feed infants, which in turn advantages infants born with even larger brains born even more helpless.

This feed-forward loop is enough, all on its own, to eventually produce the most premature possible infant with the largest possible brain; every time a population with slightly larger brains managed to secure more food, that would remove some of the metabolic pressure to keep brain size low, resulting in a population with even larger brains, resulting in pressure to find even more novel methods of securing food. The development of abstract representation more advanced than that displayed by the great apes and the subsequent development of language both occurred within the context of this ongoing process, and accelerated it by temporarily removing some food pressure, allowing smarter and more premature infants to be born and drive food pressure back up again, necessitating further development of novel scavenging and later hunting strategies.

One objection Wallace might have raised to this model is that it posits the sudden emergence of new and complex behaviors without a correspondingly sudden anatomical change — skull size increase was gradual, but the emergence of alloparenting and long-distance hunting were not. Where are the sudden anatomical changes to match the sudden behavioral changes? There are two answers to this, the simplest being that such anatomical changes did occur, but in soft tissue, which does not show up in fossils and for which the only preserved proxy available is skull size.

On reflection, however, there is a more basic explanation: the principal evolutionary advantage of having neurons at all is that they allow an organism to adapt to change faster than trial-and-error by reproduction can allow. The range of potential behaviors that a particular critical mass of neurons can allow for is necessarily much, much wider than the range of behavior it has produced to date — the capacity must evolve before the behavior can emerge. Canids existed for a long time before anyone taught them tricks, and humans were anatomically capable of building steam engines long before it became common behavior for them.

When Erectus developed alloparenting it had already been around for ~800 thousand years, but then suddenly had marked increase in foraging efficiency and therefore calories available to further maximize brain size and prematurity. If the model is correct, the rate of change in skull size between Australopithecus afarensis and Homo erectus should be less than the rate of change in skull size between Homo erectus and Homo sapiens. The fossil record supports this: from afarensis to erectus, cranial capacity increased from an average of 430 to 850 cubic centimetres over roughly 2 million years, and from erectus to sapiens average cranial capacity increased from 850 to 1400 cubic centimetres over about the same span of time.

So the two answers to Wallace are first that selective pressures do, in fact, account for the development of capacities known to underlie language once anatomical feed-forward loops are taken into account, and second that large brains are physically capable of implementing new behaviors long before those behaviors actually appear, such that they may emerge spontaneously and reproductively privilege the individual in which they occur.

Moreover, the ability to imitate observed behaviors (which likely emerged with alloparenting) and the ability to communicate novel ideas by combining existing words (which possibly emerged with big game hunting) both enable a given technique to spread to other individuals with the same cognitive capacities immediately, rather than privileging only the offspring of the individual who invented them, further accounting for the sudden emergence and spread of things like tool cultures on sub-evolutionary timescales.

The principal similarity between computer memory as currently implemented and biological memories is that information and methods of processing that information are stored in the same medium. In a computer data and instructions are stored in the same medium — any string of bytes could represent either a program or data or both, depending on context — and in brains memories seem to be stored and retrieved in a fashion inextricable from processing context.

In most other respects, computer memory is more reminiscent of the operation of individual cells than of any inter-cellular process like a brain. In a computer, a program composed of a pattern of binary bits — 1s and 0s — is copied from storage into working memory, interpreted by a processor, and outputs data that in turn can sometimes affect the program’s own future execution somewhere down the line; in a cell, a gene composed of a pattern of quaternary nucleotides — A’s C’s T’s and G’s — is copied from DNA to RNA, interpreted by a ribosome, and outputs a protein, sometimes a protein that in turn can sometimes affect the DNA’s own future structure somewhere down the line. The original abstract conception of computation (Turing, 1936) — an interpreter which iterates along an infinitely long two-symbol tape — bears more than a passing resemblance to the operation of ribosomes reading four-symbol sequences from a 3-billion base-pair long genome.

Biological memory as implemented in neurons differs in that there appear to be no atomic engrams — no one has isolated a quantum of change in brains equivalent to a single-base-pair mutation or a single-bit flip. The simplest form of neuronal memory is behavioral conditioning, which is demonstrable by long-term potentiation in response to repeated stimuli even in extremely simple nervous systems. This preconscious neuronal learning is entirely nonsymbolic, and behaviors produced are generally entirely predictable from the conditioning stimuli, but every retrieval of a response via a stimulus changes the action potentials involved — computer memory becomes ‘sticky’ like this only in cases of extreme malfunction.

In human memory, there is a second system in play, one that maps stored representations onto perceptual input. It operates in a way that bears some resemblance to hypothesis testing, in that low levels of difference between the internal model and the sensory data result in the model being projected onto the data to fill in any gaps, and high levels of difference result in behaviors associated with salience and surprise. Bottom-up processing appears to depend on on AMPA glutamate receptor activity, and top-down processing on NMDA receptor activity; dopamine codes for the level of predictive error (Corlett, Firth, and Fletcher, 2009).

The cognitive effects of several psychoactive drugs fit this paradigm — for instance, PCP, which blocks NMDA receptor transmission, gives you exactly the sort of delusions and perceptual apophenia you might expect under such a paradigm. Dyadic human infants are already capable of this two-system comparison between reality and stored representations, and the fact that some primates can be shown to store representations (Terrace, 2005) suggests they have some glimmering of the same capacity. This is not semiotics in the sense the term is normally used, because it does not require explicit communication between organisms, but it does allow for a feedback loop that can generate novel behaviors by projecting abstract representations onto perceived reality in a manner more complex than conditioned memories in simpler animals can manage.

Computer memory does not behave like this on small scale, but large networks of computers can implement somewhat analogous processes whose deficits can point to the necessity of other systems to explain adult human memories. Google Deep Dream, an experimental computer vision project, was built to recognize objects in videos by projecting pre-existing internal models based on a very large dataset of categorized images it trained on. The system quickly became famous for its apophenia — for instance, after associating millions of pictures of dogs from various angles with the general shape of a dog, it started seeing dogs everywhere, mapping them onto vaguely dog-shaped objects in the scenes with which it was presented.

Missing from this two-system picture, in a Bayesian sense, is the capacity to update prior models reliably. That capacity is fundamental to triadic interactions in human infants — they are constantly checking back with their mother to see if they are schematizing the external object to her satisfaction. It is also notably impaired in adult humans with damage to their anterior cingulates — they retain the ability to judge whether what they are seeing matches their internal schema, but they have trouble updating the schema (Mars, Sallet, and Rushworth, 2011). If this function was not present in early hominids, patients with this kind of brain damage may be engaging in essentially atavistic cognition, the cognition of the semiconscious category. Updating priors based on input from another mind requires storing an abstract representation of minds sufficiently complex to account for differences in knowledge between them.

Schizophrenics, and paranoid schizophrenics in particular, famously suffer from intractable delusions of reference, believing strange things despite overwhelming evidence to the contrary. They seem partially unable to distinguish between symbols of reality and reality itself — they tend to confuse the thought of a voice with the sound of one, and will often fixate on seemingly irrelevant objects or phenomena and impute profound meaning to them, or hallucinate things they have schemas for onto sensory data that doesn’t really match it very well. They also tend to have too much dopamine, hypofunctioning NMDA receptors, and abnormalities in their anterior cingulate cortex (Coyle, 2006), all in line with the interrupted Bayesian model.

Schizophrenics have arguably lost secondary intersubjectivity and retained primary intersubjectivity — they can no longer verify their perceptions of an external object with another person, but can still manage to store persistent representations of abstract schemas (often to the point where it takes years to convince them their schemas are wrong). They also retain language, which might seem to imply that secondary intersubjectivity actually developed long after language did (Jaynes, 1976) — however, schizophrenia develops late in life after secondary intersubjectivity and language have already been present for years, so it is impossible to tell whether they could have developed language if they had developed without secondary intersubjectivity from birth. It is not necessary to posit a speech-catalyzed plague of hypertrophied cingulates to explain the leap from primary to secondary intersubjectivity in primates.

If there were some other clade whose history included habitual bipedalism, rapid adaptation to major ecological change, a reproductive bottleneck leading to helpless infants, complex social behaviors, and the dyadic ability to mimic complex sequences at a delay, it would be possible to argue that parallel evolution puts them in the same preconscious category as human infants and early hominids — that is, to present a case that they may possess some potential for primary but no potential for secondary intersubjectivity.

Certain avians fit these criteria: they are descended from dinosaurs which independently developed bipedalism in the Triassic, survived major climatic change and migrated to many novel climes, lay eggs which are necessarily small enough not to preclude flight, hatch unable to fly or feed themselves, and are capable of mimicking complex songs and sometimes human speech at a delay and with conversational pacing. Some even alloparent (Anctil and Franke, 2013).

Corvids in particular are capable of solving very complex puzzles — they also pass the mirror test, use simple tools, and will re-hide cached food when they notice another bird watching them if and only if they themselves have stolen a cache in the past (Clayton and Dally, 2007). This strongly implies they are capable storing models of the world and comparing them with current sensory input in an analogous way to semiconscious primates.

There are, broadly speaking, two forms of language: single-word associations, and complex recursive syntax. The former is already present in great apes, which can readily be conditioned to associate a hand sign or computer symbol with a particular object. However, these symbols are always imposed from the outside — apes do not generate new symbols or sequences of symbols. The particular advantage of language is not in the ability to use labels, which apes and dogs do in ways that can be explained by simple behavioral conditioning, but in the ability to generate arbitrary new labels.

Human words are different from, for example, variegated warning calls specific to particular predators as seen in e.g. Campbell’s monkeys (Schlenker et al., 2014), in that new ‘calls’ for new referents can be generated at will and spread among a group, rather than standardizing over evolutionary timescales. This capacity is a prerequisite for more complex grammatical language, and requires secondary intersubjectivity. Secondary intersubjectivity involving the representation of multiple differing states of mind is thought to have emerged with Homo erectus roughly 1.2 million years ago, on the basis that a sudden increase in the efficacy of scavenging could be attributed to alloparenting, which would at once allow more adults to engage in foraging unencumbered by infants and privilege infants capable of making distinctions between caregivers (Hrdy 2009).

Alloparenting exists in many species, including some primate species, but not in any of the great apes — for it to emerge in the human lineage so quickly suggests that the behavior in this case was a neurological innovation and not a genetic one, an innovation made possible by the relentless feed-forward loop of bipedalism and extra cranial capacity. Somewhat contra Jaynes, the triadic capacity likely preceded and was necessary for language to begin to develop beyond simple labels — sentences with recursive grammar communicate novel ideas, and to transmit a novel idea by a series of symbols implies a persistent model of another mind with a notably different state of knowledge. To make an argument about when such nontrivial language emerged along the same lines as the argument for alloparenting would require describing a behavior that could not be accomplished without nontrivial language.

Speculatively, long-distance persistence hunting, which emerged later than simple group scavenging, might be a candidate: bipedalism is great for endurance running, but extends the range so much that the hunters might wind up very far away from the band they intend to feed, and it would make much more sense to send someone back to fetch the band than to drag a large kill back to them. That would require communicating novel information in a relay. Sending a messenger would arguably require at least enough language to relate something like “the others sent me to come bring you to [a place you have never been]” — the messenger would need to convey the state of mind of the hunters still with the kill to the remainder of the band, and hold that message in a form persistent enough to survive a lengthy journey reliably.

This is somewhat analogous to the difference between triadic but preverbal infants and verbal children — the triadic infant can distinguish between two minds, but has little means available to convey a thirdhand message about an absent party. One present, language would also allow tighter social coordination of hunting behaviors and enable less primitive forms of hunting to emerge.

Anctil, A., & Franke, A. (2013). Intraspecific Adoption and Double Nest Switching in Peregrine Falcons (Falco peregrinus). Arctic, 66(2), 222-225.

Baldwin, D. (1991). Infants’ Contribution to the Achievement of Joint Reference. Child Development, 62(5), 875-890. doi:10.2307/1131140

Bard, K. A., Todd, B. K., Bernier, C., Love, J., & Leavens, D. A. (2006). Self-Awareness in Human and Chimpanzee Infants: What Is Measured and What Is Meant by the Mark and Mirror Test?. Infancy, 9(2), 191-219. doi:10.1207/s15327078in0902_6

Beebe, B. (2003). A Comparison of Meltzoff, Trevarthen, and Stern. Psychoanalytic dialogues, 13(6), 777-804.

Beebe, B. (2014). My journey in infant research and psychoanalysis: Microanalysis, a social microscope. Psychoanalytic psychology, 31(1), 4-25. doi:10.1037/a0035575

Brooks, R., and Meltzoff, A. (2008) Infant gaze following and pointing predict accelerated vocabulary growth through two years of age: a longitudinal, growth curve modeling study. Journal of Child Language, 35(1), 207-220.

Clayton, N. S., Dally, J. M., & Emery, N. J. (2007). Social cognition by food-caching corvids. The western scrub-jay as a natural psychologist. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1480), 507–522. doi:10.1098/rstb.2006.1992

Corlett, P. R., Frith, C. D., & Fletcher, P. C. (2009). From drugs to deprivation: a Bayesian framework for understanding models of psychosis. Psychopharmacology, 206(4), 515–530. doi:10.1007/s00213-009-1561-0

Coyle, J.T. (2006). Glutamate and Schizophrenia: Beyond the Dopamine Hypothesis. Cell and Molecular Neurobiology, 26, 363-382. doi:10.1007/s10571-006-9062-8
Hare, B., Call, J., Agnetta, B., and Tomasello, M. (2000). Chimpanzees know what conspecifics do and do not see. Animal Behaviour, 59(4), 771–785.

Hrdy, S. (2009). Mothers and Others: The Evolutionary Origins of Mutual Understanding. Boston: Harvard University Press.

Janyes, J. (1976). The Origin of Consciousness in the Breakdown of the Bicameral Mind. Boston: Houghton Mifflin.

Mars, R. B., Sallet, J., & Rushworth, M. F. S. (2011). Neural Basis of Motivational and Cognitive Control. Cambridge: The MIT Press.

Meltzoff, A. and Moore, M. (1994). Imitation, memory, and the representation of persons. Infant Behavior and Development, 17(1), 83-99. doi:/10.1016/0163-6383(94)90024-8.

Povinelli, D., & Eddy, T. (1996). Chimpanzees: Joint visual attention. Psychological Science, 7(129-135).

Piaget, J. (1969). The psychology of the child. Basic Books.
Rodman, Peter S.; McHenry, Henry M. (1980). Bioenergetics and the origin of hominid bipedalism. American Journal of Physical Anthropology, 52, 103–106. doi:10.1002/ajpa.1330520113
Schlenker, P., Chemla, E., Arnold, K., Lemasson, A., Ouattara, K., Keenan, S., . . .

Zuberbühler, K. (2014). Monkey semantics: Two ‘dialects’ of campbell’s monkey alarm calls. Linguistics and Philosophy, 37(6), 439-501. doi:/10.1007/s10988-014-9155-7
Stern, D. (1971), A microanalysis of mother–infant interaction. Journal of the American Academy of Child Psychology, 19:501–517.

Striano, T., & Reid, V. M. (2006). Social cognition in the first year. Trends in Cognitive Sciences, 10(10), 471 – 476. doi:10.1016/j.tics.2006.08.006
Terrace, Herbert S. (2005) The Simultaneous Chain: A New Approach to Serial Learning. TRENDS in Cognitive Sciences, 9(4), 202-210.

Turing, A. M. (1936) On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 2(42), 230–65.
Watanabe, H., & Mizunami, M. (2007). Pavlov’s cockroach: Classical conditioning of salivation in an insect. PLoS One, 2(6) doi:/10.1371/journal.pone.0000529

Our Chi-val

We have a pathetically tiny corpus of texts that predate the Roman collapse, and fewer still from before the Bronze Age Collapse. The ones we do know generally survived because they were recopied more often than the rest. Most of the works we have from classical antiquity derive from copies made in Charlemagne’s era, and countless more are referenced that have never been found. Aristotle’s Poetics are a good example — the volume on tragedy survived by blind luck via an Arabic translation, and the volume on comedy is lost forever (ecce Eco). Confucius’ Analects only survived the Qin dynasty because someone hid a copy behind a wall, most of his contemporaries’ work having been burned and buried along with said contemporaries their own selves.

Human history is rife with examples of literary canons largely destroyed by the simple attrition of civlizations rising and falling in their usual messy ways. The things that survive various Yugas Kali are obsessively copied and recopied, like the Masoretic Text. Modern technology means many, many orders of magnitude more copies of modern data, which one might think bodes well for their survival. The new problem is encoding.

Linear A documents are indecipherable, since the Myceneans just wrote their own archaic Greek in the old alphabet and the subsequent Greek dark ages forgot writing altogether. So when you pick up a Linear A tablet, you lack the techniques required to read it, even though the medium has survived for millenia. Minoan decoder rings are not common these days.

Modern information storage does not survive for millenia, as Brewster Kahle would affirm. But say it did. If you find a thousand-year-old hard drive in the 31st century (unwittingly used as a brick in a 24th-century temple, say), how do you decipher it? Assuming you have reinvented the scanning electron microscope, I mean. Once you’ve destroyed a few hundred figuring out just how we stored data back at the dawn of the American Empire, all you have is a sequence of bits. You might, if you’re clever, figure out that ASCII corresponds to the Latin alphabet. You’d be much less likely to figure out, say, Unicode from first principles.

Now how many old DVDs would it take you to derive the DVD video format standard? Won’t be a problem for long; they’ll only last a couple hundred years in perfect storage, and you’d have to keep 200-year-old machinery in working order to recopy them… I’m sure you could build a DVD player from the specs, but they’re probably stored in whatever CAD format was in vogue at the time.

Lots of contemporary data, lots of the accumulating history of our time, is stored in ways that require special programs to decipher — proprietary file formats, faddish databases… Urbit, heaven forfend. These programs are stored on the same essentially ephemeral media as the data itself. Losing a text is not a matter of forgetting an alphabet over a thousand years, it’s a matter of forgetting an obsolete program over a decade. Ever tried to read a WriteNow file from 1987 you stored on floppy? Even better, the programs you need might run on architectures that haven’t existed for a long time; can you read TERNAC assembly?

Perhaps you can find an intact binary for that CAD program to read the specs to build a DVD player on its… original DVD installation media! Good luck finding the source code, that was a trade secret and there weren’t many copies. And then you need a computer that will emulate a computer old enough to emulate a computer old enough to run it, of course.

Continual recopying takes effort and energy. Even if there is no collapse — and I challenge anyone to find a Holocene millenium in which there was nothing that deserves the name collapse — much falls by the wayside. Most early silent films are already lost forever. Without Alan Lomax most early American folk music would be lost, and without the Internet Archive much of the early web… Empirically, most information anywhere gets discarded.

(A book that builds its own translator is called a genome.)

Class Rules

Americans: supplemental

American PCs get +3 charisma, -5 wisdom.

Characters start with an extra 1000gp, but must pay 1gp for every 1hp healed.

Mounts will consume 3x more feed than normal and may explode.

American rangers’ weapons do 2x damage, and cannot be stolen while wielder remains alive or before their corpse cools to <60° (Farenheit, of course).

American fighters gain +3 to-hit versus enemies associated with the color red (skins, coats, ideologies, &c.).

American mages may spend 2e9 gp to summon a legendary fire demon, who will raze target city or fortress and poison the surrounding land, but can only do so twice, as the third summoning will trigger Ragnarok.

American clerics may sacrifice freedom at any lawful shrine to obtain a security bonus.

American bards gain reputation at 4x the normal rate, and may roll a skill check to steal songs from other races.

American rogues past level 10 are considered too big to fail any skill checks.