Language is in some sense uniquely human, and this demands explanation. From that simple stance a torrent of theory has poured forth for centuries, with no grand unification in sight. Any nontrivial account of the origins and parameters of human language is inherently massively cross-disciplinary, and here we run into problems — experts in one discipline may lack precision in others, and thus fail to properly contextualize their findings, either by mis-appraising evidence from other fields or by ignoring them entirely. The prudent strategy is agnostic and exploratory — look for isomorphisms across disparate spheres, lash them together, and see what sticks.

In the latter half of the twentieth century, Noam Chomsky made a series of observations about the operation of grammar that ignited tremendous debate. Noting that all human languages share certain properties that no other system of animal communication does, he presented a case for sharpening the distinction between uniquely human capacities — the faculty of language in the narrow sense, or FLN — and any mechanisms we share with other species — the faculty of language in the broad sense, or FLB — then using the resulting models to guide language research in other areas. After decades pursuing progressively simpler and more general models of syntax, he eventually came to believe he had pinned down the most essential abstract operations distinguishing human from animal communication.

Chomsky’s Minimalist Program aims for the strongest possible economy of derivation and representation, but this is not necessarily a valid basis for investigating language. Johnson and Lappin (1997) lambasted the MP as divorced from empiricism and vague to the point of uselessness. Pinker and Jackendoff (2004) objected that Chomsky had thrown too many babes out with the bathwater, writing off systemic adaptations for language as nonessential in a manner that did not conform to any coherent timeline of evolutionary biology. As much ink has been spilled in mutual admonishment for misinterpretations and strawmen as on these foundational differences (cf. Roberts 2000,  Holmberg 2000, Lappin et al. 2001, Fitch et al. 2005, &c).

The concept of recursion is central to Chomsky’s description of language, and over the years he has become more adamant that it constitutes the indispensable central component of FLN. Many objections to the MP hinge on the inadequacy of existing explanations for how the implications of this hypothesis can be reconciled with what we know about evolution and language acquisition, and many counter-objections contend that recursion is being misunderstood. It behooves us to examine it closely.

Recursion is best defined as a property of definitions. Recursive definitions are those that precisely describe an infinite set by specifying both a finite subset of its elements and a finite number of rules sufficient to describe all other elements in terms of that subset. For a definition to meet all of these criteria fundamentally implies that the set of rules must be self-referential <1>. This is because they are finite in number, yet must be used to discriminate whether any given element is a member of the infinite set in question — they must specify a procedure that can be applied to the finite subset of pre-defined base cases to produce members of their infinite complement, and also be applied to any such member to produce a different member, such that the same procedure can be repeated an arbitrary number of times to produce an ordered series of valid member elements beginning with a base case and terminating with the element in question. It is strictly equivalent to state that the rules must specify a procedure which can be applied to any element of the infinite complement, or to its own output, and by illimited prepetition produce a series of valid members of the full set that terminates in a base case.

These terms are cumbersome in the extreme, but have the virtue of exactness — this is easiest to see with purely computational examples. The infinite set of natural numbers may be defined recursively by specifying that 1 is a member and that adding 1 to any member produces another member — this is sufficient information to determine whether any given number is natural or not, by the simple expedient of seeing whether you can count to it. The Fibonacci sequence, another infinite ordered set, can be similarly defined by specifying the first two elements (conventionally 1 and 1) and defining any subsequent element as the sum of the previous two. In more formal notation, we can completely describe the entire infinite sequence using a finite subset of base cases and a self-referential successor function as follows:

F(1) = 1; F(2) = 1; F(N) = F(N-2)+F(N-1) Ɐ N>2.

Less trivially, whole mathematical proof systems can be defined recursively — with a finite number of axioms and a self-referential ruleset for combining them into valid expressions (and those into still larger expressions, &c.), we have accurately described the set of all expressions which can be proved within that system <2>.

It cannot be overstressed that recursive definitions, and the self-referential functions they include, are mathematical abstractions, and like all mathematical abstractions only have explanatory value for real-world phenomena insofar as they accurately predict those phenomena. This means that declaring a system or process in nature to be recursive, without elaboration, is ambiguous and imprecise — it is necessary to specify how and why it might be useful to model it using an infinite set as defined, tersely, by a finite subset and a self-referential successor function.

What sorts of systems are useful to model in this way? Real-world phenomena that produce detectable patterns over time are well-modeled by recursive definitions if the space of potential patterns is large, the number of basic components that comprise the patterns is relatively small, and the state of the system at any given time consistently depends on a recurring elaboration of the prior states. As a rule of thumb, we tend to find it easy to conceive of these processes as tree-like.

Chomsky’s central insight is that the syntax of every human language <3> appears to involve such a process. They all make a practical distinction between grammatically well-formed and grammatically ill-formed sentences (independent of truth or intelligible meaning, hence the infamous well-formed but meaningless sentence “colorless green ideas sleep furiously”), and they all seem to arrange words in a binary hierarchy, embedding short sentences into longer ones by merging exactly two elements into a nested set that embeds like one of its members. These universal properties of grammar seem to imply a self-referential ruleset that maps a finite set of known words to an infinite class of sentences.

Of course, in reality, there are no infinitely long sentences, and the number of sentences ever composed is and always will be finite. Infinitude is a property of the model, not the phenomenon. Nevertheless, the most parsimonious way to explain the way we classify sentences is to suppose that something in the human brain is applying a recursive definition (Hauser et al. 2002) — nobody does it by memorizing a large set of sentences. Since various properties and implications of recursive definitions have been formally derived in a set-theoretical context, this presents a tempting opportunity to see if known math can point the way to unknown biology.

So, the sequences of words produced by humans using language appear to belong to infinite sets of grammatically valid sentences, and membership in those sets is reasonably well-described by self-referential definitions that involve embedding a finite vocabulary of words into sentences that can, in turn, be embedded like words into a larger sentence, ad infinitum. Thus, humans must, by whatever mechanism, be repeatedly performing some generative procedure that composes words into sentences recursively before actually outputting them serially. Could this really be the unique, centrally human trait that underlies our capacity for language and abstract thought?

No! Absolutely not! Not on its own.

Nature abounds in generative systems whose output can be classified accurately with recursive definitions — motor sequences in rodents (Berridge 1990) and avians (Berger-Tal, 2015) are well-described with such models (Bruce 2016). Even sunflowers, not often hailed for their sophisticated communicative abilities, grow seeds in a pattern predicted precisely by the Fibonacci sequence, with each concentric ring (outside the innermost two, the base cases) containing exactly as many seeds as the sum of the previous two layers. That we produce vocalizations that tend to match a self-referential formula says nothing interesting at all about our mental capacities; it is quite easy to describe an obviously non-conscious machine that outputs an unbounded variety of syntactically correct sentences (Searle 1980). That a process can be correctly predicted by applying a concise recursive definition implies only that it involves some sort of iterative feedback — it crops up everywhere we look. This imples strongly that the ability to recognize iterative feedback is much more interesting than the feedback itself.

To call human syntax recursive without further clarification conflates at least two things — one very broad, the other very narrow. The pattern of syntax is recursive in that it can be accurately described by a finite set of words and a finite set of self-referential functions: for a simplified example, one could tersely describe a valid sentence as a noun phrase and a verb, which noun phrase may consist of a single atomic word or a subsentence (Carstairs-McCarthy, 2000). Thus, a starting vocabulary and a relatively simple embedding function can accurately classify an infinite hierarchy of possible valid sentences, in the same way that two seed values and a self-referential summation function can accurately classify an infinite number of possible valid sunflower ring counts. Syntax is, in the productive sense, trivially recursive.

The really surprising thing about humans is not that we can produce patterns that are well-modeled by recursion, but that we can model them. We are not only able to generate recursive syntax, but to interpret it — we hear a finite number of sentences in our lives, yet somehow converge on simple rules that allow us to instantly agree whether a completely novel sentence is syntactically correct or incorrect. No other creature appears to be able to classify sentences in this manner — indeed, no other creature appears to be able to classify anything in this manner.

It is possible to express a hypothesis in purely behavioral terms: modern humans are the only animal that can adapt to a recursive reward schedule. That is, presented with the output of a recursive process, we can divine the successor function behind it well enough to predict further output with arbitrary precision, while any other species is unable to do the same. Crucially, in principle, it should be possible to test this without using any purported components of FLB whatsoever, and compare results meaningfully across clades.

Although no experimental paradigm has proceeded explicitly on this basis, several studies suggest it may be worth investigating. Human music embeds (non-semiotic) phrases and is well-modeled by recursive definition, but other primates cannot predict it well enough to clap along (Honing et al. 2012). No nonhuman species can reliably complete any analogue of the Towers of Hanoi puzzle, which human children can solve by deriving a general rule that applies across more steps than they have short-term memory to recall (Martins 2016). Some avians produce calls that are center-embedded in a way that is reminiscent of human syntax, but they appear to recognize and interpret calls from conspecifics on a purely phonological level (Corballis 2007). Animal numeracy is extremely limited, and its accuracy decreases precipitously as the numbers get larger — they cannot learn except by ‘brute-forcing’ the problem space, as opposed to inferring the underlying rules. Avians can be painstakingly trained to recognize sets of up to six items (Pepperberg & Gordon 2005), and chimpanzees have some ability to recognize relative quantity and proportion (Woodruff & Premack 2001), but no amount of training seems to induce a schema adequate to the task of simple counting. Human children, in contrast, generally derive the successor function for counting by around age four (Fegenson et al. 2004), despite far less intense training on a far smaller sample set. Rosenberg (2013) finds that human infants likely organize their memories into binary hierarchies — isomorphic to universal grammar — and can use them to model the future. If no extant non-humans can make predictions based on self-referential definitions, it seems reasonable to speculate in terms of an exclusively human Faculty of Recursive Prediction, or FRP.

Two factors may excuse adding FRP to the already-overflowing bowl of alphabet soup that is contemporary linguistic acronym space: distinguishing recursion in a productive sense from recursion in a predictive sense, and reconciling whatever is useful about the MP with a coherent timeline of human evolution.

Proponents of the MP often suggest that recursive syntax is intimately related to our capacity and inclination to categorize in nested binary hierarchies — what Fitch (2014) calls our ‘dendrophilia’ — but the nature of the relationship is not always clear. A predictive account may clarify matters: a reductive definition of prediction might be phrased as the reparative correlation of schema with environment over time, and the evolutionary value of FRP as the ability to map a territory that includes other mapmakers.

In some sense prediction is most of what nervous systems do: they allow an organism to adapt their behavior to their environment quickly, rather than wait for a hard-coded adaptation to evolve. Mechanisms to minimize predictive error over time are built into brains at every conceivable level, and we have a good idea how many of them work (cf. Clark 2013, Corlett et al. 2009); prediction-based descriptions have a good track record of usefully informing biological research.

Chomsky’s Poverty of Stimulus argument (1980) states that children’s ability to learn an universal grammar from a very limited set of exemplars implies that such grammar is inborn. This is must be true in some sense — if humans can recognize and classify an unbounded quantity of novel sentences, and no other species can do so, some biological mechanism must account for it. Since we are necessarily descended from entirely alinguistic species, however, this raises the obvious question of how and when such a mechanism emerged, and the MP’s failure to provide an adequate answer is one of its most glaring flaws. Once convinced they could not ignore natural selection entirely, proponents have generally hypothesized a single mutation that spread through the human population around seventy thousand years ago, when widespread evidence of abstract art and rapid technological innovation begin to appear in the fossil record. Detractors rightly point out that mutations do not spread through an entire species for no reason, and that the capacity to interpret syntax does not provide any obvious advantage to an individual unless born into a world where it is already in widespread use — instead, many argue, it must have emerged gradually in concert with incremental refinements to the plethora of systems the MP lumps under FLB.

It is nevertheless worth trying to salvage the idea of a small, central language innovation in recent human lineage that is more than the sum of FLB, because a pure gradualist account does not explain modal independence very well. The communicative faculties of the modern human brain seem to use any intervening machinery they can get their hands on — this is why Hauser, Chomsky and Fitch (2002) argue for a narrow language faculty ‘at the interface’ of those systems. The modern language faculty shows a remarkable tendency to adopt entirely novel modalities on sub-evolutionary timescales — writing, for example, is quite recent, but does not depend on any new physiological systems. A human may lose their entire vocal apparatus and still use sign language, which exhibits the same properties as spoken language (including de-novo formation of syntax, cf. de Vos & Pfau 2015). Persons who can blink but retain no other voluntary motor control are still able to communicate in language, if painstakingly; we should pay close attention to whatever it is in that situation that is fundamentally indispensable. It makes sense to draw some kind of distinction between peripheral capacities of language and central ones, even if the line between FLN and FLB in the MP may be drawn in the wrong place.

If FRP is a coherent notion, and reflects some underlying biological trait unique to modern humans, it is tempting to suppose it might be useful to a single individual even in the absence of conspecifics with the same trait, and thereby resolve the Promethean paradox. If we were steelmanning the MP, we could imagine that that FRP was enabled by some small mutation that spread through a world of silent apes on its own merits, and language was later exapted once the mutation had become common enough. On reflection, however, this is unlikely: if it is so universally advantageous, and simple enough to arise in one step, why hasn’t an homologous mutation occurred in other clades? We would need to explain why it provided a selective advantage in human ancestors but nowhere else.

This problem could potentially be resolved by enlarging, very slightly, the set of capacities we consider most central to language. If FRP only provides selective advantage in combination with some other indispensable trait, and that trait is very rare, we have a convenient way to explain why it has no equivalent in other species — for most of them, it would be useless. FRP enables the efficient reception of grammatical sentences; what if we could describe an environment where their transmission was already common?

Displaced reference is not unique to humans, but only barely. Honeybees famously dance to direct their hives to food, but this behavior does not generalize and is likely the hard-coded result of billions of years of eusocial haploidy. Some evidence exists that corvids can recruit others to a distant carcass (Heinrich 1998), but aside from this we are alone in our ability to refer to distant phenomena — even chimpanzees trained to correlate objects with arbitrary symbols or motor patterns appear to use them exclusively in reference to immediate stimuli, and only under prompting (Terrace 1979). Displaced reference has little obvious bearing on syntax, but is crucial for the use of words. Several theories of protolanguage are predicated on words emerging before syntax, noting that even a small vocabulary is immediately useful. However, they tend to have a hard time explaining why the specific universal grammar humans appear to use — which Bare Phrase Structure (cf. Chomsky 1994) describes with impressive parsimony — was adopted identically across the species.

Can we combine these ideas? As noted above, it’s much more common for an organism to make a series of movements that is well-described by a recursive definition than to be able to apply one predictively. If we do get some aspect of embedding syntax ‘for free’ from initially asyntactic protolanguage (cf. Heine & Kuteva 2007), that opens up the intriguing possibility of an implicit syntax — one that is entirely an artefact of generative functions, but initially could only be interpreted by simpler heuristics than FRP. This would explain the latter’s rarity and utility: if and only if your conspecifics are already displacers who talk, FRP lets you predict them better, and vastly expands the range of what you can talk about in the future. In such a model, focusing on displacement and FRP as separate but central traits would sacrifice the minimal amount of minimalism necessary to avoid the pitfalls of the MP’s magic bullet mutation. In essence, it would mean a two-factor FLN.

The ultimate object of reorganizing categories like this is to reconcile with natural history — the territory to be mapped is made of eras and events, not logical categories. Since current language behavior is a proper subset of the history of life, the former must conform to the latter in any model that aspires to be more than domain-specific. Cross-disciplinary speculation is asymmetrical; in the same way that chemistry is downstream of physics, linguistics is downstream of evolutionary neurobiology. A useful model in physics constrains chemistry, whereas a useful model in chemistry can at best suggest fruitful avenues of investigation in physics.

To turn two-factor FLN into a coherent speculative timeline means properly situating two events — the emergence of displacement and the later emergence of FRP — somewhere in the six million years or so since our last common ancestor with chimpanzees, and also describing what happened before, between, and afterwards — roughly speaking, three different era-specific slices of FLB development. Every trait involved in the account must confer plausible selective advantage; moreover, if they emerged suddenly, they must be plausibly attributable to a single mutation, or plausibly exapted as a spandrel of one or more traits with independent advantages to explain their prior development. The best estimate for the emergence of displaced reference is just over one million years ago, exapted from a limited call system achieving immediate reference and a capacity for intersubjectivity driven by alloparenting; the best estimate for the emergence of FRP is around seventy thousand years ago, triggered by a single mutation in otherwise anatomically modern humans who were already highly dependent on a mature protolanguage.

After their divergence from chimpanzees, the various hominid species adopted bipedal gait, encephalized rapidly, and lost much of their body hair. These traits are often considered adaptations to climate change or rapid deforestation — there exist explanations based on other pressures, but as gross anatomical adaptations they are in any case not fundamentally difficult to explain individually in straightforward gradualist terms. In combination, however, they created a reproductive bottleneck: in order to pass progressively larger heads through progressively narrower pelvises, these species had to give birth to progressively more premature offspring, which could no longer cling to hair and had to be carried and cared for for long periods before they could begin to feed themselves. Sarah Hrdy (2009) marshals a compelling case for a sudden shift to alloparenting in erectus between one and two million years ago, driven by the caloric demands of these helpless infants, and suggests that leaving children in the care of other adults allowed for a leap in the efficacy of foraging. She attributes the rapid subsequent development of social intersubjectivity and joint attention faculties to the comparative survival rates of infants better able to monitor and engage with multiple caretakers, and her account dovetails very well with the rhythmic, reparative dyadic and triadic interactions in early ontogeny described by Meltzoff, Trevarthen, and Stern (Beebe 2003). We may reasonably suppose that something more like modern human childhood than chimpanzee childhood emerged around this time: it was the first point at which infants had to be fed and held for many months before they were capable of even rudimentary locomotion, and were in the main only able to interact with the alloparents on whom they depended by vocalizing or by moving their eyes and faces.

Modern human infants typically begin to use simple words for displaced reference soon after they become capable of triadic interactions but before they begin to use any appreciable syntax. Before uttering any words, they hear quite a few — the stimulus is impoverished, not destitute — and babble in return, articulating phonemes stochastically and noting their effects in a manner that predicts later language ability (McConnell 2008). Erectus infants had no words to hear, so we are faced with another chicken-and-egg problem, but it may be resolved if we suppose that they were hearing something sufficiently word-like.

As with hard-wired alloparenting, hard-wired call repertoires are displayed by various primates but not by great apes. They are generally inborn, involuntary, and immutable, but despite this serve well for implicit immediate reference — groups of vervets, for example, will quickly coordinate their behavior en masse in response to a predator-specific alarm call, and may sometimes combine calls (Fischer 2013). Whatever pressures produce such systems, they are evolutionarily unremarkable for primates, and something homologous could plausibly have emerged ‘from scratch’ in our genus between one and six million years ago. If so, an erectus infant, like a modern infant, would have been cared for by adults who changed their behavior noticeably in response to certain phonemes, while simultaneously having very few other means available to affect their environment in any way — ideal pressures under which to voluntarize vocal production and generate new labels for shared referents. Thus armed with a repertoire of calls establishing immediate reference and primed to synchronize attention with conspecifics, erectus would have had all the prerequisites at hand to exapt displaced reference and gain the immediate advantages of a minimal, asyntactic vocabulary, one that could expand on sub-evolutionary timescales by virtue of their close imitative coordination and in contrast with the glacial pace of inborn call development.

Derek Bickerton’s admirably detailed account of protolanguage (2009) presupposes a discontinuity with animal communication; if the preceding model of displaced reference as a spandrel of vocal immediate reference (rare in nature but common in primates) and a cluster of helpless infancy adaptations (unique in primates but not in nature) is enough to explain that discontinuity, we have arrived at his starting point. He hypothesizes that displaced reference was a crucial element in constructing a high-end scavenging niche, initially enabling recruitment to distant carcasses by galvanizing conspecifics with an imitation of the animal in question. This usage is iconic rather than properly semantic, but over time, he contends, the imitation sounds would become decontextualized, evoking associated memories of the referent independent of any call to action, and thereby be transmuted gradually into full-fledged conceptual words.

Bickertonian protolanguage is a reasonable explanation for the success of erectus and subsequent members of our clade in expanding their range out of Africa and across the Old World. Simple words, strung together in short utterances without any particular structure, are more than sufficient to account for a collective advantage in scavenging and, later, in hunting — displaced reference allows for unprecedented coordination of activity. Quite a lot of communication is possible using words without any embedding syntax, and Bickerton compares such protolanguages to pidgins, which share this feature.

There is a problem with this comparison. Pidgins may spontaneously develop into creole languages, which do use embedding syntax, in a single generation — as soon as they are learned by modern human children, they pick up universal grammar and start accumulating the bells and whistles any language does. Yet it seems unlikely that erectus protolanguage did the same — tool industries stayed relatively stagnant across all hominids for the next million years, and there is little evidence of any representative art or symbolic physical culture. If this reflects a protolanguage that was stuck in a pidgin-like state, we need another evolutionary event to account for the leap to modern syntax and subsequent cultural takeoff. Bickerton thinks they were waiting for Merge, the abstract operation with which Chomsky (1995) describes the binary embedding process of his purportedly universal Bare Phrase Structure grammar; Gilles Fauconnier’s notion of blending (1999) might also fit the bill.

Modern human languages all exhibit duality of patterning — they compose a finite number of meaningless phonemes into a larger finite number of meaningful words, and also compose those words into an unbounded <4> variety of syntactic sentences. A long period of protolanguage before some sort of mutation enabling syntax helps account for this, but we still have to explain why the latter occurred only in the context of the former. Merge in particular was not originally developed as a concept under any assumption that a form of protolanguage was necessary to explain the advantage it conferred — we either lack an explanation of that advantage, or we lack an explanation for why it was not selected for in species without displaced reference already operating.

An alternative approach to protolanguage attempts to derive embedding syntax bottom-up from pure phonology. Andrew Carstairs-McCarthy (1999) has an account along these lines that describes the exceedingly gradual self-organization of something like universal grammar production from bare syllables, using such heuristics as call blending, synonymy avoidance, and implicit topicalization. If this model or any like it hold up, Bickerton’s pidgin-like protolanguage would, in fact, become gradually more like a creole language over time, at least in terms of production.

This is where FRP fits. If we accept that syntax could have arrived implicitly, strictly as an artefact of production or modulation and not necessarily a capacity of reception or demodulation, we get a built-in reason why the latter key mutation for which Bickerton is searching has to be unique to humans. Early protolanguage would have resembled the asyntactic ‘string of beads’ he describes, while late protolanguage would sound, to a modern human, as though it had syntax — but the hominids using it would still be interpreting it like a string of beads, severely limiting their ability to predict what their conspecifics were going to say or do next, at least as compared to a modern human. This sets up enormous selective pressure for FRP to catch on — a recursive predictor would be immediately privileged reproductively to the extent protolanguage was already productively syntactic, allowing them to adapt to already-crucial patterns of information in their environment better than anyone else around them. They would also get several uses for the same capacity — recursive prediction does not just allow for modeling, e.g., sentences of the form “If I say that she said that he said that… [&c.], then [X]” but also for predictively modeling the social situation that underlies such a sentence. They would be the first individuals capable of deliberate story-spreading or behavior manipulation — essentially, they would be capable of domesticating their peers, in a manner somewhat reminiscent of Julian Jaynes’ ‘bicamerals’ (1976). This holds as long as they had a ready-made productive embedding syntax with which to work.

By contrast, in the absence of such an implicit grammar, FRP is not useful: it confers no advantage if there is no ready-made system for it to predict. Indeed, it could easily be disadvantageous — disruptions of predictive mechanisms in humans often result in severe apophenia, a tendency to see patterns that aren’t really there. There is general consensus that abstract symbolic artifacts in the fossil record are plausibly indicative of behaviorally modern language, but by evolutionary standards they are expensive — every hour spent, e.g., carving sympathetic magic figurines under the mistaken impression they will summon what they resemble is one that might be spent avoiding starvation in some more time-tested manner. FRP is tremendously useful to an ultrasocial species with an highly developed protolanguage, because the advantages outweigh the costs — anywhere else it emerges, it should be expected to die out <5>.

A “just-so story” stands or falls on falsifiability. The preceding sketch depends on the exaptations described to account for two modally independent core language faculties emerging roughly a million years apart; if any aspect of the evolutionary biology is definitively disproven, everything after that point should be thrown out, and if either of the two core faculties is shown to be common in other species or inextricable from broader faculties it does not belong in a two-factor FLN. It is also possible to attack with direct experimentation — a nonhuman species with FRP or a species with FRP but no displacement would knock the concept down.

As for gathering evidence, it might be interesting to design a test for FRP simple enough to be administered identically to animals and humans. Something along the lines of Stephen Wolfram’s elementary cellular automata might work — subjects would be trained to classify patterns of two colors as valid or invalid according to an ultra-simple embedding rule, and receive reward for correct answers. If FRP is a meaningful concept, animal performance should always decline relative to the length of the pattern, while humans above some critical age should be able to maintain accuracy indefinitely. If that critical age turns out to fall invariably after syntactic language is already in use, that would imply that syntactic language enables FRP rather than the other way around; if not, however, we would have an ontogenic window in which to seek physical structures to investigate, one we could potentially narrow by investigating what types of aphasia do or do not knock out all of FRP.

The real holy grail, as ever, is a well-understood set of neurobiological mechanisms accounting for the most central aspects of language, whatever they may be: mechanisms we can thoroughly interrogate down to the level of developmental genetics. To pin them down once and for all would necessitate a genuinely hubristic megaproject, one to dwarf the scale of less ambitious scientific endeavors (e.g. CERN, the Manhattan Project, the Apollo Program): to really understand how language evolves, we would have to evolve it again from scratch. It should, in principle, be possible to test any given account of language origins empirically by artificially selecting a very large population of apes through each purported stage of language acquisition in the last six million years of human lineage — in the case of the above model, from referential vocal production and bipedal neoteny to alloparenting to displaced reference to protolanguage to syntax to FRP — and wind up with a conscious nonhuman being capable of true speech. Until we have met such a being, any universality in human language means we are necessarily extrapolating speculatively from a single species-wide exemplar, an N of 1. Barring  alien contact or a monumental commitment to linguistic uplift, we will have to wait for incremental improvements in neurobiology to shed light on an incomplete narrative, or hope for more unforeseen revelations from disciplines not previously consulted.

<1> This is a consequence of the Recursion Theorem. Self-referential rules and their base cases, as described, are often simply referred to as recursive functions; this is deliberately avoided here to reduce conflation of terms.

<2> The set of all decidable statements in such an axiomatic system is, notably, not generally identical to the set of all necessarily true statements (cf. Gödel 1931).

<3> There is some (hotly contested) evidence that the Múra-Pirahã language may lack this quality, but the Pirahã are readily capable of learning other languages that have it, and thus do not on their own invalidate Chomsky’s hypothesized biological adaptation for universal grammar.

<4> It is important to distinguish the unbounded from the infinite — the number of actual sentences spoken cannot be infinite, but the schema we use to interpret novel sentences has to involve discrete infinity.

<5> Some findings suggest limited abstract art in prehuman hominids, such as purported Neanderthal burials in Europe and Erectus carvings on Java — these could be brief local flare-ups of the FRP trait that fizzled in the absence of a sufficiently advantageous application.

Beebe, B. (2003). A Comparison of Meltzoff, Trevarthen, and Stern. Psychoanalytic dialogues, 13(6), 777-804.

Berger-Tal, O. (2015). Recursive movement patterns: review and synthesis across species. Ecosphere, 6(9), 1-12.

Berridge, K. (1990). Comparative Fine Structure of Action: Rules of Form and Sequence in the Grooming Patterns of Six Rodent Species. Behaviour, 113, 21-56.

Bickerton, D. (2009). Adam’s Tongue: how humans made language, how language made humans. New York: Hill and Wang.

Bruce, R. (2016). Recursion in fixed motor sequences: Towards a biologically based paradigm for studying fixed motor patterns in human speech and language. CEUR Workshop Proceedings,, 272-282.

Carstairs-McCarthy A. (2000) The distinction between sentences and noun phrases: an impediment to language evolution? The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form: 248-263. Cambridge: Cambridge University Press.

Chomsky, N. (1980). Rules and representations. Oxford: Basil Blackwell.

Chomsky, N. (1995). Bare Phrase Structure. Oxford: Basil Blackwell.

Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36, 181–253.

Corballis, M. (2007). Recursion, Language, and Starlings. Cognitive Science, 31: 697–704.

Corlett, P. et al. (2009). From drugs to deprivation: a Bayesian framework for understanding models of psychosis. Psychopharmacology, 206(4), 515–530.

De Vos, C., and Pfau, R. (2015). Sign Language Typology: The Contribution of Rural Sign Languages. Annual Review of Linguistics, 1, 265-288.

Feigenson, L. et al. (2004). Core systems of number. Trends in Cognitive Sciences, 8(7), 307-314.

Fischer, J. (2013). Vervet Alarm Calls Revisited: A Fresh Look at a Classic Story. Folia primatologica, 84(3-5), 273-274.

Fitch, W. (2014). Toward a computational framework for cognitive biology: Unifying approaches from cognitive neuroscience and comparative cognition. Physics of Life Reviews, 11(3), 329–364.

Fitch, W. et al. (2005). The evolution of the language faculty: Clarifications and implications. Cognition, 97, 179–210.

Hauser, M. et al. (2002). The Faculty of Language: What Is It, Who Has It, and How Did It Evolve? Science, 298(5598), 1569-1579.

Heine, B. and Kuteva, T. (2007). The Genesis of Grammar: A Reconstruction. Oxford: Oxford University Press.

Heinrich, B. (1998). Winter foraging at carcasses by three sympatric corvids, with emphasis on recruitment by the raven. Behavioral Ecology and Sociobiology, 3, 141-156.

Holmberg, A. (2000). Am I Unscientific? A Reply to Lappin, Levine, and Johnson. Natural Language & Linguistic Theory, 18, 837–842.

Honing, H. et al. (2012). Rhesus Monkeys (Macaca mulatta) Detect Rhythmic Groups in Music, but Not the Beat. PLoS ONE, 7(12), e51369.

Hrdy, S. (2009). Mothers and Others:  the evolutionary origins of mutual understanding. Cambridge, Massachusetts: Belknap Press.

Janyes, J. (1976). The Origin of Consciousness in the Breakdown of the Bicameral Mind. Boston, Massachusetts: Houghton Mifflin.

Johnson, D. and Lappin, S. (1997). A Critique of the Minimalist Program. Linguistics and Philosophy, 20, 273–333.

Lappin, S. et al. (2001). The Revolution Maximally Confused. Natural Language and Linguistic Theory, 19, 901–919.

McConnell, E. (2009). From baby babble to childhood chatter: predicting infant and toddler communication outcomes using longitudinal modeling. Kansas: ProQuest.

Gödel, K. (1931). Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme, I. Monatshefte für Mathematik und Physik, 38(1), 173-198.

Pinker, S. and Jackendoff, R. (2004). The faculty of language: what’s special about It? Cognition, 95, 201–236.

Roberts, I. (2000). Caricaturing Dissent. Natural Language & Linguistic Theory, 18, 849–857.

Rosenberg, R. (2013). Infants hierarchically organize memory representations. Developmental Science, 16(4), 610-621.

Terrace, H. (1979). How Nim Chimpsky changed my mind. New York: Ziff-Davis.

Van Heijningen, C. et al. (2009). Simple Rules Can Explain Discrimination of Putative Recursive Syntactic Structures by a Songbird Species. Proceedings of the National Academy of Sciences of the United States of America, 106(48), 20538-20543.

Woodruff, G. and Premack, D. (1981). Primative mathematical concepts in the chimpanzee: proportionality and numerosity. Nature, 293(5833), 568–570.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s