A useful high-level division of species by category would be one that reflected both evolutionary and behavioral reality well enough to make valid predictions. Since behavior is immediately observable and evolutionary history generally involves more indirect inference, it makes sense to categorize behavior first and then look for evolutionary conditions necessary to produce it.
The first and most obvious line that may be drawn is between species with and without intra-generational learning, which is to say with and without neurons. The behavior of species without neurons depends on genome and circumstance — two (e.g.) sea cucumbers or with identical genomes in identical circumstances will behave identically, and large changes in behavior can only be produced over multiple generations by natural selection. In contrast, species with neurons are capable of learning — their behavior is mediated by long-term potentiation of neurons in response to past events, such that two (e.g.) dogs with identical genomes in identical circumstances may respond differently to the same stimulus if they have received different conditioning.
Although creatures with more developed brains have more nuanced heuristics available, this capacity for learning is broadly evident even in species with extremely simple nervous systems, like cockroaches (Watanabe and Mizunami, 2007). This suggests two categories, or more properly a category and a subcategory: life, and neuronal life.
Within the neuronal subcategory, adult modern humans use complex language that can direct and influence the behavior of other humans, including those not immediately present. They are capable not just of associating an arbitrary symbol with an object, but of distinguishing symbols as a category from objects as a category. This requires a theory of mind — for a human to understand that a novel series of symbols will be interpreted correctly by another mind, it is necessary that they understand both that other humans are similar enough to them to interpret the same symbols in the same way, and also that other humans are different enough from them to lack information they have or have information they lack. These abstract linguistic capacities appear to be unique to humans, and so humans can be placed in a third subcategory within neuronal life: conscious linguistic life, a set which currently contains only the human species.
Although complex language and theories of mind appear to be unique to adult humans, they do not develop immediately. Children fail to verbally identify differences in objects present in their own visual fields versus those of other people until they are around 6 years old (Piaget, 1969), do not begin to use complex elaborative syntax until they are around 2 years old, do not use simple word labeling until they are around 1 year old, and do not engage in communicative coordination of regard with another person and an external object until they are about 6 months old (Striano and Reid, 2006).
However, even before 6 months they are capable of protoconversations, mirroring the expressions on other human faces at a delay and coordinating the length of pauses between facial shifts (Beebe, 2014). This behavior implies both that the infant must be storing some kind of representation of another person’s face for the length of the delay and also that they can map this representation to their own face in order to mimic it. Do these pre-linguistic capacities exist in any other species?
Great apes become mobile much more quickly than humans do, and so infant great apes do not spend much time on the face-to-face protoconversations that immobile human infants engage in. However, they are able to pass mirror tests, which involve looking at their reflection and deducing the presence of a mark on their own forehead, about as well as human infants under the same circumstances (Bard et al., 2006). This strongly implies that they must also possess enough of a self-representation to map their own movements to observed movements over time, since they must determine that the movements of the ape in the mirror correspond exactly to their own and are not simply produced by another ape behind glass.
Great apes can also follow gaze and understand opacity (Povinelli and Eddy, 1996) in a manner reminiscent of human infants, and can use this to preferentially steal food that they can tell another ape is unable to see (Hare et al., 2000). Other primates can preserve abstract representations of sequence that simple stimulus-response chaining is inadequate to explain (Terrace 2005). The vast majority of animal species do not display these capacities.
For this reason it makes sense to posit a fourth behavioral category, within neuronal life and containing adult humans, which also contains human infants and arguably contains some other primates and hominids — a preconscious or semiconscious category, with great apes on the low end, human infants on the high end, and extinct hominids in the middle. In this category, organisms can store persistent representations and map their perceptions to internal models, but are unable to produce language or model differences between the states of knowledge of multiple individuals: they have some heuristics available for primary intersubjectivity, but none for secondary intersubjectivity (Beebe, 2003).
Do these four putative nested categories — organisms, neuronal organisms, semiconscious organisms, and fully linguistic organisms — correspond well to the evolutionary record? They appear to map to known clades; all species share a common ancestor, all species with brains share a more recent common ancestor, primates one more recent still, and humans one more recent than all of the others.
To the extent that ontogeny recapitulates phylogeny, therefore, the putative semiconscious category should predict a long period of time in which the human lineage developed and elaborated on preconscious representational abilities already partially present in apes, but did not display the abilities of modern humans to use complex language or elaborate theories of mind. It should also predict that the appearance in non-primate clades of any behaviors that appear to imply stored representation beyond simple behavioral conditioning but do not produce complex language will be produced by broadly similar evolutionary conditions. Moreover, if the structural capacity for speech and theory of mind evolved separately and significantly later than the capacity for symbolic representation, it should be possible to disrupt the former in adult humans while leaving the latter intact.
Dyadic interactions can be described conservatively as imitation at a delay. Infants are capable of initiating complex protoconversations with their mothers by around 6 months, of engaging in protoconversations initiated by the mother by around 3 months (Striano and Reid, 2006), and of subtle but measurable matching adjustments of facial expressions over time almost immediately after birth (Meltzoff and Moore, 1994). Protoconversations are characterized by two-way coordination of facial expressions and vocalizations, with infants first responding to and soon after initiating exchanges of mimicry that involve both partners matching not only their movements but their pace (Beebe).
This implies the capacity to store an abstract representation of a face in working memory, such that observed movements and timing can be recapitulated without an immediate stimulus. Unresponsive faces trigger distress behaviors in infants, which implies the ability to predict a mirrored movement on the part of another and register deviations from that prediction. Dyadic interactions require an ability to detect differences between sensory data and a stored model, for which simple behavioral conditioning cannot account.
As early as 6 months, infants are capable of sustaining mother-initiated interactions involving a third object, alternating attention between the mother and the object; by around 9 months they are initiating such interactions (Striano and Reid, 2006). They not only follow gaze — projecting a ray through three-dimensional space based on the mother’s eye movements and fixating on an object in that direction — but also check back after looking in the same direction, refocusing their attention if they have picked the wrong object as indicated by the mother’s use of indicative gestures or simple verbal labels (Baldwin, 1991). Multiple acts of gaze-following over time require not just a persistent representation of a face or body plan that maps to their own, but of a mind whose state may differ from the state of their own.
Triadic interactions require both the ability to note differences between sensory data and a stored model and the ability to adjust existing models on the fly to match a different model in someone else’s mind, for which simple model storage based on past experience cannot account. The latter capacity is a necessary prerequisite for complex language, as opposed to simple labels, because novel messages are ineffective if the other party cannot be relied upon to understand them; unsurprisingly, triadic proficiencies correlate with later language proficiencies (Brooks and Meltzoff, 2008). Infant-initiated triadic interactions may involve gaze redirection with deliberate communicative intent (Stern, 1971), implying that triadic infants have some capacity to model other humans as lacking information they have or having information they lack. The ability to compare a model of self with a substantially different model of another does not appear to be present in any other species.
Wallace questioned how Darwin’s theory of natural selection could account for human language and consciousness, given that only humans possess these features and that human minds seemed to him much more powerful than could be accounted for by simple selective pressure. After posing this question, he became a spiritualist and concluded that providence had intervened in evolution three times: once to produce multicellularity, once to produce brains, and once to produce human consciousness.
It is no longer considered prudent to speculate on divine intervention in evolutionary history, and so Wallace’s Problem boils down not to whether but to how the physical capacity for language evolved. Our current picture is incomplete, but seems to involve two major leaps in cognitive capacity — one from mirrored representations to differential representations, and one from differential representations to full-fledged language. To address Wallace’s Problem in substance requires us to explain what specific selective pressures produced these developments, what accounts for their apparent sudden appearance, and why they did not occur elsewhere in nature.
The environment of humanity and its immediate antecessors went through several major ecological changes in relatively short succession. The first of these, the deforestation of East Africa around 4 million years ago, produced bipedalism; this major anatomical shift can be explained in the traditional model of gradual change, as incrementally more bipedal individuals would gain incremental advantages in food-gathering by increasing their range and decreasing their energy expenditure (Rodman and McHenry, 1980). This incremental shift is well-attested in the fossil record, and occurs around the same time as the split between the Pan and Australopithecus lineages.
In addition to opening up new frontiers in foraging, bipedalism produces narrow pelvises through which it is difficult to pass an infant. Increased bipedalism therefore tends to produce infants born incrementally earlier in development, which require longer periods of care before being able to feed themselves. This means more pressure to find novel foraging strategies in order to feed infants, which in turn advantages infants born with even larger brains born even more helpless.
This feed-forward loop is enough, all on its own, to eventually produce the most premature possible infant with the largest possible brain; every time a population with slightly larger brains managed to secure more food, that would remove some of the metabolic pressure to keep brain size low, resulting in a population with even larger brains, resulting in pressure to find even more novel methods of securing food. The development of abstract representation more advanced than that displayed by the great apes and the subsequent development of language both occurred within the context of this ongoing process, and accelerated it by temporarily removing some food pressure, allowing smarter and more premature infants to be born and drive food pressure back up again, necessitating further development of novel scavenging and later hunting strategies.
One objection Wallace might have raised to this model is that it posits the sudden emergence of new and complex behaviors without a correspondingly sudden anatomical change — skull size increase was gradual, but the emergence of alloparenting and long-distance hunting were not. Where are the sudden anatomical changes to match the sudden behavioral changes? There are two answers to this, the simplest being that such anatomical changes did occur, but in soft tissue, which does not show up in fossils and for which the only preserved proxy available is skull size.
On reflection, however, there is a more basic explanation: the principal evolutionary advantage of having neurons at all is that they allow an organism to adapt to change faster than trial-and-error by reproduction can allow. The range of potential behaviors that a particular critical mass of neurons can allow for is necessarily much, much wider than the range of behavior it has produced to date — the capacity must evolve before the behavior can emerge. Canids existed for a long time before anyone taught them tricks, and humans were anatomically capable of building steam engines long before it became common behavior for them.
When Erectus developed alloparenting it had already been around for ~800 thousand years, but then suddenly had marked increase in foraging efficiency and therefore calories available to further maximize brain size and prematurity. If the model is correct, the rate of change in skull size between Australopithecus afarensis and Homo erectus should be less than the rate of change in skull size between Homo erectus and Homo sapiens. The fossil record supports this: from afarensis to erectus, cranial capacity increased from an average of 430 to 850 cubic centimetres over roughly 2 million years, and from erectus to sapiens average cranial capacity increased from 850 to 1400 cubic centimetres over about the same span of time.
So the two answers to Wallace are first that selective pressures do, in fact, account for the development of capacities known to underlie language once anatomical feed-forward loops are taken into account, and second that large brains are physically capable of implementing new behaviors long before those behaviors actually appear, such that they may emerge spontaneously and reproductively privilege the individual in which they occur.
Moreover, the ability to imitate observed behaviors (which likely emerged with alloparenting) and the ability to communicate novel ideas by combining existing words (which possibly emerged with big game hunting) both enable a given technique to spread to other individuals with the same cognitive capacities immediately, rather than privileging only the offspring of the individual who invented them, further accounting for the sudden emergence and spread of things like tool cultures on sub-evolutionary timescales.
The principal similarity between computer memory as currently implemented and biological memories is that information and methods of processing that information are stored in the same medium. In a computer data and instructions are stored in the same medium — any string of bytes could represent either a program or data or both, depending on context — and in brains memories seem to be stored and retrieved in a fashion inextricable from processing context.
In most other respects, computer memory is more reminiscent of the operation of individual cells than of any inter-cellular process like a brain. In a computer, a program composed of a pattern of binary bits — 1s and 0s — is copied from storage into working memory, interpreted by a processor, and outputs data that in turn can sometimes affect the program’s own future execution somewhere down the line; in a cell, a gene composed of a pattern of quaternary nucleotides — A’s C’s T’s and G’s — is copied from DNA to RNA, interpreted by a ribosome, and outputs a protein, sometimes a protein that in turn can sometimes affect the DNA’s own future structure somewhere down the line. The original abstract conception of computation (Turing, 1936) — an interpreter which iterates along an infinitely long two-symbol tape — bears more than a passing resemblance to the operation of ribosomes reading four-symbol sequences from a 3-billion base-pair long genome.
Biological memory as implemented in neurons differs in that there appear to be no atomic engrams — no one has isolated a quantum of change in brains equivalent to a single-base-pair mutation or a single-bit flip. The simplest form of neuronal memory is behavioral conditioning, which is demonstrable by long-term potentiation in response to repeated stimuli even in extremely simple nervous systems. This preconscious neuronal learning is entirely nonsymbolic, and behaviors produced are generally entirely predictable from the conditioning stimuli, but every retrieval of a response via a stimulus changes the action potentials involved — computer memory becomes ‘sticky’ like this only in cases of extreme malfunction.
In human memory, there is a second system in play, one that maps stored representations onto perceptual input. It operates in a way that bears some resemblance to hypothesis testing, in that low levels of difference between the internal model and the sensory data result in the model being projected onto the data to fill in any gaps, and high levels of difference result in behaviors associated with salience and surprise. Bottom-up processing appears to depend on on AMPA glutamate receptor activity, and top-down processing on NMDA receptor activity; dopamine codes for the level of predictive error (Corlett, Firth, and Fletcher, 2009).
The cognitive effects of several psychoactive drugs fit this paradigm — for instance, PCP, which blocks NMDA receptor transmission, gives you exactly the sort of delusions and perceptual apophenia you might expect under such a paradigm. Dyadic human infants are already capable of this two-system comparison between reality and stored representations, and the fact that some primates can be shown to store representations (Terrace, 2005) suggests they have some glimmering of the same capacity. This is not semiotics in the sense the term is normally used, because it does not require explicit communication between organisms, but it does allow for a feedback loop that can generate novel behaviors by projecting abstract representations onto perceived reality in a manner more complex than conditioned memories in simpler animals can manage.
Computer memory does not behave like this on small scale, but large networks of computers can implement somewhat analogous processes whose deficits can point to the necessity of other systems to explain adult human memories. Google Deep Dream, an experimental computer vision project, was built to recognize objects in videos by projecting pre-existing internal models based on a very large dataset of categorized images it trained on. The system quickly became famous for its apophenia — for instance, after associating millions of pictures of dogs from various angles with the general shape of a dog, it started seeing dogs everywhere, mapping them onto vaguely dog-shaped objects in the scenes with which it was presented.
Missing from this two-system picture, in a Bayesian sense, is the capacity to update prior models reliably. That capacity is fundamental to triadic interactions in human infants — they are constantly checking back with their mother to see if they are schematizing the external object to her satisfaction. It is also notably impaired in adult humans with damage to their anterior cingulates — they retain the ability to judge whether what they are seeing matches their internal schema, but they have trouble updating the schema (Mars, Sallet, and Rushworth, 2011). If this function was not present in early hominids, patients with this kind of brain damage may be engaging in essentially atavistic cognition, the cognition of the semiconscious category. Updating priors based on input from another mind requires storing an abstract representation of minds sufficiently complex to account for differences in knowledge between them.
Schizophrenics, and paranoid schizophrenics in particular, famously suffer from intractable delusions of reference, believing strange things despite overwhelming evidence to the contrary. They seem partially unable to distinguish between symbols of reality and reality itself — they tend to confuse the thought of a voice with the sound of one, and will often fixate on seemingly irrelevant objects or phenomena and impute profound meaning to them, or hallucinate things they have schemas for onto sensory data that doesn’t really match it very well. They also tend to have too much dopamine, hypofunctioning NMDA receptors, and abnormalities in their anterior cingulate cortex (Coyle, 2006), all in line with the interrupted Bayesian model.
Schizophrenics have arguably lost secondary intersubjectivity and retained primary intersubjectivity — they can no longer verify their perceptions of an external object with another person, but can still manage to store persistent representations of abstract schemas (often to the point where it takes years to convince them their schemas are wrong). They also retain language, which might seem to imply that secondary intersubjectivity actually developed long after language did (Jaynes, 1976) — however, schizophrenia develops late in life after secondary intersubjectivity and language have already been present for years, so it is impossible to tell whether they could have developed language if they had developed without secondary intersubjectivity from birth. It is not necessary to posit a speech-catalyzed plague of hypertrophied cingulates to explain the leap from primary to secondary intersubjectivity in primates.
If there were some other clade whose history included habitual bipedalism, rapid adaptation to major ecological change, a reproductive bottleneck leading to helpless infants, complex social behaviors, and the dyadic ability to mimic complex sequences at a delay, it would be possible to argue that parallel evolution puts them in the same preconscious category as human infants and early hominids — that is, to present a case that they may possess some potential for primary but no potential for secondary intersubjectivity.
Certain avians fit these criteria: they are descended from dinosaurs which independently developed bipedalism in the Triassic, survived major climatic change and migrated to many novel climes, lay eggs which are necessarily small enough not to preclude flight, hatch unable to fly or feed themselves, and are capable of mimicking complex songs and sometimes human speech at a delay and with conversational pacing. Some even alloparent (Anctil and Franke, 2013).
Corvids in particular are capable of solving very complex puzzles — they also pass the mirror test, use simple tools, and will re-hide cached food when they notice another bird watching them if and only if they themselves have stolen a cache in the past (Clayton and Dally, 2007). This strongly implies they are capable storing models of the world and comparing them with current sensory input in an analogous way to semiconscious primates.
There are, broadly speaking, two forms of language: single-word associations, and complex recursive syntax. The former is already present in great apes, which can readily be conditioned to associate a hand sign or computer symbol with a particular object. However, these symbols are always imposed from the outside — apes do not generate new symbols or sequences of symbols. The particular advantage of language is not in the ability to use labels, which apes and dogs do in ways that can be explained by simple behavioral conditioning, but in the ability to generate arbitrary new labels.
Human words are different from, for example, variegated warning calls specific to particular predators as seen in e.g. Campbell’s monkeys (Schlenker et al., 2014), in that new ‘calls’ for new referents can be generated at will and spread among a group, rather than standardizing over evolutionary timescales. This capacity is a prerequisite for more complex grammatical language, and requires secondary intersubjectivity. Secondary intersubjectivity involving the representation of multiple differing states of mind is thought to have emerged with Homo erectus roughly 1.2 million years ago, on the basis that a sudden increase in the efficacy of scavenging could be attributed to alloparenting, which would at once allow more adults to engage in foraging unencumbered by infants and privilege infants capable of making distinctions between caregivers (Hrdy 2009).
Alloparenting exists in many species, including some primate species, but not in any of the great apes — for it to emerge in the human lineage so quickly suggests that the behavior in this case was a neurological innovation and not a genetic one, an innovation made possible by the relentless feed-forward loop of bipedalism and extra cranial capacity. Somewhat contra Jaynes, the triadic capacity likely preceded and was necessary for language to begin to develop beyond simple labels — sentences with recursive grammar communicate novel ideas, and to transmit a novel idea by a series of symbols implies a persistent model of another mind with a notably different state of knowledge. To make an argument about when such nontrivial language emerged along the same lines as the argument for alloparenting would require describing a behavior that could not be accomplished without nontrivial language.
Speculatively, long-distance persistence hunting, which emerged later than simple group scavenging, might be a candidate: bipedalism is great for endurance running, but extends the range so much that the hunters might wind up very far away from the band they intend to feed, and it would make much more sense to send someone back to fetch the band than to drag a large kill back to them. That would require communicating novel information in a relay. Sending a messenger would arguably require at least enough language to relate something like “the others sent me to come bring you to [a place you have never been]” — the messenger would need to convey the state of mind of the hunters still with the kill to the remainder of the band, and hold that message in a form persistent enough to survive a lengthy journey reliably.
This is somewhat analogous to the difference between triadic but preverbal infants and verbal children — the triadic infant can distinguish between two minds, but has little means available to convey a thirdhand message about an absent party. One present, language would also allow tighter social coordination of hunting behaviors and enable less primitive forms of hunting to emerge.
Anctil, A., & Franke, A. (2013). Intraspecific Adoption and Double Nest Switching in Peregrine Falcons (Falco peregrinus). Arctic, 66(2), 222-225.
Baldwin, D. (1991). Infants’ Contribution to the Achievement of Joint Reference. Child Development, 62(5), 875-890. doi:10.2307/1131140
Bard, K. A., Todd, B. K., Bernier, C., Love, J., & Leavens, D. A. (2006). Self-Awareness in Human and Chimpanzee Infants: What Is Measured and What Is Meant by the Mark and Mirror Test?. Infancy, 9(2), 191-219. doi:10.1207/s15327078in0902_6
Beebe, B. (2003). A Comparison of Meltzoff, Trevarthen, and Stern. Psychoanalytic dialogues, 13(6), 777-804.
Beebe, B. (2014). My journey in infant research and psychoanalysis: Microanalysis, a social microscope. Psychoanalytic psychology, 31(1), 4-25. doi:10.1037/a0035575
Brooks, R., and Meltzoff, A. (2008) Infant gaze following and pointing predict accelerated vocabulary growth through two years of age: a longitudinal, growth curve modeling study. Journal of Child Language, 35(1), 207-220.
Clayton, N. S., Dally, J. M., & Emery, N. J. (2007). Social cognition by food-caching corvids. The western scrub-jay as a natural psychologist. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1480), 507–522. doi:10.1098/rstb.2006.1992
Corlett, P. R., Frith, C. D., & Fletcher, P. C. (2009). From drugs to deprivation: a Bayesian framework for understanding models of psychosis. Psychopharmacology, 206(4), 515–530. doi:10.1007/s00213-009-1561-0
Coyle, J.T. (2006). Glutamate and Schizophrenia: Beyond the Dopamine Hypothesis. Cell and Molecular Neurobiology, 26, 363-382. doi:10.1007/s10571-006-9062-8
Hare, B., Call, J., Agnetta, B., and Tomasello, M. (2000). Chimpanzees know what conspecifics do and do not see. Animal Behaviour, 59(4), 771–785.
Hrdy, S. (2009). Mothers and Others: The Evolutionary Origins of Mutual Understanding. Boston: Harvard University Press.
Janyes, J. (1976). The Origin of Consciousness in the Breakdown of the Bicameral Mind. Boston: Houghton Mifflin.
Mars, R. B., Sallet, J., & Rushworth, M. F. S. (2011). Neural Basis of Motivational and Cognitive Control. Cambridge: The MIT Press.
Meltzoff, A. and Moore, M. (1994). Imitation, memory, and the representation of persons. Infant Behavior and Development, 17(1), 83-99. doi:/10.1016/0163-6383(94)90024-8.
Povinelli, D., & Eddy, T. (1996). Chimpanzees: Joint visual attention. Psychological Science, 7(129-135).
Piaget, J. (1969). The psychology of the child. Basic Books.
Rodman, Peter S.; McHenry, Henry M. (1980). Bioenergetics and the origin of hominid bipedalism. American Journal of Physical Anthropology, 52, 103–106. doi:10.1002/ajpa.1330520113
Schlenker, P., Chemla, E., Arnold, K., Lemasson, A., Ouattara, K., Keenan, S., . . .
Zuberbühler, K. (2014). Monkey semantics: Two ‘dialects’ of campbell’s monkey alarm calls. Linguistics and Philosophy, 37(6), 439-501. doi:/10.1007/s10988-014-9155-7
Stern, D. (1971), A microanalysis of mother–infant interaction. Journal of the American Academy of Child Psychology, 19:501–517.
Striano, T., & Reid, V. M. (2006). Social cognition in the first year. Trends in Cognitive Sciences, 10(10), 471 – 476. doi:10.1016/j.tics.2006.08.006
Terrace, Herbert S. (2005) The Simultaneous Chain: A New Approach to Serial Learning. TRENDS in Cognitive Sciences, 9(4), 202-210.
Turing, A. M. (1936) On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 2(42), 230–65.
Watanabe, H., & Mizunami, M. (2007). Pavlov’s cockroach: Classical conditioning of salivation in an insect. PLoS One, 2(6) doi:/10.1371/journal.pone.0000529