The Language Mosaic and its Evolution
James R. Hurford
Edited by Morten H. Christiansen Simon Kirby
Oxford University Press 2003
The Language Mosaic and its Evolution
James R. Hurford
It is natural to ask fact-demanding questions about the evolution of language, such as 'Did Homo erectus use syntactic language? ‘When did relative clauses appear?' and 'What language was spoken by the first Homo sapiens sapiens who migrated out of Africa?
One function of science is to satisfy a thirst for such answers to questions comprehensible in everyday terms, summarized as 'What happened, and when?' Such questions are clearly genuinely empirical; there is (or was) a fact of the matter. A time‑travelling investigator could do fieldwork among the Homo erectus and research the first question, and then make forward jumps in time and research the other questions. I believe, however, that study in the evolution of language will not yield answers to such questions in the near future. Therefore, finding answers to such empirical‑in‑principle questions cannot be the purpose of language evolution research. The goal is, rather, to explain the present.
Evolutionary linguistics does not appeal to an apparatus of postulated abstract principles specific to the subject to explain language phenomena. Language is embedded in human psychology and society, and is ultimately governed by the same physical principles as galaxies and mesons. Not being physicists, or even chemists, we can take for granted what those scientists give us. In the hierarchy of sciences'up'from physics, somewhere around biochemistry, and, on a parallel track, in mathematics and computational theory, facts begin to appear which can be brought to bear on the goal of explaining language. Ihese facts are not in themselves linguistic facts, but linguistic facts are distantly rooted in them. The basic linguistic facts needing explanation are these: there are thousands of different languages spoken in the world; these languages have extremely complex structure; and humans uniquely (barring a tiny minority of pathological cases) can learn any of these languages. These broad facts subsume an army of more detailed phenomena pertaining to individual languages. Such facts are, of course, the standard goals of linguistics. But modern mainstream linguistics has ignored the single most promising dimension of explanation, the evolutionary dimension.
Linguistic facts reflect acquired states of the brains of speakers. Those brains were bombarded in childhood with megabytes of information absorbed from the environment through various sensory channels, and influencing (but not wholly determining) neurogenesis. Ihe grown neurons work through complex chemistry, sending information at various speeds and with varying fidelity buzzing around the brain and out into the muscles, including those of the vocal tract. This is a synchronic description, reduced almost to caricature, of what happens in an extremely complex organism, a human, giving rise to linguistic facts, our basic explananda. Facts of particular languages are themselves partly the result of specific historical contingencies which we cannot hope to account for in detail. Collecting such facts is work at the indispensable descriptive coalface of linguistics. The theoretical goals of linguistics must be set at a more general level, accounting for the range, boundaries, and statistical distribution of language‑specific facts.
Much of biology, like most of linguistics, is devoted to wholly descriptive synchronic accounts of how living organisms work. But at one time in the world's history there were no living organisms. The evolutionary branch of biology aims to explain how the observed range of complex organisms arose. With the unravelling of the structure of DNA, evolutionary theory began the reductive breakthrough, still incomplete, from postulating its own characteristic abstract principles to a sound basis in another science, chemistry. Any evolutionary story of how complex organisms arose must now be consistent with what we know about the behaviour of molecules. Ihe evolutionary story must also be consistent with knowledge from another new and independent body of theory, represented in the early work of D'Arcy-Thompson (1961) and recently by such work as Kauffman (1993; 1995) and West et al. (1997). This work emphasizes that the environment in which natural selection operates is characterized by mathematical principles, which constrain the range of attractor states into which evolution can gravitate.
The evolutionary biologists Maynard Smith and Szathmary (1995) have identified eight ‘major transitions in evolution'. Their last transition is the emergence of human societies with language. Chomsky has stressed that language is a biological phenomenon. But prevalent contemporary brands of linguistics neglect the evolutionary dimension. The present facts of language can be understood more completely by adopting an evolutionary linguistics, whose subject matter sits at the end of a long series of evolutionary transitions, most of which have traditionally been the domain of biology. With each major transition in evolution comes an increase in complexity, so that a hierarchy of levels of analysis emerges, and research methods necessarily become increasingly convoluted, and extend beyond the familiarly biological methods. Evolution before the appearance of parasitism and symbiosis was simpler. Ontogenetic plasticity, resulting in phenotypes which are not simply predictable from their genotypes, and which may in their turn affect their own environments, further complicates the picture. The advent of social behaviour necessitates even more complex methods of analysis, many not susceptible to mathematical modelling, due to the highly non‑linear nature of advanced biosocial systems.
With plasticity (especially learning) and advanced social behaviour comes the possibility of culture, and a new channel of information transfer across generations. Cultural evolution, mediated by learning, has a different dynamic from biological evolution; and, to make matters even more complex, biological and cultural evolution can intertwine in a co-evolutionary spiral.
The key to explaining the present complex phenomena of human language lies in understanding how they could have evolved from less complex phenomena. Ihe fact that human language sits at the end (so far!) of a long evolutionary progression certainly poses a methodological challenge. Nevertheless, it is possible to separate out components of the massively complex whole, and to begin to relate these in a systematic way to the present psychological and social correlates of language and to what we can infer of their evolutionary past. Modern languages are learned by, stored in, and processed online by evolved brains, given voice by evolved vocal tracts, in evolved social groups. We can gain an understanding of how languages, and the human capacity for language, came into existence by studying the material (anatomical, neural, biochemical) bases of language in humans, related phenomena in less evolved creatures, and the dynamics of populations and cultural transmission.
A basic dichotomy in language evolution is between the biological evolution of the language capacity and the historical evolution of individual languages, mediated by cultural transmission (learning). In the next section I will give a view of relevant steps in the biological evolution of humans towards their current fully‑fledged linguistic capacity.
Biological Steps to Language‑Readiness
In this section, I review some of the cognitive pre-adaptations which paved the way for the enormously impressive language capacity in humans. While these pre‑adaptations do not in themselves fully explain how the full, uniquely human ability finally emerged, they do give us a basis for beginning to understand what must have happened.
A pre‑adaptation is a change in a species which is not itself adaptive (i.e. is selectively neutral) but which paves the way for subsequent adaptive changes. For example, bipedalism set in train anatomical changes which culminated in the human vocal tract. Though speech is clearly adaptive, bipedalism is not itself an adaptation for speech; it is a pre‑adaptation. This example involves the hardware of language, the vocal tract.
Many changes in our specie’s software, our mental capacities, were necessary before we became language‑ready; these are cognitive pre‑adaptations for language. Preadaptations for language involved the following capacities or dispositions:
1. A pre‑phonetic capacity to perform speech sounds or manual gestures.
2. A pre‑syntactic capacity to organize longer sequences of sounds or gestures.
3. Pre‑semantic capacities:
a. to form basic concepts;
b. to construct more complex concepts (e.g. propositions);
c. to carry out mental calculations over complex concepts.
4. Pre‑pragmatic capacities:
a. to infer what mental calculations others can carry out;
b. to act cooperatively;
c. to attend to the same external situations as others;
d. to accept symbolic action as a surrogate for real action.
5. An elementary symbolic capacity to link sounds or gestures arbitrarily with basic concepts, such that perception of the action activates the concept, and attention to the concept may initiate the sound or gesture.
If some capacity is found in species distantly related to humans, this can indicate that it is an ancient, primitive capacity. Conversely, if only our nearest relatives, the apes, possess some capacity, we can conclude that it is a more recent evolutionary development. Twin recurring themes in the discussion of many of these abilities are learned, as opposed to innate, behaviour and voluntary control of behaviour.
Voluntary control is a matter of degree, ranging from involuntary reflex to actions whose internal causes are obscure to us. Probably all vertebrates can be credited with some degree of voluntary control over their actions. some sense, and in some circumstances, they 'decide' what to do. In English 'voluntary' is reserved for animate creatures. Only jokingly do we say of a machine that it 'has a mind of its own', but this is precisely when we do not know what complex internal states lead to some unpredicted behaviour. 'Volontary' is used to describe whole actions. If actions are simple, they may, like reflex blinking, be wholly automatic, and involuntary. If an action is complex, although the whole action may be labelled 'voluntary' it is likely to have an automatic component and a non‑automatic component. Both the automatic and the non‑automatic component may be determined by complex processes obscure to us. What singles humans out from other species is capacity to acquire automatic control, in the space of a few years, of the extremely complex syntactic and phonological processes underlying speaking and understanding language. Such automatization must involve the laying down of special neural structures. It seems reasonable to identify some sul set of these neural structures with what linguists call a grammar. Ihe she size of the information thus encoded (languages are massive) testifies to th enormous plasticity, specialized to linguistic facts, of the human brain.
Human languages are largely learned systems. The more ways a specie is plastic in its behaviour, the more complex are the cultural traditions, including languages, that can emerge. Our nearest relatives, the chimpanzee are plastic in a significantly wider range of behaviours than any other no human animals; their cultural traditions are correspondingly more multifaceted, while falling far short of human cultural diversity and complexity, Combined with plasticity, voluntary control adds more complexity, an unpredictability, to patterns of behaviour. Much of the difference between humans and other species can be attributed to greatly increased plasticity and voluntary control of these pre‑adaptive capacities.
Chimpanzees cannot speak. They typically have little voluntary breath control. To wild chimpanzees, voluntary breath control does not come naturally. On the other hand, chimpanzees have good voluntary control over their manual gestures, although they are not as capable as humans of delicate manual work. A pre‑adaptation that was necessary for the emergence of modern spoken language was the extension of voluntary control from the hands to the vocal tract.
Learning controlled actions by observation entails an ability to imitate. Imitation involves an impressive 'translation' of sensory impressions into motor commands. Think of a smile. Without mirrors or language, one has no guarantee that muscle contractions produce the effect one perceives in another's face. Given the required voluntary control, and the anatomical hardware, imitation of speech sounds should be easier than imitation of facial gestures, because one can hear one's own voice. A capacity for imitation is found in a perplexing range of species. Some birds can imitate human speech, and many other sounds as well. Dolphins can be trained to imitate human movements. A capacity for imitation can evolve separately in different species, with or without the other necessary pre‑adaptive requirements for human language. A neural basis of imitation has been found in monkeys in the form of' mirror neurons', which fire both when an animal is carrying out a certain action, such as grasping, and when it observes that same action carried out by another animal. A recurrent theory in phonetics is the 'motor theory of speech perception', which claims that speech sounds are represented in the brain in terms of the motor commands required to make them.
Although they cannot speak, our ape cousins have no trouble in recognizing different spoken human words. The capacity to discriminate the kinds of sounds that constitute speech evidently preceded the arrival of speech itself.
Syntax involves the stringing together of independent subunits into a longer signal. We are concerned in this section with what Marler (1977) calls 'phonological syntax', as opposed to 'lexical syntax' In phonological syntax the units, like the letters in a written word, have no independent meaning. In lexical syntax the units, such as the words in an English sentence, have meanings which contribute to the overall meaning of the whole signal. Many bird species can learn songs with phonological syntax. Oscine birds, which learn complex songs, are very distant relatives of humans. Many other birds, and more closely related species, including most mammals, do not produce calls composed of independent sub‑units. Our closest relatives, the apes, do produce long calls composed of sub‑units. The long calls of gibbons are markers of individual identity, for advertising or defending territory. The subunit notes, used in isolation, out of the context of long calls, are used in connection with territorial aggression, and it is not clear whether the meanings of these notes can be composed by any plausible operation to yield the identity‑denoting meaning of the whole signal. Male gibbon singing performances are notable for their extreme versatility. Precise copies of songs are rarely repeated consecutively, and the song repertoires of individual males are very large. Despite this variability, rules govern the internal structure of songs. Male gibbons employ a discrete number of notes to construct songs. Songs are not formed through a random assortment of notes. The use of note types varies as a function of position, and transitions between note types are nonrandom. (MitaniandMarlerl989:35)
Although it is fair to call such abilities in apes 'pre‑syntactic’, they are still far removed from the human ability to organize sequences of words into complex hierarchically organized sentences. Little is known about the ability of apes to learn hierarchically structured behaviours, although all researchers seem to expect apes to be less proficient at it than humans; see Byrne and Russon (1998) and Whiten (2000) for some discussion.
Basic Concept Formation
Many species lead simple lives, compared to humans, and even to apes, and so may not possess very many concepts, but they do nevertheless possess them. 'Perceptual categorization and the retention of inner descriptions of objects are intrinsic characteristics of brain function in many other animals apart from the anthropoid apes' (Walker 1983: 378).
The difference between humans and other animals in terms of their inventories of concepts is quantitative. Animals have the concepts that they need, adapted to their own physiology and ecological niche. What is so surprising about humans is how many concepts they have, or are capable of acquiring, and that these concepts can go well beyond the range of what is immediately useful. Basic concrete concepts, constituting an elementary pre‑semantic capacity, were possessed by our remote ancestors. (A good survey appears in Jolly 1985: ch. 18; see also Allen and Hauser 1991.)
Something related to voluntary control is also relevant to pre‑semantic abilities. We need not be stimulated by the presence of an object for a concept of it to be evoked. Some animals may have this to a limited degree. When an animal sets off to its usual foraging ground, it knows where it is going, because it can get there from many different places, and even take new routes. So the animal entertains a concept of a place other than where it currently is. But for full human language to have taken off, a way had to evolve of mentally reviewing one's thoughts in a much more free‑ranging way than animals seem to use
Complex Concept Formation
The ability to form complex conceptual structures, composed systematically of parts, is crucial to human language. Logical predicate‑argument structure underlies the messages transmitted by language. Ihe words constituting human sentences typically correspond to elements of a conceptual/logical representation. While apes may perhaps not be capable of storing such complex structures as humans, it seems certain that they have mental representations in predicate‑argument form. Simply attending to an object is analogous to assigning a mental variable to it, which functions as the argument of any predicate expressing a judgement made by the animal. Ihe two processes of attending to an object and forming some judgement about it are neurologically separate, involving different pathways (dorsal and ventral) in the brain. This is true not only for humans but also for apes and closely related monkeys (see argument and references in Hurford 2003.) It seems certain that all species closely related to humans, and many species more distantly related, have at least this representational capacity, which is a pre‑semantic pre‑adaptation for language.
Humans are not the only species capable of reasoning from experienced facts to predictions about non‑experienced states of affairs. There is a large literature on problem‑solving by animals, leading to ranking of various species according to how well they perform in some task involving simple inference from recent experience (see Krushinsky 1965 for a well‑known example). Apes and monkeys perform closest to humans in problem‑solving, but their inferential ability falls short of human attainment.
Mind‑reading and manipulation
When a human hears an utterance, he has to figure out what the speaker intended; this is mind‑reading. When a human speaks, she does so with some estimation of how her hearer will react; this is social manipulation. Humans have especially well‑developed capacities for social manipulation and mindreading, and these evolved from similar abilities in our ancestors, still visible in apes. Social intelligence, a well‑developed ability to understand and predict the actions of fellow members of the group, was a necessary prerequisite for the emergence of language. Recent studies amply demonstrate these manipulation and mind‑reading abilities in chimpanzees (Byrne and Whiten 1988; de Waal 1982; 1989; Hare et aL 2001).
People can understand the intended import of statements whose literal meanings are somehow inappropriate, such as It's cold in here, intended as a request to close the window. To explain how we cope with such indirectness, traditional logic has to be supplemented by the Cooperative Principle (Grice 1975), which stipulates that language users try to be helpful in specified ways. Ihe use of language requires this basis of cooperativeness. No such complex communication system could have evolved without reliable cooperativeness between users.
Humans are near the top of the range of cooperativeness. The basis of cooperation in social insects is entirely innate, and the range of cooperative behaviours is small. In humans, building onto a general natural disposition to be cooperative, cooperation on a wide range of specific group enterprises is culturally transmitted. Children are taught to be team players' No concerted instruction in cooperation exists outside humans, but there are reports of cases where an animal appears to be punished for some transgression of cooperativeness (Hauser 1996: 107‑9). So the basis for cooperative behaviour, and for the instilling of such behaviour in others, exists in species closely related to humans. Chimpanzees and bonobos, in particular, frequently engage in reconciliation and peacemaking behaviour (de Waal 1988; 1989). Dispositions to cooperation and maintenance of group cohesion are pragmatic cognitive pre‑adaptations for language.
Cats are inept at following a pointing finger; dogs are better. Language is also used to 'point at' things, both directly and indirectly. Linguists and philosophers call this 'reference'. When a speaker refers to some other person, say by using a personal pronoun, the intention is to get the hearer to attend to this other person. Successful use of language demands an ability to know what the speaker is talking about. A mechanism for establishing joint attention is necessary. Human babies and children are adept at gaze‑ and finger following (Franco and Butterworth 1996; Morales et al.2000; Charman et al.2000). Ihe fact that humans, uniquely, have whites to their eyes probably helps us to work out what other people are looking at.
Primates more closely related to humans are better at following the human gaze than those less closely related (Itakura 1996). Chimpanzees follow human gaze cues, while non‑ape species such as macaques fail to follow human gaze cues. But experiments on rhesus macaques interacting with other rhesus macaques show that these animals do follow the gaze of conspecifics (Emery et al.1997). Spontaneous pointing has also been observed in captive common chimpanzees (who had not received language training) (Leavens et al. 1996) and in young free‑ranging orangutans (Bard 1992). It thus appears that animals close to humans possess much of the cognitive apparatus for establishing joint attention, which is the basis of reference in language.
Short greetings such as Hello! and Hi! are just act‑performing words; they do not describe anything, and they cannot be said to be true or false. We can find exactly such act‑performing signals in certain ritualized actions of animals. lhe classic example of a ritualized action is the snarling baring of the teeth by dogs, which need not precede an imminent attack, and is a sign of hostility. Human ritualized expressions such as Hello! are relics of ancient animal behaviour, mostly now clothed in the phonemes of the relevant language. But some human ritualized expressions, such as the alveolar click, 'tsk', indicating disapproval, are not assimilated into the phonology of their language (in this case English). The classic discussion of ritualization in animal behaviour is Tinbergen (1952), who noted the signal's 'emancipation'from its original context. lEis process of dissociation between the form of the signal and its meaning can be seen as the basis of the capacity to in the next section. (See Haiman 1994 for a more extended argument that ritualization is a central process in language evolution.)
Elementary Symbolic Capacity
The sound of the word tree, for instance, has no iconic similarity with any property of a tree. ltis kind of arbitrary association is central to language. Linguistic symbols are entirely learned. This excludes from language proper any possible universally instinctive cries, such as screams of pain or whimpers of fear. In the wild, there are many animals with limited repertoires of calls indicating the affective state of the animal. In some cases, such calls also relate systematically to constant aspects of the environment. The bestknown example is the vervet monkey alarm system, with distinctive calls for different classes of predator. There is no evidence that such calls are learned to any significant degree. Thus no animal calls, as made in the wild, can as yet be taken as showing an ability to learn an arbitrary mapping from signal to message.
Trained animals, on the other hand, especially apes, have been shown to be capable of acquiring arbitrary mappings between concepts and signals. Ihe acquired vocabularies of trained apes are comparable to those of 4‑yearold children, with hundreds of learned items. An ape can make a mental link between an abstract symbol and some object or action, but the circumstances of wild life never nurture this ability, and it remains undeveloped.
The earliest use of arbitrary symbols in our species was perhaps to indicate personal identity (Knight 1998; 2000).1hey replaced non‑symbolic indicators of status such as physical size, and involuntary indexes such as plumage displays. In gibbons, territorial calls also have features which can indicate sex, rank and (un)mated condition (Cowlishaw 1992; Raemaekers et al.1984).
The duetting long call behaviour of chimpanzees and bonobos, where one animal matches its call to that of another, indicates some transferrability of the calls between individuals, and an element of learning. But such duetting is probably 'parrot‑like, in that the imitating animal is not attempting to convey the 'meaning' (e.g. rank, identity) of the imitated call. The duetting behaviour is not evidence of transfer of symbolic behaviour from one individual to another. Probably the duetting behaviour itself has some social/pragmatic significance, perhaps similar to grooming.
In humans the ability to trade conversationally in symbols comes naturally. Even humans have some difficulty when the symbol clashes with its meaning, for example if the word red is printed in green. Humans can over come such difficulties and make a response to the symbol take precedence over the response to the thing. But chimpanzees apparently cannot suppress an instinctive response to concrete stimuli in favour of response to symbols With few exceptions, even trained apes only indulge in symbolic behaviour to satisfy immediate desires. The circumstances of wild chimpanzee life have not led to the evolution of a species of animal with a high readiness or willingness (as with humans) to use symbols, even though the rudiments of symbolic ability are present.
All these pre‑adaptations illustrate cases where some ability crucial to developed human language was present, if to a lesser degree, in our prelinguistic ancestors. Note that the levels of linguistic structure where language interfaces with the outside world, namely phonetics, semantics and pragmatics, were (apart from motor control of speech) in all likelihood relatively closer to modern human abilities tban the 'core' levels of linguistic structure, namely phonology and morpho-syntax. The elaborated phonology and syntax so characteristic of full human language came late to the scene. In modern humans, syntactic and phonological organization of utterances though learned, is largely automatic, not under conscious control. In a sense then, language evolved 'from the outside in'; the story is of a widening gap bridged by learnable automatic processes, between a signaller's intentions (meanings) and the signal itself. Near the beginning, there were only simple calls analogous to English Hello, in which an atomic signal is directly mapped onto an atomic social act. Every human utterance is still a speech act of some sort. We now have the possibility of highly sophisticated speech acts, whose interpretation involves decoding of a complex signal into a complex conceptual representation, accompanied by complex calculations to derive the likely intended social force of the utterance. Ihe crucial last biological step towards modern human language capacity was the develop ment of a brain capable of acquiring a much more complex mapping be tween signals and conceptual representations, giving rise to the possibility of the signals and the conceptual representations themselves growing in complexity. In the first generations after the development of a brain capable of acquiring such a complex mapping, communication was not necessarily much more complex. Ihe actual complex structures that we now find in the communication systems (i.e. languages) of populations endowed with such brains may have taken some time to emerge. Ihe mechanisms by which languages grew in biologically language‑ready populations will be discussed in the next Section.
Cultural Evolution of Languages
The Two‑Phase Nature of Langunge Transmission
I have referred earlier to the 'phenomena of human language'. Modern linguistics focuses equally, if not more, on the noumena of language. A noumenon/phenomenon distinction pervades linguistics from Saussure's langue and parole, through Chomsky's competence and performance to his later I(nternal)‑language and E(xternal)‑language. Chomsky's postulation of competence attributes psychological reality to the language system, held in individual minds. Ihis contrasts with Saussure's characterization of langue as an entity somehow belonging to the language community. The move to individual psychological reality paved the way for an explanatory link between the evolution of language and biological evolution. Modern linguistics, preoccupied with synchronic competence, has yet to realize the potential for explaining both linguistic phenomena and linguistic noumena in terms of a cyclic relationship between the two, spiralling through time.
Spoken utterances and particular speech acts located in space and time are produced by speakers guided by knowledge of grammatical wellformedness, paraphrase relations, and ambiguity. This knowledge was in turn formed in response to earlier spoken utterances and speech acts, as users acquired their language. Modern linguistics has tended to characterize the overt phenomena of language, the spatio‑temporal events of primary linguistic data (PLD), as 'degenerate, and of little theoretical interest. The burden of maintaining the system of a language, as it is transmitted across generations, has been thrust almost wholly onto the postulated innate cognitive apparatus, which makes sense of the allegedly chaotic data in similar ways in all corners of the globe, resulting in linguistic universals.
Clearly humans are innately equipped with unique mental capacities for acquiring language. Language emerges from an interaction between minds and external events. The proportions of the innate cognitive contribution and the contribution due to empirically available patterns in the stimuli remain to be discovered. Methodologically, it is much harder to study performance data systematically, as this requires copious corpus‑collecting, and it is not a priori obvious what to collect and how to represent it. In transcribing the linguistic data input to a child, it is highly probable that the transcriber imposes decisions informed by his own knowledge, and thus the true raw material which a child processes is not represented. This difficulty contrasts with the study and systematization of adult linguistic intuitions, accomplished from the armchair. But the intractability of the data giving rise to adult linguistic intuitions does not imply that the only proper object of study is linguistic competence. Because language emerges from the interaction of minds and data, linguistics must concern itself with both phases in this life‑cycle.
This view of language as a cyclic interaction across generations between I‑language and E‑language, has been taken up by historical linguists. Rather than postulating abstract laws of linguistic change, they (e.g. Andersen 1973; Lightfoot 1999) appeal to principles relating the spoken output of one generation to the acquired knowledge of the next. Ihis is a healthy development. Historical linguistics, however, is concerned with explaining language change as it can be observed (or deduced) from extant data, either ongoing changes or reconstructed changes hypothesized from comparing related languages and dialects. Historical linguistics is not, in general, concerned with accounting for the emergence of modern complex forms of language from earlier simpler forms. As such, historical linguistics typically makes 'uniformitarian' assumptions (see Newmeyer 2002; Deutscher 1999 for discossion of uniformitarianism). By contrast, one task of evolutionary linguistics is to work out how modern complex linguistic systems could have arisen from simpler origins, using the cyclic interaction between spatio‑temporal data and acquired grammars as its central explanatory device. Ihis task has been undertaken from two quite different directions, by theorists of grammaticalization and computer modellers working with the 'iterated learning model' (ILM). I discuss them briefly below.
At the heart of the grammaticalization theory is the idea that syntactic organization, and the overt markers associated with it, emerges from nonsyntactic, principally lexical and discourse, organization. Ihe mechanism of this emergence is the spiralling interaction of the two phases of a language's existence, I‑language and E‑language. Through frequent use of a particular word, that word acquires a specialized grammatical role that it did not have before. And in some cases this new function of the word is the first instance of this function being fulfilled at all, in the language concerned. Clear examples are seen in the emergence of Tok Pisin, the Papua New Guinea creole. In Tok Pisin, ‑fela (or ‑pela) is a suffix indicating adjectival function, as in niupela'new' retpela'red,gutpela 'good' This form is dearly derived from the English nounfellow, a noun not originally identified with any particular grammatical function, other than those associated with all nonus. Grammaticalization occurs in the histories of all languages, not just in the creolization process.
Grammaticalization theory has largely been pursued by scholars concerned with relatively recent changes in languages (Traugott and Heine 1991; Hopper and Traugott 1993; Traugott 1994; Pagliuca 1994). In keeping with a general reluctance to speculate about the remote past, most grammaticalization theorists have not theorized about the very earliest languages and the paths from them to modern languages. Nevertheless, a recurrent central theme in grammaticalization studies is unidirectionality. The general trend of grammaticalization processes is all in one direction. Occasionally there may be changes in the opposite direction, but these are infrequent, and amply outnumbered by changes in the typical direction. It follows that the general nature of languages must have also changed over time, as languages accumulated more and more grammaticalized forms. Heine is one of the few grammaticalization theorists who has speculated about what this implies for the likely shape of the earliest languages.
.. on the basis of findings in grammaticalization studies, we have argued that languages in the historically non‑reconstructible past may have been differentin a systematic wayfrom present‑day languages. We have proposed particolar sequences of the evolution of grammatical structures which enable us to reconstruct earlier stages of human language(s).... such evolutions lead in a principled way from concrete lexical items to abstract morphosyntactic forms. [Ihis] suggests, on the one hand, that grammatical forms such as case inflections or agreement and voice markers did not fall from heaven; rather they can be shown to be the result of gradual evolutions. Much more importantly, [this] also suggests that at the earliest conceivable stage, human language(s) might have lacked grammatical forms such as case inflections, agreement, voice markers, etc. so that there might have existed only two types of linguistic entities: one denoting thing‑like time stable entities (i.e. nouns), and another one for non‑time stable concepts such as events (i.e. verbs). (Heine and Kuteva 2002: 394)
To stimulate discussion, I will be at least as bold as Heine, and offer the following suggestions about what earlier stages of human languages were like, based on the unidirectionality of grammaticalization processes. Ihe origin of all grammatical morphemes (function words, inflections) is in lexical stems. Ihis leads one to hypothesize that the earliest languages had: no articles (modern articles typically originate in demonstratives, or the number one); no auxiliaries (these derive from verbs); no complementizers (which may originate from verbs); no subordinating conjunctions (also likely to derive from verbs); no prepositions (deriving from nouns); no agreement markers (deriving from pronouns); no gender markers (deriving from noun classifiers, which in their turn derived from nouns); no numerals (from adjectives and nouns); no adjectives (from verbs and nouns).
In addition, I speculate that the earliest languages had: no proper names (but merely definite descriptions); no illocution markers (such as please); no subordinate clauses, or hypotaxis; no derivational morphology; less differentiation of syntactic classes (perhaps not even noun and verb); and less differentiation of subject and topic. All this is characteristic of (unstable) pidgins and reminiscent of Bickerton's construct 'protolanguage'; a crude pidgin‑like form of communication with no function words or grammatical morphemes. Still in the syntactic domain, Newmeyer (2000) has theorized that all the earliest languages were SOV (once they had the noun/verb distinction).
In keeping with ideas from grammaticalization theory about meaning, the earliest languages would have had, in their semantics: no metaphor; no polysemy; no abstract nonns; fewer subjective meanings (e.g. epistemic modals); less lexical differentiation (e.g. hand/arm, saunter/stroll/amble); fewer hyponyms and superordinate terms.
One can apply similar ideas in phonology. Probably the earliest languages had simple vowel systems and only CV syllable structure. See the next subsection for mention of computer modelling of the emergence of phonological structure, via the cyclic two‑phase mechanism of language transmission.
Computer Modelling of Langunge Evolution
Grammaticalization theorists work backward from modern languages, via known processes of linguistic change, toward earlier, simpler stages of language. By contrast, computer modellers of emerging language start from simulated populations with no language at all, and their simulations can lead to interesting results in which the populations have converged on coordinated communicative codes which, though still extremely simple, share noteworthy characteristics with human language. Some exarnples of such work are Batali (1998; 2002), Kirby (2000; 2002), Hurford (2000), Teal and Taylor (1999), and Tonkes and Wiles (2002). A survey of some of these works, analysing their principal dimensions, and the issues they raise, appears in Hurford (2002). Hurford refers to this class of computer models as 'expression/induction' (E/I) models; Kirby has rechristened this general class 'iterated learning models' (ILMs), a term which seems likely to gain currency. There is a noticeable trend in recent computer simulations of language evolution away from modelling of the biological evolution of features of the language acquisition device (e.g. Hurford 1989; 1991; Batali 1994). More recent simulations (such as those cited earlier in this paragraph) typically model the evolution of languages, via iterated learning. Such studies, moreover, do not typically attempt to 'put everything together' and reach a foll language‑like outcome; rather they explore the interactions between pairs of strictly isolated factors relevant to the iterated learning model (e.g. Brighton and Kirby 2001).
Language has not always existed. Hence there is a puzzle concerning what behaviour the first speakers of a language used as a model in their learning. Computer modelling studies have addressed this problem, using simulations in which individuals have a limited capacity for random invention of linguistic forms corresponding to given (pre‑existing) meanings. Massive advances in computing power make it possible to simulate the complex interactive dynamics of language learning by children and their subsequent language behaviour as adults, which in turn becomes the model for learning by the next generation of children. It is now possible not only to simulate the learning of a somewhat complex communication system by a single individual, on the basis of a corpus of presented examples of meaning‑form pairs, but to embed such individual learning processes in a population of several hundred individuals (each of whose learning is also simulated) and to simulate the repetition of this population‑wide process over many historical generations.
The cited research has implemented such simulations with some success in evolving syntactic systems which resemble natural language grammars in basic respects. ltis research can be seen as a step up from the preceding paradigm of generative grammar. In early generative grammar, the researcher's task was to postulate systems of rules generating all and only the grammatical sentences of the language under investigation. Early generative grammars were somewhat rigorously specified, and it was possible in some cases to check the accuracy of the predictions of the grammar. But, whether rigorously specified or not, the grammars were always postulated. How the gram mars themselves came to exist was not explained, except by the quite vague claim that they were consistent with the current theory of the innate Language Acquisition Device. The recent simulation studies, while still in their infancy, can legitimately claim to embody rigorous claims about the precise psychological and social conditions in which grammars themselves evolve.
This strand of computational simulation research has the potential to clarify the essentials of the interaction between (a) the psychological capacities of language learners and (b) the historical dynamics of populations of learners giving rise to complex grammars resembling the grammars of real natural languages. In such simulations, a population of agents begins with no shared system of communication. Ihe agents are 'innately' endowed with certain competencies, typically including control of a space of possible meanings, an inventory of possible signals, and a capacity for acquiring grammars of certain specified sorts on exposure to examples of meaning‑signal pairs. Ihe simulations typically proceed with each generation learning from its predecessor, on the basis of observation of its communicative behaviour. At first, there is no coherent communicative behaviour in the simulated population. Over time, a coherent shared syntactic system emerges. The syntactic systems which have been achieved in this research paradigm are all, of course, simpler than real attested languages, but nevertheless possess many of the central traits of natural language syntactic organization, including recursivity, compositionality of meaning, asymmetric distribution of regular and irregular forms according to frequency, grammatical functional elements with no denotational meaning, grammatical markers of passive voice and of reflexivity, and elementary partitioning into phrases.
There has been less computer simulation of the evolution of phonological systems, but what exists is impressive. De Boer (2001) manages to approximate to the distribution of vowels systems in the languages of the world through a model in which individual agents exchange utterances and learn from each other. An early computational study (Lindblom et al. 1984) can be interpreted as modelling the processes by which syllables become organized into structured CV sequences of segments, wLere the emergent selected consonants and vowels are drawn from economical symmetrical sets, as is typical of actual languages.
Computer simulations, within the iterated learning framework, starkly reveal what Keller (1994) has called 'phenomena of the third kind, and Adam Smith (1786) attributed to an 'Invisible Hand' Languages are neither natural kinds, like plants and animals, nor artefacts, deliberate creations of humans, like houses and cars. Phenomena of the third kind result from the summed independent actions of individuals, but are not intentionally constructed by any individual. Ant trails and bird flocks are phenomena of the third kind, and so, Keller persuasively argues, are languages. Simulations within the ILM framework strip the interaction between individuals down to a bare minimum from which language‑like systems can be shown to emerge. The key property of these models is that each new generation learns its language from a restricted set of exemplars produced by the preceding generation.
One of the most striking results of this work is this: in a population capable of both rote‑learning and acquisition of roles generalizing over recurrent patterns in form‑meaning mapping, a pressure exists toward an eventual emergent language that expresses meanings compositionally. No calculation of an individual agent's fitness is involved, nor does any consideration of the communicative efficacy of the language play a part. The convergence on 'efficient' languages is essentially a mathematical outcome of the framework, analogous to the hexagonal cells of honeycombs. At least some of the regular compositional patterning we see in languages is the result, not of humans having an inbuilt bias towards learning languages of a certain type, but of the simple fact that languages are passed on from one generation to the next via a limited channel, a 'bottleneck’. As Daniel Dennett has remarked (personal communication), this turns the familiar 'poverty of the stimulus' argument in relation to language acquisition on its head. The poverty of stimulus argument appealed to an intuition that human languages are learned from surprisingly scanty data. Work in the iterated learning framework shows that in order for regular compositional language to emerge, a bottleneck between the adult providers of exemplary data and the child learner is necessary. Interesting experiments show that in these models, overprovision of data (i.e. practically no bottleneck) results in no convergence on a regular compositional language.
These two strands of research, grammaticalization studies and computer modelling within the ILM, are at present quite distinct, and followed by non overlapping research communities. Computer modellers typically come from backgrounds in artificial intelligence, and kinow little Latin and less Greek (to put it kindly); grammaticalization theorists come predominantly from humanities backgrounds, and have difficulty conceptualizing computer models. These two research strands will ultimately converge. When they do converge, they should also converge on the attested facts of historical change and creolization.
In the last two decades, new techniques, such as gene sequencing, massive computer simulation, and the various brain imaging methods, have flashed light on intriguing features scarcely contemplated before. But these flashlights are highly selective in their illumination, each gathering reflections from only a few dimensions of the hugely multidimensional space of language structure and use. Language evolution research must continue to feed, voraciously and eclectically, on the results from a very wide range of disciplines. The study of language origins and evolution is harder than molecular biology, physical anthropology, or language acquisition research, for example, because at various levels it draws on all of these, and more. We now understand far more about questions of language origins and evolution than has ever been understood before. But precisely because we can now begin to grasp the nature of the questions better, we also know that good answers are even more elusive than we thought.
As background readings relevant to many of the issues raised under the heading of'pre‑adaptation'in this chapter, I suggest the chapters in the three edited collections, Hurford et al. ( I 998), Knight (2000), and Wray (2002). For work on grammaticalization and related theoretical positions, I suggest de Boer (200I), Hopper and Traugott (I993), Pagliuca (I994),and Traugott and Heine (Ig9l). For workon computational simulations of language evolution, I suggest Riscoe (2002) and Parisi and Cangelosi (2001).