Robin Dunbar
The Human Story
A new History of Mankind's Evolution
faber and faber 2004

pg 121

When did Speech Evolve?

If chimpanzees do not have the capacity for language and we do, when did language evolve? There are two approaches we can use to get at this question, though neither is entirely satisfactory on its own. The most obvious is to ask whether there are any anatomical correlates of language (or speech) that we might be able to identify in the fossil record. As it happens, there are, althongh they are somewhat indirect. The second is to exploit the relationships we found between neocortex size, group size and grooming time to ask when hominid group sizes would have been too large to sustain by grooming alone: that should be the point at which language had to evolve.

The first approach has been to examine some of the neural correlates of speech. One of these is the size of the hole through the bottom of the skull that the nerve to the tongue passes through. The size of this hole (the hypoglossal canal) reflects the size of the nerve, and the size of the nerve reflects the amount of work it has to do. Speech depends on precise articulation and this in turn depends on fine motor control of the tongue, jaw and lips in order to create exactly the right articulatory space in the mouth to produce particular sonnds. Humans have a significantly larger hypoglossal canal than any of the African great apes (chimpanzees and gorillas). More important, all the fossil hominids after the appearance of archaic humans (the first members of our species Homo supiens who appeared about 500,000 years ago) have hypoglossal canals that are similar in size to those of modern humans - and this indudes both the Neanderthals and the Cro-Magnons (our own immediate ancestors in Europe). In contrast, all the australopithecine skulls in which this feature could be measured have ape-sized holes. The real problem is that there is a dearth of suitable skulls from which we can measure the size of the canal in between these two phases of our evolutionary history, so it is rather difficult to place an exact date on the point of the transition other than to say that it occorred some time between two million and 300'000 years ago.

A second study, carried out by Ann McLarnon at the Roehampton Institute, focused on breathing control. Modern humans, but not living monkeys or apes, have a dramatic enlargement of the vertebral canal in the region of the thoracic vertebrae in the upper chest. The nerves from this region control the chest muscles and the diaphragm, and are thus important in the fine control of breathing that is necessary to produce speech. Speaking requires us to release a steady, slow exhalation of air over a much longer period than is necessary for simply breathing. None of our primate cousins can do this, and they lack the enlarged thoracic nerves that would be required to control it. Examination of the thoracic vertebrae of fossil hominids suggests that this very conspicuous enlargement of the vertebral canal in this region does not appear until the same sort of time period as the enlargement of the hypoglossal canal. Older specimens, including both australopithecines and Homo erectus, all have thoracic vertebral canals that are, relatively speaking, no larger than those of other monkeys and apes. But Neanderthals and early modern humans from around 80,000 years ago all have canals that are indistinguishable in size from those of modern humans. Once again, however, we are left hanging uncertainly about the exact date because there are no fossil vertebrae from the intervening period. One thing that we can conclude, however, is that, since both Neanderthals and early modern humans both had enlarged thoracic vertebral canals, the most parsimonious conclusion is that they inherited this from their most recent common ancestor - archaic Homo supiens who first appeared around 500'000 years ago.

Taken together, these analyses bracket the date at which speech evolved. The size of the thoracic nerve canal places the earliest possible date as some time after 1.6 million years ago (the last fossil in the sequence with an ape-like thoracic canal). Given that both Neanderthals and Cro-Magnons have modern­sized hypoglossal and thoracic nerve canals, the simplest expla­nation is that they inherited these traits from their common ancestor, archaic Homo supiens. Hence, the latest possible date must be the appearance of that common ancestor, around half a million years ago.

An alternative approach to this problem is to see what we can learn from the relationship explored in Chapter 3 between neo­cortex size and group size and the fact that the amount of time spent grooming in Old World monkeys and apes is a function of social-group size. I discussed this at some considerable length in my book Grooming, Gossip and the Evolution of Language. The essence of my argument is that if we take the relationship between group size and neocortex size in primates and apply it to the fossil specimens, we can use it to predict how group sizes change across time for all the fossil hominids. Then, using these group sizes, we can exploit the relationship between group size and grooming time in Old World monkeys and apes to predict how much time each of these fossil populations would have had to spend grooming if it was to bond its groups in the conventional primate way.

What these analyses tell us is that required grooming time remains well within the limits for living monkeys and apes right throughout the australopithecine period of our evolutionary history. Only with the appearance of Homo erectus does it begin to rise above the figure of 20 per cent of total daytime that marks the upper limit for living non-human primates, and even then the rise is at first very slow. It is not until we get to the appearance of the earliest members of our own species (archaic Homo supiens) 500,000 years ago that we find that the demand for grooming time has really taken off. It is only really at this point that the grooming time requirement seriously exceeds the limits we find in other monkeys and apes. The fact that this coincides rather nicely with the conclusions we drew from the anatomical evidence for speech reinforces the suggestion that language is a uniquely human trait.

In sum, then, it seems that speech (and hence language) must have been in place by the appearance of Homo sapiens half a million years ago, at least in some form. Whether this would have been language as we know it today is a moot point. A plausible interpretation of the evidence suggests that speech/language did not evolve suddenly out of nowhere (as many linguists have assumed) but rather developed piecemeal to fill the bonding gap left by grooming once group size exceeded the size that could be bonded in the conventional primate manner. This raises the possibility that language actually went through a vocal phase that was not linguistic - in short, one that was musical rather than verbal.

Laughter: the Best Medicine?

Language has clearly been hugely successful in allowing us to get where we are. But at the same time, there is something missing from the story that I have sketched out here. And this has to do with the way grooming creates a bond between two monkeys. Being groomed seems to have an extraordinarily relaxing effect on our non-human cousins. During grooming, the heart rate slows and the animal visibly relaxes. Indeed, if it is groomed for long enough, the animal can actually drop off to sleep. The reason why grooming has this soporific effect is that it seems to be remarkably good at stimulating the brain to release endorphins, the brain's own painkillers. Endorphins belong to the family of chemicals known as opioids: they have a very similar chemical structure to the more conventional opi­ates like opium and morphine, which explains why we get addicted to the latter so easily.

Experimental studies of monkeys have confirmed that grooming triggers a release of endorphins. Moreover, animals that are given artificial opiates lose interest in grooming; and, when they are given opiate-blockers (chemicals like naloxone that lock onto the opioid receptor sites in the brain and prevent the body's natural opioids from producing their analgesic effect), they become increasingly restless and seek out grooming. Whatever else it does, grooming produces a sense of relaxedness and wellbeing in its recipients and it seems to be this effect that is the proximate mechanism that enables grooming to act as a bonding agent. We do not really understand how this works, but it is very clear that grooming acts as the immediate reinforcer that allows partners to feel good in each other's company. In some way, this sense of wellbeing is transmitted into a willingness to support each other in conflicts. We seem to act in much the same way: we are more willing to support or help out those whose company we enjoy.

This raises a puzzle. What in human interaction provides the chemical kick that does the same work, so allowing language to act as a bonding agent? Speech itself lacks the direct physical intervention to stimulate the opioid system in the way that grooming or massage does. Of course, we do resort to grooming - or at least what amounts to grooming - in our more inti­mate relationships. But that kind of mutual mauling is rather conspicuously restricted to our more intimate relationships - in fact, precisely the circumstances when we abandon language altogether. This equivalent of grooming (petting, stroking, fondling) is something we do only with our most intimate associates: it is pretty much confined to mates, parents and children, less often to grandparents and one's very best friends, much less often still to more distant relatives like aunts, uncles, nieces or nephews and cousins and almost never to anyone else (except other people's babies). Such attention directed to one's doctor, teacher, pupil or work colleague, or worse still a stranger on the street, is likely to raise eyebrows at best, and produce a deep sense of outrage in the recipient at worst - and, these days, probably a court summons for harassment.

This brings up an interesting question as to why we find intimate contact between strangers so disconcerting. My guess is that it has to do precisely with the fact that we find close physical contact deeply arousing and that, for us humans as with the bonobos, there is only a very fine line between physical contact of this kind and fullblown sex: the one spills all too easily over into the other precisely because physical contact in a relaxed situation is emotionally arousing. So how do we create bonds of intimacy with those with whom we do not wish to have sex at the drop of hat? The answer, I am going to suggest, is that we make them laugh.

Laughter, when you think about it, is a remarkably odd behaviour. Chimpanzees have something vaguely like it, and this is indeed thought to be the origins of the human laugh. But it is pretty much restricted to play situations. When inviting or engaged in play, chimpanzees often give a series of quiet rapid pants with an open mouth expression (termed the 'relaxed open mouth'or ROM face). But this behaviour resembles only the most basic forms of human laughter, more the kind of genteel laughter that tinkles among the teacups in polite society or the kind that young children use when inviting play. Humans also engage in much more forceful kinds of laughter (as in 'belly laughs'), and they do it in a far greater range of circomstances than do chimpanzees (or, for that matter, young chil­dren). No one else in the animal kingdom laughs (at least, not in the extensive way that we do).

So what is going on here?

Well, think about what happens when you laugh, and particularly when you laugh heartily, letting go all your inhibitions and having a good old roar. You come out of it feeling . . . well, a little light-headed, certainly much more relaxed, and generally rather at peace with the world. Sound familiar? Well, of course it is: it is the endorphin story all over again. Laughter seems to be a good releaser of endorphins. And there is some indirect experimental evidence to support this. The evidence is indirect because it is difficult to measure endorphin prodoction directly (it requires the rather unpleasant procedure known as a lumbar puncture, in which a rather large needle is shoved forcibly into the space between two adjacent vertebrae). Most studies have therefore used pain tolerance as a more easily assayed measure. The logic is that, if endorphins are a part of the pain‑control system, then the more endorphins released the more pain you will be able to put up with.

My students Julie Stow and Giselle Partridge carried out two separate experiments to try to test this idea. In these experi­ments, we asked subjects to see how long they could keep a frozen wine-cooler sleeve on their arm. Once they had done this, they were shown a video clip from either a docomentary or a comedy programme, after which they were asked to try the wine cooler again. Subjects who were shown a boring docu­mentary showed no increase in how long they could stand the pain afterwards compared to what they had managed before. But those who watched a comedy were able to stand the frozen wine cooler for significantly longer than they had previously done. Moreover, the increase in tolerance that they showed was related to how much they had laughed during the video: those who laughed more were more tolerant of the pain than those who had laughed less.

Perhaps this explains another odd feature of our conversational behaviour - the fact that we seem to spend a great deal of our time trying to make each other laugh. It seems rather odd that we have a mechanism (language) specifically designed to allow us to exchange information, yet we seldom seem to use it for such a solemn purpose. Indeed, in all but the most excep­tional circomstances, we find it rather boring if those we engage in conversation insist on prodocing an unending stream of worthily stolid information. 'About that new red stop sign I noticed down at the crossroads ..' 'Uh-huh?...Now, which way did you say the bar was?' But, start talking to someone who feeds you one-liners or who peppers their conversation with witticisms and, all of a sudden, the bar seems to lose its magical appeal.

And this is exactly what we fonnd in a study of conversations carried out by Feroud Seepersand. He listened in on natural conversations in bars and cafes, every 30 seconds making a note of the topic being discussed, while at the same time keeping a record of when the individuals laughed. His results showed that r a pair continued talking about the same topic for significantly longer after one of them had laughed than if neither had laughed. Like grooming, it seems that laughter encourages you to stay put and continue the interaction with a particular partner. It floods the brain with endorphins and just makes you feel positively disposed towards the other person.

In fact, there is some recent evidence that puts all this into an even more interesting light. A study of patients with damage to different parts of their brains has revealed that a particular area in the right frontal lobe is crucial for the appreciation of humour. You can be missing almost any other bit of the brain (including bits on the leit side) and yet still appreciate humour. More extraordinary still is the fact that this indudes not just cartoons and other'visnal'humour but also verbal humour - the very thing we might suppose would be handled by the language areas in the brain's left hemisphere where our speech centres are located. People with this same crucial bit of the right hemisphere missing also show greatly reduced laughter and smiling responses. Interestingly, this part of the right brain also has direct neural links down to the amygdala in the limbic system, the part of the brain that is especially involved in processing emotions and emotional cues.

Laughter is a ritualised activity that is highly contagious. We rarely laugh when on our own: indeed, those who do so invariably attract adverse comment. As convention has it, only the mad laugh on their own; the rest of us laugh because others laugh or because social situations are particularly prone to triggering laughter. That's why canned laughter on TV gets us going when without it, sitting alone in our room watching TV late at night, laughing is probably the last thing we would bother to do. It is also why, when someone tells a joke in a foreign language, we will laugh heartily with everyone else despite the fact that we have not understood a word. It is this chorusing feature of laughter that attracts my attention here, not so much because of the laughter but because of the chorusing that it seems to involve.

It seems that, at some point during the course of human evolution, we borrowed the chimpanzee playface and its associated vocalisations and exaggerated them to provide the reinforcer for grooming at a distance. Since the brain areas involved in laughter and language seem to be very different - indeed, they are not even in the same hemisphere of the brain - laughter may well have evolved long before language. The fact that laughter is so contagious perhaps suggests that it was used in a kind of communal ritual alongside non-verbal vocalisations like conventional primate contact calls. Later, of course, the acquisition of language allowed us to use verbal constructions to stimulate laughter in others more effectively. Jokes, it seems, have a very ancient heritage, much older in all likelihood than anything else we do with language.

Trip the Light Fantastic

There seems to be something very fundamental (in a literal sense) about music and song. We rise to it emotionally in a way we seldom rise to mere words. Composers since time immemo­rial have recognised that they can stir our emotions by the way they order the sequences of toned sounds, producing now a sense of joy, now despair, or exciting us with the rhythms that set the feet a-tapping. There have inevitably been arguments as to whether this emotional manipulation is culture-specific or not. Do up-turns in tone make all humans joyful and down­turns make us all despair? Do major keys raise our hopes and minor keys dash them? Does fast music make us feel excited and roused, and slow music make us feel languorous? Or are these simply associations we have learned from the last dozen or so centuries of western music?

I am less interested in the answer to this question than in the fact that composers can manipulate our emotions at all, irrespective of the origins of the particular code they may use to do it. It seems to be a remarkable universal of human nature that we respond emotionally to music in this way, and that we are especially prone to do so when we do it in groups. Communal singing, as almost all religions have long recognised, seems to have a particularly strong emotional hold on us.

Why should music do this to us and what role has it played in the story of human evolution?

The answer to the first question is still shrouded in mystery. But it seems that some musical tones do trigger off deep responses somewhere in the brain. Aside from the more obvious activity to be expected in the auditory cortex where all sounds are processed, the main responses are in the right hemisphere and in regions in the evolutionarily more ancient limbic system. Since that is the opposite side of the brain to where language has its main centres (the leit hemisphere), it seems plausible to infer that music and language have had separate evolutionary histories. Indeed, the deeply emotional stirrings generated by music suggest to me that music has very ancient origins, long predating the evolution of language, and this per­haps gives us a clue as to how we might answer the second question on music's role in our evolutionary history.

The answer, I think, lies in the fact that something similar to non-human primate contact calling must have bridged the gap between the first rise in group size above the conventional non­human primate limit (about 60-70 individuals) and the rise of true language (once group size had exceeded around 120). Given what we know both about primate contact calls and their r~ use in choruses and about music, it seems increasingly likely to me that it was singing that filled this gap.

Singing is a form of vocal activity that lends itself to multi­tasking and the double use of time. We still do it. From the unique women's wanlling songs of the Outer Hebrides to sea­men's shanties, and from the marching songs of armies to foot­ball fans singing on soccer terraces, singing rouses emotions and binds members of the group while they are engaged in some other activity that prevents more intimate forms of contact. Of course, singing also helps while away the time and makes a hard task more bearable. But ask yourself: how does it achieve this? It is surely not jost by keeping the mind busy while the hands haul on the ropes! My guess is that it's because com­munal singing triggers the release of endorphins and it is these that make the work seem lighter.

That endorphins might in fact be involved has been known for some time. In one experiment, subjects listened to tapes of music, and indicated when they felt a thrill of excitement at a particular passage. The pattern of thrills was quite consistent from one day to the next for any individual subject, even though, as we might have expected, people varied enormously in terms of which particolar passages triggered these thrill episodes. However, if subjects were given an injection of nalox­one (the same antidote to endorphins that blocks monkeys' sat­isfaction responses to grooming) between successive auditions, they failed to show such marked thrills on the following audi­tion as they had on the previous control exposure. Those who had an injection that contained nothing but saline fluid exhibited no differences between successive auditions. This is strong circumstantial evidence that endorphins are involved.

Why and how singing has this effect is as yet a complete mystery. Very little work has been done in this area to date. Nonetheless, the prima facie case for this hypothesis is very strong. And it feels right. Of particular significance here is the fact that we can indoce these emotional effects by music alone without the need for any words. Wordless songs and the pure tonalities of musical instruments produce the same effects as the most rousing lyrics. Gregorian plainsong of the Catholic monastic tradition provides a particularly obvious example of this. It is the sonnds of harmonious chant that we find so compelling, not the words - especially given the fact that most of it is in ancient Latin and not understood. In fact, so important is the sound and so unimportant are the words themselves that during the early polyphonic period of European music (around the twelfth to thirteenth centuries) composers often did not bother to pay too much attention to how they used the lyrics provided by poems or a text from the Bible. It was by no means uncommon for the lyric to end mid-sentence - sometimes, even halfway through a word! - if that suited the music better.

These observations allow us to make sense of the kinds of phenomena we see in contexts like Pentecostal church services. Here, the musicians, choir and minister create an increasingly intense and rousing musical torrent that gradually draws the congregation, one by one, into the flow of the excitement until everyone is waving their arms, jigging their bodies, and burst­ing into 'Amens!' and 'Hallelujahs!' at appropriate moments. Some even appear to get carried away into trance-like states. So compelling is the music, that it is difficult even for sceptical non-believers to resist joining in - just as it is diffficult to sit still while listening to an Irish jug band playing reels and jigs in a pub.

My guess is that very early on, song became wrapped up with dance. We seem to respond with especial enthusiasm to the rhythmicity of dance, and dance is of course widely used in the rituals of both traditional societies (think of the trance dances of the !Kung San bushmen) and advanced religions (think of the way the priests danced before the Ark of the Covenant in the Judaism of King David's time, and still do, more than two and a half millennia later, in the swaying dance of the dabtara or deacons in the Ethiopian Coptic Chorch). Indeed, dance has been exploited very specifically to indoce states of euphoria and trance among the 'whirling dervish' sect of Sufi Islam: in this case, the dancers spin round in unison - an impression that is exaggerated by the long white over-garments they wear - until they collapse into a trance-like state.

Are these trance states some kind of self-induced opioid high? Is this why we so enjoy dancing, a phenomenon that probably ranks, along with smiling and laughter, as one of the most futile of all human universals? Were dance and singing, and perhaps the rhythmic clapping of hands that so often accompanies both of these, an early supplement to physical grooming that allowed Homo erectus to enlarge its groups beyond the limit imposed by the immediate time constraints on grooming?

Music-making using instruments was presomably a much later invention, one that occurred long after the rise of singing and perhaps even tens of thousands of years after the appearance of speech and trne language. The vast array of musical instruments with which we are now familiar is, of course, of very recent origin. Examples of stringed and brass instruments, as well as drums, do not appear in the archaeological record much before a few thousand years BC. However, simple wind instruments of the flute or recorder type have a much more ancient history. One beautiful example carved from deer bone was found in the accumulated debris of a Cro-Magnon occupation site in a cave floor in France dating from some 30-40,000 years ago. As it survives, this instrument has four holes on the front and two on the reverse, and was clearly designed to play a pentatonic scale (which it still does quite admirably). Another flute, made out of cave bear bone, was fonnd at a Neanderthal site in modern Slovenia dated to 30000 years ago. Reconstructions made from original materials (real cave bear bones) play well and a competent flautist can coax out of these instruments almost the full range of sonnds that can be produced from a modern recorder. Making these instruments using contemporary tools is a laborious business, so our prehistoric ancestors must have viewed the effort especially worthwhile.

We clearly differ from our ape and monkey cousins in our use of language. However, many of the core features of language, and the associated non-verbal components that make conver­sation possible, bear important similarities to the kinds of social communication we find in other primates. That we use language to exchange complex technical information is undoubtedly important, but it seems likely that this was a relatively recent development. Speech and language evolved to enable us to bond social groups that were getting too large to bond by conventional primate social grooming. We seem still to use it mainly for these purposes. Moreover, in order to enable language to do this job effectively, we have to draw heavily on some non-verbal features (laughter and music) that take us straight back into the chemical processes that underpin grooming. However, with laughter and music, we are at last beginning to find elements which, if not uniquely human, do at least find expression among hurnans with a frequency and intensity that are perhaps unique.

Language and music raise for us another important feature of human nature, namely the whole complex business of culture. If culture can be said to be the hallmark of humanity, then language might be said to be its handmaiden. But what is this thing called culture? And are we the only species that can lay daim to it?

pg 159