Friday, January 2, 2009

My views on the NATURE-NURTURE debate – Part 5


Is an anti-nativist stance logical?

I should say at the outset that Chomsky and all the other nativists implicitly acknowledge that they could be wrong. This follows from the fact that they defend nativism from an empirical standpoint, which means to say that they infer their thesis from the ostensible fact that all the evidence seems to point in that direction. To illustrate this further, it is important to consider the distinction between analytic and synthetic statements.

If the nativist’s hypothesis were an analytic statement, then they would not even need to provide evidence for their case; we would be forced to accept it based simply on logic. One cannot call one’s thesis an empirical one, and still insist that it is the only logical one. In fact, this is what distinguishes an analytic statement from a synthetic one.

An analytic statement is a statement that merely paraphrases the subject of the sentence; in other words, there is no additional information that is provided by the predicate that was not to be found in the subject. Real-world knowledge plays a minimal role in this regard, since all one needs to establish the veracity of the statement is to have just a rudimentary understanding of the terms in the statement.

Consider an analytic statement like Bachelors are unmarried males. What makes this an analytic statement is:

1) The fact that there is no additional information about the world; the predicate says nothing more than what is in the subject, and
2) If we were to negate this sentence, it would lead to a logical contradiction; in other words, we cannot imagine a possible world where the converse of this sentence can be true without entailing a logical contradiction. It is meaningless to say Bachelors are married males or Bachelors are unmarried females.

A synthetic statement, on the other hand, tells us something about the subject of the sentence which is over and above a mere definition/paraphrase. We have to also know something about the state of affairs in the real world in order to ascertain the veracity of a particular statement. For example, with the statement There is a bag of potatoes in the pantry, one of the truth-conditions that would make this sentence true is if there is actually a bag of potatoes in the relevant pantry. To take one more example: Freddie Mercury is a singer. For this sentence to be true, there would have to be a person by the name of Freddie Mercury, and he would have to be singer; if those conditions are met, then this sentence would be true.

What makes these sentences different from analytic sentences is the mere fact that we can imagine a possible world where the converse is true, without there being any logical contradiction. So even though it does not make sense to imagine a world where the statement Bachelors are married males/unmarried females is true, it would make perfect sense to imagine a world where There is no bag of potatoes in the pantry, or where Freddie Mercury is not a singer are sentences that happen to be true. So when I utter a synthetic sentence, like There is a computer in my room, what I am also saying, ipso facto, is that there is a possibility that there may not be a computer in my room; there is no contradiction in terms in saying There is no computer in my room.

So when the nativists say There is a Language Acquisition Device [LAD] in our minds which enable us to learn language, it is a statement of the non-analytic kind. We know this because they deem it necessary to quote empirical evidence for their stance, and conclude that all the evidence points towards their theory. If this statement were analytic, it would make no more sense to try and show evidence that the statement Bachelors are unmarried males is true.

It seems quite obvious that the nativist’s hypothesis is an empirical one, and that a statement like Humans possess an LAD is a non-analytic statement. However, some scholars, like Jerry Fodor, do actually try and argue that their manifesto is analytically based. Even Pinker sometimes confounds the analytic-synthetic distinction when he argues that ‘mentalese’ (the language of thought) cannot not exist, and that if it did not, we could not possibly learn a language, translate between languages, etc. Despite proposing that it is logically untenable to not postulate mentalese, he still dedicates an entire chapter in Pinker to argue for its viability. Once again, let me reiterate, statements that are analytically true need neither empirical verification nor empirical argument.

So despite Fodor arguing analytically, and despite Pinker blurring analytic-synthetic arguments, it is otherwise evident that the nativist manifesto is a synthetic one, which means that they acknowledge that there is no contradiction in the converse being true, viz. There is no LAD in our minds that enables language learning to take place.

As Stephen Cowley (in a paper published in 2001)tells us, Pinker, joining the other nativists, merely assumes his “radical, nativist theory is the only alternative to behaviourism”, and “that the work of Vygotsky, Piaget, Stern, Trevarthen, Bruner, Wertsch, Cole and Nelson [and Sampson] – to name a few – lends little understanding to language development.” This blinkered view is obviously unjustified.

When I asked what his thoughts on Educating Eve were, Pinker dismissively told me that he has not looked “the Sampson book” because he was too busy writing other books! Either Pinker is being deliberately blinkered here, or he is saying that The Language Instinct needed to be updated anyway, but when I asked him if the latter was the case, he said: “That’s not what I’m saying at all!”. It is evident, then, since this is not the case, that criticisms should not be ignored.

The problem with prematurely accepting nativism

Research necessarily cannot be neutral and objective, because it is based on observations; and one’s observations are always theory-laden. When this is acknowledged at the outset, and one’s assumptions are made clear, then meaningful scientific inquiry can be conducted. However, when a theory is accepted prematurely as axiomatic truth, it can lead one to force interpretations onto data that one is not even aware of (as I believe Pinker is guilty of doing) [as in Pinker’s biased representation of the SLI data mentioned above]. Many red herrings and pseudo-problems may present themselves. A pseudo-problem is one that arises solely as a consequent of one’s theoretical assumptions, which may never have even been an issue had one been working in another framework. One such pseudo-problem arises as a result of Chomsky drawing a distinction between competence and performance. It is said that to postulate an innate predisposition for language in the mind is “an unnecessary hypothesis forced upon one by the previous postulation of the concept of competence”, to quote David Crystal. If the notion of competence had not been proposed to begin with, there would be no need to rephrase the question of “How do we acquire language?” as “How do we acquire competence?” The danger is that of “imposing too much grammatical structure on the mind of the child: it is very easy to get into the habit of talking about certain structures as being innately present, the only reason for wanting to do so being that such structures have an important role to play in the analysis of adult grammar.

For Popper, scientific progress is based on problem-shifts. These problem-shifts can be progressive, or degenerative. A progressive problem-shift occurs when we update our hypotheses in light of new evidence, without tenaciously holding on to our theoretical constructs. A degenerating problem-shift can occur when one maintains a particular thesis, despite evidence to the contrary. I contend that this is the key problem with nativism. The nativists may be correct in their claims, but at the moment there is plenty of evidence, or other problems in terms of methodology, that run counter to what they claim, and this is what the central thesis of this dissertation is.


Response to the “speed of acquisition” thesis

As scientists, one of the first things we need to isolate is potential falsifiers. If any hypothesis does not have falsifiable status, it cannot possibly have scientific status. With regards to the speed of acquisition thesis, it is not at all clear what state of affairs would contradict this thesis: how slow would acquisition have to be if we did not have an LAD which Chomsky postulates? It is not at all clear that having an elaborate linguistic structure hard-wired somewhere in our brains would yield a different time-frame than the theory of language-learning outlined by Sampson.

Aside from this problem, there is the fact that Chomsky compares two very different things when drawing an analogy between language acquisition and other kinds of learning. A more apt analogy would be to compare meta-linguistic skills to that of the physics student, and to compare our unconscious linguistic knowledge to that of certain expectations regarding the laws of physics.

To illustrate: any competent L1 speaker of English will agree that John said that he was sick is an ambiguous sentence, and yet the ambiguity disappears when you say He said that John was sick. Very few people will be able to explain why this is so, because their knowledge regarding these things is largely unconscious; the linguist, on the other hand, ought to have no problem in explaining it in terms anaphor binding. Likewise, most people would “know” that once you burn a match-stick it would not be possible to somehow un-burn it, such that it reverts to being the way it was before it was burnt. Once again, most people would not be able to give an intellectually satisfying explanation as to why this is so. The physicist, on the other hand, ought to have no problem in explaining this phenomenon in terms of the third law of thermodynamics, which states that entropy can only increase with time.

One cannot compare tacit knowledge of language with overt knowledge of physics. A more commensurable analogy would be to compare meta-linguistic knowledge (how to draw phrase-structure trees, how to depict things like question-formation via transformational rules…etc.) with knowledge of physics (how to calculate the gravitational pull between two bodies of mass… etc.), neither of which could possibly be said to be innate. Tacit grammatical knowledge would be more commensurable with certain tacit expectations like object-permanence, time running ‘forward’ not ‘backward’, etc.

Sometimes Chomsky argues that it would take an indefinite period of time to learn language if not for our LADs, and it would therefore not even make sense to talk about language acquisition. I cannot see the force of this argument, since Chomsky seems to be treating what ought to be an obviously empirical hypothesis analytically, which as we have seen is something that simply cannot be done (cf. section 5.1.). Note, this is not the same as having innate dispositions. Chomsky’s LAD is far more specific, and makes a much stronger claim than Popper’s belief that we are innately predisposed to learn certain things. A disposition, on the other hand, is not restricted to language. As mentioned at the outset, the fact that humans are genetically endowed with certain faculties is not at all in dispute – it is a question of how much is innate. An innate disposition is not the same as having an elaborate, abstract, and very specific LAD hard-wired into our minds.

Response to the “stages of acquisition” thesis

Firstly, it is not necessarily a fact that language is acquired in the same stages all over the world. From a Popperian point of view, the sensible thing to do would be to try and find instances where language was not learnt in these stages, where the babbling stage was skipped for some reason, etc., but this has not been done. Of course, from a practical point of view an experiment of this sort would take about three years at least, but if the nativist researcher really cared about the veracity of his claims, it would indeed be a worthy endeavour; after all, many theorists have spent numerous years trying to teach primates to use sign language – unsuccessfully.

Pinker makes the claim that each preceding stage is necessary in order to progress onto the next stage. Rosenbek et al., in a book entitled APHASIA: A CLINICAL APPROACH (published back in 1989) document instances of babies who were forced to skip their “babbling stage” due to problems like severe tonsillitis, yet acquired language without a problem; this shows Pinker’s claim to be counter-factual.

Nevertheless, back to the point: if instances that are not in consonance with one’s hypothesis are found, we would have to update our hypothesis; if not, we can tentatively assume that the hypothesis holds true. David Crystal, though without any references or data, claims that the stages of acquisition are not universal, and therefore cannot be part of our biological endowment. Even Fromkin and Rodman, in an introductory textbook for Linguistics freshman, say that “Observations of children in different areas of the world reveal that the stages are similar, possibly universal […]” (my italics). Why do these evidently pro-nativist authors use the word ‘similar’, not ‘same’, and why are the stages ‘possibly universal’? This seems to be a tacit concession that this strand of their argument is flawed. Victoria Fromkin, by the way, was a student of Chomsky’s, and has always been a sycophantic follower his – according to Dr. N. Thwala (former head of the School of Literature and Language Studies at the University of the Witwatersrand), who studied with her and knew her personally.

Nonetheless, even it was true that these stages occur universally, it still does not make a good case for nativism; let us consider a parallel case:

Karate all over the world is learnt in a specific order: when learning to do karate, the beginner starts with basics, then incorporates these basic moves into a pre-set sequence of movements (called a kata), then moves on to basic three-step sparring, then to one-step semi-free sparring, then finally to free-sparring. This is literally true of every style of karate found around the globe.

Now, given that there are so many different varieties of karate all over the world, all started by different karate masters, why would each and every style follow the same steps when learning to do karate? I was quite intrigued to learn that even those who practise in the Kyokushin-kai school of karate learn it in essentially the same way. This is interesting because this style of karate is quite different from the others in the sense that it is a full-contact sport (also called “knock-down karate”), and therefore it is a lot less about an art form, and more about brute strength. Nonetheless, it still conforms to the general schema of karate-learning.

Also, let us take this one step further. Imagine a world where, for some reason, every style of karate was wiped off the planet. If people were to develop a fighting technique analogous to what karate was/would be, it is unlikely that it would be learnt in any order other than the (progressive) order that has always been. Even though this is speculative, the fact still remains that the various styles of karate all over the world conform to the aforementioned schema, and if it mattered to us and we wanted to explain this pattern, how would we do so? How would we explain the fact that this happens, seemingly universally? Would it make sense to say that we would have to be either at a loss or be willing to postulate a massive coincidence, if we are not prepared to accept some sort of innate karate gene?

Of course not, since there are other more sensible reasons for karate having to be taught and therefore learnt in this given order: quite simply, the respective steps are progressive, which means that the first step forms the basis for the next step, and so on. It does not make sense to go on to learning something more advanced when the simpler, more basic steps have not been mastered. Students of karate cannot learn to do flying kicks from their first lesson for obvious reasons. It is for the very same reason that a freshman would not be able to produce a qualitatively adequate Ph.D thesis, even if he were given adequate time and resources. A two-year old child cannot speak with the proficiency of an adult because he has not learnt how to – he has not had sufficient interaction with speakers of the language. Is this not a more reasonable explanation than the idea that the child cannot speak proficiently yet because his LAD has not yet instigated his genetic blueprint to move from the initial state to the steady state?

Why, then, is it a surprise that children do not speak fluently from the beginning? Simply because they need to master the basic steps first, and as with everything else, the fact that there are basic steps, which progress in a specific order, should really not be a surprise. One could object to this argument on the grounds that karate is consciously taught, whereas language is unconsciously acquired. Anecdotal evidence shows that children are actually taught to certain degree. Pinker acknowledges this as part of the explanation as to why hearing children of deaf parents do not acquire language efficiently by watching TV and listening to the radio. They need to interact with real, live speakers, because speech per se will not suffice without the concomitant actions, according to Pinker. (This is also untrue, though, as I have a niece who now speaks fluent Hindi solely from watching Bollywood movies – both her parents speak English to her, and she has never been exposed to Hindi in a real-world context).

Anyway, in various works, like his book EMPIRICAL LINGUISTICS, Sampson presents data showing that language learning can be construed as a graded series of language lessons. If this is true, it has to be acknowledged that language learning is not as passive and ‘unconscious’ as is commonly claimed.

It is therefore evident that since we do not think it viable to postulate a karate gene because karate is learnt in uniform stages across the world, we are, by the same logic, also not obliged to accept the claim that there is some kind of unfolding of a genetic blueprint because language is learnt in the same stages across the world (barring the fact that even that is contentious).

Response to the “critical period” thesis

The critical period thesis can be contended on many grounds. Firstly, let us briefly consider what one would need to do in order to test this hypothesis. There would need to be two individuals, of the same age, and with identical genes; ideally, identical twins would suit us best. Then, one would need to serve as a control ‘group’, and the other would need to serve as the experimental ‘group’. We need to then keep them apart in identical environments for the same amount of time, and manipulate the relevant variable, that of linguistic input. In other words, the control would have a normal upbringing, and the experiment would be normal, aside from the fact that nobody would be speaking to him. Of course we need to decide beforehand when this alleged critical period will end, so if we take it to be the age of ten, then we would, after ten years, put the twins in the same environment, and see if the experimental ‘group’ is able to acquire language competently. How long we give him to do so will depend upon his progress, but obviously with a maximum of ten years. If he is not able to acquire language, we may conclude that it may be because he has passed his critical period for language acquisition; if he does manage to acquire language, it could either mean that the critical period ends later than ten, or that there is no such thing as a “critical period”, in which case more research needs to be done.

The problem with this is blatantly clear: no ethics committee will allow research of this kind to be done. Aside from the fact that it is almost impossible to test this properly, there would still be problems even if we did conduct an experiment of this kind. We could not preclude other factors, like the fact that the experimental ‘group’ may feel somewhat neglected for not being talked to, and may even be rather traumatised because his/her brother/sister was talked to, and can converse so proficiently. This could lead to resentment, lack of motivation, etc.; the possibilities are endless, and an experiment of this kind is therefore both impractical and unethical.

Despite the fact that there are so many problems associated with this, the nativists use the example of Genie as evidence for the existence of a critical period as if the other factors did not even need to be considered. Obviously, any human being going through an experience like this would end up very traumatised, and therefore cognitively impaired in more ways than one. Despite this, Pinker and the other nativists ignore this fact; in fact, Derek Bickerton even says that there is no compelling evidence that Genie suffered any “lasting damage”. If this were true, it would eliminate some of the confounding variables, but most rational people would agree that it is not. As mentioned earlier, Genie grew up under rather traumatising circumstances, and such prolonged abuse would affect her development, cognitive and otherwise, in an adverse manner.

It is a well-known fact that learning generally slows down the older you get, and from a biological (or evolutionary) point of view, it would make sense that this would happen around the age of puberty. Up until puberty, we learn the general life-skills necessary for survival, and at around the age of thirteen, we are ready for autonomy. The human body is ready to bear children, and according to the psychologist Kohlberg, even our moral standards, by which the rest of our lives are governed, are established around this age, etc. It is not the case that these things are unchangeable, it just makes sense to establish certain things as standard so as to have a basis to live your life, reproduce, and teach your offspring to survive. Why should language be any different? I would find it difficult to leave a live human being out in the snow for polar bears to consume; an Eskimo does not – it is common practice to do so with invalids as they compromise the mobility (and therefore the survival) of the tribe. If I were raised as an Eskimo, and understood the reasoning behind it, I would accept this without any qualms.

My point is simply that what one was raised with is what has become part of one’s norms, and unless there is a good reason to change the status quo, it should not be changed. If learning a new language is perceived as pointless, it will obviously not be assimilated easily into one’s schema, and vice versa. Hence, motivation, or lack of it, is a more obvious explanation as to why adults do or do not learn language as efficiently as children do.

It is a fact that the ability to learn in general slows down with age, and that we are particularly predisposed to learn things faster up until puberty. For example, it is generally accepted that if a child starts karate before puberty he learns faster, and progresses more quickly than children who start later on. Some contend that this is because younger children (i.e. pre-puberty) are more supple, and generally more enthusiastic. Now, if this actually were the case, as it seems to be from casual observation, would we want to postulate a “critical period” for learning karate? Of course, the very thought is ludicrous. As Pinker acknowledges, brain metabolism is particularly high during these years. Human life is divided into five stages: infancy, toddlerhood, childhood, adolescence, and adulthood. During the first three stages, humans are generally helpless and in need of care. During adolescence, humans develop autonomy, and could quite easily survive on their own. During this stage, humans also peak in all their faculties: their sexual desire, metabolism, curiosity (intellectual and otherwise), etc., is at an all-time high. From a biological point of view, adolescents are designed to break away from dependence on their parents, find a mate and start a family of their own. Hence, it is neither surprising nor a coincidence that children around this age feel an irresistible urge to rebel against their parents and other authority figures. Aside from the prima facie plausibility of this theory, other leading biologists also agree.

Now, given that this is the case, it makes sense that we would be predisposed to learning things faster up until this stage of life. If we think back to the days prior to civilisation, where formal schooling, a tertiary education, etc., would not have existed, humans probably did break away from their parents and form their own families at around this age. In fact, some contend that that is the very reason vestiges of initiation rites exist in many of the world’s practices; and older traditions less influenced by the dawn of civilised man still practise initiatory rites around this time for the sole purpose of allowing adolescents sexual passage, and in so doing allowing them to break away from the family and form their own. The details are not important, but the point is simply that human beings as a result would have had to learn all the necessary life-skills in order to survive. This would include being able to identify potential predators, being able to identify members of rival clans, being able to hunt effectively, being able to recognise and work with members of one’s own clan in order to attain a common goal otherwise not attainable, etc. Failure to develop these skills would result in failure of that given clan/family to survive. Hence, natural selection would have to see to it that they are able to pick up these skills quickly, and able to put them to use at the appropriate time. As we have seen, this ‘appropriate time’ is around the age of puberty at the onset of adolescence. It so happens that the acquisition of language is one of these skills, but that is not surprising given the profound survival advantage it gives.

Why would this general predisposition drastically slow down after puberty? Simply because it is energy consuming. It makes sense to not have this kind of predisposition for the rest of one’s life because having such a high metabolic rate would compromise functioning in other arenas, thereby stopping one from carrying out certain tasks effectively and ultimately compromising survival.

This explanation seems more plausible than postulating a critical period unique for language acquisition. Neither Chomsky nor the other nativists give us convincing reasons to believe that this phenomenon occurs in order to facilitate the process of language acquisition per se. Hence, the “critical period” thesis is not tenable.

Response to the “poverty of input” thesis

Chomsky takes the instance of yes-no question formation in an embedded clause to be confirming evidence for his poverty of stimulus thesis, but this is clearly an example of language being structure-dependent, and therefore becomes a question of whether structure-dependency is a matter of innateness or not. This issue will be discussed in more detail later.

As for the other evidence Chomsky cites in support of this particular hypothesis, Sampson gives numerous examples of documented studies which indicate that Chomsky’s claim that children are not corrected on grammatical issues, but matters of ethics, is quite simply false. I clearly remember as a very young child being overtly corrected on grammatical matters; at this very moment I know for a fact that my nephews and nieces are almost always corrected when they utter grammatically deviant statements (I obviously would not know about the times when I am absent, except to take the parents’ word that they “have to learn how to speak properly”). Aside from this, I am continually told by my students, year after year, that their “little sister” or their “nephew from Cape Town” was always rigorously corrected whenever he/she spoke in a deviant manner.

This is, of course, rebutting anecdotal evidence with even more anecdotal evidence. The critic would obviously say that if I have a problem with Chomsky’s use of anecdotal evidence, it is hypocritical to use such evidence myself. Of course, this is a fair criticism. However, Chomsky is the one making a novel claim, and hence the burden of providing systematic, empirical data lies on him. If his method of putting forth his argument is flawed, then the conclusions drawn from that argument should also be flawed, or at least not taken at face-value – or as axiomatic truth, which is what Chomskyan tenets have become. Nevertheless, I would predict that if a systematic study had to be done on parental feedback, it would show that parents do, in general, care whether their children’s utterances are grammatical or not. This is based on the assumption that all the anecdotal claims, like the ones mentioned above, are veracious and worthy of investigation. Indeed those who have done studies of this kind show Chomsky to be either distorting the facts, or simply mistaken. Aside from Sampson’s 2000 paper, which documents research of this kind from various corpora, other scholars have so as well. Academics like Roger Brown and William Labov have been documenting child-parent interaction since the early 1970’s, confirming the common-sense view that parents do indeed systematically correct their children when they speak in a deviant manner.

One could object to this by saying that the point is not whether children make mistakes sometimes, or whether they are corrected by caretakers sometimes, but why children never make certain mistakes despite never being told not to. This is a valid point, and will be duly dealt with below.

Perhaps the main argument that grammar must arise in the individual from some special-purpose device, genetically coded and neurobiologically expressed, is that grammar is too arbitrary, subtle and quirky to arise otherwise. But if the influence on language acquisition is not only the language the infant hears, but all of narrative imagining, including all of the systems from which narrative imagining recruits, there would actually be an overabundance of sources for subtleties and quirks – without having to postulate a special device to introduce them, as per Mark Turner’s theories in books like THE LITERARY MIND.

Hence, the assumption Chomsky and the other nativists make in terms of poverty of input is somewhat inaccurate.

Response to the “convergence amongst grammars” thesis

Aside from the fact that it is indeed somewhat counter-intuitive to think that a rather well-read, well-educated, and rather eloquent orator has a level of language proficiency tantamount to that of a man with, for example, no formal education, Chomsky would say that the disparity lies in their performance, not their competence. In fact, it is not at all clear-cut what Chomsky means by these terms, since he uses them differently at different times.

Sometimes he uses the term ‘performance’ to mean what a person actually does, with ‘competence’ being whatever mental abilities enable him to do what he does. At other times, he uses ‘performance’ to include perceptual strategies and psychological processing abilities. A set of perceptual strategies is not part of what one does in terms of actual linguistics behaviour, and therefore should be considered as part of one’s of one’s competence. In the latter sense of performance, processing abilities and perceptual strategies are taken to be part of a human being’s general mental abilities, rather than being part of some particular language, like Zulu or Afrikaans. The ability to speak this particular language is referred to as competence. But then, Chomsky also speaks of particular languages having “performance rules”; for example, he speaks of free word order as being determined by “rules of performance”. Here, part of the grammar of some language, then, would fall under the rubric of performance, not competence. Now competence is defined as covering certain language-particular rules. So when Chomsky says that linguistics ought to concern itself with the study of competence, not performance, it is not even clear what he means; if free word order is determined by “rules of performance”, does it mean that the study of it is outside the domain of Chomskyan linguistics?

Nevertheless, let us ignore the problems inherent in Chomsky’s equivocal use of terminology. The question still remains: how would we measure the language proficiency (call it what you like) of a sample of members in a given community in a way that would show Chomsky’s assumptions to be correct?

To show that speakers in a given community, regardless of formal education, social background, etc., would be equal in terms of their language proficiency, what would we need to do? We would need to take a sample representative of a certain community and show them to be of equal linguistic proficiency. We would need to show that aspects of their grammar converge on points we have no viable reason to believe originates in environmental speech input. If Labov’s work is anything to go by, it is quite evident that native speakers differ systematically in their judgements on various aspects of grammar. If we take this to be performative, and therefore not indicative of competence, perhaps we could observe naturalistically the members of our speech community, and based on the data we gather draw conclusions on their various language proficiencies. But to do this would be to revert to Bloomfieldian linguistics, according to which the linguist makes generalisations from the corpus. As we know, Chomsky would be quick to remind us that this only allows us to make generalisations about performance.

As is evident, if one had to analyse the performance levels of various speakers in a community, most people would expect that there would be various levels of language proficiency based on numerous factors. To claim that this does not count as viable counter-evidence is little more than an objection to rationality, and a surreptitious way of immunising the hypothesis from refutation (some may notice that I am using Popperian terminology here).

If the relevant methods of investigating this in a scientific manner are not going to count, then we need not treat this as a scientific hypothesis, and therefore need not take it seriously.

Response to the “argument from universals”

Chomsky’s main argument for rationalism is by and large based on the existence of complex linguistic universals. Chomsky always cites examples of putative universals from transformational grammar, but just about every other theory of grammar that has been proposed has incorporated claims for extremely complex and sophisticated linguistic universals. Transformational grammar is typically seen as rationalistic, whereas structuralism is typically seen as behaviouristic, and as an upshot claim made from a structuralist’s perspective are stigmatised and not given due consideration. According to Lakoff (Parrett, 1974, pp. 170-171), Chomsky’s theory about the organisation of language, deep structures, transformations, etc., is consistent with strict empiricism, whilst structuralist linguistic theories about the organisation of language are consistent with rationalism. Hence, one doing research on language universals is not necessarily obliged to accept the other axioms that go with the Chomskyan school of thought. Pinker and Chomsky tend to mistakenly polarise theorists along these lines. Pinker, for example, places immense emphasis on the work of Joseph Greenberg, whose research was on universals of word order. Greenberg was not particularly interested in whether such universals reflected some kind of innate linguistic knowledge. Indeed it would be rather strange if he was. In Pinker quotes Greenberg’s observation that “in languages with prepositions, the genitive almost always follows the governing noun, while in languages with postpositions it almost always precedes” [my italics]. The phrase “almost always” shows that not all languages adhere to this restriction, which means that it is not, strictly speaking, a linguistic universal. This in turns means that knowledge of this kind cannot be part of our biological endowment. A universal like the one quoted above applies statistically, with exceptions. Norwegian and Kitharaka, for example, are obvious exceptions to this rule– they have both prepositions and postpositions (cf. section on “Principles and Parameters” above). One could also argue that the English word AGO is a posposition as well. Hence, the fact that Greenberg did not use his findings to bolster Chomsky’s line of thought (that it is universal, therefore innate) is not surprising; why Pinker makes that very claim in The Language Instinct is very surprising indeed. In explaining why, one would admittedly have to speculate, and perhaps Pinker and Chomsky feel the need to polarise theorists into either the “pro-Chomsky” school or the “anti-Chomsky” school.

Before we go on, I would like to allude to the logic of argument structure, in order to help us understand why Chomsky and his followers may be mistaken in inferring nativism from hierarchical structuring.

There are many ways to show an argument to be flawed. One of the ways to show a particular conclusion to be a non sequitur is to show a parallel argument with the same logical form, where the premises are true and the conclusion not necessarily true. For example, the following syllogism has both true premises and a true conclusion:

Pigs breathe
Animals breathe
Pigs are animals

(Which reads: Pigs breathe; animals breathe; therefore pigs are animals). We know that this argument is not valid, despite both the premises and conclusion being true, because we can easily think of an argument that espouses the same logical form, has premises we all agree to be true, and yet leads to a conclusion that we would all agree to be not true:

Dogs are four-legged
Cats are four-legged
Cats are dogs

So because this is an invalid logical form, we are not necessarily obliged to accept the veracity of any argument that makes use of this kind of form.

What makes an argument valid is the fact that the conclusion must necessarily follow from the premises. Hence, if it is a valid argument, and if we agree that the premises are true, then we are obliged to accept the conclusion as being true. Whether the premises are true or not is a separate issue, but the fact remains that if they were true, the conclusion would be true.

We could put the argument Chomsky postulates into a logical form as follows:

If all known natural languages make use of hierarchical structuring, then this must be as a result of biological endowment
All known languages natural languages make use of hierarchical structuring
This is a result of biological endowment

Chomsky gives further supporting arguments, like the fact that it is logically possible to have a language that is structure-independent, yet this does not occur in the real world, analogous to the fact that it is logically possible that humans could have four arms, and the fact that we have two shows something about the nature of our genetic make-up.
If we isolate the logical form of the above-mentioned argument, it looks like this:

p -> q

This is a valid logical form, which means that if we accept the conditional p à q as being true in this world, then we ought to accept the rest of the argument. So the way to show this argument to be flawed would be to think of a similar conditional, and show it to be not necessarily true due to the fact that it leads to a conclusion that seems at odds with reality – and uncontroversially so.

So, before concluding that we do not need to concede that structure-dependency is necessarily part of our genetic make-up, let us consider a few non-linguistic instances of hierarchical organisation.

The following arguments are adapted from a similar argument put forth by Sampson in Educating Eve. The reader may think I am over-doing it by using so many examples to illustrate essentially the same point, but I feel this is necessary since this is by far the most important and convincing argument that Chomsky puts forth for his case, and therefore needs to be argued against in quite some detail.

Imagine there are two computer programmers. Programmer One (henceforth P1) makes use of modules, such that each sub-part of the program can be saved and run autonomously, and yet work as one complex program when the entire program is completed. Programmer Two (henceforth P2) does not make use of modules, but still produces a program of equal complexity to that of P1.

P2’s program is in no way different to that of P1’s, insofar as the end-product is concerned. The main difference is that P1 works on his program bit by bit, making sure each constituent sub-program runs as it is supposed to, and is saved onto the hard-drive; only then does he go on to write the next sub-program, etc., until the entire program is completed. If there happens to be something wrong with the program, it will be easier to found out where the fault lies by re-checking each sub-program. P2, on the other hand, writes out his entire program from beginning to end, then saves and runs it. If P2’s program contains a bug, he would have to go through the entire program to see where the fault lies.

Now, imagine there were two rooms; one containing a group of ten P1-type programmers, and the other containing a group of ten P2-type programmers. Imagine the two groups are having a competition of some sort, in terms of which the room that produces the largest number of completed programs in a day does not self-detonate. In other words, the group that survives would be the one that works the most productively. The problem, however, is that the power supply keeps shorting out at unpredictable times, causing the computers in both rooms to shut down. Imagine also that only executionable programs can be saved; if a program were not able to run autonomously then it would be impossible to save it. When the power supply begins to run again, the programmers in both rooms resume their respective tasks.

Obviously, this poses a serious problem for the P2 programmers. When they resume their programming, they would have to start all incompleted programs from scratch.

If we were to imagine a world like this, it is evident that P1-type programmers have more of a chance of surviving. The thing that sets P1 programmers apart from P2 programmers is the fact that P1 programmers make use of modularity. Aside from the fact that our P1 programmers stand a better chance not self-detonating, there are also other advantages of using modularity in computer programming. To name just one of the many: if there were a problem with any given program, a P1 programmer would be able to find the glitch rather easily: he would just need to run each module separately and see which one does not run properly. This would narrow the scope of his troubleshooting very substantially. A P2 programmer, on the other hand, would need to peruse the entire program from start to finish in the hope of finding the glitch. Hence, it just makes more sense that programmers adopt the P1-type modus operandi.

Another kind of “modularity” is to be found in the human body. We have different parts in our body which function in different ways, and yet work together as a whole. For example, we have a separate circulatory system, separate respiratory system, etc. If something were to go wrong with the human body, it is therefore easier to diagnose and treat illness; if something were to go wrong with the heart, it would make sense to also check the condition of the lungs, since they work together. It would not make sense to pay particular attention to, say, your left toe when treating a heart problem since there is no obvious structural/functional link between the two. Imagine if the human body were not somehow divided into constituent parts with separate functions. Even though this may not be easy to imagine, it is evident that if something did go wrong, it would be that much more difficult to diagnose any given illness since we would not be sure where to look. Normally, if something were wrong with the visual system, the doctor would check the actual eye-ball first, then failing to find any flaw he may examine the nerves connected to eyes, along with the occipital lobe of the brain, etc., because they all function together to constitute the visual system. If the body did not function in this manner, we would have to check everything from A-Z in order to determine the problem.

Aside from computer programs and physical organisms, hierarchical structuring can also be found in social institutions, like universities and schools. Let us briefly consider how an educational institution like a school is hierarchically structured.

Amongst the learners there are the various grades. Amongst the staff members there are post level 1 teachers (PL1), level 2 teachers (PL2), and level 3 teachers PL3. Post-level 1 teachers are teachers who do not belong to management, post-level 2 teachers are the HOD’s, and post-level 3 ‘teachers’ are the posts that the vice-principal and the principal hold.

(I now request you to imagine a hierarchical tree structure which illustrates the components mentioned above; I regret that I am not able to reproduce the one I have drawn on this blog...)

With a structure of this kind, let us consider what follows. Firstly, there are clear dominance relationships, with the principal presiding over the school. If the principal decides to, say, have a civvies day this coming Friday, he may do so simply by informing the others. If one of the HOD’s decides to do so, they can only get their wish granted with the permission of the vice-principal and principal, who have the authority to disagree and therefore not grant the wish.

If a student wanted to change some aspect of school policy, he would have difficulty doing so. If he were to follow protocol, the first thing he would need to do is go to his class-teacher, who would then approach his HOD, who would go to the vice-principal, who would go to the principal. Students are not allowed to go directly the principal or vice-principal, even though this is possible in theory. They have to follow protocol.

The decision(s) (pertaining, of course, to the running of the school) made by those higher up affect all those lower down, yet the decisions made by those lower down can only affect those still lower down, and then to can only be implemented if those higher up are in agreement with them. Since those right at the bottom of the structure have no nodes to dominate, they have virtually no say in the running of the institution.

Note, once again, that schools do not necessarily have to be like this. It is quite possible, from a logical point of view, that schools need not employ structures of this kind; for example, it is quite possible to imagine a school which does not categorise its staff members into different levels. Either each staff member works autonomously, or all the staff members form a committee where each member has equal power. Given that this is a logical possibility, how do we explain the fact that this kind of structuring (or non-structuring) never occurs? The fact that schools throughout the world, which have had no significant connection to each other, make use of this kind of structuring requires explanation.

Most people would agree that schools have this kind of structure due to the fact that it makes school-management easier. Any other system would lead to veritable anarchy.
Now imagine I tried to explain this ‘phenomenon’ along the following lines:

The universal structuring of schools throughout the world occurs due to the fact that there must exist an “innate universal school structure-dependency gene” (IUSSDG, consider this an analogue to Chomsky’s LAD), which has to be a result of pre-programming in our genetic make-up, one which predisposes us to arrange institutions of this kind in a structure-dependent manner. The most obvious objection to this claim would be that not all schools are exactly the same all over the world. This quasi-contradiction can easily be explained in terms of parameters; we have a universal schema regarding the outlay of schools, which can be adapted to various circumstances. If a school needs four deputy principles, then our parameters would have to be set in a relevant manner to accommodate this. (We would have to ignore the fact that an instinct, by definition, is something which is not adaptable, and therefore not variable; the webs that a black-widow spins are the same all over the world, and always has been since the discovery of the black widow).

How would one refute my hypothetical IUSSDG? It is indeed not easy to think of a possible scenario that would disprove my purport. Whenever an alleged counter-example presents itself, we could very easily just put that down to a variable parameter. In other words, it is just about impossible to prove this thesis wrong.

The parallels with language is quite evident, so given that it would be somewhat odd to accept this argument, why would we think that the same argument holds true for language?

As a final non-linguistic instance of (contingent) hierarchical structuring, let us consider the following scenario:
Imagine a freshman linguistics programme in which there are two courses taught in the second semester concurrently, with the course codes LING114 and LING110, each course being taught by two lecturers: one in the first block and one in the second.

Once again, I ask you to imagine the following set-up in the form of tree diagram. This would show two courses, further subdivided into the two blocks they are spread over, with the various course requirements the students need to fulfil in order to pass the course. Once again, it is necessary to point out that this tree structure would need to be much bigger, with many more nodes and further sub-divisions to accurately capture the various details of an actual course that was taught in this manner.

Let us consider what follows from this sort of set-up:

Notice that the person who teaches either course during block 1 is at liberty to do as he pleases for his block. He can add as many nodes as he likes (in fact, the number of nodes that can be added are theoretically infinite) by giving more tests, more assignments, projects, etc. Why is he at liberty to do so? Simply because he dominates the that node in the tree. What he does in his block, by the same logic, will also not affect what happens in the block 2 portion of the tree; this because it is governed not by B1, but by B2, and therefore has no say in that arena.

Also, let us say that the LING110 B1 lecturer has an urgent family crisis to attend to in England, and therefore will not be present to teach for most of his block. Upon consultation, the head of department decides to ask the B2 lecturer to just swap her course around with the B1 lecturer. After the necessary parties agree to this, and the necessary administrative work is sorted out, the B1 and B2 courses are duly swapped around. What would swapping the courses around entail? One of the obvious upshots of this kind of arrangement was that all B2’s powers, in terms of deciding his course content, assessment criteria, etc., cannot be somehow ‘left behind’ because he is changing positions on our tree. It is indeed logically possible that because B1 and B2 are now going to be transposed, some of B1’s decision-making powers may be left behind, giving B2 more power. Even though this seems like a somewhat outlandish speculation, the fact that it is a logical possibility, and universally does not occur, needs explaining. Why would it necessarily follow that if B1 and B2 are swapped around, both the B1 and the B2 lecturer would retain all their powers? Well, because it is more pragmatic to do this, as doing it any other way would cause unnecessary complications.

B1 and B2 cannot gain or lose any powers in virtue of the fact that they occur on the same level in the tree. If B1 were moved up the tree, to where LING110 stands, for example (in other words, if he became the course coordinator), he would take with him all the powers he had in his previous position, coupled with the additional power and responsibility that comes with being in a higher position. If it were the other way round, i.e. if the LING110 course coordinator had to swap with the B1 lecturer, he would have to dispense with some of his powers in virtue of the fact that he is now moving down on our tree structure.

This latter principle cannot fail to remind one of the “pied-piping principle” in syntax, which is defined by Andrew Radford as

a process by which a moved constituent drags with it all the properties,
along with its set of features, when it moves to another position on the
same level.

The stipulation regarding items moving only on the same level is unique to syntax since an NP obviously cannot somehow be ‘promoted’ to the S node in the same way it would be in our linguistics tree structure.

The point, however, is that the principles which are involved in cases like this have nothing to do with elaborate abstractions of innate knowledge, but follow from the fact of hierarchical structuring. To take this point further, let us assume someone notices that this phenomenon applies to all university courses taught all over the world, and say it matters enough to him to try and explain this seemingly universal characteristic. How amenable would we be to his explanation along the following lines?

Logically speaking, it is possible that this kind of “course structure” need not occur; it would not be contradictory in any manner to imagine a course that does not make use of this, or to imagine a school where the teachers are not categorised into different levels, etc., so the fact that no school, or no university course, makes use of linear organisation is something that ought to be in need of explanation.

Once again, the oddity of this argument accentuates the fact that it actually does not need any explanation, which shows that hierarchical structuring is actually something quite natural; something that we would expect to occur.

One could object to this argument by saying that the structure of language is something unconscious, whereas the instances mentioned above are a result of conscious decision-making. Let us consider this objection for a moment. Did we really decide that everything must be structure-dependent? If we look around us, and consider the nature of the world, we would notice that just about everything is, in some way or the other, hierarchically arranged; organisms, physical structures, societies, institutions, etc., all make use of hierarchical structuring. Even amongst groups of friends, sooner or later it comes to be that some members, or one member, is given leadership status, and the others fall into some kind of rank. It is hard to imagine that some time back, at the dawn of the human era, people made a conscious decision to start arranging things, like (the nuclear family structure) in a hierarchical manner. Given that nature evolved that way, why should it be so surprising that social institutions naturally evolved that way? And that when the human species evolved more acute cognitive structures, our knowledge (linguistic and otherwise), also evolved in a hierarchical fashion? If social institutions did not evolve naturally, then there should have been a time in human history when societies quite comfortably dispensed with ranks of this sort. As far as we know, this has never been the case. Some of the most ancient texts known to mankind tell us that society cannot be ‘rankless’; we have to categorise its members using some kind of hierarchy (cf. Plato’s Republic and the ancient Hindu Scripture The Bhagavad Gita, for example, which explains how a utopian society ought to be ranked). Maybe the texts are inaccurate, but then we need to ask ourselves what other evidence there is. Certainly in the modern-day world it would be hard to even imagine what this would be like. A rankless society, or a school where the staff members are all ‘equal’ is certainly a “logical possibility” (in much the same way a language that is structure-independent is logically possible), but its universal non-occurrence needs explaining. I contend that societies, like organisms and languages, naturally evolved this way because natural selection dictated that it just makes more evolutionary sense for things to adhere to a structure of this kind. It is true that we can consciously design school systems (and we can consciously design languages too; otherwise there would be no such thing as artificial languages and computer languages, for example), but all systems that evolve naturally have to adhere to this kind of structuring. School systems did not start off as homogenous masses of teachers and students, but must have started with some kind of ranking, which became formalised later on. It is only by deliberate intervention that, say, a school system can dispense with this kind of arrangement, and then too it would be difficult imagine how it would function effectively. Likewise, it is only by deliberate intervention that one can construct a language that does not make use of the principles of hierarchical inclusion relationships. If languages are allowed to ‘evolve naturally’, they would also have to adhere to this sort of structuring.

Nevertheless, if we are prepared to state that hierarchical structuring is one of the most important features that languages have in common, let us consider the following scenario:
Imagine a botanist is particularly interested in the study of weeping willows, and wants to write a paper on “What weeping willows have in common”. When he presents the thesis of his paper to his fellow botanists, they ask him what he has found that makes all willows common. “Well”, he replies, “they are all hierarchically structured.”

What is so odd about this scenario is that this so-called novel claim is actually quite a weak truism. We would expect something more concrete, like the number of leaves produced on average, etc. To extend our parallel to language, when trying to see what it is that languages have in common, we would expect more concrete examples, instead of something so abstract. It is not for lack of trying, as the nativists have been trying to pin these down for a long time, without much success (cf. examples discussed earlier).

Hitherto, the only concrete universal that seems to have survived refutation is that “all languages have nouns and verbs”, to quote Sampson. Surely this is not because we are innately predisposed to have these categories (whilst the others have to be learned), but because every human society needs to name things, and describe actions.

As an aside, Pinker gives numerous concrete instances of principles that follow from structure-dependency, to which Sampson duly presents blatant counter-examples. When I presented this counter-example at a conference, organised by the South African Applied Linguistics Association, I was told by a certain pro-nativist syntactician, Dr Jochen Zeller, now the head of Linguistics at the University of Kwazulu Natal, that whether this is a true counter-example or not depends on one’s representation. Hence, even though one may present a counter-example which is obviously grammatical, yet contrary to what an innate constraint would allow, the evidence can simply be ignored in light of the fact that one can “represent” the structure in a different way; obviously one which fits your theory. The sceptic will not concede that there are independent reasons for, say, adopting X-bar theory, and Pinker gives “no grounds for assuming that the universals posited have extra-theoretical reality”, to quote Cowley. Unfortunately, many of the nativists argue along unscientific terms like this, which makes it quite difficult to have any kind of intellectually meaningful debate with them.

Also, many other facts about language follow from the fact that languages are hierarchically structured. The fact that we do not make use of rules like “To form questions reverse the order of words”, as Pinker would say, follows from the fact that a sentence is not a linear string of words, but an organised tree-structure; the obvious upshot of this being that we can only manipulate the sentence in a particular way. The same reasoning applies to the fact that we cannot use the “middle” of something to form questions: “To turn a statement into a question, take the word in the middle of the sentence, or, if it has an even number of words, the word closest to the left of the middle”. Again, this is because a sentence is not a linear string of words. It is an organised tree-structure. Once we understand this, it is easy to see why picking out the middle car from a string of cars is not the same as picking out the middle word from a sentence, and more importantly, we also need to appreciate the fact that we are not necessarily obliged to accept that this is something specific to language that is hard-wired into our brains. Of course the human mind is responsible for arranging things in a hierarchical manner, but it is the same human brain that insists upon hierarchical organisation for just about everything else.

Why, then, should we believe that this is something unique to language?

In fact, to even think that it makes sense to ask why it is we do not have a rule like “to form a sentence reverse the order of words” is somewhat odd. This is certainly a logical possibility, but only in the very strict sense in that there is no contradiction in terms in asking that question. The fact of the matter is that it is practically impossible to manipulate a sentence in that way because the words we see on the piece of paper are not the psychologically relevant parts. The parts lower down on the tree cannot be taken into consideration when doing something that requires higher levels of cognition. To illustrate, imagine a friend comes over to your house for dinner (say you have a fairly new house), and he asks you for directions to the kitchen. Tacitly, what he is actually saying is that he wants you to tell him what he needs to do in order to get to the kitchen. Imagine you gave him directions along these lines: your brain needs to send a signal to your relevant muscles, and you need to wake up. Then you need to face the door, and after you begin to walk forward, continue until you get to a green door. When you get to the green door, turn the handle…and so on.

Now, after you give him directions, and he eventually follows it to his destination, you join him in the kitchen. There you ask him: Can you re-trace your steps back to where we were by reversing the directions I gave you? He most certainly can, from a logical point of view, but he will most certainly have a lot of trouble doing so.

What is odd about this? It is the fact that we are mixing high-level and low-level bits of information. Regarding the directions to the kitchen, the low-level bits are psychologically irrelevant (which leg goes first, the fact that you need to turn the handle, etc.), and what matters is the high-level bits: which passage to walk down, which door to turn in to…etc. Although it is logical to think otherwise, it just does not make practical sense.

We are doing exactly this when expecting the “reverse word-order rule” to be a viable alternative; we are mixing high-level (manipulation of a sentence) with low-level bits (manipulation of the actual words). The restriction, then, is not so much logical as psychological.

Philip Carr, in a paper he published in 2003, says that there is no empirical evidence offered to substantiate the claim that the child learning language is unable to exploit the capacities available in other domains like association, induction, conditioning, hypothesis formation and testing, generalisation, and so on. Nativists tend to hedge this obviously counter-factual claim by saying that there are some areas where this may play a restricted role, but in absence of any evidence, why should this be seen as anything other than a cop-out?

Carr then goes on to say that “One possible middle-way approach is to argue that analogical generalisation is available in syntax, but that the capacity for forming analogical generalisation is constrained by the Principle of Structure Dependence, so that sets of expressions are taken to be analogous only if they are structurally analogous. It then remains to be established whether Structure Dependence is truly an innately endowed, specifically linguistic, principle. But even if it is, the possibility arises that the domain-specific capacity to form analogical generalisations operates in tandem with that principle”.

As we have already seen, this is not something which is unique to language, and we have no reason to assume that structure dependency is innate.

Regarding the “Principles and Parameters” branch of UG, the following criticism can be levied:
Any putative universal that turns out to be untrue can simply be dismissed as “representing a new option on a parameter”, to quote Cowley again. This is certainly anything but scientific in the Popperian sense.

In addition to this, an explanation of language acquisition from a P&P framework necessitates the (ostensibly) innocuous simplification of instantaneity. This means that we should assume that the acquisition of language takes place in one fell swoop, as Chomsky puts it: from the initial state to the steady state, and that there is little variability thereafter. Chomsky believes this kind of simplification is not only innocuous, but also necessary for the purposes of meaningful scientific inquiry, in much the same way as one would, say, assume a runway to be frictionless when doing some calculations in kinetics. This in itself is problematic, in light of the fact that it not only precludes certain versions of learning theory by fiat, it also simply ignores phenomena which could lead to certain insightful revelations apropos language acquisition.

This approach also makes false predictions. For example, a phrase-structure rule like
PP -> P NP*

would predict that this particular language makes use of prepositions only, whereas another language may have a rule like

PP -> NP* P

which would make use of postpositions only. For these reasons, Chomsky would say that this strand of his argument should rather forcefully entice us to concede that we are endowed with a profound abstract structure that is genetically inherited. However, there are languages that have both prepositions and postpositions, like Norwegian, as I mentioned earlier. Pinker says that if these parameters were a matter of binarity, it could help explain the mystery of language acquisition, but if there are other instances like this, then the mystery is still a mystery, and the “principles and parameters” approach does not suffice as an explanation. In fact, many of the principles that are postulated as such turn out not to be principles as such, but contingent properties that happen to occur in many languages. Even (dialectal) native English speakers have apparent non-English constructions, like verb-final sentences.

Chomsky does admit that research in this area is still quite rudimentary, and that we are not sure how accurate this way of viewing things is. Hence, nativism, insofar as it depends on “principles and parameters” as an explanation, is not tenable. It may also be worth noting that Chomsky uses this theory to try and explain why languages all over the world are so variable. If this approach is shown to be flawed, then the nativists are left wanting with regards to explaining how it is that an alleged instinct is so variable, in much the same way as marriage customs are variable from culture to culture.

Also, if intuitions regarding grammaticality are to be explained by the violations of these P&P’s, then contrasting intuitions (even amongst native speakers) regarding grammaticality need to be explained. Carr points out the same problem I always encounter when I ask a group of students to make judgements on grammaticality; for example, when shown an “ungrammatical” sentence, “there is always a substantial minority of native speakers who find such expressions well-formed”. Once again, a better explanation is to be found in a paper by my former mentor Stefan Ploch on Link Phonology, published in 2001, who debunks the grammaticality hypothesis (GH) by explaining it in terms of attestation. The GH can be defined as the idea that a native speaker can take any well-formed input and judge whether it is grammatical or not. Since this is not true, it is evident that attestation provides a better explanation for this phenomenon. Sampson also opens his book, EMPIRICAL LINGUISTICS, explaining why the GH is methodologically flawed.

Response to “Descartes’ problem”

At numerous points, Chomsky says things like:

Descartes himself devoted little attention to language and
I will make no attempt to characterise Cartesian Linguistics

as it saw itself.

And a little later on, he points out that it is indeed a well-known fact that “Descartes made only scant reference to language”.

In fact, the phrase “Cartesian Linguistics” was actually coined by Chomsky, and from the above-mentioned quotes, one once again wonders why Chomsky would attribute this position to Descartes, and, on a related point, why Chomsky would invoke his name in the first place.

It is indeed well known in philosophical circles that Descartes did not place much emphasis on the apparent problem Chomsky would like to ascribe to him. In his Meditations, Descartes was actually trying to make another point altogether. He was chiefly concerned with showing that humans are the only ones who possess a soul, and he even takes it further by saying that non-humans are not only lacking in a soul, but are also no more important than mere automata. Descartes wanted us to share his belief that a computer and a dog are actually equal in God’s eyes because they both do not possess a soul. One of his arguments leading to that conclusion was the fact that humans have language, and animals do not. Even though a parrot can utter what we hear as words, a computer can also be programmed to do so; with the parrot as with the computer, there is no understanding taking place. That was what Descartes was arguing, and we can see that this had nothing to do with creativity or how it lends more viability to linguistic nativism. (Descartes was actually building up to a rather odd theistic argument purportedly proving the existence of God. His conclusion is that God must exist because otherwise we would not be able to cognise his existence. We are only able to do so because God wills us to! Most philosophers are actually not convinced by his argument, but the details are not relevant here.)

George Lakoff also concurs that there has never been such a thing as ‘Cartesian linguistics’. Chomsky claims that Cartesian rationalism gave birth to a linguistic theory like transformational grammar in its essential respects. He bases this claim on Arnauld’s and Lancelot’s Grammaire Générale et Raisonné, published in 1660. This was a series of grammars by Lancelot, the most extensive of which was his Latin grammar. Chomsky “never checked out his Latin grammar”, but Robin Lakoff did, and she published her findings in the journal Language. What R. Lakoff discovered was that in the introduction Lancelot credited all of his findings to Sanctius, a Spanish grammarian, whose work antedated Descartes by half a century. If this is true, then what Chomsky called “Cartesian Linguistics” had nothing to do with Descartes, but came directly from an earlier Spanish tradition. In Lakoff’s words, what is “equally embarrassing” for Chomsky is the fact that the theories of Sanctius, and the Port Royal grammarians generally, differ from the theory of transformational grammar in a crucial way: they do not acknowledge the existence of a syntactic deep structure, but assume that syntax is based on meaning and thought. This is a position which Chomsky vehemently opposes.

Aside from this, using the fact that language makes infinite use of finite media as evidence for innateness is not logical in any case. According to Lakoff, Chomsky’s use of the word ‘creative’ is “very strange”. There is nothing in transformational grammar that accounts for human creativity or that even pretends to. All that transformational grammar does, is provide a recursive mechanism for generating sentences. There is nothing ‘creative’ about this, since in principle it is not any different from constructing a computer program to do mathematics. The program could perform an infinity of mathematical operations, “but no one would say that it accounted for mathematical creativity”.

This is evident since we can easily think of instances where finite media are used to produce what could be of a potentially infinite variety. For example, the number of letters in the English alphabet is twenty-six, yet these can be manipulated and combined in an infinite variety of ways. The number of keys playable on a keyboard is also finite, yet these keys can be combined to play a potentially infinite variety of songs. As a young child I used to wonder whether song-writers are going to have a problem sometime in future because all the songs would already have been written, but once the fact that song-writing also makes infinite use of finite media was understood, this is no longer a concern. (Chomsky actually does believe that, for example, the number of art forms are limited, and may indeed one day have reached “saturation”!)

A musician who has been playing the piano for a while, and is rather impressively piano-literate in terms of being able to distinguish between the various extant genres, would be able to tell which genre a given piece of music belongs to. Sometimes he may not be sure for whatever reason, but chances are he would be able to accurately categorise various piano pieces based on the ‘rules’ that go with composing a particular type of musical score. Note that despite being rule-bound, in a sense, the composer can still produce a potentially infinite variety of musical scores, and recognise a particular score as part of an accepted genre, part of another genre, or not like any genre accepted according to the current standards of musicianship.

This is not unlike the native speaker’s ability to tell apart grammatical sentences, ungrammatical sentences, and sentences which they are not quite sure of. The only reason our pianist-composer is so skilled at telling the various genres apart is because he knows the rules that need to be complied with in order to fit into a particular paradigm. Likewise, speakers of a given language know the rules of their language, and they use this knowledge as a criterion to judge whether or not sentences fit in with their idiolect (cf. below, where I discuss Ploch and Jensen’s notion of “attestation” in more detail, which provides an alternative explanation for Chomsky’s ‘grammaticality hypothesis’).

Now, what has this got to do with being born with an elaborate linguistic structure hard-wired into our genotype? If we are not inclined to attribute specific innate knowledge to our pianist-composer, why attribute similar knowledge when it comes to language since these cases are actually the same in principle? One could object to this criticism by claiming that the rules of writing, piano-playing, etc. are consciously taught, whereas the rules of spoken language are not taught.

Whether rules are taught or not is itself a claim that should not be controversial. Sampson, for example, argues that parent-child interaction is analogous to a graded series of language lessons. Regardless, if we claim that rules are not language rules are not learnt, but part of our biological endowment, then why do different languages have different rules, and why do they differ even from dialect to dialect? The “principles and parameters” strand of Chomsky’s theory was postulated to answer precisely this objection, and as we have seen above, the “principles and parameters” theory is flawed. Before accepting the claim that actual linguistic rules are innate, we would have to ask what other evidence there is to lead us to accept this conclusion. Surely the burden of proof lies on the one making the claim, and until such evidence is forthcoming, we are not obliged to accept that rules are not formed from interaction with the environment, but that we are born knowing all the rules of language.

This strand of the argument, then, also breaks down.

Response to Pinker’s The Language Instinct

Phonotactic Constraints

The claim that English has rules of this kind seems quite false. I am told that there is an electric blanket brand in the UK called “fnug”. Sampson tells us that there is a houseplant called “vriesia”, and that he knows an Englishman by the name of Srawley, who sees his name as quite a natural, English name. I know of at least three adult monolingual English speakers (all of whom were privileged to have a tertiary education) who until very recently thought that the word “gnome” was pronounced as it was spelt. There is an L1 speaker of English who, until recently resided in an ashram in the south of Johannesburg – having passed away a while back. She happened to be of Scottish descent, and attended Victoria Girls’ High School in Cape Town. This is relevant because her school happened to be a very upper-class school for “whites only”. Long into her adulthood, and therefore long past her putative ‘critical period’, she joined a Hindu monastery, and as a result, learnt to say things like [k∧rma], and says it quite naturally. The “gnome” and “karma” instances are interesting because the former has an illicit onset, whereas the latter has an illicit post-vocalic r.

I myself am a monolingual speaker of English. However, I do not aspirate my (syllable initial) bilabial stops. Hence, I found my freshman phonetics classes very frustrating, when John Rennison would insist that if we put our hands in front of our mouths, we would FEEL the difference between pin and spin, where the latter is supposed to make us manifest a burst of air. Simon Donnelly was just as confused when I pointed out that I do not ‘round my lips’ when I say the sh part of milkshake.

Their confusion stems from the fact that all this is meant to be “innate”, you see...

Of course it is true that English may be uncomfortable “in general” with sequences of this kind, but it makes more sense to explain this intuitive discomfort regarding certain phonotactic forms in terms of learning, exposure, and attestation, rather than innate machinery.

Infant Prodigies

It would indeed take too long to test for every possible variable when a child is trying to make sense of the world around him, like whether the person you are talking to has a higher social status than you, and whether the previous word happened to be ‘and’. But “if you innately know that these last two variables cannot be relevant, then you will be innately incapable of growing up as a speaker of Japanese and [Biblical] Hebrew, respectively”! In these languages, social status and the position of the word ‘and’ matter in these languages.

It looks like Sampson may have been unfair here, because one can say that he is just taking two examples which he knows languages make use of, presenting counter-examples as “fruitless possibilit[ies]”, and saying “See, Pinker is wrong.” Sampson’s point is simply this: how are we to know what counts as a “fruitless possibility”? How do we know that some language somewhere does not consider whether a statement was uttered indoors or outdoors as a grammatical variable? Of the thousands of languages that are found in the world, how do we know? Given that all sorts of seemingly strange variables matter, like the isihlonipho sabafazi practice amongst tribal Xhosa speakers, according to which the syllables in the names of the male members of a woman’s husband’s family have to be avoided when they are in each other’s company. Even if a language were found that makes use of the indoor-outdoor variable, the nativists would just say, ‘oh’, and try to pin other things down as “fruitless possibilities”, whilst maintaining that we have an innate disposition to know not to look for these things. If we are to continue to assert that there is this kind of innate restriction, surely it ought to be because the evidence points us in that direction; my point here is simply that the evidence does not point in that direction.

Regarding the claims mentioned in the previous section made by Karen Stromswold, who said that children never make mistakes by way of false analogy like:

I like → He likes; I can → *He cans

because our innate knowledge precludes that possibility. If this were true, then how is the following construction, taken from sixteenth century Shakespearean English, possible:

I like → Thou likest; I can → Thou canst

Perhaps Shakespearean English was not spoken in this way, perhaps this is just a written convention. The fact of the matter is we could never know for sure. My point in mentioning this is simply to accentuate the fact that a construction of this kind is not as logically impossible as Pinker would like it to be. There may also be many modern-day languages that also work like this. A representative sample has not been adequately analysed. If a language like German is shown to do exactly what Stromswold predicts to be impossible, what does that show? It would show that there is no innate restriction of that kind. Indeed the crux of this thesis centres around the fact that every specifically innate constraint of this kind applies usually only to English, which means it cannot be part of our biological endowment.

A more plausible explanation would be that some aspects of grammar seem to be easier to learn than others, but it is “not obvious what distinguishes the easier bits from the hard bits”, quoting Sampson. What is evident is that it is far less plausible to explain phenomena of this kind by appealing to innate restrictions.

Second Language Learning and the Critical Period

The ‘critical period’ thesis has already been discussded earlier, and it seems more obviously false in light of the fact that there is nothing obvious that precludes someone past the putative critical period from learning language with equal proficiency to that of a native speaker. (Let us ignore the fact that neither Chomsky nor Pinker gives us any meaningful criterion as to what actually constitutes “proficient”).

How do we explain the fact that there was a student I once taught who did linguistics 114, who started learning French at the age of seventeen (in high school). Upon hearing that she was selected as an exchange student to go to France for a year, she started attending French courses, reading books on French, etc. By the time she had to go overseas, she was speaking French rather fluently, so much so that most Frenchmen she met did mistake her for a fellow citizen. If the facts are exactly as she explained them to me (I of course can only assume veracity), then Pinker is wrong in drawing the conclusion that he does. Newport and Johnson must have had a rather idiosyncratic notion of “motivation”, and either way they fail to explain the facts.

Hence, Pinker’s spin on this variation of the critical period is as unamenable to the facts as Chomsky’s.

The Language Mutants from Essex

A group of medical practitioners did a follow-up study on these so-called “language mutants” mentioned by Pinker. Firstly, they found that there is no consensus that SLI even exists, as those afflicted with it were everything but normal in other cognitive domains. Their IQ scores were substantially below the average, indicating that it is a more general impairment; this fact is not mentioned by Pinker.

Carr quotes from a subsequent work where he gives further details of Pinker’s inaccuracy. For example, “they do not cite the average verbal IQ of 75.” He also mentions that in Neil Smith’s pro-Chomskyan review, he “acknowledges that there are alternative interpretations of the data”. Once again, this is a cop-out, and Carr rightly points out, this is not an issue of interpretation, but whether SLI sufferers do or do not exhibit low IQ’s, and the aforementioned paper provides the relevant evidence. It might be normal to argue about the interpretation of data, but it certainly is “not normal practice to simply fail to discuss falsifying data”.

He then concludes, on a more general note, that “To the extent that Chomskyans fail to mention or discuss empirical evidence bearing on their claims, they do not deserve to have taken seriously their claim to be engaged in scientific investigation”.

This clearly shows that this deficiency was not unique to language, but instead indicative of a more general impairment. In fact, “it is not even clear that the term is more than a vague syndrome-type cover term for a range of deficits”, and “we are thereby forced to the general conclusion that Specific Language Impairment is not very specific” (quoting Carr again).

The Structure of Words

Firstly, regarding the restriction on inflexional-derivational suffixes, David Crystal expressed the preference of drawing a distinction between a “linguist” and a “linguistician”, such that a “linguist” should refer to a student of the discipline linguistics, whereas a linguistician should refer to someone who is involved in the discipline professionally. If someone had to encounter this distinction without any knowledge of the conventional usage of the term “linguist”, one would accept the terminology without any qualms, especially since a new-comer to the discipline is bombarded with so much jargon that he would expect to come across hitherto unfamiliar words. This example is a clear instance of what Pinker insists should “sound ridiculous” – we have an innate restriction which states that a derivational suffix like –ian can only attach itself to roots, and never to stems. According to Pinker, linguistician should also be an impossible construction which “sounds ridiculous”, since –ian attaches to the non-root stem linguist + ic. Would someone reading this accept it as an English word, or perceive it as something which violates our innate knowledge of word structure? Crystal certainly had no qualms about using it, and when I first read it, it did not even strike me as odd.

Later Pinker says that the restriction is that derivational suffixes cannot attach themselves to inflected roots. This rules out the counter-examples just mentioned, but Sampson refers to a film called Heathers, about a group of girls all called Heather, and asks us to imagine this film having some sort of cult following. Would there be anything wrong in describing this as Heathersian? Or think of a hypothetical spoof of the film, in the same way one could see Austin Powers as having a rather James Bondian paradigm, one could see this spoof as having a… Heathersian paradigm. I do not think this sounds ridiculous at all.

It might be true that we as English speakers prefer not to add derivational suffixes to inflexional suffixes, but there is a much better explanation than appealing to instinctive knowledge of word structure – one that provides an explanation based on culture and history. The –ian suffix is Latin, and words that use this suffix in English are taken from Latin. Later, –ian words were coined that had not existed in Latin, but the people who coined these words tried to make them sound as Latin as possible. Thinking in terms of Latin, it is not clear what it would mean to add an –ian suffix to a word inflected with
–s. Hence, there was no possibility of this occurring, given that they were using Latin as their model. Now things have changed. Sampson says that when he went to university in the 1960’s, he had to provide proof of having passed a substantial Latin exam, otherwise his application would not have even been considered. When my father was in school, he was forced to take Latin right up to matric. Now, Latin is no longer seen as the language of the educated. In light of this fact, it seems viable that more words like Heathersian would be accepted, as they would not have to be based on the Latin model.

Pinker happens to be writing at a time when the study of classical languages is fairly recent, yet obsolete enough for most people to not know the exact reason why there are these regularities amongst –ian words in English. Hence, it seems quite evident that this is not the case in this instance.

Regarding the way compounds work in terms of headedness and percolation conduits, Pinker uses this explanation to apply to more than one phenomenon. He talks about how irregular verbs trigger “blocking” mechanisms which preclude the application of the rule. The way I understand it, this predicts that either a rule will be generated, or it will be blocked; if it is blocked, then it cannot be generated. For example, consider the incorrect past tense of the verb sing. This will be perceived as a morphologically complex word: sing+ed. When a child encounters the correct past tense, sang, which now has to be rote-learned as a listeme, it “blocks” the application of the “add -ed” rule to this word, since sang and singed both have the same semantic content. Therefore, whereas the past tense may have been taken to be singed, it will now be over-ridden (or blocked) by the new word.

However, this does not happen. There is even a clear correlation between exposure to the irregular form and correct usage of it. Instead of saying that we have a “blocking mechanism” which precludes rules from applying, it makes far more sense to say that this is a result of exposure to stimuli from the environment, which, for example, can be explained by the notion of attestation (Ploch, 2001). Attestation refers to the fact that any form can be tested by a native speaker, and that phonological forms can be added and subtracted from the speaker’s set of attestations. Forms are “more normal” if they are retrieved from long-term memory and put into short-term memory. If this is not done, then the form will be “less normal”; this may occur either because the form is not retrieved, not retrieved properly, or because the form may be entirely foreign altogether. This explanation, besides sounding more plausible from an a fortiori point of view, is also more in consonance with common sense. It seems rather unlikely that this could be explained by a binary mechanism that either triggers a rule or not; to interpret the phenomenon in this way seems incoherent.
Also, with regards to headed compounds, the same criticism applies. Pinker explains this phenomenon in terms of “percolation conduits”, which either allow a rule to “filter” to the top of the morphological node or not. The determining factor here is that of headedness. If the compound has a head, then the rules that apply to that head percolate up the morphological tree and apply to the compound as a whole. If it is headless, then the default rule applies to the compound. How, then, does one account for the fact that some native speakers may say either at different times, and sometimes not even be sure of which one to use? To explain this, Pinker now says it is a matter of exposure, and whether the speaker perceives the compound as having a head or not; and some may not be sure! Surely speaking about innate “percolation conduits” and blocking mechanisms is far less plausible than saying that we employ a more general learning strategy. Clearly then, we are once again not obliged to take this as support for nativism.

The third instance may seem like a more convincing case, but upon analysis is just as vacuous. Despite having tried to do everything to save Gordon’s hypothesis from refutation, it seems like there is still a problem with his claim. The Brown and LOB corpus, features a natural-speech sentence which reads as follows,

…the smaller European carriers, who have in the past been strong
opponents of fares-cutting airlines…

The compound we see here is fares-cutting, which is a compound which has an analytically complex word as the first part of the word; this is precisely what Pinker and Gordon would predict to not be possible. According to Sampson, there are more instances of this usage in the UK than in the USA, where Gordon would have been working from, but this only accentuates the fact that it cannot be an innate restriction which guides the formation of these compounds, unless we are prepared to concede that there is some viable explanation regarding this aberration.

In addition to this quite blatant refutation, the validity of Gordon’s experiment can be questioned for other reasons. Firstly, his sample was based on a fairly homogenous group of children, all of whom were L1 English speakers, and all of whom were Americans. To draw the conclusion that these restrictions are universal to all human beings may be a bit premature. The reliability of this experiment may also be questioned, since there were also no follow-up tests done; it is at least possible that six months later some of the children in his sample may find the word “rats-eater” a well-attested form.

On the Origin of Language

Pinker aims to show that we possess a database embodying knowledge of X-bar syntax, traces, etc., and this must have evolved as a result of Darwinian natural selection. Evolutionary processes could have given rise to a language instinct, but Cowley
rightly tells us that “the very idea is muddled”. If this LAD is as technical and detailed as Pinker would require it to be, we would have to ask how an N-bar could count as useful both for the brain and in terms of evolutionary fitness. Why would “networks that represent formal categories” arise though natural selection? As will be mentioned below, Pinker also plays down the fact that evolution works by selecting inheritable variations. As will also be explained below, he “treats natural selection as if it were goal driven”, which is blatantly incorrect.

Mark Turner, in the book I referred to earlier, THE LITERARY MIND, proposes a rather interesting alternative explanation to the origin of language, an explanation which runs counter to the widely accepted Chomskyan claim that it is a result of genetic mutation (or natural selection for Pinker); the spontaneous development of a “grammar organ”. Before we go on to explain Turner’s theory, we would first need to understand some of the basic premises upon which his theory is based.

Chomsky was very emphatic on the point that syntax must necessarily be one the cognitive primes of the human mind. By that he means that syntax is primary in the sense that it takes precedence over other levels of language structure. In fact, Chomsky even goes so far as to call us “syntactic creatures”, arguing along Aristotelian lines. Turner does not agree with this interpretation. In fact, he is not the only one who proposes that syntax cannot be as primary as Chomsky claims it to be. The psycholinguist, Joseph Kess, for example, explains the fact that when we recall something which was said to us, we tend to focus on getting the meaning across instead of the syntactic structure. In fact, it is almost never the case that we reproduce the exact syntax when relaying information. This shows that semantics takes precedence over syntax, a standpoint to which Chomsky has always been vehemently opposed. As an aside, this is one of the reasons George Lakoff divorced himself from the Chomskyan school in the first place, and Pinker also acknowledges this fact in his latest book, THE STUFF OF THOUGHT.

Likewise, Turner does not endorse Chomsky’s hypothesis, but instead suggests that parable constitutes the primary unit of language – not so much as a structural precedent, but certainly as a temporal and, more importantly, a cognitive one.

Turner defines parable as “the projection of story”. This is a much broader definition than the one conventionally used in the English language, where the word it refers to an allegory of some kind in order to illustrate a moral, and much narrower than its Latin-Greek origin, which refers to a comparison of some kind. Turner uses his definition to refer to a more general instrument of everyday thought that shows up everywhere, “from telling time to reading Proust”.

Turner wishes to address a fundamental misconception in the minds of the general educated public, viz that the everyday mind has little to do with literature. Although literary texts may be creations of art, the instruments of thought used to invent and interpret them are basic to everyday thought. The mental instrument which Turner refers to as “narrative story” is something which is actually basic to human thinking.

The bulk of Turner’s book is in fact dedicated to illustrating how the processes we go through when analyzing parables, are exactly those we use in our everyday lives. For example, the use of mixed metaphors, which Turner refers to as “blending”, is explained as confounding two disparate things, be it physical attributes, mutually exclusive character traits, etc. For example, Turner tells of a story where a donkey concocts a rather sophisticated plan to help his friend the ox. Both the ox and donkey discuss this plan in detail before it is implemented. The details of the actual story are not important. The point is that there is an element of anthropomorphism here, since these otherwise “a-linguistic” creatures are speaking rather eloquently. This is contradictory, since animals cannot communicate with that degree of complexity. Also, donkeys almost certainly do not have the cognitive capacity to plot and plan with any degree of complexity, nor do oxen for that matter; this is also contradictory. Hence, we call this process blending, since we are combining otherwise disparate traits.

This example may be literary in nature, but Turner’s point is that the processes involved in understanding the subtleties of this parable are something everybody goes through everyday. To understand a sentence like
Roosevelt accomplished a great deal in his first one hundred days,
but Clinton has accomplished by comparison little
we must build two mental spaces and an intricate comparison between them. We can understand this as a conceptual metaphor according to which accomplishment is travelling along a path; hence, we could understand this more clearly by paraphrasing the above statement as
Roosevelt covered a lot of ground during his first one hundred days,
and Clinton covered comparatively little during his.
So in one blended space, Roosevelt is moving along a path where there are locations, and reaching this location is considered the accomplishment of a goal. In the other blended space, Clinton is starting to move along a similar path, just at a slower rate and reaching fewer locations.

When we say, then, that Clinton should have hit the ground running, we are saying that Clinton should have accomplished in his first one hundred days what Roosevelt achieved in his. Hence, Clinton has failed to “keep pace with” Roosevelt. Now, to “keep pace with” requires a conventional blend that has both presidents competing along the same track, but this is something that we would construct and use entirely unconsciously. We can force this blend into consciousness, so to speak, by saying that “Clinton was in a race with the ghost of Roosevelt”. But we also know unconsciously that this is an imposed construct, and therefore there is a temporal constraint, so there would be something anomalous in saying that “So far, Roosevelt has succeeded in keeping well ahead of Clinton”. The problem with this is that we could only say something like this when it pertains to a real race, not a parabolic blend like this one. However, it is not even as simple as that, since we could actually envisage a possible world where a blend of this sort is viable. Imagine there was a passage in Roosevelt’s diary in which he claims to set a record accomplishment during his first one hundred days of office; and part of his entry states that he will endeavour to set a record for all presidents, past and future. With these premises in mind, it now actually does make sense to say something like “So far, Roosevelt has succeeded in keeping well ahead of Clinton”.

Turner’s point here is that processes like blending are just that, cognitive processes which are not exclusively the product or the property of the literary mind, but of the everyday mind. Even contemporary neuroscience is acknowledging this fact, because at the most basic levels of perception and cognition, understanding and memory, “blending is fundamental”.

Scientific models of thought conventionally start with what they take to be basic, on the tacit assumption that scientists must do first things first. However, it is a well-accepted fact that common sense, or what we perceive with our rather limited senses, may not necessarily reflect reality. Leeching on to a common-sense paradigm is analogous to clinging to a Newtonian view of the cosmos. It is not implausible that the concepts behind such models are wrong. Indeed, even a rather perfunctory read through the various discoveries in the physical sciences would confirm this, especially the findings and subsequent implications of quantum mechanics.

It is not implausible, then, that something like imaginative blending and integration are basic; an explanation that cannot handle the dynamics involved in a child pointing to a balloon and saying This is my imagination dog, cannot hope to explain the most ‘basic’ concepts like dog, or the meaning of The dog has four legs.

What Turner refers to as the processes of The Literary Mind, are usually considered not only different, but secondary to the processes of the everyday mind. On the contrary, processes that we have always considered literary are at the foundation of the everyday mind. Literary processes like blending make the functioning of the everyday mind possible. Turner then goes on to use this idea to explain the origin of language.

It is evident that human beings have the mental capacities Turner refers to as “parable”. Now, considering Ockam’s Razor, is it necessary to add something new to parable in order to explain the linguistic mind? Do we need to make the additional hypothesis that special autonomous instructions arose in the genetic material for building an autonomous black box that does the entire job? If parable gives us what we need for a satisfactory explanation, the answer would have to be negative. Cognitive mechanisms whose existence we must grant independently of any analysis of grammar can account for the origin of grammar. The linguistic mind is “a consequence and subcategory of the literary mind”. Let us now consider Turner’s explanation of this hypothesis in more detail.

Stories have structure that human vocal sound per se does not have. Stories have objects and events, actors and movements, viewpoint and focus, image schemas and force dynamics, etc. Hence, parable takes structure from story, which is thereby superimposed on to voice; the structure it creates is what we call grammar, and sentences come from story by way of parable.

Parable draws on the full range of cognitive processes involved in story. This would include spatiality, motor capacities, our sense organs, perceptual categorization and other basic cognitive instruments. Parable draws on all of this structure to create structure for vocal sound. Grammar, built from this structure, coheres with it.

For Turner, grammar would have arisen in a community that already had parable. The members of that community used parable to project structure from story to create rudimentary structure for vocal sound.

Turner asks us to consider an analogy not unlike my example on p. 45, but to illustrate a different point:
Once again, imagine a community of people who have trained themselves in rudimentary martial arts, and suppose all members of the community have it. No genetic specialization in martial arts exist, but depends on pre-existing capacities like muscle control, balance, vision, etc. Once the skill is acquired, it seems like second nature.

Now suppose an infant is born into this community with a little genetic structure (strong bone structure, good vision, etc.) which would help direct these pre-existing capacities to learning this particular form of martial arts. The members train him as normal, but he has secret edge. And if the community is structured so as to confer reproductive advantage onto those who are more proficient at martial arts, then the community provides an environment of evolutionary adaptiveness for the genetic change. Here, each increment of further genetic specialization brings an increment of reproductive advantage. However, in this scenario, martial artistry arose without genetic specialization for it.

Along the same lines, now imagine a community of people who use parable to create rudimentary grammatical structure for vocal sound. Everyone in this community develops story and projection, has voice, receives training from his parents, and is assimilated into the work of creating grammar through parable. Now suppose a special infant is born with just a little genetic structure that helps him project story onto voice. The members (or his parents) train him as normal, but the child has a secret edge. If the community is structured so that greater facility with grammar confers reproductive advantage, then the community provides an environment of evolutionary adaptiveness for genetic change. This situation could plausibly give rise to a kind of genetic arms race in which each increment or further genetic specialization brings an increment in relative reproductive advantage. However, in this scenario, grammar arose without a genetic instruction for grammar – it arose by parable.

Let us consider a basic example of how story is projected to create grammar. Consider the small spatial story in which Chomsky throws a stone and Pinker flips a coin. These are both instances of the same basic abstract story.

This abstract story has certain kinds of structure. One kind of structure it possesses is distinction of certain elements. There is Chomsky, the act of throwing, and the perception of a stone; conceptually, we distinguish these three elements.

These elements have category structure. Chomsky, for example, would be placed in the category of animate agents; throwing is placed into the category of events; stone could be placed into the category of objects.

The story also has combinatorial structure. The distinguished elements of the story include Chomsky, the stone, the causal relationship between Chomsky and the movement of the stone, the event shape of the throwing, and our temporal viewpoint with respect to the throwing; all of which are combined as simultaneous: the act of throwing involves all of them at once. More importantly, this combination also has hierarchical structure, as it must in order to explain its universal occurrence.

So the abstract story has certain kinds of structure: reliable distinction of elements, distribution of elements into categories, simultaneous combination, hierarchy, etc. Other basic stories also show recursive structure: If Lakoff catches the stone that Chomsky threw, then one story (Chomsky throws a stone) feeds into a second story (Lakoff catches the stone).

Vocal sound per se does not have this structure. The elements of the story have structure, but the actual sound is merely a continuous stream which can be divided in any number of ways. The elements of the story also have reliable hierarchical structure. The elements are joined but conceptually distinct – this is something which the sound does not mirror. The elements of the story also have category structure which the sound does not mirror. If Chomsky throws, Pinker pushes, Jackendoff tosses, etc., then Chomsky, Pinker and Jackendoff all belong to a certain category in light of that, which is not at all evident from its phonological form. Also, the causal structure has nothing to do with the causal structure of the vocal sound. The temporal structure of the sound is always linear, whereas the temporal structure of this story involves highly complex simultaneity. So story and phonological form are two different things. Story structure is projected to create structure for vocal sound, which the latter does not intrinsically have.

The distinction of elements in the abstract story is projected to make “Chomsky”, “throws” and “a stone” precisely distinct not as sound but as grammatical elements. The category structure in the abstract story is projected to vocal sound to fit “Chomsky” and “a stone” into the same grammatical category: NP. Once again, the phonological form gives us no reliable way of putting them into categories. The different roles of Chomsky and the stone are projected to give these different grammatical relations (subject vs. object) and different semantic roles (agent vs. patient). The structure of temporal foci and viewpoint in the story is then projected to give the sentence tense.

For Turner, then, grammar is not the beginning point of language; parable is. Grammar arises from conceptual operations. Rudimentary grammar is a repertoire of related grammatical constructions established through parable. The backbone of any language consists of grammatical constructions that arise by projection from basic abstract stories.

Turner then goes on to explain how rudimentary language can be extended by other processes like metaphorical blending, resulting in the vast and intricate phenomenon we call language.

Turner’s view of language as arising through parable would allow for diversity of languages and grammars. Details of narrative structure vary from culture to culture and even person to person; what is universal is not all the specific narrative structures but rather stability of basic abstract stories. All cultures have stable repertoires of basic abstract stories. Some of them vary from culture to culture. Projection is widely variable in the actual structures it projects and the ways in which it projects them. Even when two different languages project the same basic abstract story, thereby giving rise to similar grammatical constructions, the details will often be strikingly different. In Latin, word-order does not matter; in English it does. Yet in both languages narrative relations project to create grammatical structure; it just so happens that in Latin it is a grammatical structure of case-endings in nouns, whereas in English the result is a grammatical structure of word-order.

This explains the diversity and universality of linguistic phenomena in a way that the Principles and Parameters theory fails to. Turner’s hypothesis that grammar arose by parable is not at odds with Sampson’s theory. What puts this view at odds with the Chomskyan conception that grammar arose because there evolved, with no help from natural selection, and no help from pre-existing capacities, a species-specific, modular LAD that was specialized for grammar, just as the heart evolved from natural selection and became specialized for its functions.

From a Popperian point of view, a hypothesis of this sort is not tenable. It trades Ockam’s Razor for God’s Magic Hat. Against all odds, it makes the most cosmic and all-embracing extra hypothesis imaginable so as to solve everything at once.

This is a problem which Pinker actually acknowledges. Hence, he tries and explain the origin of language by natural selection. This may seem prima facie to put Pinker even further away from Turner than Chomsky, but if we look carefully at their practical attempts to explain how natural selection could have produced a genetically coded mental organ, Pinker actually implicitly embraces an account of language origins in which grammar begins from meaning. Pinker & Bloom say: “Language is a complex system of many parts, each tailored to mapping a characteristic kind of semantic or pragmatic function onto a characteristic kind of symbol sequence.”

Mapping is the critical term and concept in this assertion. It is usually the critical concept in any explanation of grammar as “encoding” something else, “signalling” something else, “mapping” certain structures, etc. However, Pinker does not have anything to say about the role of mapping. Mapping (which is similar to what Turner calls “projection”) is a mental competence. It does not come for free in an explanation. It is actually the principle process to be explained. To speak of mapping is to make a theoretical commitment to a mental capacity for projection; Pinker and Bloom give various examples of this kind of mapping throughout their paper.

Their explanation depends upon the existence of a mental capacity to project one kind of structure onto something entirely different. Pinker assumes that the mental capacity for projection precedes grammar, but underplays the importance and complexity of this mental capacity by perfunctorily alluding to it; this mental capacity is what principally needs explaining in an account of grammar.

Chomsky’s argument is weak because it asks us to accept an almost inconceivably unlikely event in the absence of any evidence for that event. Pinker’s argument is weak in a different way: it skips briefly and vaguely over its central step. For natural selection to be responsible for the origin of grammar, we must have two events: some genetic structure must arise to result in a trait of minimal grammar, and this trait must occur in an environment in which it confers reproductive advantage. Of course, this environment cannot be one where a grammatical community already exists. This first event is not hard to imagine, even though there is no compelling evidence for it. Let us now consider the second step, regarding the conferring of reproductive advantage.

Parable is a basic capacity of human beings. In a community where grammar arose by parable, grammatical speech would be a highly useful cognitive and cultural art. In this community, language would certainly confer reproductive advantage, making genetic specialization for language adaptive. In Turner’s scenario, the origin of (rudimentary) grammar happens before any genetic specialization for grammar. The grammatical expressions produced by the lone first genetically grammatical person are parsed at least in part as grammar by members of the community who have no special genetic instruction for doing so, but use parable to do so.

This is where Pinker faces a challenge. What survival advantage would the first person with language have? We know that natural selection is blind, and is only concerned with the here and now). The scenario to which Pinker is committed is not viable in light of this, whereas Turner’s is. Grammatical processing might assist this lone person’s memory, reasoning, or imagination, and so be adaptive in this very indirect way. But Pinker is precluded from proffering an explanation along these lines since he advocates the notion that grammar is modular and therefore autonomous from other cognitive faculties. Pinker would also not be able to offer a scenario in which members of the community (who have not evolved the relevant genetic instruction) use parable, together with other cognitive processes, to recognize and parse the grammatical utterances of our hypothetical first lone linguistic being.

Pinker provides no alternative to the scenario in which the adaptiveness of genetic specialization for grammar depends upon the presence of a grammatical community. All of his speculations concerning reproductive advantage depend upon a community of speakers with rudimentary grammar. Hence, there is no way in which a solitary grammatical person would have a reproductive advantage in a community whose other members are a-linguistic. The utterances directed at him would be a-grammatical; his own grammatical utterances would have no audience; communication would not be impossible, but there is no additional advantage conferred by the grammatical component.

Pinker is actually not unaware of this problem, and he tries and explain away the problem by suggesting that “One possible answer is that any such mutation is likely to be shared by individuals who are genetically related. Because much communication is among kin, a linguistic mutant will be understood by some relatives…”

This is nothing more than a cop-out, since it is not true that such a mutation is likely to be shared by individuals who are genetically related. Consider once again the first genetically grammatical person. None of his ancestors or siblings is genetically grammatical, ipso facto. If the genetic material expressed in his grammatical competence arose by mutation from his parents’ genetic material, it is very unlikely that his kinsmen would have that mutation too. If it arose because error-free sexual recombination of his parents’ genetic material finally put together the right package, then later siblings would receive quite different genetic packages, especially if they do not have the same parents (i.e., a different father or mother). There is also the important difficulty that even if a later sibling had the right package, the first genetically grammatical infant would nonetheless still live in “grammatical isolation”, so to speak. Conventional Mendelian genetics clearly state that mutations which have survival value are passed on to succeeding generations, not to siblings; in other words, mutations conducive to increased chances of survival are passed on vertically, not horizontally. According to Cowley, who concurs with Turner, what needs to be explained is what “reproductive benefits [are] accrued to individuals who…drop (or insert) pronouns”, and unless this is accounted for, it remains much more likely that talking co-evolved with “the neurophysiological mechanisms and cognitive skills typical of primate social behaviour”.

If evolution could think ahead, it would certainly see that producing a genetically grammatical community would be enormously useful to the members of that community; and if that were the case, Pinker’s theory would have some viability. But just about every exponent of Darwinian evolution agrees that evolution cannot think ahead. Chomsky is right in saying that we can tell the story as we like, but he is wrong in saying that we ought to not try and provide one, and that we should look at these “stories” in light of what our scientific paradigms and evaluate them logically. A good natural selection story for the origin of language must show that the first genetically grammatical human being had an immediate reproductive advantage. If he was born into a community whose members had, by virtue of parable, a minimal capacity for grammar, then the immediate reproductive advantage is obvious. Pinker is obliged to show immediate benefit to the first genetically grammatical person who is born into a community whose other members have no persons endowed with grammatical competence, and who lack the ability to recognize even the existence of grammar, let alone parse it.

It is my contention that Pinker has known his position to be untenable all along, and finally realised that he would have to radically revise his ideas if he were to maintain his standing in the scientific world. As mentioned, he tacitly rejects many of his previous ideas by embracing the fundamental tenets of Cognitive Linguistics. He also no longer refers to himself as a psycholinguist (in THE LANGUAGE INSTINCT, he specifically defined the word to refer to himself! – “...a psychologist by training , who studies how people understand, produce or learn language”), but actually explicitly says that he does not consider himself a linguist: he’s a cognitive psychologist. Calling himself a cognitive linguist would not be ... very convenient, and just about any linguist would know why.
Anyway, that’s a topic for another discussion.

My point was that the view that rudimentary grammar arises from parable provides an appropriate environment of evolutionary adaptiveness of the sort that any good natural selection story must have.

Admittedly, Turner’s account does not preclude the view that humans living now have some genetic specialization for grammar, but as already stated, this must be decided independently on empirical grounds.

Be that as it may, Turner’s story reverses the conventional Chomskyan view that out of syntactic phrase structures one builds up language. With story, projection, and their combination in parable, we have a cognitive basis from which language can originate. Language follows from these mental capacities as a consequence.

Converging evidence leads us to believe that claims about natural selection “provide no reason to hypothesise that neural networks embody special, language-specific algorithm-like forms”, according to Cowley.

Linguistic Relativity

This strand of the argument is dominated by both straw-man and ad hominem arguments in a way that the other arguments are not.

When I first learnt about Pinker’s version of this theory, even as a freshman, I felt very uncomfortable with what I heard as it was quite evident that there was some propaganda at play. When I ventured to ask Dr Norine Berenz, the lecturer, if Whorf really believed in determinism the way Pinker portrayed it, she said that she has not checked Whorf’s original works, but that I should, and then conceded that Pinker is certainly guilty of rhetoric bordering on propaganda. I made a resolution then to cross-check Pinker, and having done so, it clear that there is once again blatant misrepresentation.

Regarding Pinker’s ad hominem argumentation, Sapir is conceded to be a brilliant linguist, whilst Whorf is (merely) "an inspector for the Hartford Fire Insurance Companyand an amateur scholar of Native American languages" (my italics). One plausible reason why Pinker would think these facts relevant is to make us believe that Whorf’s academic reputation is questionable, and therefore we should not take him seriously. He then goes on to say that "the more you examine Whorf's arguments, the less sense they make", followed by an analysis of Whorf's "empty gasoline drums", an example which invents facts and includes information which Whorf did NOT himself state, such as the worker tossing a cigarette into an “empty” drum filled with gasoline [sic] because his language led him to believe he could do so without any harm. This is an inaccurate representation of what Whorf said because he only talked about behavior in general around the full and empty drums. Whorf merely pointed out, based on his experience as a fire insurance investigator, that calling drums “empty” led workers to believe that they were, which is why they would see nothing wrong with tossing a cigarette butt into them, and Whorf merely uses this fact as a starting point to ask the question: do behavioural patterns reflect linguistic patterns; when put this way, it certainly does not seem like an outlandishly vacuous endeavour. In a discussion of Hopi time, we find the most virulent attack of all: "No one is really sure how Whorf came up with his outlandish claims, but his limited, badly analyzed sample of Hopi speech and his long-time leanings toward mysticism must have contributed" (my italics). Once again, these “outlandish” claims are inaccurate, making Pinker’s claim outlandish, and mentioning Whorf’s alleged association with mysticism is irrelevant since we should be focusing on whether his actual arguments make sense empirically and logically, using an accepted scientific paradigm, like that of Popper’s. Whorf does point out that our own notions of flowing time and static space are “mystical” to the Hopi, in whose language time disappears and space is altered, so that it is no longer the homogeneous and instantaneous timeless space of our supposed intuition or of classical Newtonian mechanics. Whorf’s actual field work was not only with Hopi, but he also researched Mayan writing systems, and did semantic analyses on language data collected by other linguists. His examples are often from Hopi and Shawnee, and he observed that Hopi seems to have little reference to time, and then merely posed the question: could it be that different, yet equally valid, representations of time are possible? In fact, there is nothing outlandish about a claim of this sort, and it is something worth thinking about, since quantum physicists have been telling us for almost a century that our particular cultural notion of time is a linguistic construct, and anyone who reads popular science books, like Stephen Hawking’s A BRIEF HISTORY OF TIME would know this. Of course this fact does not make Whorf infallible, since everything is subject to empirical verification, but certainly dismissing Whorf’s thesis as outlandish prima facie should not be deemed acceptable in academia.
Pinker refers to “The linguistic determinism hypothesis closely linked to the names Edward Sapir and Benjamin Whorf"; and the use of passive voice is very telling here, since we cannot help but wonder who the deleted agent is. Either it is Pinker himself, or the camp of academics representing nativism, who essentially created, developed and promulgated this strawman “Linguistic Determinism” argument in the first place. Whorf did formulate a principle of linguistic relativity, but neither he nor Sapir ever formulated the Sapir-Whorf Hypothesis they are so indelibly linked with in Pinker’s book. Who is Pinker fighting here, when even Whorf and Whorfians agree that linguistic determinism is wrong? Actually, it is not implausible to assume that the distinction between Relativity and Determinism was something put forth by nativists like Pinker, since no distinction of that sort is to be found in the original writings of either Sapir or Whorf. Pinker does not understand that Whorf generally argued from a systems perspective inherited from quantum physics, where his arguments make sense, not from the Newtonian perspective, where monocausal determinist arguments make sense. In fact, Whorf would have been writing at a time when the implications of Einstein’s theories of relativity were just beginning to astound the academic world. Finally, let's examine Pinker's colour words and Hopi time arguments. In showing the obvious absurdity of language having anything whatsoever to do with influencing perception, he contrasts the way physicists and physiologists look at colour: while to the former colour is a continuous wavelength dimension without our familiar delineations, to the latter it is a matter of three kinds of cones in the eyes wired to neurons, etc. "No matter how influential language might be, it would seem preposterous to a physiologist that it could reach down into the retina and rewire the ganglion cells." Once again, this is a strawman argument: who actually said that language reaches down into the retina and rewires the ganglion cells? I e-mailed Pinker about the obvious absurdity of this claim, inviting him to respond with either some references or justification, but he never did. Granted, maybe he was too busy, or maybe he just did not want to reply, but it is worth noting that he replied quite readily to my earlier more sycophantic and adulatory e-mails! The very claim is so preposterous that one wonders why argumentation along these lines is still acceptable.


