Concordancing for English Language Teachers
Paper presented at the annual session of TESL Manitoba
February 15, 1999
Winnipeg, Manitoba, Canada
Garry N. Dyck
English Language Centre
The University of Manitoba
Abstract: Concordancing is a method of analysing language by studying structures found in effective communication. The process allows for the study of large bodies of text called corpora with a computer program. These studies may include word usage patterns and grammatical structures. This methodology is particularly useful for ESP instructors as the corpora can be limited to the target English (e.g., academic English). This paper will explore possible uses of concordancing by English language instructors. Select concordancing programs will be discussed as well as sources of corpora on the Internet.
There are three ways in which English language teachers can respond to questions concerning grammar and word meaning. First, a language teacher can give a prescriptive answer. By this method, the student is given an answer according to an expert grammarian or by looking in a dictionary. Although this method is very fast, it is not very communicative or interactive. Nevertheless, some students prefer a quick answer demonstrated by their persistence in using bilingual dictionaries. Indeed, in situations where the answer to a question could detract from a larger learning objective, a prescriptive answer may be desirable.
In contrast to the prescriptive method, the second and third methods are both descriptive. These methods assume that the teacher is also studying the language and is an expert in English. Based on rational, introspective linguistics, the second approach which I call the introspective approach focuses on the intuition and training of the English language teacher. In this method the language teacher decides if a particular word or expression ?sounds@ or ?looks@ like the language used in natural speech. The teacher then describes language qualitatively using the data provided by a local sentence but based on intuition and training.
Ordinarily, the prescriptive method should provide support for the introspective method and vice versa. A native English teacher intuits that an expression is inappropriate and then consults a grammar book or dictionary to verify the intuition. Conversely, a grammar book or dictionary may prescribe a particular rule or definition and based on intuition and training, the teacher is able to provide examples and teaching activities. In some cases, however, the two methods may not agree. An outdated book may not agree with the more current experience of the teacher. A book from another English dialect region may also not agree with the local experience of the teacher. In such cases, the teacher may be hard pressed to justify a particular intuition.
The third method and the focus of this paper is concordancing which is a part of corpus linguistics. Concordancing is also a descriptive method, but rather than focussing on the data of a single expression as in the introspective approach, the expression is found in a large number of contexts and examined quantitatively. In this way, meaning and grammar can be discovered by an examination of the trends and patterns in the examples. In contrast to the introspective method, there is a lesser focus on the teacher=s intuition and a greater focus on the data.
What is Concordancing?
In 1897, Käding wrote about his work in determining German spelling conventions by establishing a corpus, or large body of text, comprising 11 million words and requiring 5000 technicians to analyse (In McEnery & Wilson, 1996). This type of study, based in corpus linguistics, has become increasingly popular since the 1950=s. Computers have made the creation of a corpus, as well as the analysis of a corpus much quicker and less labour intensive. In a modern concordancing effort, also based in corpus linguistics, a large computer readable corpus and a concordancing program are required. The corpus may include a variety of genres of English or may focus on one type of English. The concordancing program completes a number of tasks on the selected corpus. The concordancer can find a selected word and list sentences or portions of sentences containing that word, called key word in context (KWIC). It can also identify collocations or words most often found together with the key word. This information can provide students with information on patterns in sample sentences of real language.
Before spending money on a concordancer and time collecting a corpus, it may be easier to begin by examining a concordancing program on the Internet. The Cobuild site (http://titania.cobuild.collins.co.uk/) provides access to a corpus of over 330 million running words. Although full access to the corpus requires a paid subscription, it is possible to obtain sample sentences of text for a given word or phrase. In addition, the British National Corpus (BNC), a corpus of over 100 million running words, also allows for sample concordancing. The sample BNC concordancer requiring a web browser of 4.0 or higher (either Netscape or Internet Explorer) is linked to the Cobuild page but has the following direct address: <http://sara.natcorp.ox.ac.uk/lookup.html>. The form allows for single words, phrases, and wild cards.
For example, if I select the word nevertheless, the results begin with an indication of the number of times that this word appears in the BNC. In my search of the BNC, the word nevertheless appears 7,236 times in a corpus of 100 million words; however, only fifty randomly selected sentences are provided. Each sentence is preceded by a three character code hyperlinked to a key. The key provides the bibliographic reference together with the number of sentence units and the number of words found in that document. These sentences may come from a variety of texts including both British and American English, textbooks, newspapers, and novels, as well as transcripts of oral discourse.
Although the word nevertheless may be found in a variety of texts, the choice of word selected for a search may dictate a particular genre of text. For example, the word downsize would tend to be found in business related material. Furthermore, certain types of idioms would result in material that is less than formal although with idioms, literal and idiomatic meanings have to be identified manually. (For example, a search of the idiom get a grip will result in both literal and idiomatic uses.)
The Internet also provides useful information for teaching English based on the results of concordancing. These can lead to classroom activities. From the Cobuild home page, Wordwatch provides corpus based answers to questions of syntax and semantics and Corpus Competition provides a list of sentences with the same word missing in each sentence. In this latter activity, the learner must identify the missing word from context. Inspired by Cobuild=s Wordwatch, Tim Johns of Birmingham University=s School of English has created Kibbitzer on his EAP page at the following address: <http://sun1.bham.ac.uk/johnstf/timeap3.htm>. The Kibbitzer focuses on corpus linguistics but includes introspective linguistics. Johns= article, Five Reporting Verbs in Nature, (found at the same address) is especially significant in EAP teaching. Other classroom activities may be found by linking from the above page to Johns= Virtual DDL Library.
Another type of sample concordancer may be downloaded from web sites and used on selected corpora. These are often demo versions of full concordancing programs which are discussed in the next section.
Although using a sample concordancer may serve some purposes, it is also useful to have a complete concordancer to use on a specific type of text for EAP students. For Macintosh users, Ball (1998) recommends two freeware programs, Conc which is downloadable from the Summer Institute of Linguistics (http://www.sil.org/computing/conc/conc.html) and FreeText Browser, available from the UMich Mac Hypercard archive (http://ftp.sunet.se/pub/mac/umich/hypercard/organization/). Another freeware option is the Mac/Windows Shoebox also from the Summer Institute of Linguistics (http://www.sil.org/computing/shoebox.html). For DOS based machines, Stevens (1995) suggests simple programming and for Microsoft Word, Tribble and Jones (1997) suggest a rather complicated macro. For the computer literate, Biber, Conrad and Reppen (1998) suggest several reasons for writing one=s own concordancing program.
Commercial versions most of which have a downloadable demo version are also available. McEnery and Wilson (1996) and Biber, Conrad, and Reppen (1998) in their books on corpus linguistics list several concordancing. Both of these books include the following commercial programs for Windows or DOS: LEXA, Longman Mini Concordancer, MicroConcord, TACT, Wordcruncher. Johns (1998) recommends the use of Mike Scott=s WordSmith Tools distributed by Oxford University Press. The examples which follow are from this concordancing program.
Creating a Corpus
According to McEnery and Wilson (1996) Aa corpus in modern linguistics, in contrast to being simply any body of text, might more accurately be described as a finite-sized body of machine-readable text, sampled in order to be maximally representative of the language variety under consideration@ (page 24). Tribble and Jones (1997) suggest that this definition be modified depending on the purpose for concordancing. Concordancing in order to determine word frequency in a particular genre would require a relatively large corpus. Such a corpus would be different from one where the purpose of the corpus is to create a vocabulary list for a particular novel or series of novels. How a corpus of text is different from a collection of text was recently discussed on corpora-l with no definitive conclusions. Indeed, the exact nature of an adequate corpus is discussed extensively in the literature (Biber, Conrad & Reppin, 1998; McEnery & Wilson, 1996; Tribble & Jones, 1997; Ketterman, 1998; Ball, 1998; Vilha, 1998; Stevens, 1995; Chandler-Burns, 1995; Rundell, 1996). The common thread in such discussions is that the purpose of the corpus should be foremost in any criteria for creating a corpus. Furthermore, any conclusions based on a specific corpus should always recognize the nature of the text that corpus contains.
Although a growing number of commercial corpora may be purchased, they are often costly and may not meet the specific needs of an ESP situation. For example, sample texts used in an EAP class should be academic in nature (Dyck, 1995). Rademann (1998) has developed a corpus based on online electronic newspapers. He argues that newspapers reflect a common standard of English and that the electronic newspapers provide selected articles in the required electronic format over the Internet. Based on this idea, I was able to develop, in a relatively short period of time, a corpus of over 100,000 words taken from editorials of three Canadian national newspapers (The Globe and Mail, The National Post, and ChristianWeek). These are presently being used to analyze the language used to express an opinion.
Johns (1998) proposes that a corpus be built based on an electronic encyclopedia. Articles in a specific area can be selected and then cut and paste into a text document to form a corpus. To show that the writing in an encyclopedia is not narrow in style, Johns determined that there are four types of writing genres found in an encyclopedia: descriptive essay, process description, physical descriptions, and biographies. Certainly other types of electronic text are available for a variety of ESP needs. ESP teachers should be encouraged to organize corpora specific to their students= learning goals.
Cobb (1997) determined that Aa small but consistent gain was found for words introduced through concordances@ (page 301). This kind of learning, which Johns appropriately refers to as data-driven learning, is most often teacher initiated but may in some instances be student initiated. This section will provide a small sampling of activities all of which are teacher initiated.
As suggested by the Cobuild Web site, run a concordancer on a particular word and then delete the key word. Students should then try to guess the key word from context. This will encourage the students to determine the word from context and not by using the dictionary.
Create a KWIC list for a particular word and have students determine the preposition that precedes or follows that key word (Tribble and Jones, 1997). A concordance search could also produce lists of contexts for various verb tenses which could be compared in their contexts (Ketteman, 1998).
Words with more than one meaning
Create a KWIC list of a word that has more than one meaning and have students determine that more than one meaning exists and what those meanings are. This will work particularly well with words that appear in idiomatic and literal expressions.
Create KWIC lists for two or more words with a similar meaning; for example, create lists for over and above, or for see, look and watch, or for say and talk and have students discuss in small groups the unique characteristics of each term (Tribble and Jones, 1997). Using the same lists, delete the key words and have students determine from context which of the words would best fit the sentence.
Concordancing is an effective tool for the language teacher. The teacher through introspection clarifies the information provided by the data in the concordance. The data alone is an ineffective teacher. A combination of qualitative and quantitative information will assist the student in learning more effectively and perhaps more independently.
The unique characteristics of any concordancing activity will be the corpora. General corpora can be purchased; however, ESP corpora need to be developed and ESP instructors are best able to create corpora for specific purposes.
Ball, C. N. (1998). Concordances and corpora. [An online tutorial] 18 June 1998 <http://www.georgetown.edu/cball/corpora/tutorial1.html>, </tutorial2.html/>, and </tutorial3.html>
Ball, C. N., & Taylor, K. B. (1995). MicroConcord and corpus collections. [Hypertext preprint; published in Computers and the Humanities, 1995] 17 November 1998 <http://www.georgetown.edu/cball/preprints/microconcord.html>
Berglund, Y. (1997). Future in present-day English: Corpus-based evidence on the rivalry of expressions. ICAME Journal 21, 7-19. <http://www.hd.uib.no/icame/ij21/>
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge UP.
Chandler-Burns, R. M. (1995). English corpora for science and technology. In T. Orr (Ed.), English for science and technology: Profiles and perspectives (pp. 10-15). Aizuwakamatsu, Japan: U of Aizu.
Cobb, T. (1997). Is there any measurable learning from hands-on concordancing? System 25 (3), 201-315.
Dagneaux, E., Denness, S., & Granger, S. (1998). Computer-aided error analysis. System 26 (2), 163-174.
Dyck, G. N. (1995). Using journal articles to teach research writing. In T. Orr (Ed.), English for science and technology: Profiles and perspectives (pp. 27-30). Aizuwakamatsu, Japan: U of Aizu.
Flowerdew, L. (1998a). Integrating >expert= and >interlanguage= computer corpora findings on causality: Discoveries for teachers and students. English for Specific Purposes 17 (4), 329-345.
Flowerdew, L. (1998b). Corpus linguistics techniques applied to textlinguistics. System 26 (4), 541-552.
Grabowski, E., & Mindt, D. (1995). A corpus-based learning list of irregular verbs in English. ICAME Journal 19, 5-22. <http://www.hd.uib.no/icame/ij19/>
Johns, T. F. (1998). Improvising corpora for ELT: Quick-and-dirty ways of developing corpora for language teaching. 6 August 1998 <http://sun1.bham.ac.uk/johnstf/palc.htm>
Kettemann, B. (1998). On the use of concordancing in ELT. 10 June 1998 <http://gewi.kfunigraz.ac.at/~ketteman/conco.html>
Kettemann, B. (1998). Concordancing in English language teaching. 3 November 1998 <http://www-gewi.kfunigraz.ac.at/ed/project/concord1.html>, </concord2.html/>, and </concord3.html/>
Lager, T. (1995). A logical approach to computational corpus linguistics. Doctoral dissertation, Göteborg University, Sweden.
McEnery, T., & Wilson, A. (1996). Corpus linguistics. Edinburgh: Edinburgh UP.
Paltridge, B. (1995). Analyzing genre: A relational perspective. System 23 (4), 503-511.
Qiao, H. L., & Sussex, R. (1998). Using the Longman Mini-Concordancer on tagged and parsed corpora, with spcieal reference to their use as an aid to grammar learning. System 24 (1), 41-64.
Rademann, T. (1998). Using online electronic newspapers in modern English-language press corpora: Benefits and pitfalls. ICAME Journal 22, 49-74. <http://www.hd.uib.no/icame/ij22/>
Rundell, M. (1996). The corpus of the future, and the future of the corpus. Paper presented at New Trends in Reference Science, Exeter. 19 October 1998 <http://www.ruf.rice.edu/~barlow/futcrp.html>
Stevens, V. (1995). Concordancing with language learners: Why? When? What? CAELL 6 (2), 2-10.
Swales, J. M. (1990). Genre analysis. Cambridge: Cambridge UP.
Swales, J. M., Feak, C. B. (1994). Academic writing for graduate students. Ann Arbor MI: U Michigan P.
Thurstun, J., & Candlin, C. N. (1998). Concordancing and the teaching of the vocabulary of academic English. English for Specific Purposes 17 (3), 267-280.
Tribble, C., & Jones, G. (1997). Concordances in the classroom: A resource guide for teachers (2nd ed.). Houston TX: Athelstan.
Vihla, M. (1998). Medicor: A corpus of contemporary American medical texts. ICAME Journal 22, 73-80. <http://www.hd.uib.no/icame/ij22/>