Using Corpus Analysis Software to Analyse
Specialised Texts
1. What is a corpus?
In
corpus linguistics, a corpus can be generally defined as… ‘a collection of
naturally-occurring texts in a computer-readable format which can be retrieved
and analyzed using corpus analysis software’
(Kennedy, 1998; McEnery &
Wilson, 2001; O’Keeffe, A., McCarthy, M., & Carter, R. , 2007; Teubert
& Cermakova, 2007)
2.Sources of language corpora
http://www.natcorp.ox.ac.uk/ ·
http://corpus.leeds.ac.uk/protected/query.html
http://lextutor.ca/conc/eng/
Antconc’
(http://www.antlab.sci.waseda.ac.jp/software.html)
(http://www.lexically.net/wordsmith/)
‘Paraconc’ (http://www.athel.com/para.html)
3. Designing a specialized corpus
Corpus size
There are no fixed ruled; depending on research purposes, availability
of data and time.
Large, general corpora may be less useful than small, focused corpora if
searches are made on context-specific terms.
There are limitations of ‘too small’ corpora e.g. not enough concepts,
terms, or patterns under investigation.
It is preferable to create a ‘monitor’ or ‘open’ corpus because
specialized words/usage are dynamic.
Text extracts vs. full texts
Depends on the aim of corpus compilation.
Whole text offers more coverage because words or terms to be looked at
may be randomly distributed throughout the text.
Specific sections may be helpful if we are looking for words or phrase
under particular content areas or want to create purposeful sub-corpora.
Number of texts
Choices can be made between collect few texts of large size or a number
of texts with smaller sizes.
Choices can also be made between selecting texts written by one or two
key writers or sources, or texts retrieved from different sources or written by
different authors.
Depends on your research focus e.g. to study overall language use or to
study idiosyncrasy or linguistic choices preferred by particular writers.
Medium
Can be spoken or written texts or mixed.
Depends on research questions.
Some practical factors should also be considered e.g. compiling spoken
corpora can be time-consuming and needs special types of tagging.
Subject and text type
Should mainly focus on the specialized text under investigation,
although this is less clear-cut in multidisciplinary subjects.
Texts may come from different subject if the research focus is on the
study of particular language features rather than term extraction.
Text types within a specialized subject field may vary from
‘expert-to-expert’ texts to ‘expert-to-non-expert’ texts, or in other words,
from technical to popular texts.
Other considerations
Authorship: Texts written by experts in a field tend to present more
reliable and authentic examples of specialized language.
·
Language: Specialized texts can be stored and retrieved in the form of
monolingual, comparable, or parallel corpora.
Publication date: Texts should come from recent publications unless
queries are made in relation to particular periods of time.
4. Sources of specialized
texts
·Printed materials
· Word document
· CD-ROMs
· Texts on the Web
· Online databases
5. Getting started with Antconc
Download the latest version of Antconc watch YouTube tutorials from
http://www.antlab.sci.waseda.ac.jp/antconc_index.html
1.Run the program.
2. Open Files (browse and select
targeted files) or Open Dir (to select targeted folders)
3.Choose the function.
4.Clear All Tools and Files
before selecting opening new files.
5. Save Output to Text File to
save output e.g. concordance lines.
ความคิดเห็น
แสดงความคิดเห็น