SlideShare une entreprise Scribd logo
1  sur  12
ANTCONC
Design and Development of
a Freeware Corpus
Analysis
Toolkit for the Technical
Writing Classroom
ABSTRACT
• AntConc is a freeware, multi-platform, and multi-purpose corpus analysis
toolkit, designed by the author for specific use in the classroom.
• It includes a powerful concordancer, word and keyword frequency
generators, tools for cluster and lexical bundle analysis, and a word distribution
plot.
• It also offers the choice of simple wildcard searches or powerful regular
expression searches, and has an extremely easy-to-use, intuitive interface.
BACKGROUND
• AntConc was first released in 2002. At the time, it was a simple KWIC (Key Word
in context) concordancer program designed for use by over 700 students in a
scientific and technical writing course at Osaka University Graduate School of
Engineering.
• It was developed in a Windows environment using the PERL 5.8 programming
language, and the graphical user interface (GUI) was developed using the
PERL/TK 8.0 toolkit. This enabled the program to be easily ported to a
Linux/Unix environment, which was necessary as the course was initially taught
in a Linux based CALL (Computer Assisted Language Learning) laboratory
before being moved to a Windows based CALL laboratory the following year.
CONCORDANCER TOOL
• The central tool used in most corpus analysis software, including
AntConc, is the concordancer.
• As Sun & Wang described that the concordancers have been shown to be
an effective aid in the acquisition of a second or foreign
language, facilitating the learning of vocabulary, collocations, grammar
and writing styles.
• Research has shown that new vocabulary can only be acquired through
meeting words in diverse natural contexts and in varied situations.
• A concordance program can find and display a huge number of examples
in varied contexts and situations quickly and efficiently using a reasonably
large corpus.
TOOLS AND
FEATURES IN
ANTCONC
Multiplatform
– Windows 95 or
later
– Unix / Linux
Extensive set of text
analysis tools
– KWIC Concordance
– Search Term Distribution
Plot
– Original File View
– Word Clusters / Lexical
Bundles
– Word lists
– Keyword list
Powerful Search
Features
– Regular
Expressions (REGEX)
– Extensive
Wildcards
Multiple-Level
Sorting
Freeware
License Small memory
requirement
(~2 MB of disk space)
Easy-to-
use, intuitive
GUI
Unicode Support
HTML/XML Tag
Handling
• The Concordancer Tool of AntConc has a wide range of features that make it an
effective tool not only for learners, but also teachers and researchers.
• The features are:
1. Search terms can be either substrings, words, or
phrases, and can be either case sensitive or
insensitive. Embedded with a wide
range of wildcards that the user can assign to any
particular character or string of characters
2. Search terms can be defined as full regular
expressions (REGEX), offering the user access to
extremely powerful and complex searches
3. Three levels of sorting of KWIC (Key Word in
Context) lines are possible, with user definable
highlight colours at each level.
4. If a user clicks on any search term in the KWIC
results display, the program will automatically
open the View Files tool (described later) and
show the search term hit embedded in the original
file.
Concordance Search Term Plot Tool
• The main purpose of the Concordancer Tool is to show how a search term is used in
a target corpus.
• The Concordance Search Term Plot Tool offers the same functionality as the
Concordancer Tool in terms of search term options but the results are displayed in a
quite different way.
• An effective aid, for example in determining where phrases such as ‘we’ or ‘in this
paper’ are used in research articles, or determining which research articles use a
particular keyword or phrase.
5. The KWIC results display is divided into
columns, in which the hit number, KWIC line, and
file name are shown separately. Each column can be either
displayed or
hidden, and standard selection methods can be
used to save data in the columns or rows to the
clipboard or a text file
VIEW FILES TOOL
• When a user clicks on a search term in the results display of the
Concordancer Tool, the View Files Tool is used in order to display the
search term in the original file.
• The View Files Tool can be used independently to search for any
substring, word, phrase or regular expression in a target file, offering the
user a very powerful text search engine.
• All resulting hits are displayed in a user-definable highlight colour and
buttons and keyboard shortcuts can be used to jump to a specified hit
anywhere in the file.
• All KWIC lines based on the term are automatically shown using the
Concordancer Tool if the users click on one of the highlighted search terms.
Word List / Keyword List Tools
• Word lists are useful as they suggest interesting areas for investigation and
highlight problem area in a corpus.
• Bowker & Person described how word lists can also be used to find families
of related word forms and lemmas in a corpus.
• Hockey states that an ideal word list generation program should be able to
sort words into alphabetical or frequency order.
• Users can specify the reverse of a stop list example a list of only the words
that should be counted and these can be specified either by direct input
from the keyboard or from a separate file.
• The Keywords Tool operates in an almost identical way to the Key Words
Tool in Word Smith Tool calculating the ‘keyness’ of words using and offering
the user the option of displaying or hiding unusually infrequent key words.
WORD CLUSTERS/BUNDLES
TOOL
• In AntConc, multi-word units can be investigated USING THE Word
Clusters Tool since this tool can displays clusters of words centred on a
search term and orders them alphabetically or by frequency.
• The search terms can be specified as a substring, word, phrase or regular
expression as in the Concordancer, Plot and View File Tools and the
number of additional words to the left and right of the search term can also
be specified.
• AntConc includes lexical bundle searches as an option in the Word Clusters
Tool and calculating all the lexical bundles for a particular set of criteria
can take a great deal of time. Therefore, as in all other tools in the
program, the processing can be halted by clicking on the ‘Stop’ button at
any time.
LIMITATIONS OF ANTCONC
• Concordancers can be divided into two main types which is:-
1) those that first build an index which is used for subsequent search
operations
2) those that act directly on the raw text
• The first of these has the advantages that they can operate on large corpora but tend
to be less flexible than the second type.
• AntConc fits into the second category, performing all processing on the raw data
files, and storing results in active memory.
• Most corpus analysis programs offer users the ability to see the collocates of a
search time in a table, where the frequency of the most common words to the left or
right of the search term are indicated.
• One of the weakest areas of AntConc is in the handling of annotated data such as
data encoded in HTML/XML format.
FUTURE DEVELOPMENTS
• The first improvement will be a redesign of the View Files Tool making it
operate with far greater speed and the current tool is able to handle files
with ambiguous line endings but this comes with a heavy loss in speed.
• The next release will also include a tool to view collocates, and the ability
to sort word lists alphabetically from both the beginning and end of
words, which is a feature is a feature recommended by Hockey.
• AntConc will be improved to handle annotated data, in particular XML, in
a much more powerful and intuitive way and it also includes header
definitions that if extracted, can be used as part of search criteria.
• A detailed user manual and accompanying tutorial video are planned for the
software, where the operation of each tool will be explained with concrete
examples and a step-by-step guide.

Contenu connexe

Tendances

Linguistics relativity
Linguistics relativityLinguistics relativity
Linguistics relativityAsty Kim
 
General linguistics
General linguisticsGeneral linguistics
General linguisticszhian asaad
 
Optimality theory.pptx
Optimality theory.pptxOptimality theory.pptx
Optimality theory.pptxamjadnaasir
 
Language deth, language shift, marker, micro/macro sociolinguistics
Language deth, language shift, marker, micro/macro sociolinguisticsLanguage deth, language shift, marker, micro/macro sociolinguistics
Language deth, language shift, marker, micro/macro sociolinguisticsIqramushtaq1142
 
History of translation in english literature
History of translation in english literatureHistory of translation in english literature
History of translation in english literatureAmi Sojitra
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsIrum Malik
 
Foreignization & domestication
Foreignization & domesticationForeignization & domestication
Foreignization & domesticationabdelbaar
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics1101989
 
Code-Mixing and Code Switching
 Code-Mixing and Code Switching Code-Mixing and Code Switching
Code-Mixing and Code SwitchingLucia Pratama
 
Minimalist program
Minimalist programMinimalist program
Minimalist programRabbiaAzam
 
British national corpus
British national corpusBritish national corpus
British national corpusLaura P
 
Corpus and bnc
Corpus and bncCorpus and bnc
Corpus and bncmoona butt
 
Second language acquisition
Second language acquisitionSecond language acquisition
Second language acquisition-
 
code switching
code switchingcode switching
code switchingnina s
 
Structuralism in linguistics
Structuralism in linguisticsStructuralism in linguistics
Structuralism in linguisticshoorshumail3
 

Tendances (20)

Linguistics relativity
Linguistics relativityLinguistics relativity
Linguistics relativity
 
General linguistics
General linguisticsGeneral linguistics
General linguistics
 
Optimality theory.pptx
Optimality theory.pptxOptimality theory.pptx
Optimality theory.pptx
 
Language deth, language shift, marker, micro/macro sociolinguistics
Language deth, language shift, marker, micro/macro sociolinguisticsLanguage deth, language shift, marker, micro/macro sociolinguistics
Language deth, language shift, marker, micro/macro sociolinguistics
 
History of translation in english literature
History of translation in english literatureHistory of translation in english literature
History of translation in english literature
 
Michael halliday
Michael hallidayMichael halliday
Michael halliday
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Functionalism
FunctionalismFunctionalism
Functionalism
 
Foreignization & domestication
Foreignization & domesticationForeignization & domestication
Foreignization & domestication
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Code-Mixing and Code Switching
 Code-Mixing and Code Switching Code-Mixing and Code Switching
Code-Mixing and Code Switching
 
Minimalist program
Minimalist programMinimalist program
Minimalist program
 
Diglossia
DiglossiaDiglossia
Diglossia
 
British national corpus
British national corpusBritish national corpus
British national corpus
 
Ecolinguistics
EcolinguisticsEcolinguistics
Ecolinguistics
 
Applied linguistics
Applied linguisticsApplied linguistics
Applied linguistics
 
Corpus and bnc
Corpus and bncCorpus and bnc
Corpus and bnc
 
Second language acquisition
Second language acquisitionSecond language acquisition
Second language acquisition
 
code switching
code switchingcode switching
code switching
 
Structuralism in linguistics
Structuralism in linguisticsStructuralism in linguistics
Structuralism in linguistics
 

Similaire à Antconc

Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freewaresarahannelazarus
 
Corpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical ToolsCorpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical ToolsJitendra Patil
 
Research Tool - End Note
Research Tool - End NoteResearch Tool - End Note
Research Tool - End Noteador
 
Automatic Term Recognition with Apache Solr
Automatic Term Recognition with Apache SolrAutomatic Term Recognition with Apache Solr
Automatic Term Recognition with Apache SolrJIE GAO
 
compiler construction tool in computer science .
compiler construction tool in computer science .compiler construction tool in computer science .
compiler construction tool in computer science .RanitHalder
 
Functional Requirements for an Interlinear Text Editor
Functional Requirements for an Interlinear Text EditorFunctional Requirements for an Interlinear Text Editor
Functional Requirements for an Interlinear Text EditorBaden Hughes
 
2010 tool forum ata handout
2010 tool forum ata handout2010 tool forum ata handout
2010 tool forum ata handoutascetlan
 
The recommendations system for source code components retrieval
The recommendations system for source code components retrievalThe recommendations system for source code components retrieval
The recommendations system for source code components retrievalAYESHA JAVED
 

Similaire à Antconc (20)

Ant conc notes
Ant conc notesAnt conc notes
Ant conc notes
 
Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freeware
 
Skbp 1023 introduction to antconc
Skbp 1023 introduction to antconcSkbp 1023 introduction to antconc
Skbp 1023 introduction to antconc
 
Antconc
AntconcAntconc
Antconc
 
methods and resources
methods and resourcesmethods and resources
methods and resources
 
Corpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical ToolsCorpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical Tools
 
 
Research Tool - End Note
Research Tool - End NoteResearch Tool - End Note
Research Tool - End Note
 
Automatic Term Recognition with Apache Solr
Automatic Term Recognition with Apache SolrAutomatic Term Recognition with Apache Solr
Automatic Term Recognition with Apache Solr
 
Concordances
Concordances Concordances
Concordances
 
My Developments
My DevelopmentsMy Developments
My Developments
 
compiler construction tool in computer science .
compiler construction tool in computer science .compiler construction tool in computer science .
compiler construction tool in computer science .
 
Functional Requirements for an Interlinear Text Editor
Functional Requirements for an Interlinear Text EditorFunctional Requirements for an Interlinear Text Editor
Functional Requirements for an Interlinear Text Editor
 
2010 tool forum ata handout
2010 tool forum ata handout2010 tool forum ata handout
2010 tool forum ata handout
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
The recommendations system for source code components retrieval
The recommendations system for source code components retrievalThe recommendations system for source code components retrieval
The recommendations system for source code components retrieval
 
TM Town - TAUS Tokyo Forum 2015
TM Town - TAUS Tokyo Forum 2015TM Town - TAUS Tokyo Forum 2015
TM Town - TAUS Tokyo Forum 2015
 
CALICO 2010 Workshop
CALICO 2010  Workshop CALICO 2010  Workshop
CALICO 2010 Workshop
 
Unit1
Unit1Unit1
Unit1
 
Unit1
Unit1Unit1
Unit1
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Dernier (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Antconc

  • 1. ANTCONC Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom
  • 2. ABSTRACT • AntConc is a freeware, multi-platform, and multi-purpose corpus analysis toolkit, designed by the author for specific use in the classroom. • It includes a powerful concordancer, word and keyword frequency generators, tools for cluster and lexical bundle analysis, and a word distribution plot. • It also offers the choice of simple wildcard searches or powerful regular expression searches, and has an extremely easy-to-use, intuitive interface.
  • 3. BACKGROUND • AntConc was first released in 2002. At the time, it was a simple KWIC (Key Word in context) concordancer program designed for use by over 700 students in a scientific and technical writing course at Osaka University Graduate School of Engineering. • It was developed in a Windows environment using the PERL 5.8 programming language, and the graphical user interface (GUI) was developed using the PERL/TK 8.0 toolkit. This enabled the program to be easily ported to a Linux/Unix environment, which was necessary as the course was initially taught in a Linux based CALL (Computer Assisted Language Learning) laboratory before being moved to a Windows based CALL laboratory the following year.
  • 4. CONCORDANCER TOOL • The central tool used in most corpus analysis software, including AntConc, is the concordancer. • As Sun & Wang described that the concordancers have been shown to be an effective aid in the acquisition of a second or foreign language, facilitating the learning of vocabulary, collocations, grammar and writing styles. • Research has shown that new vocabulary can only be acquired through meeting words in diverse natural contexts and in varied situations. • A concordance program can find and display a huge number of examples in varied contexts and situations quickly and efficiently using a reasonably large corpus.
  • 5. TOOLS AND FEATURES IN ANTCONC Multiplatform – Windows 95 or later – Unix / Linux Extensive set of text analysis tools – KWIC Concordance – Search Term Distribution Plot – Original File View – Word Clusters / Lexical Bundles – Word lists – Keyword list Powerful Search Features – Regular Expressions (REGEX) – Extensive Wildcards Multiple-Level Sorting Freeware License Small memory requirement (~2 MB of disk space) Easy-to- use, intuitive GUI Unicode Support HTML/XML Tag Handling
  • 6. • The Concordancer Tool of AntConc has a wide range of features that make it an effective tool not only for learners, but also teachers and researchers. • The features are: 1. Search terms can be either substrings, words, or phrases, and can be either case sensitive or insensitive. Embedded with a wide range of wildcards that the user can assign to any particular character or string of characters 2. Search terms can be defined as full regular expressions (REGEX), offering the user access to extremely powerful and complex searches 3. Three levels of sorting of KWIC (Key Word in Context) lines are possible, with user definable highlight colours at each level. 4. If a user clicks on any search term in the KWIC results display, the program will automatically open the View Files tool (described later) and show the search term hit embedded in the original file.
  • 7. Concordance Search Term Plot Tool • The main purpose of the Concordancer Tool is to show how a search term is used in a target corpus. • The Concordance Search Term Plot Tool offers the same functionality as the Concordancer Tool in terms of search term options but the results are displayed in a quite different way. • An effective aid, for example in determining where phrases such as ‘we’ or ‘in this paper’ are used in research articles, or determining which research articles use a particular keyword or phrase. 5. The KWIC results display is divided into columns, in which the hit number, KWIC line, and file name are shown separately. Each column can be either displayed or hidden, and standard selection methods can be used to save data in the columns or rows to the clipboard or a text file
  • 8. VIEW FILES TOOL • When a user clicks on a search term in the results display of the Concordancer Tool, the View Files Tool is used in order to display the search term in the original file. • The View Files Tool can be used independently to search for any substring, word, phrase or regular expression in a target file, offering the user a very powerful text search engine. • All resulting hits are displayed in a user-definable highlight colour and buttons and keyboard shortcuts can be used to jump to a specified hit anywhere in the file. • All KWIC lines based on the term are automatically shown using the Concordancer Tool if the users click on one of the highlighted search terms.
  • 9. Word List / Keyword List Tools • Word lists are useful as they suggest interesting areas for investigation and highlight problem area in a corpus. • Bowker & Person described how word lists can also be used to find families of related word forms and lemmas in a corpus. • Hockey states that an ideal word list generation program should be able to sort words into alphabetical or frequency order. • Users can specify the reverse of a stop list example a list of only the words that should be counted and these can be specified either by direct input from the keyboard or from a separate file. • The Keywords Tool operates in an almost identical way to the Key Words Tool in Word Smith Tool calculating the ‘keyness’ of words using and offering the user the option of displaying or hiding unusually infrequent key words.
  • 10. WORD CLUSTERS/BUNDLES TOOL • In AntConc, multi-word units can be investigated USING THE Word Clusters Tool since this tool can displays clusters of words centred on a search term and orders them alphabetically or by frequency. • The search terms can be specified as a substring, word, phrase or regular expression as in the Concordancer, Plot and View File Tools and the number of additional words to the left and right of the search term can also be specified. • AntConc includes lexical bundle searches as an option in the Word Clusters Tool and calculating all the lexical bundles for a particular set of criteria can take a great deal of time. Therefore, as in all other tools in the program, the processing can be halted by clicking on the ‘Stop’ button at any time.
  • 11. LIMITATIONS OF ANTCONC • Concordancers can be divided into two main types which is:- 1) those that first build an index which is used for subsequent search operations 2) those that act directly on the raw text • The first of these has the advantages that they can operate on large corpora but tend to be less flexible than the second type. • AntConc fits into the second category, performing all processing on the raw data files, and storing results in active memory. • Most corpus analysis programs offer users the ability to see the collocates of a search time in a table, where the frequency of the most common words to the left or right of the search term are indicated. • One of the weakest areas of AntConc is in the handling of annotated data such as data encoded in HTML/XML format.
  • 12. FUTURE DEVELOPMENTS • The first improvement will be a redesign of the View Files Tool making it operate with far greater speed and the current tool is able to handle files with ambiguous line endings but this comes with a heavy loss in speed. • The next release will also include a tool to view collocates, and the ability to sort word lists alphabetically from both the beginning and end of words, which is a feature is a feature recommended by Hockey. • AntConc will be improved to handle annotated data, in particular XML, in a much more powerful and intuitive way and it also includes header definitions that if extracted, can be used as part of search criteria. • A detailed user manual and accompanying tutorial video are planned for the software, where the operation of each tool will be explained with concrete examples and a step-by-step guide.