(完整版)AntConc的详细使用说明

合集下载

antconc用法

antconc用法

antconc用法一、AntConc基本用法AntConc是一款超棒的语料库分析工具呢!它的界面简洁明了。

你打开AntConc之后,首先要做的就是加载语料库。

这就好比你要打开一个装满各种宝贝(文本数据)的大箱子。

可以加载纯文本文件,格式得是.txt的哦。

比如说,你要是研究某本小说的用词特点,就把这本小说转成.txt格式然后加载进去。

在查询功能方面,它很强大。

你可以直接输入一个单词或者短语进行搜索。

就像你在一个大仓库里找特定的货物一样。

例如,你输入“love”这个单词,它就能快速地在整个语料库中把包含“love”的所有句子都找出来。

而且它还能显示这个单词在语料库中的频率分布呢,这就像是知道这个“货物”在仓库各个角落的数量一样。

它还有一个很实用的功能是生成词表。

你点击一下相应的按钮,它就像一个超级管家一样,把语料库中的所有单词按照一定的顺序(比如频率高低)给你列出来。

这有助于你快速了解这个语料库中的主要词汇构成。

二、AntConc中的固定搭配分析用法AntConc对于分析固定搭配那是相当拿手的。

你可以使用它的“N - grams”功能。

比如说,你想研究英语中的双词搭配(bigrams),就像“hot dog”这种常见的固定搭配。

你设置好参数为2(表示两个单词一组),然后它就会在语料库中搜索所有这样的双词组合。

这就如同在一群小伙伴中找那些总是手拉手一起出现的组合一样有趣。

还有哦,对于多词固定搭配也没问题。

假设你想找像“in the end”这种三个单词的固定搭配,你把参数设为3,它就能把语料库中所有这种三词组合给你揪出来。

这就像是在一个复杂的拼图中找到特定的几块总是连在一起的小块。

三、双语例句(英语 - 汉语)1. “I used AntConc to analyze the fr equency of the word'happiness' in this English novel. It was like having a magic key to unlock the secrets of the author's use of this important concept.(我使用AntConc来分析这个英语小说里‘happiness(幸福)’这个词的频率。

ant-cron 用法

ant-cron 用法

ant-cron 用法ant-cron是一款功能强大的定时任务工具,它可以帮助用户自动化执行各种任务,如备份数据、发送邮件、更新系统等。

在使用ant-cron之前,需要了解其基本用法和配置文件,以确保任务能够准确无误地执行。

一、基本用法1.安装ant-cron首先,需要确保系统上已经安装了ant-cron。

可以通过以下命令来安装:```shellpipinstallant-cron```2.创建任务文件创建一个文本文件,用于存放定时任务,可以使用任何文本编辑器打开并编写任务内容。

在文件开头添加一行注释,指定任务名称、执行时间等信息。

3.配置定时任务使用ant-cron命令来创建或编辑定时任务。

可以通过以下命令来创建新任务或编辑现有任务:```rubyant-cron-e<编辑器路径>-f<配置文件路径>-t<任务文件路径>```其中,`<编辑器路径>`是指定编辑器类型和路径的参数,`<配置文件路径>`是指定定时任务配置文件的路径,`<任务文件路径>`是指定要编辑或创建的任务文件的路径。

执行该命令后,会打开指定的编辑器,用户可以在其中编辑任务内容。

4.执行任务编辑完任务后,保存并关闭文件。

使用以下命令来执行任务:```rubyant-cron-c<配置文件路径>-t<任务文件路径>```其中,`<配置文件路径>`是指定定时任务配置文件的路径,`<任务文件路径>`是指定要执行的任务文件的路径。

执行该命令后,系统将按照配置文件中的设置,自动执行指定的任务。

二、配置文件详解定时任务配置文件通常是一个INI格式的文件,包含了一些配置选项和变量。

以下是一些常见的配置选项和变量:1.**TaskName**:指定任务的名称。

这是唯一标识一个任务的标识符。

2.**Schedule**:定义任务的执行时间,通常是一个时间表达式或一个周期性任务的重复频率。

AntConc的详细使用说明之欧阳文创编

AntConc的详细使用说明之欧阳文创编

AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。

检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。

例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。

时间:2021.03.12 创作:欧阳文1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。

其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。

例:如要研究 a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term ”旁的,进入“Advanced Search”界面,如图1.2所示。

点击“Use context words and horizons”,然后在“Context Words”一栏键入of ,点击。

如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。

另外,图1.2 Advanced Search界面还需设定语境词距离检索项的位置,如本研究中,of在a的右二位置,所以“Content Horizon ”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。

结果可提取alot of, a bit of 等词块。

设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

用AntConc处理中文

用AntConc处理中文

用AntConc处理中文concordance, wordlist, N-gram不知道laohong用的什么宝贝!我的方法是这样的:我刚才是把Token Definition里面的letter token classes 下面的全部选中,再把Chinese Encoding 里面的第一项选中就行了,下面的我想就不用我来说了。

另外,我发现按照我的下午选项,其实没有进行分词的中文语料也是可以进行全文检索和显示的。

对不起各位,早上贴完帖子就搬家去了,累到现在才回家打开电脑。

这里是大家关心的我是如何用AntConc处理中文的:1、文本格式:大家有没有注意到上面贴的我试验AntConc的文本中既有中文简体、繁体也有英文?为了能在同一个文本中显示好中文简体、繁体和英文,我把所有文本都转存成UTF-8了。

也就是说,我用AntConc 处理的语料文本是存成UTF-8格式的,不是GB或Big5。

另外,中文文本是经过分词处理的。

请搜索本站找相关的自动分词和词性标注工具:SegT ag、ICTCLAS、NEUCSP、Hylanda、WinAT等。

2、设置AntConc:在Global Settings 下的Language Encodings,我没有选Chinese Encodings下的选项,而是选择了Unicode Encodings 下的Unicode(UTF-8)。

其它设置可以用默认的。

3、功能:这样设置后AntConc的功能就全部可以处理中文文本了,也就是说大家这样就可以用AntConc 来处理分词后的中文的Concordance,Wordlist,Cluster,N-Gram等等了。

Wordsmith 终于有了一个免费的竞争对手!请问如何显示卡方检验和互信息的值1.卡方检验是用做key words,需要一个参照语料库的wordlist和一个要分析语料库的wordlis.2.在Tool preference下选择Collocates preference 然后选择show statistics measure下的MI值或者T值然后再选择show collocate即可.3.2.1w 是最新版本,应该不是版本问题。

免费的AntConc强大好用的本地语料检索工具

免费的AntConc强大好用的本地语料检索工具

免费的AntConc强大好用的本地语料检索工具Laurence Anthony 天资聪慧,年轻轻轻就考入日本著名的早稻田大学物理系。

但等到读完本科的时候他却发现自己真正喜欢的东西并不是相对论,量子力学或者希格斯玻色子,而是莎士比亚十四行诗,词汇词源和语言学研究。

于是毅然转行,攻读起了语言学方向的硕士和博士。

LaurenceAnthony博士论文的方向是语料库研究,所以经常要与各种大规模文本打交道,但在这个过程中碰到了不少瓶颈,那时还是2000年,微软还没推出Windows XP,谷歌刚刚诞生不久,诺基亚的功能机正在开始风靡全球,语料库的研究也还处于刀耕火种的半手工状态中。

用计算机辅助语料库研究还属于比较前沿的一个理念,可以用的软件更是少之又少。

好在我们的Laurence Anthony 是理工科出身,写得了一手好代码。

学英语懂技术,就像流氓会武术。

经过不懈的努力,他终于在2002年成功开发出了一款用于语料库统计的软件:AntConc ,并借助它顺利完成了博士阶段的研究。

后来这款软件逐渐流行起来,并在后面的十几年时间里不断升级完善,被众多英语研究者参考使用。

Laurence Anthony 还为此建立起了专门的网站出处:使用实例: 1.(原文标题:建立你自己的专属英语语料库,妈妈再也不担心你的写作啦【2023-01-15】,文内介绍了一点AntConc的用法)一言以蔽之:自己准备好语料,然后用AntConc语料,把自己的本地语料当作电子词典、工具书、例句库用。

不想付费用在线语料库,不嫌自建语料库麻烦的,可以试试这个方法。

2.(原文标题:)详细使用说明:。

AntConc软件使用说明书

AntConc软件使用说明书

AntConc (Windows, Macintosh OS X, and Linux)Build 3.4.1Laurence Anthony, Ph.D.Center for English Language Education in Science and Engineering, School of Science andEngineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, JapanJanuary 31, 2014IntroductionAntConc is a freeware, multiplatform tool for carrying out corpus linguistics research and data-driven learning. It runs on any computer running Microsoft Windows (tested on Win 98/Me/2000/NT, XP, Vista, Win 7), Macintosh OS X (tested on 10.4.x, 10.5.x, 10.6.x), and Linux (tested on Ubuntu 10, Linux Mint). It is developed in Perl using ActiveState's PerlApp compiler to generate executables for the different operating systems. Getting Started (No installation necessary)WindowsOn Windows systems, simply double click the AntConc icon and this will launch the program.Macintosh OS XOn Macintosh systems, simply double clicAk the AntConc icon and this will launch the program.LinuxOn Linux systems, change the permissions to allow AntConc to be run as an executable file. Next, double click the AntConc executable and it will launch.Caution: Do not place user settings files from older versions of AntConc in the same folder as new versions. This can cause unexpected problems and may even prevent AntConc from starting. It is recommended that you delete your earlier settings file and export it again from the AntConc file menu.Overview of ToolsAntConc contains seven tools that can be accessed either by clicking on their 'tabs' in the tool window, or using the function keys F1 to F7.Concordance Tool: This tool shows search results in a 'KWIC' (KeyWord In Context) format. This allows you to see how words and phrases are commonly used in a corpus of texts.Concordance Plot Tool This tool shows search results plotted as a 'barcode' format. This allows you to see the position where search results appear in target texts.File View Tool This tool shows the text of individual files. This allows you to investigate in more detail the results generated in other tools of AntConc .Clusters/N-Grams The Clusters Tool shows clusters based on the search condition. In effect it summarizes the results generated in the Concordance Tool or Concordance Plot Tool. The N-Grams Tool, on the other hand, scans the entire corpus for 'N' (e.g. 1 word, 2 words, …) length clusters. This allows you to find common expressions in a corpus.Collocates: This tool shows the collocates of a search term. This allows you to investigate non-sequential patterns in language.Word List: This tool counts all the words in the corpus and presents them in an ordered list. This allows you to quickly find which words are the most frequent in a corpus. Keyword List: This tool shows the which words are unusually frequent (or infrequent) in the corpus in comparison with the words in a reference corpus. This allows you to identify characteristic words in the corpus, for example, as part of a genre or ESP study.Concordance ToolThis tool shows search results in a 'KWIC' (KeyWord In Context)format. This allows you to see how words and phrases arecommonly used in a corpus of texts.The following steps produce a set of concordance lines from acorpus and demonstrate the main features of this tool.1) Select one or more files for processing from using the 'OpenFile(s)...' or 'Open Dir...' options in the 'File' menu. The list ofselected files is shown in the left frame of the main window.2) Enter a search term on which to build concordance lines in the search box.3) Choose the number of text characters to be outputted on either side of the search term, using theincrease and decrease buttons on the right of the button bar under the "Search Window Size" title.(default value is 50 characters)4) Click on the 'Start' button to start the concordance lines results generation. The concordance generationcan be halted at any time by clicking on the 'Stop' button.5) Use the Kwic Sort options to rearrange the concordance lines at three different levels. 0 is the searchword, 1L, 2L... are words to the left of the target word, 1R, 2R... are words to the right of the target word.6) Click on the 'Sort' button to start the sorting process.7) Move the cursor over the highlighted search term in one of the concordance lines. The cursor will changeto a small hand icon. Clicking on the highlighted search term, will allow you to view the search term hit as it appears in the original file via the File View Tool (see below).8) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.The total number of concordance lines generated (Concordance Hits) is shown at the top of the tool window. This number will flash with the word "FINISHED" when processing has been completed, and will flash with the word "NO HITS", if not hits are generated for a particular search term.Search terms can be specified as being "words" (default) or "character strings" by activating or deactivating the "Word" search term option. Also, searches can be either “case insensitive”(default) or “case sensitive”by activating or deactivating the "Case" search term option. Searches can also be made using full regular expressions by activating the "Regex" option. For details on how to use regular expressions, consult one of the many texts on the subject, e.g., Mastering Regular Expressions (O'Reilly & Associates Press) or type "regular expressions" in a web search engine to find many sites on the subject (e.g., /quickstart.html). AntConc supports Perl regular expressions.By clicking on the "Advanced Search" button, more complex searches become possible. The first advanced search option allows you to import a set of search terms, either by typing them one per line, or by loading in a list of search terms from a file. Here, each line will be treated as a separate search term. This feature allows you to use a large set of search terms without having to re-type them each time. The second advanced search option allows you to define context words and a context window within which the search term(s) must appear. For example, to search for "student" where it appears at least three words to the left or right of the word "university," set the search term as "student," the context word as "university," and set the context window as 'From' 3L 'To' 3R.A number of menu preferences are available with this tool. (See below).Concordance Plot ToolThis tool shows concordance search results plotted in a 'barcode'format, with the length of the text normalized to the width of the barand each hit shown as a vertical line within the bar. This allows you tosee the position where search results appear in target texts. The toolalso allows you to see which files include the target search term, andcan also be used to identify where the search term hits clustertogether. An example of the use of the Plot Tool is in determiningwhere specific content words appear in a technical paper, or wherean actor or story character appears during the course of a play ornovel.The number of hits and length of each text is shown to the right of the barcode plot, and the plot itself can be enlarged or reduced in size using the “Plot Zoom” buttons.If you move the cursor over the highlighted search term in one of the concordance lines, the cursor will change to a small hand icon. Clicking on the highlighted search term will allow you to view the search term hit as it appears in the original file via the File View Tool (see below).Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,” or "Regex" based. Advanced searches are also available. For details see the Concordance Tool explanation.File View ToolThis tool shows the raw text of individual files. This allows you toinvestigate in more detail the results generated in other tools ofAntConc.The following steps produce a view of the original file anddemonstrate the main features of this tool.1) Select a file to view in the “Corpus Files” list on the left of themain window.2) If a search term has been specified, the search term hits will behighlighted throughout the text. Search options are the sameas for the Concordance Tool and Concordance Plot Tool.3) Use the "Hit Location" buttons to jump to the appropriate hit in the file.4) Change the search term and click on the 'Start' button to view other hits in the file.5) Click on the highlighted text to generate a set of KWIC lines using the highlighted text as the search term.6) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,” or "Regex" based. Advanced searches are also available. For details see the Concordance Tool explanation.The following shortcut is unique to the File View Tool:CTRL-Click = Jumps to the nearest hit in the windowClusters/N-Grams ToolThe Clusters ToolThis allows you to search for a word or pattern and group (cluster) theresults together with the words immediately to the left or right of thesearch term. In effect it summarizes the results generated in theConcordance Tool or Concordance Plot Tool.The clusters can be ordered by frequency, the start or end of the word,the range of the cluster (number of files in which the cluster appears),or the probability of the first word in the cluster preceding theremaining words. All list orderings can also be inverted by activatingthe “Invert Order” option. Also, you can select the minimum and maximum length (number of words) in each cluster, and the minimum frequency of clusters displayed. It is also possible to select if the search term always appears on the left (default) or right of the cluster.Note: In the current version, if more than one word is specified as the search term, only the first word will appear on the right if the "Search Term on Right" option is selected.)The following steps produce a set of cluster results and demonstrate the main features of this tool.1) Choose the appropriate ordering options (see above for details).2) Press the 'Start' button. At any time, the generation of the clusters list can be halted using the 'Stop'button.3) Click on the cluster to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.The N-Grams ToolThis allows you to scan the entire corpus for 'N' word clusters (e.g. 1word, 2 words,… ). This allows you to find common expressions in acorpus. For example, n-grams of size 2 for the sentence "this is a pen"are 'this is', 'is a' and 'a pen'.All ordering options available in the Clusters Tool are also available inthe N-grams tool. You can also select the minimum and maximum size(number of words) in each n-gram, and the minimum frequency andrange of n-grams displayed.The following steps produce a set of N-gram results and demonstratethe main features of this tool.1) Click on the "N-Grams" option above the search entry box.2) Choose the appropriate ordering options.3) Press the 'Start' button. At any time, the generation of the n-grams list can be halted using the 'Stop'button.4) Click on the n-gram to generate a set of KWIC lines using the text as the search term.5) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.In both the Clusters Tool and N-Grams Tool, search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,”or "Regex" based. Advanced searches are also available for the Clusters Tool. For details see the Concordance Tool explanation. A number of menu preferences are available with this tool. (See below).Collocates ToolThis tool allows you to search for collocates of a search term. Thisallows you to investigate non-sequential patterns in language.The collocates can be ordered either by total frequency, frequency onthe left or right of the search term, or the start or end of the word.They can also be ordered by the value of a statistical measurebetween the search term and the collocate. The value measures how'related' the search term and the collocate are. Current possiblestatistical measures are listed below. All list orderings can also beinverted. Also, you can select the span of words to the left and rightof the search term in which to find collocates, and the minimum frequency of collocates displayed. If only a one-word span is required, for example, to see which words appear directly on the right of the search term, check the "Same" box, to keep the minimum and maximum span size the same.Statistical Measures:(MI) Mutual Information: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)(T-Score) T-Score: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)The following steps produce a set of collocate results and demonstrate the main features of this tool.1) Choose the appropriate ordering options.2) Press the 'Start' button. At any time, the generation of the collocates list can be halted using the 'Stop'button.3) Click on one of the collocates to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,”or "Regex" based. Advanced searches are also available for the Collocates Tool. For details see the Concordance Tool explanation. A number of menu preferences are available with this tool. (See below).Word List ToolThis tool counts all the words in the corpus and presents them in anordered list. This allows you to quickly find which words are the mostfrequent in a corpus.The words can be ordered either by frequency or the start or end ofthe word, and the ordering can be inverted. The word list can also begenerated in case-insensitive mode, where words in upper and lowercase are treated the same (default) or case-sensitive, where words inupper and lower case are treated separately.The following steps produce a word list and demonstrate the main features of this tool.1) Choose the appropriate ordering options.2) Press the 'Start' button. At any time, the generation of the word list can be halted using the 'Stop' button.3) Click on the word to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.A number of menu preferences are available with this tool. (See below).Keyword ListThis tool shows the which words are unusually frequent (orinfrequent) in the corpus in comparison with the words in a referencecorpus. This allows you to identify characteristic words in the corpus,for example, as part of a genre or ESP study.The following steps produce a keyword list and demonstrate the mainfeatures of this tool.1) Select a set of target files.2) Go to the 'Preferences' menu and chose the 'KeywordPreferences' option.3) Choose the keyword generation method (a statistical measure) to calculate the 'keyness' of the targetfile words. The default setting of Log Likelihood is recommended. When using either Log Likelihood or Chi-squared as the statistical measure, the following significance values apply (see: /llwizard.html):95th percentile; 5% level; p < 0.05; critical value = 3.8499th percentile; 1% level; p < 0.01; critical value = 6.6399.9th percentile; 0.1% level; p < 0.001; critical value = 10.8399.99th percentile; 0.01% level; p < 0.0001; critical value = 15.134) Choose a threshold for the number of keywords to be displayed.5) Choose whether or not to view 'Negative Keywords' (target file words with an unusually low frequencycompared with the frequency in the reference corpus)6) Choose one of the reference corpus options. Select "Use raw file(s)" when you will use raw text (.txt) filesto serve as the reference corpus. Select "Use word list(s)" when you will use one of more word lists that are generated from a reference corpus. The "Use word list(s)" option allows you to generate keywords even when the original reference corpus is not available. The format for a word list is:RANK FREQUENCY WORD (separated by any type of white space, including spaces and tabs).1 12838 the2 11289 a3 8583 of...Note that blank lines and lines beginning with # will be ignored. Also, AntConc will check that the file(s) are correctly formatted and report any errors.7) Load the reference corpus of text (.txt) files, in the same way that the target files are chosen.8) The reference corpus directory will be shown (if appropriate), and the list of reference corpus files willappear at the bottom of the Keyword Preferences option menu.9) Click ‘Apply’ in the Keyword Preferences menu and return to the main Keywords window.10) Choose suitable options for displaying the list of generated Keywords (in a similar manner to the optionsfor generating a Word List).11) Press the 'Start' button. At any time, the generation of the keyword list can be halted using the 'Stop'button.12) Click on the keyword to generate a set of KWIC lines using the text as the search term.13) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.A number of menu preferences are available with this tool. (See below).MENU OPTIONSMenu options are divided into three groups, "File", "Global Settings" and "Tool Preferences". The options available in each group will be described below.<FILE>Options here relate to reading files into AntConc and writing files to the hard disk containing data of various types. There are also options to export all current settings to a file and import user settings from a file. If a user settings file becomes corrupted for any reason, simply restart the program or use the "Restore Default Settings" option to return the program to its original state.<GLOBAL SETTINGS>Categories here will have an effect on multiple tools in AntConc:<Character Encoding> AntConc is fully Unicode compliant, meaning that it can handle data in any language, including all European languages and Asian languages. The character encoding of the data to be read by AntConc should be specified here. For example, if you are working with data saved in a Western language, it will usually be encoded in iso-8859-1. On the other hand, Japanese texts are usually encoded in shiftjis.By specifying the correct encoding, data from all languages can be processed correctly within AntConc. The default is Unicode UTF-8, which is an international standard designed to display all characters of the languages of the world in a single encoding. I recommend you use this encoding if you create any corpus.<Colors> In the Color Settings category, you can edit the colors used to display results and other information.<Files> In the File Settings category, you can choose to display the full path of a file or just the name.<Fonts> In the Font Settings category, you can edit the font types, sizes, and styles used to display file names, results, and the search term.<Tags> In the Tag Settings category, you can choose to display or hide any tags that are contained in the corpus files. You can also choose to search with tags but hide them in the results display. Embedded tags(e.g. book_NN1), non-embedded tags (e.g. <noun>book</noun>, and header tags can be shown or hiddenby activating or deactivating the option.<Token (Word) Definition> In the Token (Word) Definition category, you can choose which characters, numbers and so on will define a "word". For example, in some cases only letters will be considered words, but at other times, in might be desirable to include numbers, dashes and so on. AntConc is fully Unicode compliant, meaning that it can handle data in any language, including all European languages and Asian languages. For this reason, the default option refers to 'letters' in the broadest sense. Letters, for example, include all English letters (a to z, A to Z) but also all Japanese 'letter' characters. It is also possible to define your own "token" definition, or append characters to the standard classes.For more information on the Unicode standards see:http://www.cs.tut.fi/~jkorpela/unicode/guide.html//Public/5.0.0/ucd/UCD.html/Public/UNIDATA/PropList.txt/charts/<Wildcard Settings> In the Wildcard Settings category, you can edit the default wildcard characters so that they do not clash with a search entry. For example, the "or" wildcard default character (a 'pipe' character | ) can be changed to a backslash (/) here. There are special wildcards to deal with whitespace issues.<TOOL PREFERENCES>Each tool (with the exception of Concordance Plot Tool and File View Tool) has a preferences category, where settings can be fine-tuned. All tool preference categories allow you to show or hide the different frames in which the results are displayed. For example, you can choose to hide the frame showing file names in the Concordance Tool display window.<Concordance> In addition to the above, the following settings can be made:●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.●Using the “Sort by characters instead of words” opti on, it is possible to arrange the results byCHARACTERS to the left or right of the first letter of the search term. This makes it possible to search for spelling differences.●Using the “Hide search term in KWIC display” option, search term can be hidden in the KWIC lines,allowing instructors to quiz students on possible words to fit the gap.●Using the “Put delimiter around hits in KWIC display” option(the default), the chosen delimitercharacter is added around the hit in the KWIC display. This makes it easier to see the hit and also eases later processing of the data in a spreadsheet software program.●Use the “Delimiter” option to select the delimiter character.●Use the “Line break replacement” option to select a character to replace line breaks with.<Clusters/N-Grams> In addition to the above, the following settings can be made:●Using the “Treat all data as lowercase” option (default) causes all words to be transformed to lower-case words. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.<Collocates Preferences> In addition to the above, the following settings can be made:●Use the "Selected Collocate Measure" to choose the statistical measure for measuring collocatestrength. Currently, two statistical measures can be used: Mutual Information (MI) and T-Score. See the tool explanation above for references to the statistics.●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.<Word List Preferences> In addition to the above the following settings can be made:●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.●Use the lemma list options to select a lemma list. A 'lemma list' can be loaded from a file, which canthen be used to generate a lemma list instead of a word list. When the lemma list function is used, the 'lemma word form(s)' column will show the words in the corpus associated with each lemma.A lemma list can be created by specifying the 'lemma entry' follow by '->' followed by one or more'words' that should be assigned to the lemma separated by one of more non-tokens. See the example below:be->is, areplay->play, plays, playing, playedNote that in the example above, commas and spaces are assumed to be NOT defined as tokens. Forthis reason, if the lemma list available on the AntConc webpage is used, a 'dash' needs to be addedto the token (word) definition for the lemma list to be processed correctly as the hyphenated wordsare used to the right of the lemma definition.●Using the "Word List Range" option, a wordlist can be generated using all words (“Use all words”),or a specific set of words (“Use specific words below”), or ignoring a certain set of words (“Use a stoplist below”). The range of words to be used (or ignored) can be entered directly, or can be stored in files which are then read by AntConc by pressing the 'Open' button. A combination of words in a file and words directly entered can also be used.<Keyword List Preferences> In addition to the above the following settings can be made:●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to a ppear before lower-case words.●Use the "Keyness Generation Method" to choose the statistical measure for measuring keywordstrength. Currently, two statistical measures can be used: Chi-Squared and Log-Likelihood (the default). The default option for the 'keyness' measure is recommended.●Use the "Threshold Value" option to choose a cut-off for the keyness values generated. The defaultoption (“All values”) for the threshold value is recommended.●Use the "Show Negative Keywords" option to view words that are unusually INFREQUENT in thetarget corpus compared with the reference corpus.●Use the "Use raw file(s)" option to use raw reference corpus file(s) as the reference corpus.●Use the "Use word list(s)" option to use word list(s) that correspond to a reference corpus. The wordlist(s) should be formatted as described in the tool explanation.●Click the "Add Directory" or "Add Files" buttons to select the reference corpus files.●Click the "Swap with Target Files" button to swap the main and reference corpora. Note that thiswill only make sense when raw corpus files are being used.<HELP>The help menu provides access to this detailed readme file. It also shows some general information about AntConc, including the current version number and date of release.SHORTCUTSHere is a list of Shortcuts that apply to all tools using window panes for results.CTRL-C = Copies the currently selected textCTRL-A = Selects all text in the window paneALT-A = Selects all text in all window panes showingDouble click = Selects the current wordTriple click = Selects the current line in the window paneSHIFT-click = Selects continuous lines across all window panes showingCTRL-click = Selects discontinuous lines across all window panes showingDELETE = This deletes any selected lines that span across all window panesINSERT = This keeps any selected lines that span across all window panes, and deletes all othersFor any 'spinbox' widgets (e.g. the search term entry box) the 'UP' and 'DOWN' arrow keys on the keyboard can be used to activate the up and down buttons.。

AntConc软件使用说明书

AntConc软件使用说明书

AntConc (Windows, Macintosh OS X, and Linux)Build 3.4.1Laurence Anthony, Ph.D.Center for English Language Education in Science and Engineering, School of Science andEngineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, JapanJanuary 31, 2014IntroductionAntConc is a freeware, multiplatform tool for carrying out corpus linguistics research and data-driven learning. It runs on any computer running Microsoft Windows (tested on Win 98/Me/2000/NT, XP, Vista, Win 7), Macintosh OS X (tested on 10.4.x, 10.5.x, 10.6.x), and Linux (tested on Ubuntu 10, Linux Mint). It is developed in Perl using ActiveState's PerlApp compiler to generate executables for the different operating systems. Getting Started (No installation necessary)WindowsOn Windows systems, simply double click the AntConc icon and this will launch the program.Macintosh OS XOn Macintosh systems, simply double clicAk the AntConc icon and this will launch the program.LinuxOn Linux systems, change the permissions to allow AntConc to be run as an executable file. Next, double click the AntConc executable and it will launch.Caution: Do not place user settings files from older versions of AntConc in the same folder as new versions. This can cause unexpected problems and may even prevent AntConc from starting. It is recommended that you delete your earlier settings file and export it again from the AntConc file menu.Overview of ToolsAntConc contains seven tools that can be accessed either by clicking on their 'tabs' in the tool window, or using the function keys F1 to F7.Concordance Tool: This tool shows search results in a 'KWIC' (KeyWord In Context) format. This allows you to see how words and phrases are commonly used in a corpus of texts.Concordance Plot Tool This tool shows search results plotted as a 'barcode' format. This allows you to see the position where search results appear in target texts.File View Tool This tool shows the text of individual files. This allows you to investigate in more detail the results generated in other tools of AntConc .Clusters/N-Grams The Clusters Tool shows clusters based on the search condition. In effect it summarizes the results generated in the Concordance Tool or Concordance Plot Tool. The N-Grams Tool, on the other hand, scans the entire corpus for 'N' (e.g. 1 word, 2 words, …) length clusters. This allows you to find common expressions in a corpus.Collocates: This tool shows the collocates of a search term. This allows you to investigate non-sequential patterns in language.Word List: This tool counts all the words in the corpus and presents them in an ordered list. This allows you to quickly find which words are the most frequent in a corpus. Keyword List: This tool shows the which words are unusually frequent (or infrequent) in the corpus in comparison with the words in a reference corpus. This allows you to identify characteristic words in the corpus, for example, as part of a genre or ESP study.Concordance ToolThis tool shows search results in a 'KWIC' (KeyWord In Context)format. This allows you to see how words and phrases arecommonly used in a corpus of texts.The following steps produce a set of concordance lines from acorpus and demonstrate the main features of this tool.1) Select one or more files for processing from using the 'OpenFile(s)...' or 'Open Dir...' options in the 'File' menu. The list ofselected files is shown in the left frame of the main window.2) Enter a search term on which to build concordance lines in the search box.3) Choose the number of text characters to be outputted on either side of the search term, using theincrease and decrease buttons on the right of the button bar under the "Search Window Size" title.(default value is 50 characters)4) Click on the 'Start' button to start the concordance lines results generation. The concordance generationcan be halted at any time by clicking on the 'Stop' button.5) Use the Kwic Sort options to rearrange the concordance lines at three different levels. 0 is the searchword, 1L, 2L... are words to the left of the target word, 1R, 2R... are words to the right of the target word.6) Click on the 'Sort' button to start the sorting process.7) Move the cursor over the highlighted search term in one of the concordance lines. The cursor will changeto a small hand icon. Clicking on the highlighted search term, will allow you to view the search term hit as it appears in the original file via the File View Tool (see below).8) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.The total number of concordance lines generated (Concordance Hits) is shown at the top of the tool window. This number will flash with the word "FINISHED" when processing has been completed, and will flash with the word "NO HITS", if not hits are generated for a particular search term.Search terms can be specified as being "words" (default) or "character strings" by activating or deactivating the "Word" search term option. Also, searches can be either “case insensitive”(default) or “case sensitive”by activating or deactivating the "Case" search term option. Searches can also be made using full regular expressions by activating the "Regex" option. For details on how to use regular expressions, consult one of the many texts on the subject, e.g., Mastering Regular Expressions (O'Reilly & Associates Press) or type "regular expressions" in a web search engine to find many sites on the subject (e.g., /quickstart.html). AntConc supports Perl regular expressions.By clicking on the "Advanced Search" button, more complex searches become possible. The first advanced search option allows you to import a set of search terms, either by typing them one per line, or by loading in a list of search terms from a file. Here, each line will be treated as a separate search term. This feature allows you to use a large set of search terms without having to re-type them each time. The second advanced search option allows you to define context words and a context window within which the search term(s) must appear. For example, to search for "student" where it appears at least three words to the left or right of the word "university," set the search term as "student," the context word as "university," and set the context window as 'From' 3L 'To' 3R.A number of menu preferences are available with this tool. (See below).Concordance Plot ToolThis tool shows concordance search results plotted in a 'barcode'format, with the length of the text normalized to the width of the barand each hit shown as a vertical line within the bar. This allows you tosee the position where search results appear in target texts. The toolalso allows you to see which files include the target search term, andcan also be used to identify where the search term hits clustertogether. An example of the use of the Plot Tool is in determiningwhere specific content words appear in a technical paper, or wherean actor or story character appears during the course of a play ornovel.The number of hits and length of each text is shown to the right of the barcode plot, and the plot itself can be enlarged or reduced in size using the “Plot Zoom” buttons.If you move the cursor over the highlighted search term in one of the concordance lines, the cursor will change to a small hand icon. Clicking on the highlighted search term will allow you to view the search term hit as it appears in the original file via the File View Tool (see below).Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,” or "Regex" based. Advanced searches are also available. For details see the Concordance Tool explanation.File View ToolThis tool shows the raw text of individual files. This allows you toinvestigate in more detail the results generated in other tools ofAntConc.The following steps produce a view of the original file anddemonstrate the main features of this tool.1) Select a file to view in the “Corpus Files” list on the left of themain window.2) If a search term has been specified, the search term hits will behighlighted throughout the text. Search options are the sameas for the Concordance Tool and Concordance Plot Tool.3) Use the "Hit Location" buttons to jump to the appropriate hit in the file.4) Change the search term and click on the 'Start' button to view other hits in the file.5) Click on the highlighted text to generate a set of KWIC lines using the highlighted text as the search term.6) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,” or "Regex" based. Advanced searches are also available. For details see the Concordance Tool explanation.The following shortcut is unique to the File View Tool:CTRL-Click = Jumps to the nearest hit in the windowClusters/N-Grams ToolThe Clusters ToolThis allows you to search for a word or pattern and group (cluster) theresults together with the words immediately to the left or right of thesearch term. In effect it summarizes the results generated in theConcordance Tool or Concordance Plot Tool.The clusters can be ordered by frequency, the start or end of the word,the range of the cluster (number of files in which the cluster appears),or the probability of the first word in the cluster preceding theremaining words. All list orderings can also be inverted by activatingthe “Invert Order” option. Also, you can select the minimum and maximum length (number of words) in each cluster, and the minimum frequency of clusters displayed. It is also possible to select if the search term always appears on the left (default) or right of the cluster.Note: In the current version, if more than one word is specified as the search term, only the first word will appear on the right if the "Search Term on Right" option is selected.)The following steps produce a set of cluster results and demonstrate the main features of this tool.1) Choose the appropriate ordering options (see above for details).2) Press the 'Start' button. At any time, the generation of the clusters list can be halted using the 'Stop'button.3) Click on the cluster to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.The N-Grams ToolThis allows you to scan the entire corpus for 'N' word clusters (e.g. 1word, 2 words,… ). This allows you to find common expressions in acorpus. For example, n-grams of size 2 for the sentence "this is a pen"are 'this is', 'is a' and 'a pen'.All ordering options available in the Clusters Tool are also available inthe N-grams tool. You can also select the minimum and maximum size(number of words) in each n-gram, and the minimum frequency andrange of n-grams displayed.The following steps produce a set of N-gram results and demonstratethe main features of this tool.1) Click on the "N-Grams" option above the search entry box.2) Choose the appropriate ordering options.3) Press the 'Start' button. At any time, the generation of the n-grams list can be halted using the 'Stop'button.4) Click on the n-gram to generate a set of KWIC lines using the text as the search term.5) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.In both the Clusters Tool and N-Grams Tool, search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,”or "Regex" based. Advanced searches are also available for the Clusters Tool. For details see the Concordance Tool explanation. A number of menu preferences are available with this tool. (See below).Collocates ToolThis tool allows you to search for collocates of a search term. Thisallows you to investigate non-sequential patterns in language.The collocates can be ordered either by total frequency, frequency onthe left or right of the search term, or the start or end of the word.They can also be ordered by the value of a statistical measurebetween the search term and the collocate. The value measures how'related' the search term and the collocate are. Current possiblestatistical measures are listed below. All list orderings can also beinverted. Also, you can select the span of words to the left and rightof the search term in which to find collocates, and the minimum frequency of collocates displayed. If only a one-word span is required, for example, to see which words appear directly on the right of the search term, check the "Same" box, to keep the minimum and maximum span size the same.Statistical Measures:(MI) Mutual Information: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)(T-Score) T-Score: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)The following steps produce a set of collocate results and demonstrate the main features of this tool.1) Choose the appropriate ordering options.2) Press the 'Start' button. At any time, the generation of the collocates list can be halted using the 'Stop'button.3) Click on one of the collocates to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,”or "Regex" based. Advanced searches are also available for the Collocates Tool. For details see the Concordance Tool explanation. A number of menu preferences are available with this tool. (See below).Word List ToolThis tool counts all the words in the corpus and presents them in anordered list. This allows you to quickly find which words are the mostfrequent in a corpus.The words can be ordered either by frequency or the start or end ofthe word, and the ordering can be inverted. The word list can also begenerated in case-insensitive mode, where words in upper and lowercase are treated the same (default) or case-sensitive, where words inupper and lower case are treated separately.The following steps produce a word list and demonstrate the main features of this tool.1) Choose the appropriate ordering options.2) Press the 'Start' button. At any time, the generation of the word list can be halted using the 'Stop' button.3) Click on the word to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.A number of menu preferences are available with this tool. (See below).Keyword ListThis tool shows the which words are unusually frequent (orinfrequent) in the corpus in comparison with the words in a referencecorpus. This allows you to identify characteristic words in the corpus,for example, as part of a genre or ESP study.The following steps produce a keyword list and demonstrate the mainfeatures of this tool.1) Select a set of target files.2) Go to the 'Preferences' menu and chose the 'KeywordPreferences' option.3) Choose the keyword generation method (a statistical measure) to calculate the 'keyness' of the targetfile words. The default setting of Log Likelihood is recommended. When using either Log Likelihood or Chi-squared as the statistical measure, the following significance values apply (see: /llwizard.html):95th percentile; 5% level; p < 0.05; critical value = 3.8499th percentile; 1% level; p < 0.01; critical value = 6.6399.9th percentile; 0.1% level; p < 0.001; critical value = 10.8399.99th percentile; 0.01% level; p < 0.0001; critical value = 15.134) Choose a threshold for the number of keywords to be displayed.5) Choose whether or not to view 'Negative Keywords' (target file words with an unusually low frequencycompared with the frequency in the reference corpus)6) Choose one of the reference corpus options. Select "Use raw file(s)" when you will use raw text (.txt) filesto serve as the reference corpus. Select "Use word list(s)" when you will use one of more word lists that are generated from a reference corpus. The "Use word list(s)" option allows you to generate keywords even when the original reference corpus is not available. The format for a word list is:RANK FREQUENCY WORD (separated by any type of white space, including spaces and tabs).1 12838 the2 11289 a3 8583 of...Note that blank lines and lines beginning with # will be ignored. Also, AntConc will check that the file(s) are correctly formatted and report any errors.7) Load the reference corpus of text (.txt) files, in the same way that the target files are chosen.8) The reference corpus directory will be shown (if appropriate), and the list of reference corpus files willappear at the bottom of the Keyword Preferences option menu.9) Click ‘Apply’ in the Keyword Preferences menu and return to the main Keywords window.10) Choose suitable options for displaying the list of generated Keywords (in a similar manner to the optionsfor generating a Word List).11) Press the 'Start' button. At any time, the generation of the keyword list can be halted using the 'Stop'button.12) Click on the keyword to generate a set of KWIC lines using the text as the search term.13) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.A number of menu preferences are available with this tool. (See below).MENU OPTIONSMenu options are divided into three groups, "File", "Global Settings" and "Tool Preferences". The options available in each group will be described below.<FILE>Options here relate to reading files into AntConc and writing files to the hard disk containing data of various types. There are also options to export all current settings to a file and import user settings from a file. If a user settings file becomes corrupted for any reason, simply restart the program or use the "Restore Default Settings" option to return the program to its original state.<GLOBAL SETTINGS>Categories here will have an effect on multiple tools in AntConc:<Character Encoding> AntConc is fully Unicode compliant, meaning that it can handle data in any language, including all European languages and Asian languages. The character encoding of the data to be read by AntConc should be specified here. For example, if you are working with data saved in a Western language, it will usually be encoded in iso-8859-1. On the other hand, Japanese texts are usually encoded in shiftjis.By specifying the correct encoding, data from all languages can be processed correctly within AntConc. The default is Unicode UTF-8, which is an international standard designed to display all characters of the languages of the world in a single encoding. I recommend you use this encoding if you create any corpus.<Colors> In the Color Settings category, you can edit the colors used to display results and other information.<Files> In the File Settings category, you can choose to display the full path of a file or just the name.<Fonts> In the Font Settings category, you can edit the font types, sizes, and styles used to display file names, results, and the search term.<Tags> In the Tag Settings category, you can choose to display or hide any tags that are contained in the corpus files. You can also choose to search with tags but hide them in the results display. Embedded tags(e.g. book_NN1), non-embedded tags (e.g. <noun>book</noun>, and header tags can be shown or hiddenby activating or deactivating the option.<Token (Word) Definition> In the Token (Word) Definition category, you can choose which characters, numbers and so on will define a "word". For example, in some cases only letters will be considered words, but at other times, in might be desirable to include numbers, dashes and so on. AntConc is fully Unicode compliant, meaning that it can handle data in any language, including all European languages and Asian languages. For this reason, the default option refers to 'letters' in the broadest sense. Letters, for example, include all English letters (a to z, A to Z) but also all Japanese 'letter' characters. It is also possible to define your own "token" definition, or append characters to the standard classes.For more information on the Unicode standards see:http://www.cs.tut.fi/~jkorpela/unicode/guide.html//Public/5.0.0/ucd/UCD.html/Public/UNIDATA/PropList.txt/charts/<Wildcard Settings> In the Wildcard Settings category, you can edit the default wildcard characters so that they do not clash with a search entry. For example, the "or" wildcard default character (a 'pipe' character | ) can be changed to a backslash (/) here. There are special wildcards to deal with whitespace issues.<TOOL PREFERENCES>Each tool (with the exception of Concordance Plot Tool and File View Tool) has a preferences category, where settings can be fine-tuned. All tool preference categories allow you to show or hide the different frames in which the results are displayed. For example, you can choose to hide the frame showing file names in the Concordance Tool display window.<Concordance> In addition to the above, the following settings can be made:●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.●Using the “Sort by characters instead of words” opti on, it is possible to arrange the results byCHARACTERS to the left or right of the first letter of the search term. This makes it possible to search for spelling differences.●Using the “Hide search term in KWIC display” option, search term can be hidden in the KWIC lines,allowing instructors to quiz students on possible words to fit the gap.●Using the “Put delimiter around hits in KWIC display” option(the default), the chosen delimitercharacter is added around the hit in the KWIC display. This makes it easier to see the hit and also eases later processing of the data in a spreadsheet software program.●Use the “Delimiter” option to select the delimiter character.●Use the “Line break replacement” option to select a character to replace line breaks with.<Clusters/N-Grams> In addition to the above, the following settings can be made:●Using the “Treat all data as lowercase” option (default) causes all words to be transformed to lower-case words. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.<Collocates Preferences> In addition to the above, the following settings can be made:●Use the "Selected Collocate Measure" to choose the statistical measure for measuring collocatestrength. Currently, two statistical measures can be used: Mutual Information (MI) and T-Score. See the tool explanation above for references to the statistics.●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.<Word List Preferences> In addition to the above the following settings can be made:●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.●Use the lemma list options to select a lemma list. A 'lemma list' can be loaded from a file, which canthen be used to generate a lemma list instead of a word list. When the lemma list function is used, the 'lemma word form(s)' column will show the words in the corpus associated with each lemma.A lemma list can be created by specifying the 'lemma entry' follow by '->' followed by one or more'words' that should be assigned to the lemma separated by one of more non-tokens. See the example below:be->is, areplay->play, plays, playing, playedNote that in the example above, commas and spaces are assumed to be NOT defined as tokens. Forthis reason, if the lemma list available on the AntConc webpage is used, a 'dash' needs to be addedto the token (word) definition for the lemma list to be processed correctly as the hyphenated wordsare used to the right of the lemma definition.●Using the "Word List Range" option, a wordlist can be generated using all words (“Use all words”),or a specific set of words (“Use specific words below”), or ignoring a certain set of words (“Use a stoplist below”). The range of words to be used (or ignored) can be entered directly, or can be stored in files which are then read by AntConc by pressing the 'Open' button. A combination of words in a file and words directly entered can also be used.<Keyword List Preferences> In addition to the above the following settings can be made:●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to a ppear before lower-case words.●Use the "Keyness Generation Method" to choose the statistical measure for measuring keywordstrength. Currently, two statistical measures can be used: Chi-Squared and Log-Likelihood (the default). The default option for the 'keyness' measure is recommended.●Use the "Threshold Value" option to choose a cut-off for the keyness values generated. The defaultoption (“All values”) for the threshold value is recommended.●Use the "Show Negative Keywords" option to view words that are unusually INFREQUENT in thetarget corpus compared with the reference corpus.●Use the "Use raw file(s)" option to use raw reference corpus file(s) as the reference corpus.●Use the "Use word list(s)" option to use word list(s) that correspond to a reference corpus. The wordlist(s) should be formatted as described in the tool explanation.●Click the "Add Directory" or "Add Files" buttons to select the reference corpus files.●Click the "Swap with Target Files" button to swap the main and reference corpora. Note that thiswill only make sense when raw corpus files are being used.<HELP>The help menu provides access to this detailed readme file. It also shows some general information about AntConc, including the current version number and date of release.SHORTCUTSHere is a list of Shortcuts that apply to all tools using window panes for results.CTRL-C = Copies the currently selected textCTRL-A = Selects all text in the window paneALT-A = Selects all text in all window panes showingDouble click = Selects the current wordTriple click = Selects the current line in the window paneSHIFT-click = Selects continuous lines across all window panes showingCTRL-click = Selects discontinuous lines across all window panes showingDELETE = This deletes any selected lines that span across all window panesINSERT = This keeps any selected lines that span across all window panes, and deletes all othersFor any 'spinbox' widgets (e.g. the search term entry box) the 'UP' and 'DOWN' arrow keys on the keyboard can be used to activate the up and down buttons.。

antconc单词数

antconc单词数

antconc单词数
AntConc是一个用于文本分析的工具,它可以帮助用户分析文本中的词汇使用情况、词频等信息。

要使用AntConc来统计文本中的单词数,首先需要将文本导入到AntConc中,然后选择"File"菜单中的"Open"选项来打开要分析的文本文件。

接下来,在AntConc 的界面上,可以点击"Word List"选项卡来查看文本中的单词列表,并且在底部的状态栏中可以看到文本的单词数统计信息。

另外,也可以在"Tool"菜单中选择"Word List"来查看文本中的单词列表和统计信息。

总之,使用AntConc可以方便地获取文本的单词数统计信息,帮助用户对文本进行更深入的分析。

antconc使用

antconc使用

英语词频 日本人开发,支持中文。 北大计算所98年1月份人民日报分词语料为例 计算词频,生成词频表; 计算n元组的出现频率; 保存结果
处理中文之间要做个语言设置,否则显示乱码
Antconc包括以下工具: 索引 Concordance 索引定位 Concordance Plot 文件查看 File View 词丛 Clusters N元模式(部分词丛) N-grams 搭配 Collocates 词单 Wordlist 关键词单 Keyword List
将指针移到其中一行索引行突出的检索词上, 指针变成手形工具,点击检索词,可以看到 检索词在原文出现的情况。 注意:索引行的总数在“concordance hits” 下显示,处理结束时,会“FINISHED”;如果 没有产生索引行,则“NO HITS”,并且索引 行的窗口不会更新。
检索词可以通过“search term”上面的word选项 设定为“词(默认)”或“词的片段”,也可以 通过case来选择不区分大小写,也可选择 “Regex”使用完整的正则表达式。 /quickstart.html
按Advanced键,可以进行更为复杂的搜索。
两个高级搜索项: 定义一组检索词,可以一行一个的输入,也 可以直接载入文件中的检索词单,这个特征 允许用户使用一大组检索词,但不用每次重 复输入;
定义上下文词(context words)和一个上下 文的范围,在这个范围中必须出现检索词;
选择显示的关键词数极限值; 选择是否显示负关键词(show negative keywords),即与参照语料库相比目标语料 库中不同寻常的低频词; 选择一个文本文件的参照语料库; 参照语料库的文件列表将在参照语料库选项 下的窗口中显示出来; 点击Apply,返回主窗口; 选择生成关键词单的排列选项; 点开始键,可随时中止; 点击关键词会产生一组上下文关键词行。

AntConc软件基本操作

AntConc软件基本操作

主讲人:李广伟010302AntConc 软件介绍AntConc 功能介绍AntConc 功能演示AntConc基本操作AntConc是由日本早稻田大学(WasedaUniversity)教授Laurence Anthony开发的一款免费的语料库检索工具,主要用于语料库语言学、翻译学、外语教学等领域。

AntConc软件介绍 功能介绍图1 AntConc打开主界面如上图所示,AntConc包含“concordance”索引工具、“Concordance Plot”索引定位、“File View”文件查看、“Clusters/ N-Gram”词丛/N元模式、“Collocates”搭配、 “Word List”词表、“Keyword List”关键词表等菜单。

◆ 该软件具有提取语境共现、提取搭配词表、提取词频表等功能,以下《黄帝内经·素问》为例进行逐一说明:◆ 运用concordance工具进行提取语境共现,首先,单击File菜单,选择Open Files, 选择要打开的语料(如果想打开整个文件夹,可以选择open directory),然后,在下方Search Term下的输入框里输入“Huangdi”。

功能介绍AntConc功能演示提取语境共现图2“Huangdi”语境共现界面如2所示,“Huangdi”一词被用蓝色进行了凸显,《黄帝内经·素问》英译本里共出现“Huangdi”644次。

◆ 单击“start”,检索结果呈现在KWIC里显示,如下图所示:AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号,如在“Search Term”里输入“do|does|did|doing|done”(如图3),还可以单击“Advanced”,勾选“Use search term(s) from list below”。

在检索下面框手动输入(也可以直接加载一个txt词表用来检索),注意每个单词独立成行,设置完成后单击“Apply,然后回到语境共现界面。

antconc 检索单词词形

antconc 检索单词词形

antconc 检索单词词形
AntConc 是一个用于分析语料库的软件,它可以帮助你检索和处理文本数据。

使用AntConc 检索单词的词形变化,可以按照以下步骤进行:
1. 打开AntConc 软件,并加载你想要分析的语料库。

2. 在主界面上,选择你想要分析的文本文件或文件夹。

3. 在“Wordlist”选项卡下,选择“Conc”按钮,以打开一个词频统计表。

4. 在词频统计表中,你可以看到每个单词的出现次数和频率。

在“Form”列中,你可以看到单词的各种词形变化。

5. 根据需要筛选和排序单词,以查看特定词形的出现次数和频率。

需要注意的是,AntConc 的词形变化检索功能取决于你的语料库中包含的文本数据。

如果你的语料库中包含多个语言的文本,AntConc 可以自动识别不同语言的词形变化。

如果你的语料库中只包含一种语言的文本,AntConc 也可以识别该语言的词形变化。

AntConc-3.2.0-的使用说明

AntConc-3.2.0-的使用说明

AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。

检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。

例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。

例:如要研究a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。

点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。

如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。

另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。

其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。

图1.2 Advanced Search界面1位置,如本研究中,of在a的右二位置,所以“Content Horizon ”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。

结果可提取a lot of, a bit of 等词块。

●设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

(完整版)AntConc的详细使用说明.doc

(完整版)AntConc的详细使用说明.doc

AntConc3.2.0 的使用说明11.提取境共1.1 置索( 1)索a)点 file 下拉菜中的“ open files”,要打开的料(如果想打开整个文件,可以 opendirectory);b)在“ Search Term”一入要索的,如 go;c)在“ Search Window Size”一置每一共行出的数;d)点,开始索。

索果如 1.1 所示:图 1.1 单项检索结果(2)多索置多索除了索个以外, AntConc 具有索多个的功能,索方法在索入“ |”符号。

例:要索go 的各种形式,可在“Search Term”中入 go|went|gone|goes置境索了限制境共的索,可以定一个境在索周一定的境范内出。

例:如要研究 a ⋯ of 一,可通 AntConc 提取所有的,索方法如下:a)在“ Search Term”一入a;b)点“ Search Term”旁的,入“ Advanced Search”界面,如 1.2 所示。

点“ Use context words and horizons”,然后在“ ContextWords”一入 of,点。

如要重新置境,可先点清除原来境,后重复以上操作。

另外,需定境距离索的位置,如本研究中, of 在 a 的右二位置,所以“ Content Horizon”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。

结果可提取 a lot of, a bit of等词块。

设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

例:研究感官动词watch, sound, feel, hear, smella)在TXT文本中键入所有要检索的词项,可多达250 个词。

然后为该文本起名保存。

需注意:键入的词项需以列的形式排列。

如:feelfeelsfeltb)点击Search Term旁的,选择“ Use search term(s) from list below”。

最新AntConc的详细使用说明

最新AntConc的详细使用说明

AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。

检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。

例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。

例:如要研究a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。

点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。

如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。

另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。

其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。

位置,如本研究中,of 在a 的右二位置,所以“Content Horizon”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。

结果可提取a lot of, a bit of 等词块。

●设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。

检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。

例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。

例:如要研究a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。

点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。

如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。

另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。

其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。

位置,如本研究中,of在a的右二位置,所以“Content Horizon ”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。

结果可提取a lot of, a bit of 等词块。

●设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

例:研究感官动词watch, sound, feel, hear, smella)在TXT文本中键入所有要检索的词项,可多达250个词。

然后为该文本起名保存。

需注意:键入的词项需以列的形式排列。

如:feelfeelsfeltb)点击Search Term旁的,选择“Use search term(s) from list below”。

点击,在保存以上新建的文本的盘符路径点击文本名,然后点击;c)回到语境共现的界面后,点击,开始检索。

●使用通配符检索符号意义检索项检索结果* 零个或多个字符book* 提取所有以book打头的词,如book、books、booking、bookshop等*book 提取所有以book结尾的词,如book、notebook等*book* 可以同时提取包括以上两类词+ 零个或一个字符book+ 提取所有以book打头的词,但之后有零个或一个字母,如book、books 任意一个字符?ough 提取所有以字母组合ough结尾的,但之前有一个字母的词,如cough、rough等@ 零个或一个词think@of 提取所有含有的词组,如think of、think highlyof等# 任意一个词look# 提取所有与look的搭配,如look after、look at等●附码检索因研究需要,有些语料经过整理加工并附加上各种符号标记,称为“附码语料库”。

如附有词性标注的LOBTAG和附有错误类型标记的CLEC等。

检索时只需键入某个标记符号便可提取带附有该标码的所有词。

例:提取LOBTAG语料库中所有的名词,只需键入*_NN(NN为名词标码,关于其他词性的详细标记,请参阅何安平,2004,《语料库语言学与英语教学》一书的附录113页。

(4)在指定范围内检索a) 在concordance的检索界面上选择"Regex" (regular expression),键入\[.*\] 为检索项便可提取语料库中所有在起止符号“[”和“]”之间的所有文字内容,其他符号照似类推。

b) 在concordance的检索界面上选择"Regex" (regular expression),键入\[.*write.*\] 为检索项便可提取语料库中所有在起止符号“[”和“]”之内的“write”的语境共现行,其他词项照似类推。

键入的检索项计较大小写,但是可以用通配符*。

1.2分析检索结果(1)观察频数、分布●频数即该检索项出现的次数,可在“Concordance Hits”一栏中获得。

●点击,查看检索项在语料文本中的分布状况。

(2)凸显周边语境词为了具体某个教学等目的,可通过凸显检索项周边的某些词汇。

方法是选择“Kwic Sort”,R1和L1分别代表检索项右方和左方的第一个词,一次可设置三列凸现词,均按字母顺序排列。

检索结果如图1.3所示。

如想使凸显内容的颜色一致,可通过设置下拉菜单中的“Color Settings”改变颜色。

另外,图1.4 Tool Preferences下拉菜单界面若要凸显的部分不是一个词,而是单词中的字母,可选择下拉菜单中的“Concordance”选项中的“Sort by characters instead of words”,如图1.4所示。

图1.3凸显周边语境词检索结果(3)提取搭配词表通过点击主界面中的,可获得检索项的搭配词表,同时可以设置搭配词的位置、出现的最少次数与词表的排列方式。

例:观察look右一的搭配词a)点击主界面中的;b)在“Search Term”一栏键入look;c)设置搭配词的位置,如;d)点击,开始检索,检索结果如图1.5所示。

e)点击“Sort by Freq”可根据不同的需要设定搭配词表的排列方式,如按频数排,按拼写字母排等等。

图1.5 提取搭配词表检索结果(4)提取搭配短语另外,也可以使用这一工具来提取搭配词块,且可设置检索项在词块中的位置。

例:检索以ask开头的搭配词块a)点击主界面中的;b)在“Search Term”一栏键入ask;c)设置检索项的位置,如选择“On theleft”;d)设置搭配词块的长度,如Min.Size:3, Max.Size:3;e)点击,开始检索,检索结图1.6提取搭配短语检索结果果如图1.6所示,所有的ask 被列在词块的左边。

(5)隐藏、分类和删除“隐藏”是指把检索结果中的检索项挖空,可用于教学或测试。

具体操作方法如下:a)在“Search Term”一栏键入要检索的词项,如look;b)点击,选择“Concordance”,再选择“Hide search term in KWIC display”,最后点击;c)点击,开始检索。

检索结果如下:you always do your own homework? Do you ******* for help when you think it necessary? Do you help2. 提取词频表2.1单字和N字语词频表单字词频表是指目标语料库的单词表,且词频表的检索结果是以每个词的形式及其频数排列。

方法如下:a)选择要生成单字词频的目标语料库;b)进入界面,设置词频表排列排序方式,如“Sort by Freq”;也可以设置为按词头的或者词尾的拼写字母顺序排列。

c)点击,开始检索,检索结果如图2.1所示。

图2.1 单字词频表检索结果N字语词频表是指目标语料库的多字语频数表。

例如,检索句子“This is a pen”的2字语词频表结果为:“this is”、“is a”、“a pen”。

N字语词频表的提取方法如下:a)选择要生成单字词频的目标语料库;b)进入界面,后点击;c)设置N字语词频表的长度,如d)选择词表的排序方式,如“Sort by Freq”;e)点击,开始检索,检索结果如图2.2所示。

图2.2 N字语词频表检索结果2.2词项重组---词簇化(lemmatizing)词簇化是将同一词性的某个词的所有曲折变化形式作削尾处理,并归为一个词簇来计算频数。

其好处是可以简约词频表并且引起对构词法的关注。

对词频表进行词簇化的方法如下:在界面生成词频表之后,拉下Tool Preference菜单,选择Lemma list options, 点击open 和load,上传lemma1文档(可在本网站下载)点击Apply (如图2. 3所示)。

词簇化的部分结果见图2.4.图2. 3 设置词簇化界面图2.4 词频表被词簇化后的结果(部分)。

图中1142例a和133例an被归为同一个词簇a共1275例。

3. 提取关键词表关键词表是指两个语料库的词频表相比,其中一个明显地高频于另一个的那部分词项表。

前一个称目标语料库;后一个称参照语语料库,通常规模要大一些,以此来凸现目标语料库的一些特别高频词以浮现该语料库的主题或内容特色。

3.1凸显目标语料库中显著性高频于对照语料库的词项具体操作方法如下:a)点击file下拉菜单中的“open files”,选择要对比的目标语料的语料(如果对比整个文件夹,可以选择open directory);b)点击主界面中的;c)点击,选择“Keyword List”,如图3.1所示;d)选择“Show negative keywords”,可在检索结果中显示对照语料明显高于目标语料的词;e)点击,选择对照语料,最后点击;图3.1 Tool Preferences对话框f)点击,开始检索,检索结果如图3.2所示。

图3.2 提取关键词表检索结果。

相关文档
最新文档