AntConc的详细使用说明
antconc用法
antconc用法一、AntConc基本用法AntConc是一款超棒的语料库分析工具呢!它的界面简洁明了。
你打开AntConc之后,首先要做的就是加载语料库。
这就好比你要打开一个装满各种宝贝(文本数据)的大箱子。
可以加载纯文本文件,格式得是.txt的哦。
比如说,你要是研究某本小说的用词特点,就把这本小说转成.txt格式然后加载进去。
在查询功能方面,它很强大。
你可以直接输入一个单词或者短语进行搜索。
就像你在一个大仓库里找特定的货物一样。
例如,你输入“love”这个单词,它就能快速地在整个语料库中把包含“love”的所有句子都找出来。
而且它还能显示这个单词在语料库中的频率分布呢,这就像是知道这个“货物”在仓库各个角落的数量一样。
它还有一个很实用的功能是生成词表。
你点击一下相应的按钮,它就像一个超级管家一样,把语料库中的所有单词按照一定的顺序(比如频率高低)给你列出来。
这有助于你快速了解这个语料库中的主要词汇构成。
二、AntConc中的固定搭配分析用法AntConc对于分析固定搭配那是相当拿手的。
你可以使用它的“N - grams”功能。
比如说,你想研究英语中的双词搭配(bigrams),就像“hot dog”这种常见的固定搭配。
你设置好参数为2(表示两个单词一组),然后它就会在语料库中搜索所有这样的双词组合。
这就如同在一群小伙伴中找那些总是手拉手一起出现的组合一样有趣。
还有哦,对于多词固定搭配也没问题。
假设你想找像“in the end”这种三个单词的固定搭配,你把参数设为3,它就能把语料库中所有这种三词组合给你揪出来。
这就像是在一个复杂的拼图中找到特定的几块总是连在一起的小块。
三、双语例句(英语 - 汉语)1. “I used AntConc to analyze the fr equency of the word'happiness' in this English novel. It was like having a magic key to unlock the secrets of the author's use of this important concept.(我使用AntConc来分析这个英语小说里‘happiness(幸福)’这个词的频率。
AntConc的详细使用说明之欧阳文创编
AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。
检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。
例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。
时间:2021.03.12 创作:欧阳文1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。
其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。
例:如要研究 a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term ”旁的,进入“Advanced Search”界面,如图1.2所示。
点击“Use context words and horizons”,然后在“Context Words”一栏键入of ,点击。
如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。
另外,图1.2 Advanced Search界面还需设定语境词距离检索项的位置,如本研究中,of在a的右二位置,所以“Content Horizon ”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。
结果可提取alot of, a bit of 等词块。
设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。
AntConc..的使用说明
AntConc3.2.0的使用说明I1. 提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“ open files”,选择要打开的语料(如果想打开整个文件夹, 可以选择open directory))b)在“ Search Tern” 一栏键入要检索的词项,如go;c)在“ Search Win dow Sizd' —栏设置每一共现行出现的词数;d)点击f ,开始检索检索结果如图1.1所示:(2)多项检索设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“ |”符号。
例:要检索动词go的各种时态形式,可在“ Search Term中输入go|went|gone|goes设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。
例:如要研究a…of这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“ Search Tern” 一栏键入a;b)点击“ Search Tern” 旁的,进入“ Advaneed Search 界面,如图1.2 所示。
点击“ Use con text words and horizo nS,然后在“Words”一栏键入of,点击“日I 。
如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。
另外,还需设定语境词距离检索项的I此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充其中限定范围的检索方法由香港城市大学 D.Lee博士提供,仅此致谢。
Con textWWa F—i *;" —9 亠・1图1.1单项检索结果«H UH图 1.2 Adva need Search 界面位置,如本研究中,of在a的右二位置,所以“ Content Horizon”确定为Coirtex H GI lzi>nFrom To |2R^c) 回到语境共现的界面后,点击aart H,开始检索。
AntConc3.2.0的使用说明
AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。
检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。
例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。
例:如要研究a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。
点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。
如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。
另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。
其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。
位置,如本研究中,of 在a 的右二位置,所以“Content Horizon”确定为,最后点击;c) 回到语境共现的界面后,点击,开始检索。
结果可提取a lot of, a bit of 等词块。
● 设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。
AntConc软件使用说明书
AntConc (Windows, Macintosh OS X, and Linux)Build 3.4.1Laurence Anthony, Ph.D.Center for English Language Education in Science and Engineering, School of Science andEngineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, JapanJanuary 31, 2014IntroductionAntConc is a freeware, multiplatform tool for carrying out corpus linguistics research and data-driven learning. It runs on any computer running Microsoft Windows (tested on Win 98/Me/2000/NT, XP, Vista, Win 7), Macintosh OS X (tested on 10.4.x, 10.5.x, 10.6.x), and Linux (tested on Ubuntu 10, Linux Mint). It is developed in Perl using ActiveState's PerlApp compiler to generate executables for the different operating systems. Getting Started (No installation necessary)WindowsOn Windows systems, simply double click the AntConc icon and this will launch the program.Macintosh OS XOn Macintosh systems, simply double clicAk the AntConc icon and this will launch the program.LinuxOn Linux systems, change the permissions to allow AntConc to be run as an executable file. Next, double click the AntConc executable and it will launch.Caution: Do not place user settings files from older versions of AntConc in the same folder as new versions. This can cause unexpected problems and may even prevent AntConc from starting. It is recommended that you delete your earlier settings file and export it again from the AntConc file menu.Overview of ToolsAntConc contains seven tools that can be accessed either by clicking on their 'tabs' in the tool window, or using the function keys F1 to F7.Concordance Tool: This tool shows search results in a 'KWIC' (KeyWord In Context) format. This allows you to see how words and phrases are commonly used in a corpus of texts.Concordance Plot Tool This tool shows search results plotted as a 'barcode' format. This allows you to see the position where search results appear in target texts.File View Tool This tool shows the text of individual files. This allows you to investigate in more detail the results generated in other tools of AntConc .Clusters/N-Grams The Clusters Tool shows clusters based on the search condition. In effect it summarizes the results generated in the Concordance Tool or Concordance Plot Tool. The N-Grams Tool, on the other hand, scans the entire corpus for 'N' (e.g. 1 word, 2 words, …) length clusters. This allows you to find common expressions in a corpus.Collocates: This tool shows the collocates of a search term. This allows you to investigate non-sequential patterns in language.Word List: This tool counts all the words in the corpus and presents them in an ordered list. This allows you to quickly find which words are the most frequent in a corpus. Keyword List: This tool shows the which words are unusually frequent (or infrequent) in the corpus in comparison with the words in a reference corpus. This allows you to identify characteristic words in the corpus, for example, as part of a genre or ESP study.Concordance ToolThis tool shows search results in a 'KWIC' (KeyWord In Context)format. This allows you to see how words and phrases arecommonly used in a corpus of texts.The following steps produce a set of concordance lines from acorpus and demonstrate the main features of this tool.1) Select one or more files for processing from using the 'OpenFile(s)...' or 'Open Dir...' options in the 'File' menu. The list ofselected files is shown in the left frame of the main window.2) Enter a search term on which to build concordance lines in the search box.3) Choose the number of text characters to be outputted on either side of the search term, using theincrease and decrease buttons on the right of the button bar under the "Search Window Size" title.(default value is 50 characters)4) Click on the 'Start' button to start the concordance lines results generation. The concordance generationcan be halted at any time by clicking on the 'Stop' button.5) Use the Kwic Sort options to rearrange the concordance lines at three different levels. 0 is the searchword, 1L, 2L... are words to the left of the target word, 1R, 2R... are words to the right of the target word.6) Click on the 'Sort' button to start the sorting process.7) Move the cursor over the highlighted search term in one of the concordance lines. The cursor will changeto a small hand icon. Clicking on the highlighted search term, will allow you to view the search term hit as it appears in the original file via the File View Tool (see below).8) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.The total number of concordance lines generated (Concordance Hits) is shown at the top of the tool window. This number will flash with the word "FINISHED" when processing has been completed, and will flash with the word "NO HITS", if not hits are generated for a particular search term.Search terms can be specified as being "words" (default) or "character strings" by activating or deactivating the "Word" search term option. Also, searches can be either “case insensitive”(default) or “case sensitive”by activating or deactivating the "Case" search term option. Searches can also be made using full regular expressions by activating the "Regex" option. For details on how to use regular expressions, consult one of the many texts on the subject, e.g., Mastering Regular Expressions (O'Reilly & Associates Press) or type "regular expressions" in a web search engine to find many sites on the subject (e.g., /quickstart.html). AntConc supports Perl regular expressions.By clicking on the "Advanced Search" button, more complex searches become possible. The first advanced search option allows you to import a set of search terms, either by typing them one per line, or by loading in a list of search terms from a file. Here, each line will be treated as a separate search term. This feature allows you to use a large set of search terms without having to re-type them each time. The second advanced search option allows you to define context words and a context window within which the search term(s) must appear. For example, to search for "student" where it appears at least three words to the left or right of the word "university," set the search term as "student," the context word as "university," and set the context window as 'From' 3L 'To' 3R.A number of menu preferences are available with this tool. (See below).Concordance Plot ToolThis tool shows concordance search results plotted in a 'barcode'format, with the length of the text normalized to the width of the barand each hit shown as a vertical line within the bar. This allows you tosee the position where search results appear in target texts. The toolalso allows you to see which files include the target search term, andcan also be used to identify where the search term hits clustertogether. An example of the use of the Plot Tool is in determiningwhere specific content words appear in a technical paper, or wherean actor or story character appears during the course of a play ornovel.The number of hits and length of each text is shown to the right of the barcode plot, and the plot itself can be enlarged or reduced in size using the “Plot Zoom” buttons.If you move the cursor over the highlighted search term in one of the concordance lines, the cursor will change to a small hand icon. Clicking on the highlighted search term will allow you to view the search term hit as it appears in the original file via the File View Tool (see below).Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,” or "Regex" based. Advanced searches are also available. For details see the Concordance Tool explanation.File View ToolThis tool shows the raw text of individual files. This allows you toinvestigate in more detail the results generated in other tools ofAntConc.The following steps produce a view of the original file anddemonstrate the main features of this tool.1) Select a file to view in the “Corpus Files” list on the left of themain window.2) If a search term has been specified, the search term hits will behighlighted throughout the text. Search options are the sameas for the Concordance Tool and Concordance Plot Tool.3) Use the "Hit Location" buttons to jump to the appropriate hit in the file.4) Change the search term and click on the 'Start' button to view other hits in the file.5) Click on the highlighted text to generate a set of KWIC lines using the highlighted text as the search term.6) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,” or "Regex" based. Advanced searches are also available. For details see the Concordance Tool explanation.The following shortcut is unique to the File View Tool:CTRL-Click = Jumps to the nearest hit in the windowClusters/N-Grams ToolThe Clusters ToolThis allows you to search for a word or pattern and group (cluster) theresults together with the words immediately to the left or right of thesearch term. In effect it summarizes the results generated in theConcordance Tool or Concordance Plot Tool.The clusters can be ordered by frequency, the start or end of the word,the range of the cluster (number of files in which the cluster appears),or the probability of the first word in the cluster preceding theremaining words. All list orderings can also be inverted by activatingthe “Invert Order” option. Also, you can select the minimum and maximum length (number of words) in each cluster, and the minimum frequency of clusters displayed. It is also possible to select if the search term always appears on the left (default) or right of the cluster.Note: In the current version, if more than one word is specified as the search term, only the first word will appear on the right if the "Search Term on Right" option is selected.)The following steps produce a set of cluster results and demonstrate the main features of this tool.1) Choose the appropriate ordering options (see above for details).2) Press the 'Start' button. At any time, the generation of the clusters list can be halted using the 'Stop'button.3) Click on the cluster to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.The N-Grams ToolThis allows you to scan the entire corpus for 'N' word clusters (e.g. 1word, 2 words,… ). This allows you to find common expressions in acorpus. For example, n-grams of size 2 for the sentence "this is a pen"are 'this is', 'is a' and 'a pen'.All ordering options available in the Clusters Tool are also available inthe N-grams tool. You can also select the minimum and maximum size(number of words) in each n-gram, and the minimum frequency andrange of n-grams displayed.The following steps produce a set of N-gram results and demonstratethe main features of this tool.1) Click on the "N-Grams" option above the search entry box.2) Choose the appropriate ordering options.3) Press the 'Start' button. At any time, the generation of the n-grams list can be halted using the 'Stop'button.4) Click on the n-gram to generate a set of KWIC lines using the text as the search term.5) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.In both the Clusters Tool and N-Grams Tool, search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,”or "Regex" based. Advanced searches are also available for the Clusters Tool. For details see the Concordance Tool explanation. A number of menu preferences are available with this tool. (See below).Collocates ToolThis tool allows you to search for collocates of a search term. Thisallows you to investigate non-sequential patterns in language.The collocates can be ordered either by total frequency, frequency onthe left or right of the search term, or the start or end of the word.They can also be ordered by the value of a statistical measurebetween the search term and the collocate. The value measures how'related' the search term and the collocate are. Current possiblestatistical measures are listed below. All list orderings can also beinverted. Also, you can select the span of words to the left and rightof the search term in which to find collocates, and the minimum frequency of collocates displayed. If only a one-word span is required, for example, to see which words appear directly on the right of the search term, check the "Same" box, to keep the minimum and maximum span size the same.Statistical Measures:(MI) Mutual Information: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)(T-Score) T-Score: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)The following steps produce a set of collocate results and demonstrate the main features of this tool.1) Choose the appropriate ordering options.2) Press the 'Start' button. At any time, the generation of the collocates list can be halted using the 'Stop'button.3) Click on one of the collocates to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,”or "Regex" based. Advanced searches are also available for the Collocates Tool. For details see the Concordance Tool explanation. A number of menu preferences are available with this tool. (See below).Word List ToolThis tool counts all the words in the corpus and presents them in anordered list. This allows you to quickly find which words are the mostfrequent in a corpus.The words can be ordered either by frequency or the start or end ofthe word, and the ordering can be inverted. The word list can also begenerated in case-insensitive mode, where words in upper and lowercase are treated the same (default) or case-sensitive, where words inupper and lower case are treated separately.The following steps produce a word list and demonstrate the main features of this tool.1) Choose the appropriate ordering options.2) Press the 'Start' button. At any time, the generation of the word list can be halted using the 'Stop' button.3) Click on the word to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.A number of menu preferences are available with this tool. (See below).Keyword ListThis tool shows the which words are unusually frequent (orinfrequent) in the corpus in comparison with the words in a referencecorpus. This allows you to identify characteristic words in the corpus,for example, as part of a genre or ESP study.The following steps produce a keyword list and demonstrate the mainfeatures of this tool.1) Select a set of target files.2) Go to the 'Preferences' menu and chose the 'KeywordPreferences' option.3) Choose the keyword generation method (a statistical measure) to calculate the 'keyness' of the targetfile words. The default setting of Log Likelihood is recommended. When using either Log Likelihood or Chi-squared as the statistical measure, the following significance values apply (see: /llwizard.html):95th percentile; 5% level; p < 0.05; critical value = 3.8499th percentile; 1% level; p < 0.01; critical value = 6.6399.9th percentile; 0.1% level; p < 0.001; critical value = 10.8399.99th percentile; 0.01% level; p < 0.0001; critical value = 15.134) Choose a threshold for the number of keywords to be displayed.5) Choose whether or not to view 'Negative Keywords' (target file words with an unusually low frequencycompared with the frequency in the reference corpus)6) Choose one of the reference corpus options. Select "Use raw file(s)" when you will use raw text (.txt) filesto serve as the reference corpus. Select "Use word list(s)" when you will use one of more word lists that are generated from a reference corpus. The "Use word list(s)" option allows you to generate keywords even when the original reference corpus is not available. The format for a word list is:RANK FREQUENCY WORD (separated by any type of white space, including spaces and tabs).1 12838 the2 11289 a3 8583 of...Note that blank lines and lines beginning with # will be ignored. Also, AntConc will check that the file(s) are correctly formatted and report any errors.7) Load the reference corpus of text (.txt) files, in the same way that the target files are chosen.8) The reference corpus directory will be shown (if appropriate), and the list of reference corpus files willappear at the bottom of the Keyword Preferences option menu.9) Click ‘Apply’ in the Keyword Preferences menu and return to the main Keywords window.10) Choose suitable options for displaying the list of generated Keywords (in a similar manner to the optionsfor generating a Word List).11) Press the 'Start' button. At any time, the generation of the keyword list can be halted using the 'Stop'button.12) Click on the keyword to generate a set of KWIC lines using the text as the search term.13) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.A number of menu preferences are available with this tool. (See below).MENU OPTIONSMenu options are divided into three groups, "File", "Global Settings" and "Tool Preferences". The options available in each group will be described below.<FILE>Options here relate to reading files into AntConc and writing files to the hard disk containing data of various types. There are also options to export all current settings to a file and import user settings from a file. If a user settings file becomes corrupted for any reason, simply restart the program or use the "Restore Default Settings" option to return the program to its original state.<GLOBAL SETTINGS>Categories here will have an effect on multiple tools in AntConc:<Character Encoding> AntConc is fully Unicode compliant, meaning that it can handle data in any language, including all European languages and Asian languages. The character encoding of the data to be read by AntConc should be specified here. For example, if you are working with data saved in a Western language, it will usually be encoded in iso-8859-1. On the other hand, Japanese texts are usually encoded in shiftjis.By specifying the correct encoding, data from all languages can be processed correctly within AntConc. The default is Unicode UTF-8, which is an international standard designed to display all characters of the languages of the world in a single encoding. I recommend you use this encoding if you create any corpus.<Colors> In the Color Settings category, you can edit the colors used to display results and other information.<Files> In the File Settings category, you can choose to display the full path of a file or just the name.<Fonts> In the Font Settings category, you can edit the font types, sizes, and styles used to display file names, results, and the search term.<Tags> In the Tag Settings category, you can choose to display or hide any tags that are contained in the corpus files. You can also choose to search with tags but hide them in the results display. Embedded tags(e.g. book_NN1), non-embedded tags (e.g. <noun>book</noun>, and header tags can be shown or hiddenby activating or deactivating the option.<Token (Word) Definition> In the Token (Word) Definition category, you can choose which characters, numbers and so on will define a "word". For example, in some cases only letters will be considered words, but at other times, in might be desirable to include numbers, dashes and so on. AntConc is fully Unicode compliant, meaning that it can handle data in any language, including all European languages and Asian languages. For this reason, the default option refers to 'letters' in the broadest sense. Letters, for example, include all English letters (a to z, A to Z) but also all Japanese 'letter' characters. It is also possible to define your own "token" definition, or append characters to the standard classes.For more information on the Unicode standards see:http://www.cs.tut.fi/~jkorpela/unicode/guide.html//Public/5.0.0/ucd/UCD.html/Public/UNIDATA/PropList.txt/charts/<Wildcard Settings> In the Wildcard Settings category, you can edit the default wildcard characters so that they do not clash with a search entry. For example, the "or" wildcard default character (a 'pipe' character | ) can be changed to a backslash (/) here. There are special wildcards to deal with whitespace issues.<TOOL PREFERENCES>Each tool (with the exception of Concordance Plot Tool and File View Tool) has a preferences category, where settings can be fine-tuned. All tool preference categories allow you to show or hide the different frames in which the results are displayed. For example, you can choose to hide the frame showing file names in the Concordance Tool display window.<Concordance> In addition to the above, the following settings can be made:●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.●Using the “Sort by characters instead of words” opti on, it is possible to arrange the results byCHARACTERS to the left or right of the first letter of the search term. This makes it possible to search for spelling differences.●Using the “Hide search term in KWIC display” option, search term can be hidden in the KWIC lines,allowing instructors to quiz students on possible words to fit the gap.●Using the “Put delimiter around hits in KWIC display” option(the default), the chosen delimitercharacter is added around the hit in the KWIC display. This makes it easier to see the hit and also eases later processing of the data in a spreadsheet software program.●Use the “Delimiter” option to select the delimiter character.●Use the “Line break replacement” option to select a character to replace line breaks with.<Clusters/N-Grams> In addition to the above, the following settings can be made:●Using the “Treat all data as lowercase” option (default) causes all words to be transformed to lower-case words. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.<Collocates Preferences> In addition to the above, the following settings can be made:●Use the "Selected Collocate Measure" to choose the statistical measure for measuring collocatestrength. Currently, two statistical measures can be used: Mutual Information (MI) and T-Score. See the tool explanation above for references to the statistics.●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.<Word List Preferences> In addition to the above the following settings can be made:●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.●Use the lemma list options to select a lemma list. A 'lemma list' can be loaded from a file, which canthen be used to generate a lemma list instead of a word list. When the lemma list function is used, the 'lemma word form(s)' column will show the words in the corpus associated with each lemma.A lemma list can be created by specifying the 'lemma entry' follow by '->' followed by one or more'words' that should be assigned to the lemma separated by one of more non-tokens. See the example below:be->is, areplay->play, plays, playing, playedNote that in the example above, commas and spaces are assumed to be NOT defined as tokens. Forthis reason, if the lemma list available on the AntConc webpage is used, a 'dash' needs to be addedto the token (word) definition for the lemma list to be processed correctly as the hyphenated wordsare used to the right of the lemma definition.●Using the "Word List Range" option, a wordlist can be generated using all words (“Use all words”),or a specific set of words (“Use specific words below”), or ignoring a certain set of words (“Use a stoplist below”). The range of words to be used (or ignored) can be entered directly, or can be stored in files which are then read by AntConc by pressing the 'Open' button. A combination of words in a file and words directly entered can also be used.<Keyword List Preferences> In addition to the above the following settings can be made:●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to a ppear before lower-case words.●Use the "Keyness Generation Method" to choose the statistical measure for measuring keywordstrength. Currently, two statistical measures can be used: Chi-Squared and Log-Likelihood (the default). The default option for the 'keyness' measure is recommended.●Use the "Threshold Value" option to choose a cut-off for the keyness values generated. The defaultoption (“All values”) for the threshold value is recommended.●Use the "Show Negative Keywords" option to view words that are unusually INFREQUENT in thetarget corpus compared with the reference corpus.●Use the "Use raw file(s)" option to use raw reference corpus file(s) as the reference corpus.●Use the "Use word list(s)" option to use word list(s) that correspond to a reference corpus. The wordlist(s) should be formatted as described in the tool explanation.●Click the "Add Directory" or "Add Files" buttons to select the reference corpus files.●Click the "Swap with Target Files" button to swap the main and reference corpora. Note that thiswill only make sense when raw corpus files are being used.<HELP>The help menu provides access to this detailed readme file. It also shows some general information about AntConc, including the current version number and date of release.SHORTCUTSHere is a list of Shortcuts that apply to all tools using window panes for results.CTRL-C = Copies the currently selected textCTRL-A = Selects all text in the window paneALT-A = Selects all text in all window panes showingDouble click = Selects the current wordTriple click = Selects the current line in the window paneSHIFT-click = Selects continuous lines across all window panes showingCTRL-click = Selects discontinuous lines across all window panes showingDELETE = This deletes any selected lines that span across all window panesINSERT = This keeps any selected lines that span across all window panes, and deletes all othersFor any 'spinbox' widgets (e.g. the search term entry box) the 'UP' and 'DOWN' arrow keys on the keyboard can be used to activate the up and down buttons.。
AntConc详细使用手册
AntC取语境共现
• 1.1设置检索项 • (1)单项检索 a. 点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开 整个文件夹,可以选择open directory); b. 在“Search Term”一栏键入要检索的词项,如go; c. 在“Search Window Size” 一栏设置每一共现行出现的词数; d. 点击start 键,开始检索。 • [1] 此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师 何安平订正和补充。 • 其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。
检索结果如图1.1所示:
• 图1.1单项检索结果
(完整版)AntConc的详细使用说明
AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。
检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。
例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。
例:如要研究a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。
点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。
如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。
另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。
其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。
位置,如本研究中,of在a的右二位置,所以“Content Horizon ”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。
结果可提取a lot of, a bit of 等词块。
●设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。
AntConc软件使用说明书
AntConc (Windows, Macintosh OS X, and Linux)Build 3.4.1Laurence Anthony, Ph.D.Center for English Language Education in Science and Engineering, School of Science andEngineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, JapanJanuary 31, 2014IntroductionAntConc is a freeware, multiplatform tool for carrying out corpus linguistics research and data-driven learning. It runs on any computer running Microsoft Windows (tested on Win 98/Me/2000/NT, XP, Vista, Win 7), Macintosh OS X (tested on 10.4.x, 10.5.x, 10.6.x), and Linux (tested on Ubuntu 10, Linux Mint). It is developed in Perl using ActiveState's PerlApp compiler to generate executables for the different operating systems. Getting Started (No installation necessary)WindowsOn Windows systems, simply double click the AntConc icon and this will launch the program.Macintosh OS XOn Macintosh systems, simply double clicAk the AntConc icon and this will launch the program.LinuxOn Linux systems, change the permissions to allow AntConc to be run as an executable file. Next, double click the AntConc executable and it will launch.Caution: Do not place user settings files from older versions of AntConc in the same folder as new versions. This can cause unexpected problems and may even prevent AntConc from starting. It is recommended that you delete your earlier settings file and export it again from the AntConc file menu.Overview of ToolsAntConc contains seven tools that can be accessed either by clicking on their 'tabs' in the tool window, or using the function keys F1 to F7.Concordance Tool: This tool shows search results in a 'KWIC' (KeyWord In Context) format. This allows you to see how words and phrases are commonly used in a corpus of texts.Concordance Plot Tool This tool shows search results plotted as a 'barcode' format. This allows you to see the position where search results appear in target texts.File View Tool This tool shows the text of individual files. This allows you to investigate in more detail the results generated in other tools of AntConc .Clusters/N-Grams The Clusters Tool shows clusters based on the search condition. In effect it summarizes the results generated in the Concordance Tool or Concordance Plot Tool. The N-Grams Tool, on the other hand, scans the entire corpus for 'N' (e.g. 1 word, 2 words, …) length clusters. This allows you to find common expressions in a corpus.Collocates: This tool shows the collocates of a search term. This allows you to investigate non-sequential patterns in language.Word List: This tool counts all the words in the corpus and presents them in an ordered list. This allows you to quickly find which words are the most frequent in a corpus. Keyword List: This tool shows the which words are unusually frequent (or infrequent) in the corpus in comparison with the words in a reference corpus. This allows you to identify characteristic words in the corpus, for example, as part of a genre or ESP study.Concordance ToolThis tool shows search results in a 'KWIC' (KeyWord In Context)format. This allows you to see how words and phrases arecommonly used in a corpus of texts.The following steps produce a set of concordance lines from acorpus and demonstrate the main features of this tool.1) Select one or more files for processing from using the 'OpenFile(s)...' or 'Open Dir...' options in the 'File' menu. The list ofselected files is shown in the left frame of the main window.2) Enter a search term on which to build concordance lines in the search box.3) Choose the number of text characters to be outputted on either side of the search term, using theincrease and decrease buttons on the right of the button bar under the "Search Window Size" title.(default value is 50 characters)4) Click on the 'Start' button to start the concordance lines results generation. The concordance generationcan be halted at any time by clicking on the 'Stop' button.5) Use the Kwic Sort options to rearrange the concordance lines at three different levels. 0 is the searchword, 1L, 2L... are words to the left of the target word, 1R, 2R... are words to the right of the target word.6) Click on the 'Sort' button to start the sorting process.7) Move the cursor over the highlighted search term in one of the concordance lines. The cursor will changeto a small hand icon. Clicking on the highlighted search term, will allow you to view the search term hit as it appears in the original file via the File View Tool (see below).8) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.The total number of concordance lines generated (Concordance Hits) is shown at the top of the tool window. This number will flash with the word "FINISHED" when processing has been completed, and will flash with the word "NO HITS", if not hits are generated for a particular search term.Search terms can be specified as being "words" (default) or "character strings" by activating or deactivating the "Word" search term option. Also, searches can be either “case insensitive”(default) or “case sensitive”by activating or deactivating the "Case" search term option. Searches can also be made using full regular expressions by activating the "Regex" option. For details on how to use regular expressions, consult one of the many texts on the subject, e.g., Mastering Regular Expressions (O'Reilly & Associates Press) or type "regular expressions" in a web search engine to find many sites on the subject (e.g., /quickstart.html). AntConc supports Perl regular expressions.By clicking on the "Advanced Search" button, more complex searches become possible. The first advanced search option allows you to import a set of search terms, either by typing them one per line, or by loading in a list of search terms from a file. Here, each line will be treated as a separate search term. This feature allows you to use a large set of search terms without having to re-type them each time. The second advanced search option allows you to define context words and a context window within which the search term(s) must appear. For example, to search for "student" where it appears at least three words to the left or right of the word "university," set the search term as "student," the context word as "university," and set the context window as 'From' 3L 'To' 3R.A number of menu preferences are available with this tool. (See below).Concordance Plot ToolThis tool shows concordance search results plotted in a 'barcode'format, with the length of the text normalized to the width of the barand each hit shown as a vertical line within the bar. This allows you tosee the position where search results appear in target texts. The toolalso allows you to see which files include the target search term, andcan also be used to identify where the search term hits clustertogether. An example of the use of the Plot Tool is in determiningwhere specific content words appear in a technical paper, or wherean actor or story character appears during the course of a play ornovel.The number of hits and length of each text is shown to the right of the barcode plot, and the plot itself can be enlarged or reduced in size using the “Plot Zoom” buttons.If you move the cursor over the highlighted search term in one of the concordance lines, the cursor will change to a small hand icon. Clicking on the highlighted search term will allow you to view the search term hit as it appears in the original file via the File View Tool (see below).Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,” or "Regex" based. Advanced searches are also available. For details see the Concordance Tool explanation.File View ToolThis tool shows the raw text of individual files. This allows you toinvestigate in more detail the results generated in other tools ofAntConc.The following steps produce a view of the original file anddemonstrate the main features of this tool.1) Select a file to view in the “Corpus Files” list on the left of themain window.2) If a search term has been specified, the search term hits will behighlighted throughout the text. Search options are the sameas for the Concordance Tool and Concordance Plot Tool.3) Use the "Hit Location" buttons to jump to the appropriate hit in the file.4) Change the search term and click on the 'Start' button to view other hits in the file.5) Click on the highlighted text to generate a set of KWIC lines using the highlighted text as the search term.6) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,” or "Regex" based. Advanced searches are also available. For details see the Concordance Tool explanation.The following shortcut is unique to the File View Tool:CTRL-Click = Jumps to the nearest hit in the windowClusters/N-Grams ToolThe Clusters ToolThis allows you to search for a word or pattern and group (cluster) theresults together with the words immediately to the left or right of thesearch term. In effect it summarizes the results generated in theConcordance Tool or Concordance Plot Tool.The clusters can be ordered by frequency, the start or end of the word,the range of the cluster (number of files in which the cluster appears),or the probability of the first word in the cluster preceding theremaining words. All list orderings can also be inverted by activatingthe “Invert Order” option. Also, you can select the minimum and maximum length (number of words) in each cluster, and the minimum frequency of clusters displayed. It is also possible to select if the search term always appears on the left (default) or right of the cluster.Note: In the current version, if more than one word is specified as the search term, only the first word will appear on the right if the "Search Term on Right" option is selected.)The following steps produce a set of cluster results and demonstrate the main features of this tool.1) Choose the appropriate ordering options (see above for details).2) Press the 'Start' button. At any time, the generation of the clusters list can be halted using the 'Stop'button.3) Click on the cluster to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.The N-Grams ToolThis allows you to scan the entire corpus for 'N' word clusters (e.g. 1word, 2 words,… ). This allows you to find common expressions in acorpus. For example, n-grams of size 2 for the sentence "this is a pen"are 'this is', 'is a' and 'a pen'.All ordering options available in the Clusters Tool are also available inthe N-grams tool. You can also select the minimum and maximum size(number of words) in each n-gram, and the minimum frequency andrange of n-grams displayed.The following steps produce a set of N-gram results and demonstratethe main features of this tool.1) Click on the "N-Grams" option above the search entry box.2) Choose the appropriate ordering options.3) Press the 'Start' button. At any time, the generation of the n-grams list can be halted using the 'Stop'button.4) Click on the n-gram to generate a set of KWIC lines using the text as the search term.5) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.In both the Clusters Tool and N-Grams Tool, search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,”or "Regex" based. Advanced searches are also available for the Clusters Tool. For details see the Concordance Tool explanation. A number of menu preferences are available with this tool. (See below).Collocates ToolThis tool allows you to search for collocates of a search term. Thisallows you to investigate non-sequential patterns in language.The collocates can be ordered either by total frequency, frequency onthe left or right of the search term, or the start or end of the word.They can also be ordered by the value of a statistical measurebetween the search term and the collocate. The value measures how'related' the search term and the collocate are. Current possiblestatistical measures are listed below. All list orderings can also beinverted. Also, you can select the span of words to the left and rightof the search term in which to find collocates, and the minimum frequency of collocates displayed. If only a one-word span is required, for example, to see which words appear directly on the right of the search term, check the "Same" box, to keep the minimum and maximum span size the same.Statistical Measures:(MI) Mutual Information: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)(T-Score) T-Score: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)The following steps produce a set of collocate results and demonstrate the main features of this tool.1) Choose the appropriate ordering options.2) Press the 'Start' button. At any time, the generation of the collocates list can be halted using the 'Stop'button.3) Click on one of the collocates to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,”or "Regex" based. Advanced searches are also available for the Collocates Tool. For details see the Concordance Tool explanation. A number of menu preferences are available with this tool. (See below).Word List ToolThis tool counts all the words in the corpus and presents them in anordered list. This allows you to quickly find which words are the mostfrequent in a corpus.The words can be ordered either by frequency or the start or end ofthe word, and the ordering can be inverted. The word list can also begenerated in case-insensitive mode, where words in upper and lowercase are treated the same (default) or case-sensitive, where words inupper and lower case are treated separately.The following steps produce a word list and demonstrate the main features of this tool.1) Choose the appropriate ordering options.2) Press the 'Start' button. At any time, the generation of the word list can be halted using the 'Stop' button.3) Click on the word to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.A number of menu preferences are available with this tool. (See below).Keyword ListThis tool shows the which words are unusually frequent (orinfrequent) in the corpus in comparison with the words in a referencecorpus. This allows you to identify characteristic words in the corpus,for example, as part of a genre or ESP study.The following steps produce a keyword list and demonstrate the mainfeatures of this tool.1) Select a set of target files.2) Go to the 'Preferences' menu and chose the 'KeywordPreferences' option.3) Choose the keyword generation method (a statistical measure) to calculate the 'keyness' of the targetfile words. The default setting of Log Likelihood is recommended. When using either Log Likelihood or Chi-squared as the statistical measure, the following significance values apply (see: /llwizard.html):95th percentile; 5% level; p < 0.05; critical value = 3.8499th percentile; 1% level; p < 0.01; critical value = 6.6399.9th percentile; 0.1% level; p < 0.001; critical value = 10.8399.99th percentile; 0.01% level; p < 0.0001; critical value = 15.134) Choose a threshold for the number of keywords to be displayed.5) Choose whether or not to view 'Negative Keywords' (target file words with an unusually low frequencycompared with the frequency in the reference corpus)6) Choose one of the reference corpus options. Select "Use raw file(s)" when you will use raw text (.txt) filesto serve as the reference corpus. Select "Use word list(s)" when you will use one of more word lists that are generated from a reference corpus. The "Use word list(s)" option allows you to generate keywords even when the original reference corpus is not available. The format for a word list is:RANK FREQUENCY WORD (separated by any type of white space, including spaces and tabs).1 12838 the2 11289 a3 8583 of...Note that blank lines and lines beginning with # will be ignored. Also, AntConc will check that the file(s) are correctly formatted and report any errors.7) Load the reference corpus of text (.txt) files, in the same way that the target files are chosen.8) The reference corpus directory will be shown (if appropriate), and the list of reference corpus files willappear at the bottom of the Keyword Preferences option menu.9) Click ‘Apply’ in the Keyword Preferences menu and return to the main Keywords window.10) Choose suitable options for displaying the list of generated Keywords (in a similar manner to the optionsfor generating a Word List).11) Press the 'Start' button. At any time, the generation of the keyword list can be halted using the 'Stop'button.12) Click on the keyword to generate a set of KWIC lines using the text as the search term.13) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.A number of menu preferences are available with this tool. (See below).MENU OPTIONSMenu options are divided into three groups, "File", "Global Settings" and "Tool Preferences". The options available in each group will be described below.<FILE>Options here relate to reading files into AntConc and writing files to the hard disk containing data of various types. There are also options to export all current settings to a file and import user settings from a file. If a user settings file becomes corrupted for any reason, simply restart the program or use the "Restore Default Settings" option to return the program to its original state.<GLOBAL SETTINGS>Categories here will have an effect on multiple tools in AntConc:<Character Encoding> AntConc is fully Unicode compliant, meaning that it can handle data in any language, including all European languages and Asian languages. The character encoding of the data to be read by AntConc should be specified here. For example, if you are working with data saved in a Western language, it will usually be encoded in iso-8859-1. On the other hand, Japanese texts are usually encoded in shiftjis.By specifying the correct encoding, data from all languages can be processed correctly within AntConc. The default is Unicode UTF-8, which is an international standard designed to display all characters of the languages of the world in a single encoding. I recommend you use this encoding if you create any corpus.<Colors> In the Color Settings category, you can edit the colors used to display results and other information.<Files> In the File Settings category, you can choose to display the full path of a file or just the name.<Fonts> In the Font Settings category, you can edit the font types, sizes, and styles used to display file names, results, and the search term.<Tags> In the Tag Settings category, you can choose to display or hide any tags that are contained in the corpus files. You can also choose to search with tags but hide them in the results display. Embedded tags(e.g. book_NN1), non-embedded tags (e.g. <noun>book</noun>, and header tags can be shown or hiddenby activating or deactivating the option.<Token (Word) Definition> In the Token (Word) Definition category, you can choose which characters, numbers and so on will define a "word". For example, in some cases only letters will be considered words, but at other times, in might be desirable to include numbers, dashes and so on. AntConc is fully Unicode compliant, meaning that it can handle data in any language, including all European languages and Asian languages. For this reason, the default option refers to 'letters' in the broadest sense. Letters, for example, include all English letters (a to z, A to Z) but also all Japanese 'letter' characters. It is also possible to define your own "token" definition, or append characters to the standard classes.For more information on the Unicode standards see:http://www.cs.tut.fi/~jkorpela/unicode/guide.html//Public/5.0.0/ucd/UCD.html/Public/UNIDATA/PropList.txt/charts/<Wildcard Settings> In the Wildcard Settings category, you can edit the default wildcard characters so that they do not clash with a search entry. For example, the "or" wildcard default character (a 'pipe' character | ) can be changed to a backslash (/) here. There are special wildcards to deal with whitespace issues.<TOOL PREFERENCES>Each tool (with the exception of Concordance Plot Tool and File View Tool) has a preferences category, where settings can be fine-tuned. All tool preference categories allow you to show or hide the different frames in which the results are displayed. For example, you can choose to hide the frame showing file names in the Concordance Tool display window.<Concordance> In addition to the above, the following settings can be made:●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.●Using the “Sort by characters instead of words” opti on, it is possible to arrange the results byCHARACTERS to the left or right of the first letter of the search term. This makes it possible to search for spelling differences.●Using the “Hide search term in KWIC display” option, search term can be hidden in the KWIC lines,allowing instructors to quiz students on possible words to fit the gap.●Using the “Put delimiter around hits in KWIC display” option(the default), the chosen delimitercharacter is added around the hit in the KWIC display. This makes it easier to see the hit and also eases later processing of the data in a spreadsheet software program.●Use the “Delimiter” option to select the delimiter character.●Use the “Line break replacement” option to select a character to replace line breaks with.<Clusters/N-Grams> In addition to the above, the following settings can be made:●Using the “Treat all data as lowercase” option (default) causes all words to be transformed to lower-case words. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.<Collocates Preferences> In addition to the above, the following settings can be made:●Use the "Selected Collocate Measure" to choose the statistical measure for measuring collocatestrength. Currently, two statistical measures can be used: Mutual Information (MI) and T-Score. See the tool explanation above for references to the statistics.●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.<Word List Preferences> In addition to the above the following settings can be made:●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.●Use the lemma list options to select a lemma list. A 'lemma list' can be loaded from a file, which canthen be used to generate a lemma list instead of a word list. When the lemma list function is used, the 'lemma word form(s)' column will show the words in the corpus associated with each lemma.A lemma list can be created by specifying the 'lemma entry' follow by '->' followed by one or more'words' that should be assigned to the lemma separated by one of more non-tokens. See the example below:be->is, areplay->play, plays, playing, playedNote that in the example above, commas and spaces are assumed to be NOT defined as tokens. Forthis reason, if the lemma list available on the AntConc webpage is used, a 'dash' needs to be addedto the token (word) definition for the lemma list to be processed correctly as the hyphenated wordsare used to the right of the lemma definition.●Using the "Word List Range" option, a wordlist can be generated using all words (“Use all words”),or a specific set of words (“Use specific words below”), or ignoring a certain set of words (“Use a stoplist below”). The range of words to be used (or ignored) can be entered directly, or can be stored in files which are then read by AntConc by pressing the 'Open' button. A combination of words in a file and words directly entered can also be used.<Keyword List Preferences> In addition to the above the following settings can be made:●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to a ppear before lower-case words.●Use the "Keyness Generation Method" to choose the statistical measure for measuring keywordstrength. Currently, two statistical measures can be used: Chi-Squared and Log-Likelihood (the default). The default option for the 'keyness' measure is recommended.●Use the "Threshold Value" option to choose a cut-off for the keyness values generated. The defaultoption (“All values”) for the threshold value is recommended.●Use the "Show Negative Keywords" option to view words that are unusually INFREQUENT in thetarget corpus compared with the reference corpus.●Use the "Use raw file(s)" option to use raw reference corpus file(s) as the reference corpus.●Use the "Use word list(s)" option to use word list(s) that correspond to a reference corpus. The wordlist(s) should be formatted as described in the tool explanation.●Click the "Add Directory" or "Add Files" buttons to select the reference corpus files.●Click the "Swap with Target Files" button to swap the main and reference corpora. Note that thiswill only make sense when raw corpus files are being used.<HELP>The help menu provides access to this detailed readme file. It also shows some general information about AntConc, including the current version number and date of release.SHORTCUTSHere is a list of Shortcuts that apply to all tools using window panes for results.CTRL-C = Copies the currently selected textCTRL-A = Selects all text in the window paneALT-A = Selects all text in all window panes showingDouble click = Selects the current wordTriple click = Selects the current line in the window paneSHIFT-click = Selects continuous lines across all window panes showingCTRL-click = Selects discontinuous lines across all window panes showingDELETE = This deletes any selected lines that span across all window panesINSERT = This keeps any selected lines that span across all window panes, and deletes all othersFor any 'spinbox' widgets (e.g. the search term entry box) the 'UP' and 'DOWN' arrow keys on the keyboard can be used to activate the up and down buttons.。
AntConc的详细使用说明
AntConc3.2.0的使用说明1.提取语境共现 1.1设置检索项(1)单项检索a ) 点击file 下拉菜单中的“ open files ”,选择要打开的语料(如果想打开整个文件 夹,可以选择 open directory );b ) 在“ Search Term ”一栏键入要检索的词项,如 goc ) 在“ Search Window Size ” 一栏设置每一共现行出现的词数;d ) 点击空命,开始检索。
检索结果如图1.1所示:(2)多项检索 设置多项检索 除了检索单个词项以外,An tCo nc 还具有检索多个词项的功能,检索方法为在检索 项间键入“ | ”符号。
例:要检索动词go 的各种时态形式,可在“ Search Term ”中输入go|went|gone|goes设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的 语境范围内出现。
例:如要研究a …of 这一类词组,可通过 AntConc 提取所有的词项, 检索方法如下:a ) 在“Search Term ”一栏键入 a ;Advsnced.b) 点击“ Search Term ” 旁的 ,进入“ Advaneed Search ” 界面,如图 1.2 所示。
点击“ Use con text words and horizo ns ”,然后在 “ Con textn ―百― Cl&srWords'—栏键入of,点击鸟_1。
如要重新设置语境词,可先点击1此说明书由华南师范大学外文学院 2007研究生张杏娟编写,导师何安平订正和补充 其中限定范围的检索方法由香港城市大学 D.Lee 博士提供,仅此致谢。
1欢迎下载TMtlh 2 i-g*^ |" C-K A |"Hu 戸:El -厂 Uttthtf frwttoar |W L ■轉 Mie 曲冲&饭 rd MfrSorriC«HI *J 4 W4rd«1A ]■CfWfdl NiHbuibFfcm |2H 4fjH 4J|图 1.2 Adva need Search 界面图1.1单项检索结果清除原来语境词,后重复以上操作。
antconc单词数
antconc单词数
AntConc是一个用于文本分析的工具,它可以帮助用户分析文本中的词汇使用情况、词频等信息。
要使用AntConc来统计文本中的单词数,首先需要将文本导入到AntConc中,然后选择"File"菜单中的"Open"选项来打开要分析的文本文件。
接下来,在AntConc 的界面上,可以点击"Word List"选项卡来查看文本中的单词列表,并且在底部的状态栏中可以看到文本的单词数统计信息。
另外,也可以在"Tool"菜单中选择"Word List"来查看文本中的单词列表和统计信息。
总之,使用AntConc可以方便地获取文本的单词数统计信息,帮助用户对文本进行更深入的分析。
(完整版)AntConc的详细使用说明.doc
(完整版)AntConc的详细使用说明.docAntConc3.2.0 的使用说明11.提取境共1.1 置索( 1)索a)点 file 下拉菜中的“ open files”,要打开的料(如果想打开整个文件,可以 opendirectory);b)在“ Search Term”一入要索的,如 go;c)在“ Search Window Size”一置每一共行出的数;d)点,开始索。
索果如 1.1 所示:图 1.1 单项检索结果(2)多索置多索除了索个以外, AntConc 具有索多个的功能,索方法在索入“ |”符号。
例:要索go 的各种形式,可在“Search Term”中入go|went|gone|goes置境索了限制境共的索,可以定一个境在索周一定的境范内出。
例:如要研究a ? of 一,可通AntConc 提取所有的,索方法如下:a)在“ Search Term”一入a;b)点“ Search Term”旁的,入“ Advanced Search”界面,如 1.2 所示。
点“ Use context words and horizons”,然后在“ ContextWords”一入 of,点。
如要重新置境,可先点清除原来境,后重复以上操作。
另外,需定境距离索的位置,如本研究中,of 在a 的右二位置,所以“ Content Horizon”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。
结果可提取 a lot of, a bit of等词块。
设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。
例:研究感官动词watch, sound, feel, hear, smella)在TXT文本中键入所有要检索的词项,可多达250 个词。
然后为该文本起名保存。
需注意:键入的词项需以列的形式排列。
如:feelfeelsfeltb)点击Search Term旁的,选择“ Use search term(s) from list below”。
AntConc软件基本操作
主讲人:李广伟010302AntConc 软件介绍AntConc 功能介绍AntConc 功能演示AntConc基本操作AntConc是由日本早稻田大学(WasedaUniversity)教授Laurence Anthony开发的一款免费的语料库检索工具,主要用于语料库语言学、翻译学、外语教学等领域。
AntConc软件介绍 功能介绍图1 AntConc打开主界面如上图所示,AntConc包含“concordance”索引工具、“Concordance Plot”索引定位、“File View”文件查看、“Clusters/ N-Gram”词丛/N元模式、“Collocates”搭配、 “Word List”词表、“Keyword List”关键词表等菜单。
◆ 该软件具有提取语境共现、提取搭配词表、提取词频表等功能,以下《黄帝内经·素问》为例进行逐一说明:◆ 运用concordance工具进行提取语境共现,首先,单击File菜单,选择Open Files, 选择要打开的语料(如果想打开整个文件夹,可以选择open directory),然后,在下方Search Term下的输入框里输入“Huangdi”。
功能介绍AntConc功能演示提取语境共现图2“Huangdi”语境共现界面如2所示,“Huangdi”一词被用蓝色进行了凸显,《黄帝内经·素问》英译本里共出现“Huangdi”644次。
◆ 单击“start”,检索结果呈现在KWIC里显示,如下图所示:AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号,如在“Search Term”里输入“do|does|did|doing|done”(如图3),还可以单击“Advanced”,勾选“Use search term(s) from list below”。
在检索下面框手动输入(也可以直接加载一个txt词表用来检索),注意每个单词独立成行,设置完成后单击“Apply,然后回到语境共现界面。
AntConc..的使用说明
AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。
检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。
例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。
例:如要研究a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。
点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。
如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。
另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。
其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。
图1.2 Advanced Search界面位置,如本研究中,of在a的右二位置,所以“Content Horizon ”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。
结果可提取a lot of, a bit of 等词块。
●设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。
antconc 检索单词词形
antconc 检索单词词形
AntConc 是一个用于分析语料库的软件,它可以帮助你检索和处理文本数据。
使用AntConc 检索单词的词形变化,可以按照以下步骤进行:
1. 打开AntConc 软件,并加载你想要分析的语料库。
2. 在主界面上,选择你想要分析的文本文件或文件夹。
3. 在“Wordlist”选项卡下,选择“Conc”按钮,以打开一个词频统计表。
4. 在词频统计表中,你可以看到每个单词的出现次数和频率。
在“Form”列中,你可以看到单词的各种词形变化。
5. 根据需要筛选和排序单词,以查看特定词形的出现次数和频率。
需要注意的是,AntConc 的词形变化检索功能取决于你的语料库中包含的文本数据。
如果你的语料库中包含多个语言的文本,AntConc 可以自动识别不同语言的词形变化。
如果你的语料库中只包含一种语言的文本,AntConc 也可以识别该语言的词形变化。
最新AntConc的详细使用说明
AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。
检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。
例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。
例:如要研究a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。
点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。
如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。
另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。
其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。
位置,如本研究中,of 在a 的右二位置,所以“Content Horizon”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。
结果可提取a lot of, a bit of 等词块。
●设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
AntConc3.2.0的使用说明I1.提取语境共现设置检索项(1)单项检索a)点击file下拉菜单中的“ open files ”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory );b)在“ Search Term”一栏键入要检索的词项,如go;c)在“ Search Window Size ” 一栏设置每一共现行出现的词数;d)点击抽L,开始检索。
检索结果如图所示:I此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。
其中限定范围的检索方法由香港城市大学博士提供,仅此致谢图单项检索结果(2) 多项检索设置多项检索除了检索单个词项以外,An tCo nc 还具有检索多个词项的功能,检索方 法为在检索项间键入“ | ”符号。
例:要检索动词go 的各种时态形式,可在“ Search Term ”中输入go|wen t|g on e|goes设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项 周边一定的语境范围内出现。
例:如要研究a …of 这一类词组,可通过 AntConc 提取所有AppJ-i Caned的词项,检索方法如下:卜~glLlidriftmr jAd 帼nt 園%swtJi- ? XJ■=£ HtrSaW匸*也也M NuiEZuibPfon 宙m ★ 佃刽a)在“ Search Term”一栏键入a;b)点击“ Search Term” 旁的""',进入“ Advaneed Search ”界面,如图所示。
点击“ Use eon text words and horiz ons ”,然后在“ Con text Words' 一栏键入of,点击巨匚。
如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。
另外,还需设定语境词距离检索项的位置,如本研究中,of在a的右二位置,所以“ Content Horizon ”确定为讪冋応申,最后点击『砂订;c)回到语境共现的界面后,点击心,开始检索。
结果可提取a lot of,a bit of 等词块。
设置多字语检索在研究中,如需检索多个词项,除了使用“丨”以外,也可使用以下方法,尤其适合检索项数目较多的情况。
例:研究感官动词watch, sound, feel, hear, smella)在TXT文本中键入所有要检索的词项,可多达250个词。
然后为该文本起名保存。
需注意:键入的词项需以列的形式排列。
如:feelfeelsfeltb) 点击Search Term 旁的,选择“ Use search term(s) from listbelow ”。
点击工已矿,在保存以上新建的文本的盘符路径点击文本名,然后点击■ \ \\ ;c)回到语境共现的界面后,点击如—1,开始检索。
(3)类别检索使用通配符检索附码检索因研究需要,有些语料经过整理加工并附加上各种符号标记,称为“附码语料库”。
如附有词性标注的LOBTA和附有错误类型标记的CLE(等。
检索时只需键入某个标记符号便可提取带附有该标码的所有词。
例:提取LOBTA语料库中所有的名词,只需键入*_NN(NN为名词标码,关于其他词性的详细标记,请参阅何安平,2004,《语料库语言学与英语教学》一书的附录113页。
(4)在指定范围内检索a)在concordanee 的检索界面上选择"Regex" (regular expression) ,键入\[.*\] 为检索项便可提取语料库中所有在起止符号“[”和“]”之间的所有文字内容,其他符号照似类推。
b)在concordanee 的检索界面上选择"Regex" (regular expression) ,键入\[.*write.*\] 为检索项便可提取语料库中所有在起止符号“[”和“]”之内的“ write ”的语境共现行,其他词项照似类推。
键入的检索项计较大小写,但是可以用通配符*。
分析检索结果(1)观察频数、分布频数即该检索项出现的次数,可在“ Concordanee Hits ” 一栏中获得。
O f H■£*Of lliirbC 1吐 PFot点击查看检索项在语料文本中的分布状况。
(2) 凸显周边语境词为了具体某个教学等目的,可通过凸显检索项周边的某些词汇。
方法是选择“ Kwic Sort ”,R1和L1分别 代表检索项右方和左方的第一个词, 一次可设置三列凸 现词,均按字母顺序排列。
检索结果如图所示。
如想使凸显内容的颜色一致,可通过设置回加側耶下拉菜单中的“ Color Settings ”改变颜色。
另 外,若要凸显的部分不是一个词,而是单词中的字母,可选择1 r 1下 拉菜单中的“ Con corda nee ” 选项中的“ Sort by characters in stead ofwords ”,如图所示。
图凸显周边语境词检索结果Other Options-Treat case In sort应 Sort fcy 匚 ha riders instead atwords(3)提取搭配词表通过点击主界面中的"吋如1,可获得检索项的搭配词表,同时可以设置搭配词的位置、出现的最少次数与词表的排列方式。
例:观察look右一的搭配词Ccillocate%a) 点击主界面中的b)在“ Search Term ”一栏键入look ;Wvirlfrrt1卸m 厂SnMc)设置搭配词的位置,如I I ;d)点击%],开始检索,检索结果如图所示。
e)点击“ Sort by Freq ”可根据不同的需要设定搭配词表的排列方式, 如按频数排,按拼写字母排等等。
图提取搭配词表检索结果CWR«F!i>>«te rfciHn I I ■叭 *44*T利虹*Q■廈T*.百『1 吃HO*幽卅材■卑n«* E l-iaurip-(4)提取搭配短语另外,也可以使用'"■'■:;"l':-这一工具来提取搭配词块,且可设置检索项在词块中的位置。
M 上!■«* *・m卜:•I ■•丄i 丿i 丿例:检索以ask开头的搭配词块a)点击主界面中的皿*「訂;b)在“ Search Term ”一栏键入ask ;c)设置检索项的位置,如选择“ On the left ”;d)设置搭配词块的长度,如:3, :3 ;e)点击ZI,开始检索,检索结果如图所示,所有的ask被列在词块的左边。
(5)隐藏、分类和删除“隐藏”是指把检索结果中的检索项挖空,可用于教学或测试。
具体操作方法如下:a)在“ Search Term ”一栏键入要检索的词项,如look ;b)点击T°o1 Preference$,选择“ Concordanee”, 再选择“ Hide search term inKWIC display ”,最后点击灯肋;C)点击竺-,开始检索。
检索结果如下:you always do your own homework Do you ******* for help whe n youthink it n ecessary Do you help2.提取词频表单字和N字语词频表单字词频表是指目标语料库的单词表,且词频表的检索结果是以每个词的形式及其频数排列。
方法如下:a)选择要生成单字词频的目标语料库;b)进入I界面,设置词频表排列排序方式,如“ Sort by Freq ”;也可以设置为按词头的或者词尾的拼写字母顺序排列。
c)点击也t,开始检索,检索结果如图所示。
图单字词频表检索结果N 字语词频表是指目标语料库的多字语频数表。
例如,检索句子“ Thisis a pen ”的2字语词频表结果为:“this is ”、“is a ”、“a pen ”。
N 字语 词频表的提取方法如下:a )选择要生成单字词频的目标语料库;b )进入界面,后点击回■■'■;d ) 选择词表的排序方式,如“ Sort by Freq ”;e ) 点击 ,开始检索,检索结果如图所示。
c iSGryTK 帳*«卩 >»<h ■-〒P H - Masri I HIILP F vl Pwatmi Jtau图N 字语词频表检索结果c )设置N 字语词频表的长度,如lUi^rJiin Sir«?Min Sli& !■<£& SI&■■ |:■■< i.i•・ MiM- F ,E ,hi >Wb^r V3TIBIE-V1F^na 刑尢.MHL Jfln kifvL-*i r^s4nMT3r lUB^jJi JS ^WCWM■ H -RMrtlH 1■z llllllllll 3d -1 M>bV 疋 Vr —i—i 『VrSRAte,H=w ”Fs uqiH p IMK, Lrii ik IHia 'Hr Jii ri r -»uri r = •■词项重组---词簇化(lemmatizing )词簇化是将同一词性的某个词的所有曲折变化形式作削尾处理,并归为一个词簇来计算频数。
其好处是可以简约词频表并且引起对构词法的关注。
对词频表进行词簇化的方法如下:Woi <1 Li&t在I界面生成词频表之后,拉下Tool Preferenee 菜单,选择Lemma list options, 点击open和load,上传lemmal文档(可在本网站下载)点击Apply (如图2. 3所示)。
词簇化的部分结果见图.图2. 3设置词簇化界面图词频表被词簇化后的结果(部分)图中1142例a和133例an被归为同一个词簇a共1275例。
3.提取关键词表关键词表是指两个语料库的词频表相比,其中一个明显地高频于另一个的那部分词项表。
前一个称目标语料库;后一个称参照语语料库,通常规模要大一些,以此来凸现目标语料库的一些特别高频词以浮现该语料库的主题或内容特色。
凸显目标语料库中显著性高频于对照语料库的词项具体操作方法如下:a)点击file 下拉菜单中的“ openfiles ”,选择要对比的目标语料的语料(如果对比整个文件夹,可以选择open directory );b)点击主界面中的Keyword Li^tc)点击阳彗也,选择“ KeywordList ”,如图所示;d)选择“ Show negative keywords ”,可在检索结果中显示对照语料明显高于目标语料的词;最后点击.Apply | ;,选择对照语料,「检索结果如图所示。