AntConc地详细使用说明书

合集下载

2antconc3.2.1_chinese使用手册[1]

2antconc3.2.1_chinese使用手册[1]

使用手册中文译文(Chinese translated version for the readme):AntConc3.2.1使用手册(Windows, Macintosh OSX,and Linux操作系统)###############################################################Laurence Anthony, Ph.D.(哲学博士)科学工程英语教育中心科学工程学院早稻田大学3-4-1 大久保,新宿区,东京169-8555,Japan3月10日,2007年###############################################################AntConc开始是作为相对简单的索引程序,不过慢慢地进步成一个相对有用的文本分析工具。

它是在ActiveState供应商的优良Komodo 跨平台支持多种程序软件的开发环境下用Perl5.8程序语言写成的。

这个程序只要双击可执行文件就可以打开,这个文件可以从Laurence Anthony实验室网站下载。

这个程序可以在任何窗口环境下运行,包括Win 98/Me/2000/NT and XP, and also Macintosh OSX and Linux的电脑。

如果用户发现任何这个程序在一个特定的操作系统下的使用问题,请让我知道。

AntConc包括以下工具:索引**Concordance**索引定位**Concordance Plot**文件查看**File View**词丛**Clusters**N元模式(部分词丛)**N-Grams (part of Clusters)**搭配**Collocates**词单**Word List**关键词单**Keyword List**注意每个工具都可以点击工具窗口上各自的选项卡或者使用功能键F1至F7打开。

“索引”(Concordance)索引工具生成用户选择的一个或多个目标文件的索引行(concordance lines或上下文关键词: key word in context)行。

AntConc的详细使用说明之欧阳文创编

AntConc的详细使用说明之欧阳文创编

AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。

检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。

例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。

时间:2021.03.12 创作:欧阳文1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。

其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。

例:如要研究 a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term ”旁的,进入“Advanced Search”界面,如图1.2所示。

点击“Use context words and horizons”,然后在“Context Words”一栏键入of ,点击。

如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。

另外,图1.2 Advanced Search界面还需设定语境词距离检索项的位置,如本研究中,of在a的右二位置,所以“Content Horizon ”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。

结果可提取alot of, a bit of 等词块。

设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

ANTCOR 蚂蚁系列网络产品 2.4GHz超级无线终端机 说明书

ANTCOR 蚂蚁系列网络产品 2.4GHz超级无线终端机 说明书

目录ANTCOR蚂蚁系列网络产品 (1)2.4GHz 超级无线终端机车 (1)一、产品概述 (3)二、液晶配置 (5)2.1欢迎界面 (5)2.2屏幕菜单 (5)2.2.1设备信息 (5)2.2.2无线信息 (6)2.2.3无线扫描 (6)2.2.4学习过程 (7)2.2.5恢复出厂设置 (7)三、WEB页面配置 (8)3.1本地设置 (8)3.2登录设备 (8)3.3无线模式介绍 (9)3.3.1站点模式设置 (10)3.3.2中继模式设置 (12)3.3.3 3G无线路由器模式设置 (13)3.3.4无线路由器PPPoE设置 (14)3.3.5无线路由器设置 (15)3.4如何设置学习过的信息 (17)3.5高级设置 (19)3.6系统服务 (19)3.7系统设置 (19)3.8论坛 (22)附录A:无线安全设置说明 (22)附录B:无线工具使用说明 (23)7.1校正天线 (24)7.2 Ping (24)7.3跟踪路由 (24)7.4站点侦测 (25)附录C:FAQ(蚂蚁战车无线问答) (25)声明未经过本公司明确书面许可,任何单位或者个人不得擅自仿制、复制、眷抄或译本部分或者全部内容。

不得以任何形式或方法进行商品传播或用于任何商业、赢利目的。

本手册所提到的产品规格和资讯仅供参考,如内容有更新,恕不另行通知。

除非有特殊约定,本手册仅做为使用指导,本手册中的所有陈述,信息等均不构成任何形式的担保。

一、产品概述感谢您使用本公司ANTCOR蚂蚁系列网络产品2.4GHz 超级无线终端机(蚂蚁战车)。

这份手册将会帮助您完成所有的安装使用。

本包装内应该包含下列对象:2.4GHz 超级无线终端机(蚂蚁战车)5dBi天线5V、2A电源交叉网线保修卡注意:如有缺少请与经销商联系。

ANTCOR蚂蚁系列网络产品室内型超级无线终端机(蚂蚁战车),工作在2.4GHz频段,符合IEEE802.11b/g标准,采用OFDM(正交频分复用)技术,实际数据速率高达20Mbps 以上,具有速率高、传输距离远等特点,是小区无线覆盖/农村无线覆盖/校园无线覆盖/无线城市覆盖应用的最佳选择。

AntConc3.2.0的使用说明

AntConc3.2.0的使用说明

AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。

检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。

例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。

例:如要研究a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。

点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。

如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。

另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。

其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。

位置,如本研究中,of 在a 的右二位置,所以“Content Horizon”确定为,最后点击;c) 回到语境共现的界面后,点击,开始检索。

结果可提取a lot of, a bit of 等词块。

● 设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

AntConc软件使用说明书

AntConc软件使用说明书

AntConc (Windows, Macintosh OS X, and Linux)Build 3.4.1Laurence Anthony, Ph.D.Center for English Language Education in Science and Engineering, School of Science andEngineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, JapanJanuary 31, 2014IntroductionAntConc is a freeware, multiplatform tool for carrying out corpus linguistics research and data-driven learning. It runs on any computer running Microsoft Windows (tested on Win 98/Me/2000/NT, XP, Vista, Win 7), Macintosh OS X (tested on 10.4.x, 10.5.x, 10.6.x), and Linux (tested on Ubuntu 10, Linux Mint). It is developed in Perl using ActiveState's PerlApp compiler to generate executables for the different operating systems. Getting Started (No installation necessary)WindowsOn Windows systems, simply double click the AntConc icon and this will launch the program.Macintosh OS XOn Macintosh systems, simply double clicAk the AntConc icon and this will launch the program.LinuxOn Linux systems, change the permissions to allow AntConc to be run as an executable file. Next, double click the AntConc executable and it will launch.Caution: Do not place user settings files from older versions of AntConc in the same folder as new versions. This can cause unexpected problems and may even prevent AntConc from starting. It is recommended that you delete your earlier settings file and export it again from the AntConc file menu.Overview of ToolsAntConc contains seven tools that can be accessed either by clicking on their 'tabs' in the tool window, or using the function keys F1 to F7.Concordance Tool: This tool shows search results in a 'KWIC' (KeyWord In Context) format. This allows you to see how words and phrases are commonly used in a corpus of texts.Concordance Plot Tool This tool shows search results plotted as a 'barcode' format. This allows you to see the position where search results appear in target texts.File View Tool This tool shows the text of individual files. This allows you to investigate in more detail the results generated in other tools of AntConc .Clusters/N-Grams The Clusters Tool shows clusters based on the search condition. In effect it summarizes the results generated in the Concordance Tool or Concordance Plot Tool. The N-Grams Tool, on the other hand, scans the entire corpus for 'N' (e.g. 1 word, 2 words, …) length clusters. This allows you to find common expressions in a corpus.Collocates: This tool shows the collocates of a search term. This allows you to investigate non-sequential patterns in language.Word List: This tool counts all the words in the corpus and presents them in an ordered list. This allows you to quickly find which words are the most frequent in a corpus. Keyword List: This tool shows the which words are unusually frequent (or infrequent) in the corpus in comparison with the words in a reference corpus. This allows you to identify characteristic words in the corpus, for example, as part of a genre or ESP study.Concordance ToolThis tool shows search results in a 'KWIC' (KeyWord In Context)format. This allows you to see how words and phrases arecommonly used in a corpus of texts.The following steps produce a set of concordance lines from acorpus and demonstrate the main features of this tool.1) Select one or more files for processing from using the 'OpenFile(s)...' or 'Open Dir...' options in the 'File' menu. The list ofselected files is shown in the left frame of the main window.2) Enter a search term on which to build concordance lines in the search box.3) Choose the number of text characters to be outputted on either side of the search term, using theincrease and decrease buttons on the right of the button bar under the "Search Window Size" title.(default value is 50 characters)4) Click on the 'Start' button to start the concordance lines results generation. The concordance generationcan be halted at any time by clicking on the 'Stop' button.5) Use the Kwic Sort options to rearrange the concordance lines at three different levels. 0 is the searchword, 1L, 2L... are words to the left of the target word, 1R, 2R... are words to the right of the target word.6) Click on the 'Sort' button to start the sorting process.7) Move the cursor over the highlighted search term in one of the concordance lines. The cursor will changeto a small hand icon. Clicking on the highlighted search term, will allow you to view the search term hit as it appears in the original file via the File View Tool (see below).8) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.The total number of concordance lines generated (Concordance Hits) is shown at the top of the tool window. This number will flash with the word "FINISHED" when processing has been completed, and will flash with the word "NO HITS", if not hits are generated for a particular search term.Search terms can be specified as being "words" (default) or "character strings" by activating or deactivating the "Word" search term option. Also, searches can be either “case insensitive”(default) or “case sensitive”by activating or deactivating the "Case" search term option. Searches can also be made using full regular expressions by activating the "Regex" option. For details on how to use regular expressions, consult one of the many texts on the subject, e.g., Mastering Regular Expressions (O'Reilly & Associates Press) or type "regular expressions" in a web search engine to find many sites on the subject (e.g., /quickstart.html). AntConc supports Perl regular expressions.By clicking on the "Advanced Search" button, more complex searches become possible. The first advanced search option allows you to import a set of search terms, either by typing them one per line, or by loading in a list of search terms from a file. Here, each line will be treated as a separate search term. This feature allows you to use a large set of search terms without having to re-type them each time. The second advanced search option allows you to define context words and a context window within which the search term(s) must appear. For example, to search for "student" where it appears at least three words to the left or right of the word "university," set the search term as "student," the context word as "university," and set the context window as 'From' 3L 'To' 3R.A number of menu preferences are available with this tool. (See below).Concordance Plot ToolThis tool shows concordance search results plotted in a 'barcode'format, with the length of the text normalized to the width of the barand each hit shown as a vertical line within the bar. This allows you tosee the position where search results appear in target texts. The toolalso allows you to see which files include the target search term, andcan also be used to identify where the search term hits clustertogether. An example of the use of the Plot Tool is in determiningwhere specific content words appear in a technical paper, or wherean actor or story character appears during the course of a play ornovel.The number of hits and length of each text is shown to the right of the barcode plot, and the plot itself can be enlarged or reduced in size using the “Plot Zoom” buttons.If you move the cursor over the highlighted search term in one of the concordance lines, the cursor will change to a small hand icon. Clicking on the highlighted search term will allow you to view the search term hit as it appears in the original file via the File View Tool (see below).Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,” or "Regex" based. Advanced searches are also available. For details see the Concordance Tool explanation.File View ToolThis tool shows the raw text of individual files. This allows you toinvestigate in more detail the results generated in other tools ofAntConc.The following steps produce a view of the original file anddemonstrate the main features of this tool.1) Select a file to view in the “Corpus Files” list on the left of themain window.2) If a search term has been specified, the search term hits will behighlighted throughout the text. Search options are the sameas for the Concordance Tool and Concordance Plot Tool.3) Use the "Hit Location" buttons to jump to the appropriate hit in the file.4) Change the search term and click on the 'Start' button to view other hits in the file.5) Click on the highlighted text to generate a set of KWIC lines using the highlighted text as the search term.6) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,” or "Regex" based. Advanced searches are also available. For details see the Concordance Tool explanation.The following shortcut is unique to the File View Tool:CTRL-Click = Jumps to the nearest hit in the windowClusters/N-Grams ToolThe Clusters ToolThis allows you to search for a word or pattern and group (cluster) theresults together with the words immediately to the left or right of thesearch term. In effect it summarizes the results generated in theConcordance Tool or Concordance Plot Tool.The clusters can be ordered by frequency, the start or end of the word,the range of the cluster (number of files in which the cluster appears),or the probability of the first word in the cluster preceding theremaining words. All list orderings can also be inverted by activatingthe “Invert Order” option. Also, you can select the minimum and maximum length (number of words) in each cluster, and the minimum frequency of clusters displayed. It is also possible to select if the search term always appears on the left (default) or right of the cluster.Note: In the current version, if more than one word is specified as the search term, only the first word will appear on the right if the "Search Term on Right" option is selected.)The following steps produce a set of cluster results and demonstrate the main features of this tool.1) Choose the appropriate ordering options (see above for details).2) Press the 'Start' button. At any time, the generation of the clusters list can be halted using the 'Stop'button.3) Click on the cluster to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.The N-Grams ToolThis allows you to scan the entire corpus for 'N' word clusters (e.g. 1word, 2 words,… ). This allows you to find common expressions in acorpus. For example, n-grams of size 2 for the sentence "this is a pen"are 'this is', 'is a' and 'a pen'.All ordering options available in the Clusters Tool are also available inthe N-grams tool. You can also select the minimum and maximum size(number of words) in each n-gram, and the minimum frequency andrange of n-grams displayed.The following steps produce a set of N-gram results and demonstratethe main features of this tool.1) Click on the "N-Grams" option above the search entry box.2) Choose the appropriate ordering options.3) Press the 'Start' button. At any time, the generation of the n-grams list can be halted using the 'Stop'button.4) Click on the n-gram to generate a set of KWIC lines using the text as the search term.5) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.In both the Clusters Tool and N-Grams Tool, search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,”or "Regex" based. Advanced searches are also available for the Clusters Tool. For details see the Concordance Tool explanation. A number of menu preferences are available with this tool. (See below).Collocates ToolThis tool allows you to search for collocates of a search term. Thisallows you to investigate non-sequential patterns in language.The collocates can be ordered either by total frequency, frequency onthe left or right of the search term, or the start or end of the word.They can also be ordered by the value of a statistical measurebetween the search term and the collocate. The value measures how'related' the search term and the collocate are. Current possiblestatistical measures are listed below. All list orderings can also beinverted. Also, you can select the span of words to the left and rightof the search term in which to find collocates, and the minimum frequency of collocates displayed. If only a one-word span is required, for example, to see which words appear directly on the right of the search term, check the "Same" box, to keep the minimum and maximum span size the same.Statistical Measures:(MI) Mutual Information: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)(T-Score) T-Score: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)The following steps produce a set of collocate results and demonstrate the main features of this tool.1) Choose the appropriate ordering options.2) Press the 'Start' button. At any time, the generation of the collocates list can be halted using the 'Stop'button.3) Click on one of the collocates to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.Search terms can be specified as being "words" (default) or "character strings", and searches can be “case insensitive” (default), “case sensitive,”or "Regex" based. Advanced searches are also available for the Collocates Tool. For details see the Concordance Tool explanation. A number of menu preferences are available with this tool. (See below).Word List ToolThis tool counts all the words in the corpus and presents them in anordered list. This allows you to quickly find which words are the mostfrequent in a corpus.The words can be ordered either by frequency or the start or end ofthe word, and the ordering can be inverted. The word list can also begenerated in case-insensitive mode, where words in upper and lowercase are treated the same (default) or case-sensitive, where words inupper and lower case are treated separately.The following steps produce a word list and demonstrate the main features of this tool.1) Choose the appropriate ordering options.2) Press the 'Start' button. At any time, the generation of the word list can be halted using the 'Stop' button.3) Click on the word to generate a set of KWIC lines using the text as the search term.4) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.A number of menu preferences are available with this tool. (See below).Keyword ListThis tool shows the which words are unusually frequent (orinfrequent) in the corpus in comparison with the words in a referencecorpus. This allows you to identify characteristic words in the corpus,for example, as part of a genre or ESP study.The following steps produce a keyword list and demonstrate the mainfeatures of this tool.1) Select a set of target files.2) Go to the 'Preferences' menu and chose the 'KeywordPreferences' option.3) Choose the keyword generation method (a statistical measure) to calculate the 'keyness' of the targetfile words. The default setting of Log Likelihood is recommended. When using either Log Likelihood or Chi-squared as the statistical measure, the following significance values apply (see: /llwizard.html):95th percentile; 5% level; p < 0.05; critical value = 3.8499th percentile; 1% level; p < 0.01; critical value = 6.6399.9th percentile; 0.1% level; p < 0.001; critical value = 10.8399.99th percentile; 0.01% level; p < 0.0001; critical value = 15.134) Choose a threshold for the number of keywords to be displayed.5) Choose whether or not to view 'Negative Keywords' (target file words with an unusually low frequencycompared with the frequency in the reference corpus)6) Choose one of the reference corpus options. Select "Use raw file(s)" when you will use raw text (.txt) filesto serve as the reference corpus. Select "Use word list(s)" when you will use one of more word lists that are generated from a reference corpus. The "Use word list(s)" option allows you to generate keywords even when the original reference corpus is not available. The format for a word list is:RANK FREQUENCY WORD (separated by any type of white space, including spaces and tabs).1 12838 the2 11289 a3 8583 of...Note that blank lines and lines beginning with # will be ignored. Also, AntConc will check that the file(s) are correctly formatted and report any errors.7) Load the reference corpus of text (.txt) files, in the same way that the target files are chosen.8) The reference corpus directory will be shown (if appropriate), and the list of reference corpus files willappear at the bottom of the Keyword Preferences option menu.9) Click ‘Apply’ in the Keyword Preferences menu and return to the main Keywords window.10) Choose suitable options for displaying the list of generated Keywords (in a similar manner to the optionsfor generating a Word List).11) Press the 'Start' button. At any time, the generation of the keyword list can be halted using the 'Stop'button.12) Click on the keyword to generate a set of KWIC lines using the text as the search term.13) Click on the “Clone Results” button to create a copy of the results so that different sets of results can becompared.A number of menu preferences are available with this tool. (See below).MENU OPTIONSMenu options are divided into three groups, "File", "Global Settings" and "Tool Preferences". The options available in each group will be described below.<FILE>Options here relate to reading files into AntConc and writing files to the hard disk containing data of various types. There are also options to export all current settings to a file and import user settings from a file. If a user settings file becomes corrupted for any reason, simply restart the program or use the "Restore Default Settings" option to return the program to its original state.<GLOBAL SETTINGS>Categories here will have an effect on multiple tools in AntConc:<Character Encoding> AntConc is fully Unicode compliant, meaning that it can handle data in any language, including all European languages and Asian languages. The character encoding of the data to be read by AntConc should be specified here. For example, if you are working with data saved in a Western language, it will usually be encoded in iso-8859-1. On the other hand, Japanese texts are usually encoded in shiftjis.By specifying the correct encoding, data from all languages can be processed correctly within AntConc. The default is Unicode UTF-8, which is an international standard designed to display all characters of the languages of the world in a single encoding. I recommend you use this encoding if you create any corpus.<Colors> In the Color Settings category, you can edit the colors used to display results and other information.<Files> In the File Settings category, you can choose to display the full path of a file or just the name.<Fonts> In the Font Settings category, you can edit the font types, sizes, and styles used to display file names, results, and the search term.<Tags> In the Tag Settings category, you can choose to display or hide any tags that are contained in the corpus files. You can also choose to search with tags but hide them in the results display. Embedded tags(e.g. book_NN1), non-embedded tags (e.g. <noun>book</noun>, and header tags can be shown or hiddenby activating or deactivating the option.<Token (Word) Definition> In the Token (Word) Definition category, you can choose which characters, numbers and so on will define a "word". For example, in some cases only letters will be considered words, but at other times, in might be desirable to include numbers, dashes and so on. AntConc is fully Unicode compliant, meaning that it can handle data in any language, including all European languages and Asian languages. For this reason, the default option refers to 'letters' in the broadest sense. Letters, for example, include all English letters (a to z, A to Z) but also all Japanese 'letter' characters. It is also possible to define your own "token" definition, or append characters to the standard classes.For more information on the Unicode standards see:http://www.cs.tut.fi/~jkorpela/unicode/guide.html//Public/5.0.0/ucd/UCD.html/Public/UNIDATA/PropList.txt/charts/<Wildcard Settings> In the Wildcard Settings category, you can edit the default wildcard characters so that they do not clash with a search entry. For example, the "or" wildcard default character (a 'pipe' character | ) can be changed to a backslash (/) here. There are special wildcards to deal with whitespace issues.<TOOL PREFERENCES>Each tool (with the exception of Concordance Plot Tool and File View Tool) has a preferences category, where settings can be fine-tuned. All tool preference categories allow you to show or hide the different frames in which the results are displayed. For example, you can choose to hide the frame showing file names in the Concordance Tool display window.<Concordance> In addition to the above, the following settings can be made:●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.●Using the “Sort by characters instead of words” opti on, it is possible to arrange the results byCHARACTERS to the left or right of the first letter of the search term. This makes it possible to search for spelling differences.●Using the “Hide search term in KWIC display” option, search term can be hidden in the KWIC lines,allowing instructors to quiz students on possible words to fit the gap.●Using the “Put delimiter around hits in KWIC display” option(the default), the chosen delimitercharacter is added around the hit in the KWIC display. This makes it easier to see the hit and also eases later processing of the data in a spreadsheet software program.●Use the “Delimiter” option to select the delimiter character.●Use the “Line break replacement” option to select a character to replace line breaks with.<Clusters/N-Grams> In addition to the above, the following settings can be made:●Using the “Treat all data as lowercase” option (default) causes all words to be transformed to lower-case words. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.<Collocates Preferences> In addition to the above, the following settings can be made:●Use the "Selected Collocate Measure" to choose the statistical measure for measuring collocatestrength. Currently, two statistical measures can be used: Mutual Information (MI) and T-Score. See the tool explanation above for references to the statistics.●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.<Word List Preferences> In addition to the above the following settings can be made:●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to appear before lower-case words.●Use the lemma list options to select a lemma list. A 'lemma list' can be loaded from a file, which canthen be used to generate a lemma list instead of a word list. When the lemma list function is used, the 'lemma word form(s)' column will show the words in the corpus associated with each lemma.A lemma list can be created by specifying the 'lemma entry' follow by '->' followed by one or more'words' that should be assigned to the lemma separated by one of more non-tokens. See the example below:be->is, areplay->play, plays, playing, playedNote that in the example above, commas and spaces are assumed to be NOT defined as tokens. Forthis reason, if the lemma list available on the AntConc webpage is used, a 'dash' needs to be addedto the token (word) definition for the lemma list to be processed correctly as the hyphenated wordsare used to the right of the lemma definition.●Using the "Word List Range" option, a wordlist can be generated using all words (“Use all words”),or a specific set of words (“Use specific words below”), or ignoring a certain set of words (“Use a stoplist below”). The range of words to be used (or ignored) can be entered directly, or can be stored in files which are then read by AntConc by pressing the 'Open' button. A combination of words in a file and words directly entered can also be used.<Keyword List Preferences> In addition to the above the following settings can be made:●Using the “Treat all data as lowercase” option causes all words to be transformed to lower-casewords. This is useful to get accurate counts of words in certain cases.●Using the “Treat case in sort” option causes capitalized words to a ppear before lower-case words.●Use the "Keyness Generation Method" to choose the statistical measure for measuring keywordstrength. Currently, two statistical measures can be used: Chi-Squared and Log-Likelihood (the default). The default option for the 'keyness' measure is recommended.●Use the "Threshold Value" option to choose a cut-off for the keyness values generated. The defaultoption (“All values”) for the threshold value is recommended.●Use the "Show Negative Keywords" option to view words that are unusually INFREQUENT in thetarget corpus compared with the reference corpus.●Use the "Use raw file(s)" option to use raw reference corpus file(s) as the reference corpus.●Use the "Use word list(s)" option to use word list(s) that correspond to a reference corpus. The wordlist(s) should be formatted as described in the tool explanation.●Click the "Add Directory" or "Add Files" buttons to select the reference corpus files.●Click the "Swap with Target Files" button to swap the main and reference corpora. Note that thiswill only make sense when raw corpus files are being used.<HELP>The help menu provides access to this detailed readme file. It also shows some general information about AntConc, including the current version number and date of release.SHORTCUTSHere is a list of Shortcuts that apply to all tools using window panes for results.CTRL-C = Copies the currently selected textCTRL-A = Selects all text in the window paneALT-A = Selects all text in all window panes showingDouble click = Selects the current wordTriple click = Selects the current line in the window paneSHIFT-click = Selects continuous lines across all window panes showingCTRL-click = Selects discontinuous lines across all window panes showingDELETE = This deletes any selected lines that span across all window panesINSERT = This keeps any selected lines that span across all window panes, and deletes all othersFor any 'spinbox' widgets (e.g. the search term entry box) the 'UP' and 'DOWN' arrow keys on the keyboard can be used to activate the up and down buttons.。

用AntConc处理中文

用AntConc处理中文

用AntConc处理中文concordance, wordlist, N-gram不知道laohong用的什么宝贝!我的方法是这样的:我刚才是把Token Definition里面的letter token classes 下面的全部选中,再把Chinese Encoding 里面的第一项选中就行了,下面的我想就不用我来说了。

另外,我发现按照我的下午选项,其实没有进行分词的中文语料也是可以进行全文检索和显示的。

对不起各位,早上贴完帖子就搬家去了,累到现在才回家打开电脑。

这里是大家关心的我是如何用AntConc处理中文的:1、文本格式:大家有没有注意到上面贴的我试验AntConc的文本中既有中文简体、繁体也有英文?为了能在同一个文本中显示好中文简体、繁体和英文,我把所有文本都转存成UTF-8了。

也就是说,我用AntConc 处理的语料文本是存成UTF-8格式的,不是GB或Big5。

另外,中文文本是经过分词处理的。

请搜索本站找相关的自动分词和词性标注工具:SegT ag、ICTCLAS、NEUCSP、Hylanda、WinAT等。

2、设置AntConc:在Global Settings 下的Language Encodings,我没有选Chinese Encodings下的选项,而是选择了Unicode Encodings 下的Unicode(UTF-8)。

其它设置可以用默认的。

3、功能:这样设置后AntConc的功能就全部可以处理中文文本了,也就是说大家这样就可以用AntConc 来处理分词后的中文的Concordance,Wordlist,Cluster,N-Gram等等了。

Wordsmith 终于有了一个免费的竞争对手!请问如何显示卡方检验和互信息的值1.卡方检验是用做key words,需要一个参照语料库的wordlist和一个要分析语料库的wordlis.2.在Tool preference下选择Collocates preference 然后选择show statistics measure下的MI值或者T值然后再选择show collocate即可.3.2.1w 是最新版本,应该不是版本问题。

(完整版)AntConc的详细使用说明

(完整版)AntConc的详细使用说明

AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。

检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。

例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。

例:如要研究a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。

点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。

如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。

另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。

其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。

位置,如本研究中,of在a的右二位置,所以“Content Horizon ”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。

结果可提取a lot of, a bit of 等词块。

●设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

antconc使用

antconc使用

英语词频 日本人开发,支持中文。 北大计算所98年1月份人民日报分词语料为例 计算词频,生成词频表; 计算n元组的出现频率; 保存结果
处理中文之间要做个语言设置,否则显示乱码
Antconc包括以下工具: 索引 Concordance 索引定位 Concordance Plot 文件查看 File View 词丛 Clusters N元模式(部分词丛) N-grams 搭配 Collocates 词单 Wordlist 关键词单 Keyword List
将指针移到其中一行索引行突出的检索词上, 指针变成手形工具,点击检索词,可以看到 检索词在原文出现的情况。 注意:索引行的总数在“concordance hits” 下显示,处理结束时,会“FINISHED”;如果 没有产生索引行,则“NO HITS”,并且索引 行的窗口不会更新。
检索词可以通过“search term”上面的word选项 设定为“词(默认)”或“词的片段”,也可以 通过case来选择不区分大小写,也可选择 “Regex”使用完整的正则表达式。 /quickstart.html
按Advanced键,可以进行更为复杂的搜索。
两个高级搜索项: 定义一组检索词,可以一行一个的输入,也 可以直接载入文件中的检索词单,这个特征 允许用户使用一大组检索词,但不用每次重 复输入;
定义上下文词(context words)和一个上下 文的范围,在这个范围中必须出现检索词;
选择显示的关键词数极限值; 选择是否显示负关键词(show negative keywords),即与参照语料库相比目标语料 库中不同寻常的低频词; 选择一个文本文件的参照语料库; 参照语料库的文件列表将在参照语料库选项 下的窗口中显示出来; 点击Apply,返回主窗口; 选择生成关键词单的排列选项; 点开始键,可随时中止; 点击关键词会产生一组上下文关键词行。

AntConc软件基本操作

AntConc软件基本操作

主讲人:李广伟010302AntConc 软件介绍AntConc 功能介绍AntConc 功能演示AntConc基本操作AntConc是由日本早稻田大学(WasedaUniversity)教授Laurence Anthony开发的一款免费的语料库检索工具,主要用于语料库语言学、翻译学、外语教学等领域。

AntConc软件介绍 功能介绍图1 AntConc打开主界面如上图所示,AntConc包含“concordance”索引工具、“Concordance Plot”索引定位、“File View”文件查看、“Clusters/ N-Gram”词丛/N元模式、“Collocates”搭配、 “Word List”词表、“Keyword List”关键词表等菜单。

◆ 该软件具有提取语境共现、提取搭配词表、提取词频表等功能,以下《黄帝内经·素问》为例进行逐一说明:◆ 运用concordance工具进行提取语境共现,首先,单击File菜单,选择Open Files, 选择要打开的语料(如果想打开整个文件夹,可以选择open directory),然后,在下方Search Term下的输入框里输入“Huangdi”。

功能介绍AntConc功能演示提取语境共现图2“Huangdi”语境共现界面如2所示,“Huangdi”一词被用蓝色进行了凸显,《黄帝内经·素问》英译本里共出现“Huangdi”644次。

◆ 单击“start”,检索结果呈现在KWIC里显示,如下图所示:AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号,如在“Search Term”里输入“do|does|did|doing|done”(如图3),还可以单击“Advanced”,勾选“Use search term(s) from list below”。

在检索下面框手动输入(也可以直接加载一个txt词表用来检索),注意每个单词独立成行,设置完成后单击“Apply,然后回到语境共现界面。

AntConc-3.2.0-的使用说明

AntConc-3.2.0-的使用说明

AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。

检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。

例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。

例:如要研究a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。

点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。

如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。

另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。

其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。

图1.2 Advanced Search界面1位置,如本研究中,of在a的右二位置,所以“Content Horizon ”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。

结果可提取a lot of, a bit of 等词块。

●设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

语料库的三大功能

语料库的三大功能

我想有些亲可能并不是学英语的,需要处理中文的语料库,所以我就把Anthony的AntConc 的使用手册翻译了一下,没有全部,只是某些功能的使用步骤,版本是AntConc3.2.1w(windows)20071.索引工具(concordance)使用步骤1)从file菜单的open file 或open dir选择一个或多个要处理的文件,选出来的文件按顺序在主窗户的左边框里显示出来。

2)在左边search term下的输入框里输入一个搜索词3)使用右边"Search Window Size"的按钮条的增加和减少按钮来选择在搜索词两边显示的字符数。

4)按“Start”键开始产生索引行的检索结果。

检索过程中可按“stop"键随时停止检索。

5)使用Kwic Sort下的按钮条选择一个目标词来重排索引行,0是搜索词,1L,2L是搜索词左边的第一,第二个单词,1R,2R是搜索词右边第一,第二个单词。

注意,三级分类均可,软件刚启动时,二三级未选择。

6)按“Sort”键开始分类处理。

7)将指针移到其中一个索引行的突出的搜索词之上,系统默认为蓝色,与前一项分类得出的目标词不一致,是最开始的搜索词。

指针会转变成一个手形的图标。

点击突出的搜索词,可以使用户看到搜索词在原文中出现的情况。

见“File View"工具。

今天让我们来了解一下什么是语料库。

同样,为了让大家容易理解,我先不准备用专业术语。

可以这样理解,语料就是语言材料的集合。

学外语的同行通常少不了要给人翻译东西,有时候我们可能会碰到我们从来没有遇到过的东西,比如,广告或者说明书。

这时候,我们真希望有类似的用目的语写成的广告或者说明书在手头,我们可以参考,起码我们知道这种广告或者说明书该如何措辞,还有这种广告或者说明书的文本结构方面的特征。

我们可以把收集到的这些文本集合看做是简单的语料库。

所以,语料库本质上就是一文本集合。

最新AntConc的详细使用说明

最新AntConc的详细使用说明

AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。

检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。

例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。

例:如要研究a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。

点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。

如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。

另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。

其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。

位置,如本研究中,of 在a 的右二位置,所以“Content Horizon”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。

结果可提取a lot of, a bit of 等词块。

●设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

AntConc地详细使用说明书

AntConc地详细使用说明书

AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。

检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。

例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes ●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。

例:如要研究 a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。

点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。

如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。

另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。

其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。

位置,如本研究中,of在a的右二位置,所以“Content Horizon ”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。

结果可提取a lot of, a bit of 等词块。

●设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

AntConc3.2.0的使用说明11.提取语境共现1.1设置检索项(1)单项检索a)点击file下拉菜单中的“open files”,选择要打开的语料(如果想打开整个文件夹,可以选择open directory);b)在“Search Term”一栏键入要检索的词项,如go;c)在“Search Window Size”一栏设置每一共现行出现的词数;d)点击,开始检索。

检索结果如图1.1所示:图1.1单项检索结果(2)多项检索●设置多项检索除了检索单个词项以外,AntConc还具有检索多个词项的功能,检索方法为在检索项间键入“|”符号。

例:要检索动词go的各种时态形式,可在“Search Term”中输入go|went|gone|goes ●设置语境词检索为了限制语境共现的检索,可以设定一个语境词在检索项周边一定的语境范围内出现。

例:如要研究 a … of 这一类词组,可通过AntConc提取所有的词项,检索方法如下:a)在“Search Term”一栏键入a;b)点击“Search Term”旁的,进入“Advanced Search”界面,如图1.2所示。

点击“Use context words and horizons”,然后在“ContextWords”一栏键入of,点击。

如要重新设置语境词,可先点击清除原来语境词,后重复以上操作。

另外,还需设定语境词距离检索项的1此说明书由华南师范大学外文学院2007研究生张杏娟编写,导师何安平订正和补充。

其中限定范围的检索方法由香港城市大学D.Lee博士提供,仅此致谢。

位置,如本研究中,of在a的右二位置,所以“Content Horizon ”确定为,最后点击;c)回到语境共现的界面后,点击,开始检索。

结果可提取a lot of, a bit of 等词块。

●设置多字语检索在研究中,如需检索多个词项,除了使用“|”以外,也可使用以下方法,尤其适合检索项数目较多的情况。

例:研究感官动词watch, sound, feel, hear, smella)在TXT文本中键入所有要检索的词项,可多达250个词。

然后为该文本起名保存。

需注意:键入的词项需以列的形式排列。

如:feelfeelsfeltb)点击Search Term 旁的,选择“Use search term(s) from list below”。

点击,在保存以上新建的文本的盘符路径点击文本名,然后点击;c)回到语境共现的界面后,点击,开始检索。

(3)类别检索符号意义检索项检索结果* 零个或多个字符book* 提取所有以book打头的词,如book、books、booking、bookshop等*book 提取所有以book结尾的词,如book、notebook等*book* 可以同时提取包括以上两类词+ 零个或一个字符book+ 提取所有以book打头的词,但之后有零个或一个字母,如book、books 任意一个字符?ough 提取所有以字母组合ough结尾的,但之前有一个字母的词,如cough、rough等@ 零个或一个词think@of 提取所有含有的词组,如think of、thinkhighly of等# 任意一个词look# 提取所有与look的搭配,如look after、lookat等●附码检索因研究需要,有些语料经过整理加工并附加上各种符号标记,称为“附码语料库”。

如附有词性标注的LOBTAG和附有错误类型标记的CLEC等。

检索时只需键入某个标记符号便可提取带附有该标码的所有词。

例:提取LOBTAG语料库中所有的名词,只需键入*_NN(NN为名词标码,关于其他词性的详细标记,请参阅何安平,2004,《语料库语言学与英语教学》一书的附录113页。

(4)在指定范围内检索a) 在concordance的检索界面上选择"Regex" (regular expression),键入\[.*\] 为检索项便可提取语料库中所有在起止符号“[”和“]”之间的所有文字内容,其他符号照似类推。

b) 在concordance的检索界面上选择"Regex" (regular expression),键入\[.*write.*\] 为检索项便可提取语料库中所有在起止符号“[”和“]”之内的“write”的语境共现行,其他词项照似类推。

键入的检索项计较大小写,但是可以用通配符*。

1.2分析检索结果(1)观察频数、分布●频数即该检索项出现的次数,可在“Concordance Hits”一栏中获得。

●点击,查看检索项在语料文本中的分布状况。

(2)凸显周边语境词为了具体某个教学等目的,可通过凸显检索项周边的某些词汇。

方法是选择“Kwic Sort”, R1和L1分别代表检索项右方和左方的第一个词,一次可设置三列凸现词,均按字母顺序排列。

检索结果如图1.3所示。

如想使凸显内容的颜色一致,可通过设置下拉菜单中的“Color Settings”改变颜色。

另外,图1.4 Tool Preferences下拉菜单界面若要凸显的部分不是一个词,而是单词中的字母,可选择下拉菜单中的“Concordance”选项中的“Sort by characters instead of words”,如图1.4所示。

图1.3凸显周边语境词检索结果(3)提取搭配词表通过点击主界面中的,可获得检索项的搭配词表,同时可以设置搭配词的位置、出现的最少次数与词表的排列方式。

例:观察look右一的搭配词a)点击主界面中的;b)在“Search Term”一栏键入look;c)设置搭配词的位置,如;d)点击,开始检索,检索结果如图1.5所示。

e)点击“Sort by Freq”可根据不同的需要设定搭配词表的排列方式,如按频数排,按拼写字母排等等。

图1.5 提取搭配词表检索结果(4)提取搭配短语另外,也可以使用这一工具来提取搭配词块,且可设置检索项在词块中的位置。

例:检索以ask开头的搭配词块a)点击主界面中的;b)在“Search Term”一栏键入ask;c)设置检索项的位置,如选择“On theleft”;d)设置搭配词块的长度,如Min.Size:3, Max.Size:3;e)点击,开始检索,检索结图1.6提取搭配短语检索结果果如图1.6所示,所有的ask 被列在词块的左边。

(5)隐藏、分类和删除“隐藏”是指把检索结果中的检索项挖空,可用于教学或测试。

具体操作方法如下:a)在“Search Term”一栏键入要检索的词项,如look;b)点击,选择“Concordance”,再选择“Hide search term in KWIC display”,最后点击;c)点击,开始检索。

检索结果如下:you always do your own homework? Do you ******* for help when you think it necessary? Do you help2. 提取词频表2.1单字和N字语词频表单字词频表是指目标语料库的单词表,且词频表的检索结果是以每个词的形式及其频数排列。

方法如下:a)选择要生成单字词频的目标语料库;b)进入界面,设置词频表排列排序方式,如“Sort by Freq”;也可以设置为按词头的或者词尾的拼写字母顺序排列。

c)点击,开始检索,检索结果如图2.1所示。

图2.1 单字词频表检索结果N字语词频表是指目标语料库的多字语频数表。

例如,检索句子“This is a pen”的2字语词频表结果为:“this is”、“is a”、“a pen”。

N字语词频表的提取方法如下:a)选择要生成单字词频的目标语料库;b)进入界面,后点击;c)设置N字语词频表的长度,如d)选择词表的排序方式,如“Sort by Freq”;e)点击,开始检索,检索结果如图2.2所示。

图2.2 N字语词频表检索结果2.2词项重组---词簇化(lemmatizing)词簇化是将同一词性的某个词的所有曲折变化形式作削尾处理,并归为一个词簇来计算频数。

其好处是可以简约词频表并且引起对构词法的关注。

对词频表进行词簇化的方法如下:在界面生成词频表之后,拉下Tool Preference菜单,选择Lemma list options, 点击open 和load,上传lemma1文档(可在本网站下载)点击Apply (如图2. 3所示)。

词簇化的部分结果见图2.4.图2. 3 设置词簇化界面图2.4 词频表被词簇化后的结果(部分)。

图中1142例a和133例an被归为同一个词簇a共1275例。

3. 提取关键词表关键词表是指两个语料库的词频表相比,其中一个明显地高频于另一个的那部分词项表。

前一个称目标语料库;后一个称参照语语料库,通常规模要大一些,以此来凸现目标语料库的一些特别高频词以浮现该语料库的主题或内容特色。

3.1凸显目标语料库中显著性高频于对照语料库的词项具体操作方法如下:a)点击file下拉菜单中的“open files”,选择要对比的目标语料的语料(如果对比整个文件夹,可以选择open directory);b)点击主界面中的;c)点击,选择“Keyword List”,如图3.1所示;d)选择“Show negative keywords”,可在检索结果中显示对照语料明显高于目标语料的词;e)点击,选择对照语料,最后点击;图3.1 Tool Preferences对话框f)点击,开始检索,检索结果如图3.2所示。

图3.2 提取关键词表检索结果。

相关文档
最新文档