Java Treeview — 微阵列数据的可扩展可视化
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
上海第二工业大学本科外文翻译Java Treeview—extensible visualization of microarray data ABSTRACT
Summary: Open source software encourages innovation by allowing users to extend the functionality of existing applications.Treeview is a popular application for the visualization of microarray data, but is closed-source and platform-specific,which limits both its current utility and suitability as a platform for further development. Java Treeview is an open-source,cross-platform rewrite that handles very large datasets well,and supports extensions to the file format that allow the results of additional analysis to be visualized and compared.The combination of a general file format and open source makes Java Treeview an attractive choice for solving a class of visualization problems. An applet version is also available that can be used on any website with no special server-side setup.
Availability: under GPL.
Contact: alokito@
INTRODUCTION
Java Treeview is an enhanced, open source, cross-platform rewrite of the original, windows-only Treeview (Eisen et al.,1998). The original Treeview provides a simple interface for viewing the results of hierarchical clustering. The clustering is done by a separate program that creates a tab-delimited text file called a ‘clustered data file’ or CDT file. The simple structure of the CDT file allows developers to use it as an output format for other tools. Although it was originally developed for gene expression data, Treeview has since been used to view the results of hierarchical clustering of other types of data, including GFP reporter levels and motif significance scores. In addition to making the
functionality of Treeview available to a large audience, Java Treeview supports a generalized CDT format that allow many additional details, such as colors of genes,arrays, nodes and heights of terminal branches, to be specified.Furthermore, the generic structure of the generalized CDT and open source nature of Java Treeview encourage not only further enrichment of the dendrogram representation, but also the development of completely novel visualizations, several of which are presented here. These features dramatically expand the utility of Treeview, and have met with an encouraging response; in the 6 months following the 1.0 release, Java Treeview has been downloaded over 4000 times through word of mouth, the majority of times byWindows users with access to the original parison of multiple analysis methods is an important task that can be aided by superior visualization tools. New methods of microarray analysis appear almost daily in the literature. Also, to the researcher in the field it is tempting to consider minor modifications to existing methods. Many of these analyses associate a score or annotation with genes,arrays or nodes. The generalized CDT file, described in the manual, provides a natural place to put this information. Java Treeview loads the generalized CDT file into a standard data structure and makes it available for representation within an existing or novel visualization. For example, the location of genes on a microarray may be relevant to an investigation of spatial bias, the location of genes on the chromosome may be of interest in an array CGH experiment, and in many cases arrays may be annotated to distinct classes, for instance in a cancer study different types of source tumors. These additional types of data can be incorporated into the CDT file and represented either alongside the dendrogram or in a separate tabbed display (see Features and Fig. 1). All displays are linked together with a shared selection model, so that genes selected according to a particular criterion in one display are also selected in the other displays. This aids in comparison of the different visualizations, and adds value over special purpose visualization tools.
There are a wide variety of tools available to the genomics researcher. A few examples, include A V A (Zhou and Liu,2003), TRANSPATH (Krull et al., 2003), Genesis (Sturn et al., 2002) and J-Express (Dysvik and Jonassen, 2001).However, many of these tools are monolithic applications that proscribe the types of data and analysis that may be done, are available for only a few platforms, or come with commercial entanglements. Java Treeview is thus closest in spirit to the TableView application (Johnson et al., 2003), but is built to display large phylogeny-based datasets with hierarchical trees; under Mac OSX, displaying a clustergram of 233 arrays × 5380 genes = 1 253 540 cells consumes 125 MB of RAM; displaying one of 183 arrays × 21 819 genes = 3 992 877 cells consumes 300 MB of RAM. This puts the display of even very large clusters into the realm of cross-platform computing. Combined with an analysis program, such as XCluster (Sherlock, 2000) (/∼sherlock/cluster .html) or Cluster 3.0(De Hoon et al., 2004), a spreadsheet editor such as Excel,and scripted automation using packages such as PCL Analyisis(), Java Treeview is a powerful tool for discovering patterns in genome-wide datasets.
BUILT-IN FUNCTIONALITY OF JA V A
TREEVIEW
Java Treeview provides fine control over the appearance of the dendrogram. Within the generalized CDT file, the foreground and background colors of gene and array annotations,as well as the color of each node branch in the array and gene dendrograms can be independently specified, greatly increasing the information that can be represented. In addition,time can be used instead of correlation to organize the dendrogram, allowing the visualization of phylogenetic trees. A customizable context-specific information window in the upper left displays information such as the number 3247
Downloaded from https:///bioinformatics/article-abstract/20/17/3246/186177
by guest on 24 May 2018
A.J.Saldanha
of genes selected, the correlation of the selected node and relevant annotations depending upon the cursor location.Instead of a single annotation, arbitrary combinations of annotations can be displayed adjacent to the dendrogram.Clicking on a gene or array annotation opens relevant Web databases in an external browser, provided that there is an annotation column containing an identifier and an URL template for the external database. Templates for SGD(yeast),SOURCE(human), WormBase, FlyBase, the mouse Genome Database and GenomeNet Escherichia coli are provided by default, and there is a repository on the website describing additional URL templates. Java Treeviewcurrently supports three newvisualizations in addition to the dendrogram. Arbitrary gene scores can be used to produce scatterplots, gene locations can be used to produce a karyoscope plot and aligned sequence can be used to create a dendrogram-like view of sequence data. Motivating uses of the first two visualizations are provided in the accompanying figure, and sequence alignment visualization is described in the examples section of the website. Examples, mini-tutorials and an extensive User Manual are available from the website ().
An open-source, PERL based package, ‘helper-scripts’, is provided to aid in the incorporation of data into the .cdt format for visualization. Currently, all views support output to raster-based images(PNG, PPM and JPEG) and output of subsets of the data to tab-delimited text for further analysis. The dendrogram display also supports output to vector-based postscript files.
译:
摘要
简介:开源软件通过允许用户扩展现有应用程序的功能来鼓励创新.Treeview是用于微阵列数据可视化的流行应用程序,但是是封闭源代码和特定于平台的,它将当
前的实用性和适用性限制为进一步发展的平台。
Java Treeview是一个开源的跨平台重写,可以很好地处理非常大的数据集,并且支持对文件格式的扩展,从而可以显示和比较附加分析的结果。
一般文件格式和开源组合Java Treeview是解决一类可视化问题的有吸引力的选择。
小应用程序版本也可用于任何没有特殊服务器端设置的网站。
可用性:在GPL下。
联系人:alokito@
介绍
Java Treeview是一个增强的,开源的,跨平台重写原始的,仅限Windows的Treeview (Eisen et al。
,1998)。
原始的Treeview提供了一个简单的界面来查看层次聚类的结果。
集群由一个单独的程序完成,该程序创建一个名为“集群数据文件”或CDT 文件的制表符分隔文本文件。
CDT文件的简单结构允许开发人员将其用作其他工具的输出格式。
虽然它最初是为基因表达数据而开发的,但TreeView已被用于查看其他类型数据(包括GFP记者级别和主题显着性分数)的等级聚类结果。
除了使Treeview的功能可供广大读者使用外,Java Treeview还支持广义的CDT格式,该格式允许指定诸如基因,数组,节点和终端分支高度等许多额外细节。
此外,广义CDT的结构和Java Treeview的开源特性不仅鼓励树形图表示的进一步丰富,而且还鼓励完全新颖的可视化的发展,其中几个在这里展示。
这些功能极大地扩展了Treeview的实用性,并得到了令人鼓舞的回应;在1.0版本发布后的6个月内,Java Treeview已经通过口碑被下载了超过4000次,大部分时间由Windows用户访问原始Treeview。
多种分析方法的比较是一项重要任务,可以由上级可视化工具。
微阵列分析的新方法几乎每天都在文献中出现。
此外,对该领域的研究人员来说,考虑对现有方法进行较小的修改是很有诱惑力的。
许多这些分析将分数或注释与基因,阵列或节点相关联。
手册中描述的广义CDT文件提供了一个放置此信息的自然场所。
Java Treeview将广义CDT文件加载到标准数据结构中,并使其可用于在现有或新颖的可视化中进行表示。
例如,微阵列上基因的位置可能与调查空间偏倚有关,染色体上基因的位置可能在阵列CGH实验中感兴趣,并且在许多情况下,阵列可能被
注释为不同的类别,例如在癌症研究中不同类型的来源肿瘤。
这些附加类型的数据可以合并到CDT文件中,并且可以在树状图的旁边显示,也可以在单独的标签显示中显示(参见特征和图1)。
所有的显示都与一个共享的选择模型连接在一起,以便在其他显示中选择根据特定标准在一个显示中选择的基因。
这有助于比较不同的可视化,并为特殊用途的可视化工具增加了价值。
基因组学研究人员可以使用各种各样的工具。
一些例子包括AVA(Zhou和Liu,2003),TRANSPATH(Krull等人,2003),Genesis(Sturn等人,2002)和J-Express(Dysvik 和Jonassen,2001)。
然而,其中许多工具是单片应用程序,它们可以禁止可能完成的数据和分析类型,仅适用于少数平台,或者与商业纠纷一起使用。
因此,Java Treeview与TableView应用程序在精神上最为接近(Johnson et al。
,2003),但它的目的是显示具有分层树的大型基于系统发育的数据集;在Mac OSX下,显示233个阵列×5380个基因= 1 253 540个单元的聚类图消耗125 MB的RAM;显示183个阵列中的一个×21 819个基因= 3个992 877个细胞消耗300 MB的RAM。
这使得即使是非常大的集群也能进入跨平台计算领域。
结合诸如XCluster(Sherlock,2000)(/~sherlock/cluster.html)或Cluster 3.0(De Hoon等人,2004)的分析程序,电子表格编辑器如Excel和使用软件包(如PCL Analyisis)的脚本自动化(http:// pcl-analysiJava Treeview是一种用于发现全基因组数据集中模式的强大工具.
内置的JAVATREEVIEW功能
Java Treeview提供了对树状图外观的精细控制。
在广义CDT文件中,基因和数组注释的前景和背景颜色,以及阵列中每个节点分支的颜色和基因树状图可以独立指定,大大增加了可以表示的信息。
此外,可以使用时间代替相关性来组织树状图,从而允许进化树的可视化。
左上角的可自定义的特定于上下文的信息窗口显示https:///bioinformatics/article-abstract/20/17/3246/1861 77下载的数字3247A.J .Saldanhaof基因被选择,所选节点和相关注释的相关性取决于光标的位置。
取代单个注释,注释的任意组合可显示在树形图的旁边。
单击基因或数组注释可打开相关的Web数据库一个外部浏览器,前提是有一个注释列包含
外部数据库的标识符和URL模板。
SGD(酵母),SOURCE(人类),WormBase,FlyBase,鼠标Genome Database和GenomeNet Escherichia coli的模板默认提供,网站上还有一个存储库,用于描述其他URL模板。
除树状图外,Java Treeview目前支持三种新的可视化。
可以使用任意基因评分来产生散点图,可以使用基因位置来产生核聚变图,并且可以使用对齐序列来创建序列数据的类树状图。
在附图中提供了前两种可视化的激励使用,序列对齐可视化在网站的示例部分进行了描述。
可以从网站()获得示例,迷你教程和广泛的用户手册。
提供了一个基于PERL的开放源代码软件包'helper-scripts',以帮助合并数据转换为.cdt格式进行可视化。
目前,所有视图都支持输出到基于栅格的图像(PNG,PPM 和JPEG),并将数据的子集输出到制表符分隔的文本以供进一步分析。
树状图显示也支持输出到基于矢量的postscript文件。