Cloud Computing Test Bed_ Hadoop as a Service Case Study
Filesystem clusters of up to 5000+ machines Pools of 10000+ clients 5+ Petabyte Filesystems
All in the presence of frequent HW failure
Utility Computing
Don’t buy computers, lease computing power Upload, run, download Ownership model
第ep: Cloud Computing
Service and data are in the cloud, accessible with any
Evolution of Computing with Network (2/2)
Grid Computing
Resource sharing across several domains
Decentralized, open standards
Global resource sharing
Applications on the Web
The Cloud
Cloud Computing
Cloud computing is a concept of using the internet to allow people to access technology-enabled services.
device connected to the cloud with a browser A key technical issue for developer:
毕业设计说明书英文文献及中文翻译学生姓名:学号:计算机与控制工程学院:专指导教师:2017 年 6 月英文文献Cloud Computing1。
Cloud Computing at a Higher LevelIn many ways,cloud computing is simply a metaphor for the Internet, the increasing movement of compute and data resources onto the Web. But there's a difference: cloud computing represents a new tipping point for the value of network computing. It delivers higher efficiency, massive scalability, and faster,easier software development. It's about new programming models,new IT infrastructure, and the enabling of new business models。
For those developers and enterprises who want to embrace cloud computing, Sun is developing critical technologies to deliver enterprise scale and systemic qualities to this new paradigm:(1) Interoperability —while most current clouds offer closed platforms and vendor lock—in, developers clamor for interoperability。
What the Enterprise Needs to Know about Cloud Computing
1982年,Sun公司(Sun Microsystems)首创了“网络即计算机”这一概念。
可口可乐公司(Coca-Cola)正在将35,000名员工所使用的邮件服务从Lotus Notes平台转向微软提供的Microsoft Exchange Online服务平台。
纳斯达克公司(NASDAQ)利用亚马逊的S3(Amazon Simple Storage Service)“云计算”服务,以储存股票和基金的历史数据,同时还利用富互联网应用程序提供新的利润增长点。
cloud computering
Cloud ComputingCloud computing,is a calculation based on the Internet, through this way, the hardware and software resources and information sharing may be provided to computers and other devices on demand. Cloud is actually a network, an Internet metaphor. The core idea of cloud computing, is a lot of computing resources unified management and scheduling with internet access, constitute a pool of computing resources to users on-demand services. Network resources is called "cloud." Cloud computing refers to narrow the delivery of IT infrastructure and usage patterns, refers to the network to demand, and scalable way to obtain the necessary resources; generalized cloud computing refers to the delivery of service and usage patterns, refers to on-demand through a network, easy to expand way to get needed services. This service can be IT and software, Internet-related, other services as well Definition of cloud computingCloud is linked by a series of interconnected and virtualized parallel and distributed computer system mode thereof. These virtual computers dynamically provide one or more unified computing and storage resources. These resources provide consultation and service consumers to circulate through the Service. Called cloud computing based on cloud computing. Simply put, cloud computing refers to Internet-based super-computing model. That is in the PC, a large memory capacity and processor resources servers and other storage devices together, unified management and collaborative work.A Brief HistoryIn 1983, Sun Microsystems that "the network is the computer" ("The Network is the Computer")March 2006, Amazon launched Elastic Compute Cloud (Elastic Compute Cloud; EC2) service.August 9, 2006, Google CEO Eric Schmidt (Eric Schmidt) Search Engine Assembly (SES San Jose 2006) first proposed the "cloud computing" concept. Google "cloud computing" from Google engineer Christopher Bisciglia did "Google 101" project. October 2007, Google and IBM began at the American University campus, this plan includes Carnegie Mellon University, Massachusetts Institute of Technology, StanfordUniversity, University of California at Berkeley and the University of Maryland, the promotion plan of cloud computing, hoping to reduce the cost of distributed computing technology in academic research, and to provide relevant software and hardware equipment and technical support (including hundreds of PC and BladeCenter and System x servers to these universities, these computing platforms will provide 1600 processor support, including Linux, Xen, Hadoop and other open-source platform). And can the students to large-scale computing-based research network development plans.January 30, 2008, Google announced the launch of "cloud computing academic program" in Taiwan, in cooperation with the Taiwan National Taiwan University, National Chiao Tung University and other schools, will this advanced large-scale, rapid computing technology to the campus.February 1, 2008, IBM (NYSE: IBM) announced that it will build the world's first cloud computing center (Cloud Computing Center) for the Chinese software company in China Wuxi Taihu New Town Science and Education Industrial Park.July 29, 2008, Yahoo, HP and Intel announced a covers the United States, Germany and Singapore joint research program, launched cloud computing research test bed, promote cloud computing. The plan to create six data centers with our partners as a research test platform, each data center configurations 1400-4000 processor. These partners include the Infocomm Development Authority of Singapore, the University of Karlsruhe, Germany Steinbuch Computing Center, University of Illinois Champaign United States, Intel Research, HP Labs and Yahoo. August 3, 2008, the US Patent and Trademark Office Web site information display, Dell is applying the "cloud computing"trademark, a move aimed at strengthening the future may reshape the technology.March 5, 2010, Novell and the Cloud Security Alliance (CSA) jointly announced a vendor-neutral plan, called "Trusted Cloud Computing Initiative (Trusted Cloud Initiative)".July 2010, NASA and include Rackspace, AMD, Intel, Dell and other vendors announced support "OpenStack" open-source project, Microsoft in October 2010 expressed support for OpenStack integrated with Windows Server 2008 R2; and Ubuntu Yiba added to the 11.04 version of OpenStack.In February 2011, Cisco Systems officially joined OpenStack, focus on developing OpenStack network services.October 20, 2011, "Shengtai Yun 'product MongoIC announced its official opening, which is China's first professional MongoDB cloud services, is the world's first company to support the MongoDB database recovery cloud service.Cloud computing principlesThe use of specific software in accordance with the priority and scheduling algorithm will calculate or assign specific data to be stored into the cloud environment of each node. Wherein each node in the cloud refers to the distributed computer.Cloud computing featuresTo become a cloud computing, it must have the characteristics of the following five aspects:1) the level of scalabilityLevel Scalability refers to the multi-chip cloud connectivity and integration capabilities for the cloud to work. For example, an offer cloud computing services (cloud computing) can provide access to cloud storage services (storage cloud) to hold the temporary intermediate values. Similarly, the two computing cloud can easily mix a bigger cloud computing.2) Vertical ScalabilityScalability refers to the vertical by enhancing the performance of a single or a plurality of nodes in the cloud to improve the performance of the entire capacity of the cloud. Moreover, in order to meet the needs of market development, cloud nodes must be able to escalate, ie vertical scalability.3) Internet-centricCloud computing platform operators to Internet-centric, storage and computing power distributed among the various nodes of the network is connected, thereby weakening the computing capability of the terminal, so that computing architecture of the Internet by the "Server + Client" to "cloud services platform + Client "evolution. This means that the major changes of the Internet, the Internet functions will be more powerful, and even lead to change the existing general pattern of enterprise information.4) VirtualizationThe underlying hardware, including servers, storage and network equipment virtualized, establish a shared resource pool can DAMA basis.5) The user transparencyCloud users transparency is indispensable to an important feature. Largely transparent to the user of the user's convenience. User transparency includes a transparent operation and technical transparent.(1) Operation Transparency: In the cloud computing environment for users all operations must be transparent, that is user cloud computing environment, the calculation operation or data stored in the cloud and its operation in the machine the corresponding operation is no different.(2) Technical Transparent: Transparent technology means that users do not care about the cloud nodes is how to work together and how extensible. Which includes a horizontal expansion transparent transparent scalability and vertical scalability and transparent.ApplicationCloud IOTThings of two business models: 1. MAI (M2M Application Integration), internal MaaS;2. MaaS (M2M As A Service), MMO, Multi-Tenants (multi-tenant model). Things traffic increases, the demand for data storage and computing capacity will bring to "cloud computing" capabilities: 1. Cloud computing: data from the computing center to center in the early stages of the Internet of Things, PoP to meet the demand; 2. In the advanced stage of things, it may appear MVNO / MMO operators (foreign countries has existed for many years), cloud computing requires virtualization, combining technology, SOA and other technologies to achieve ubiquitous networking service: TaaS (everyTHING As A Service).Cloud SecurityCloud security, by definition, is a from the "cloud computing" a new term evolved. "Cloud security (Cloud Security)" by a large number of clients in a mesh network behavior anomaly detection software, to obtain the latest information on the Internet Trojans, malicious programs, pushed to Server client for automatic analysis and processing, then viruses and Trojan horses solutions distributed to each client. Strategy of the cloud security is: the more users, each user more security, because such a large user base, enough to cover every corner of the Internet, as long as a site is linked to a new Trojan horse or virus appears, It will immediately be intercepted. Cloud StorageCloud storage is a new extension of the concept and developed in the cloud (cloud computing) concept refers to the application through the cluster, grid technology and distributed file systems and other functions, the network in a large variety of different types of storage devices set up by the application software work together to provide a system of external data storage and business access functions. When cloud computing and processing of the core system is a lot of storage and management of data, cloud computing system, you need to configure a large number of storage devices, so cloud computing system is transformed into a cloud storage system, so the cloud storage is a data storage Cloud computing systems and management as the core.Cloud gamesCloud game is cloud-based gameplay, the game runs in the cloud mode, all games are run on the server side, and delivered to the user via the network rendering After the completion of the game screen compression. On the client, the user's gaming device does not require any high-end processors and graphics cards, you only need a basic video decompression capability on it. On Today, the cloud game also did not become a console and handheld community networking model, because so far X360 still in use LIVE, PS is the PS NETWORK, wii is wi-fi. But the possibility of a few years or a decade later, the cloud replace these things become the ultimate direction of development of its network, it is very large. If this vision can become a reality, then the host manufacturers will become the network operator, they do not need to continue to invest huge research and development costs of new hosts, but simply tobring in a small fraction of the money to upgrade their servers on the line, But to achieve the effect is almost the same. For users, they can save for later spending a host, but get really top of the game screen (of course, video output for hardware must be excellent). You can imagine a handheld and a home machine has the same screen, home machine, and we have today is as simple to use set-top boxes, and even home machine can replace the TV set-top box and became the television viewing mode times era.EpilogueCloud computing is a new, promising computing model. Cloud computing to share computer resources cloud computing and storage resources of each node to its fundamental purpose, to provide to the end user needed its main functions. This paper describes the concept of cloud computing, principles and characteristics, and then introduces the features and design principles of service-based architecture, and then proposes a realistic cloud computing services architecture based four-layer structure, and a detailed analysis of the characteristics of the layers and function, and finally describes the calculation of cloud-based services architecture process and prove cloud computing-based services architecture scalability and transparent to the user in terms of better than grid computing.。
用户端负载降低 降低总体拥有成本
按需扩展资源 使应用具有高可用性
按使用付费 可能将应用的开发与基础设施维护相 对分离 不需要为一次性任务或罕见的负载状 况准备大量设备
云计算是随着处理器技术、虚拟化技术、分布 式存储技术、宽带互联网技术和自动化管理技术的 发展而产生的. 这种大规模的计算能力通常是由分 布式的大规模集群和服务器虚拟化软件搭建。
2011 云计算
Cloud Computing
1990 网格计算
Grid Computing
云是一些可以自我维护和管理的虚拟计算资源,通常为一些大型服务器 集群,包括计算服务器、存储服务器、宽带资源等等。 云计算是基于互联网的超级计算模式--即把存储于个人电脑、移动电话和 其他设备上的大量信息和处理器资源集中在一起,协同工作。在极大规 模上可扩展的信息技术能力向外部客户作为服务来提供的一种计算方式。
数据管理技术 云计算的特点是对海量的数据存储、读取后进行大量的分析,如何提高数据的更新速率以及进一步提高随机读速率是未来的数据管理技术必须解决的问题。云计算的数据管理技术最著名的是谷歌的BigTable数据管理技术,同时Hadoop开发团队正在开发类似BigTable的开源数据管理模块。
数据存储技术 云计算系统需要同时满足大量用户的需求,并行地为大量用户提供服务。因此,云计算的数据存储技术必须具有分布式、高吞吐率和高传输率的特点。目前数据存储技术主要有Google的GFS(Google File System,非开源)以及HDFS(Hadoop Distributed File System,开源),目前这两种技术已经成为事实标准。
虚拟机技术 虚拟机,即服务器虚拟化是云计算底层架构的重要基石。在服务器虚拟化中,虚拟化软件需要实现对硬件的抽象,资源的分配、调度和管理,虚拟机与宿主操作系统及多个虚拟机间的隔离等功能,目前典型的实现(基本成为事实标准)有Citrix Xen、VMware ESX Server 和Microsoft Hype-V等。
分布式编程与计算 为了使用户能更轻松的享受云计算带来的服务,让用户能利用该编程模型编写简单的程序来实现特定的目的,云计算上的编程模型必须十分简单。必须保证后台复杂的并行执行和任务调度向用户和编程人员透明。当前各IT厂商提出的“云”计划的编程工具均基于Map-Reduce的编程模型
Technical benefits
• • • • •
Allow federated cluster experiments/benchmarks Allow innovation at all levels of the cloud computing infrastructure stack Commitment to openness in sharing software, tools, best practices Collection of usage statistics Example research areas:
What worked well
− Security Sweet spot: OpenId & AWS REST Auth & Diffie-Hellman − MapReduce also useful for CPU-bound apps but completion time prediction and efficient failover becomes trickier − Stream+Python powerful combination
8 9/18/2011
Hadoop as a Service
Why a Hadoop service?
• •
Simplify installation, setup and provisioning Research Questions:
− How to support multi-tenancy with QoS differentiation − How to optimize workflows across users with fluctuating capacity requirements
Differentiated Hadoop services continued
Market-based resource allocator, Tycoon ()
− Continuous bidding (of spending rates) for resource capacity
P: Some users are more risk averse than others (can tolerate less fluctuations) S: Bid on nodes based on predicted guarantee to deliver a QoS level
− Why a Hadoop service? − Differentiated Hadoop services − Economic job optimization − Experiment results − Demo
Cloud Computing Test Bed
Key features:
− On-demand creation − Dynamic resource flexing
Differentiated Hadoop services
• • • •
More important jobs should preempt less important jobs Time critical jobs need to meet deadlines Test jobs need no stringent QoS guarantees How to get users to truthfully reveal their resource requirements?
• VLAN isolated mini-datacenter
− Reset, reboot, power up, power down, get status
• Bias towards large and short experiments • Site coordination required, e.g. accounting
Economic job optimization
− Not all subtasks need maximum capacity at all times
பைடு நூலகம்
− Automatically rescale the capacity as needed to optimize the cost/benefit ratio of the workflow as a whole
Cloud Computing Test Bed: Hadoop as a Service Case Study
Thomas Sandholm, Kevin Lai from Hewlett-Packard Laboratories, Palo Alto
Thomas Sandholm, Kevin Lai from Hewlett-Packard Laboratories, Palo Alto
What is the cloud computing test bed?
University of Illinois, Urbana Champaign Karlsruhe University, Germany
Intel Research, Pittsburgh
Yahoo! Research, Sunnyvale HP Labs, Palo Alto, Bristol
Part I: Cloud Computing Test Bed
− What is the cloud computing test bed? − Technical benefits − Infrastructure stack
Part II: Hadoop as a Service
Experiment setup
• • • • • •
2 Competing Hadoop users GridMix on 40 nodes 25 GiB input data 6-13 MapReduce Tasks 7-12 minutes/job 30-job sequential workflow/user
Optimization strategies continued
Best Response:
P: When other users place competing bids, optimal configuration/allocation might change S: Find game theoretical best response bids continuously to maximize utility
Experiment results
10-12% performance improvement
17 9/18/2011
Experiment results continued
45% efficiency improvement
18 9/18/2011
Lessons learned
Application Services (Example: Mahout, PIG ) Infrastructure Services Optional Interfaces (Example: Hadoop, NFS) Virtual Resource Set (Example: Tycoon, Eucalyptus) Physical Resource Set Mandatory Interface Recommendations (Example: Tashi, Emulab) Hardware (Example: computers, network, storage)
Data Reduction:
P: Early phases of workflow more data intensive S: Use decaying spending rates when bidding
P: During map/reduce synch up some nodes may be bottlenecks S: Redistribute funds to active bottlenecks
Infocomm Development Authority, Singapore
1000-4000 cores/site
4 9/18/2011
Goal of the cloud computing test bed
Promote −collaborative cloud computing research among
− Application scalability profile not perfectly linear
Optimization strategies
P: Some tasks/nodes more performance critical than others S: Declare relative priority of mappers and reducers or critical nodes and split budget accordingly (e.g. master funding boost)