云计算外文翻译参考文献

合集下载

Hadoop云计算外文翻译文献

Hadoop云计算外文翻译文献

Hadoop云计算外文翻译文献(文档含中英文对照即英文原文和中文翻译)原文:Meet HadoopIn pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers.—Grace Hopper Data!We live in the data age. It’s not easy to measure the total volume of data stored electronically, but an IDC estimate put the size of the “digital universe” at 0.18 zettabytes in2006, and is forecasting a tenfold growth by 2011 to 1.8 zettabytes. A zettabyte is 1021 bytes, or equivalently one thousand exabytes, one million petabytes, or one billion terabytes. That’s roughly the same order of magnitude as one disk drive for every person in the world.This flood of data is coming from many sources. Consider the following:• The New York Stock Exchange generates about one terabyte of new trade data perday.• Facebook hosts approximately 10 billion photos, taking up one petabyte of storage.• , the genealogy site, stores around 2.5 petabytes of data.• The Internet Archive stores around 2 petabytes of data, and is growing at a rate of20 terabytes per month.• The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year.So there’s a lot of data out there. But you are probably wondering how it affects you.Most of the data is locked up in the largest web properties (like search engines), orscientific or financial institutions, isn’t it? Does the advent of “Big Data,” as it is being called, affect smaller organizations or individuals?I argue that it does. Take photos, for example. My wife’s grandfather was an avid photographer, and took photographs throughout his adult life. His entire corpus of medium format, slide, and 35mm film, when scanned in at high-resolution, occupies around 10 gigabytes. Compare this to the digital photos that my family took last year,which take up about 5 gigabytes of space. My family is producing photographic data at 35 times the rate my wife’s grandfather’s did, and the rate is increasing every year as it becomes easier to take more and more photos.More generally, the digital streams that individuals are producing are growing apace. Microsoft Research’s MyLifeBits project gives a glimpse of archiving of pe rsonal information that may become commonplace in the near future. MyLifeBits was an experiment where an individual’s interactions—phone calls, emails, documents were captured electronically and stored for later access. The data gathered included a photo taken every minute, which resulted in an overall data volume of one gigabyte a month. When storage costs come down enough to make it feasible to store continuous audio and video, the data volume for a future MyLifeBits service will be many times that.The t rend is for every individual’s data footprint to grow, but perhaps more importantly the amount of data generated by machines will be even greater than that generated by people. Machine logs, RFID readers, sensor networks, vehicle GPS traces, retail transactions—all of these contribute to the growing mountain of data.The volume of data being made publicly available increases every year too. Organizations no longer have to merely manage their own data: success in the future will be dictated to a large extent by their ability to extract value from other organizations’ data.Initiatives such as Public Data Sets on Amazon Web Services, , and exist to foster the “information commons,” where data can be freely (or in the case of AWS, for a modest price) shared for anyone to download and analyze. Mashups between different information sources make for unexpected and hitherto unimaginable applications.Take, for example, the project, which watches the Astrometry groupon Flickr for new photos of the night sky. It analyzes each image, and identifies which part of the sky it is from, and any interesting celestial bodies, such as stars or galaxies. Although it’s still a new and experimental service, it shows the kind of things that are possible when data (in this case, tagged photographic images) is made available andused for something (image analysis) that was not anticipated by the creator.It has been said that “More data usually beats better algorithms,” which is to say that for some problems (such as recommending movies or music based on past preferences),however fiendish your algorithms are, they can often be beaten simply by having more data (and a less sophisticated algorithm).The good news is that Big Data is here. The bad news is that we are struggling to store and analyze it.Data Storage and AnalysisThe problem is simple: while the storage capacities of hard drives have increased massively over the years, access speeds--the rate at which data can be read from drives--have not kept up. One typical drive from 1990 could store 1370 MB of data and had a transfer speed of 4.4 MB/s, so you could read all the data from a full drive in around five minutes. Almost 20years later one terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.This is a long time to read all data on a single drive and writing is even slower. The obvious way to reduce the time is to read from multiple disks at once. Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes.Only using one hundredth of a disk may seem wasteful. But we can store one hundred datasets, each of which is one terabyte, and provide shared access to them. We can imagine that the users of such a system would be happy to share access in return for shorter analysis times, and, statistically, that their analysis jobs would be likely to be spread over time, so they wouldn`t interfere with each other too much.There`s more to being able to read and write data in parallel to or from multiple disks, though. The first problem to solve is hardware failure: as soon as you start using many pieces of hardware, the chance that one will fail is fairly high. A common way of avoiding data loss is through replication: redundant copies of the data are kept by the system so that in the event of failure, there is another copy available. This is how RAID works, for instance, although Hadoop`s filesystem, the Hadoop Distributed Filesystem (HDFS),takes a slightly different approach, as you shall see later. The second problem is that most analysis tasks need to be able to combine the data in some way; data read from one disk may need to be combined with the data from any of the other 99 disks. Various distributed systems allow data to be combined from multiple sources, but doing this correctly is notoriously challenging. MapReduce provides a programming model that abstracts the problem from disk reads and writes, transforming it into a computation over sets of keys and values. We will look at the details of this model in later chapters, but the important point for the present discussion is that there are two parts to the computation, the map a nd the reduce, and it’s the interface between the two where the “mixing” occurs. Like HDFS, MapReduce has reliability built-in.This, in a nutshell, is what Hadoop provides: a reliable shared storage and analysis system. The storage is provided by HDFS, and analysis by MapReduce. There are other parts to Hadoop, but these capabilities are its kernel.Comparison with Other SystemsThe approach taken by MapReduce may seem like a brute-force approach. The premise is that the entire dataset—or at least a good portion of it—is processed for each query. But this is its power. MapReduce is a batch query processor, and the ability to run an ad hoc query against your whole dataset and get the results in a reasonable time is transformative. It changes the way you think about data, and unlocks data that was previously archived on tape or disk. It gives people the opportunity to innovate with data. Questions that took too long to get answered before can now be answered, which in turn leads to new questions and new insights.For example, Mailtrust, Rackspace’s mail division, used Hadoop for processing email logs. One ad hoc query they wrote was to find the geographic distribution of their users.In their words: This data was so useful that we’ve scheduled the MapReduce job to run monthly and we will be using this data to help us decide which Rackspace data centers to place new mail servers in as we grow. By bringing several hundred gigabytes of data together and having the tools to analyze it, the Rackspace engineers were able to gain an understanding of the data that they otherwise would never have had, and, furthermore, they were able to use what they had learned to improve the service for their customers. You can read more about how Rackspace uses Hadoop in Chapter 14.RDBMSWhy can’t we use databases with lots of disks to do large-scale batch analysis? Why is MapReduce needed? The answer to these questions comes from another trend in disk drives: seek time is improving more slowly than transfer rate. Seeking is the process of moving the disk’s head to a particular place on the disk to read or write data. It characterizes the latency of a disk operation, whereas the transfer rate corresponds to a disk’s bandwidth.If the data access pattern is dominated by seeks, it will take longer to read or write large portions of the dataset than streaming through it, which operates at the transfer rate. On the other hand, for updating a small proportion of records in a database, a traditional B-Tree (the data structure used in relational databases, which is limited by the rate it can perform seeks) works well. For updating the majority of a database, a B-Tree is less efficient than MapReduce, which uses Sort/Merge to rebuild the database.In many ways, MapReduce can be seen as a complement to an RDBMS. (The differences between the two systems are shown in Table 1-1.) MapReduce is a good fit for problems thatneed to analyze the whole dataset, in a batch fashion, particularly for ad hoc analysis. An RDBMS is good for point queries or updates, where the dataset has been indexed to deliver low-latency retrieval and update times of a relatively small amount of data. MapReduce suits applications where the data is written once, and read many times, whereas a relational database is good for datasets that are continually updated.Table 1-1. RDBMS compared to MapReduceTraditional RDBMS MapReduceData size Gigabytes PetabytesAccess Interactive and batch BatchWrite once, read many times Updates Read and write manytimesStructure Static schema Dynamic schemaIntegrity High LowScaling Nonlinear LinearAnother difference between MapReduce and an RDBMS is the amount of structure in the datasets that they operate on. Structured data is data that is organized into entities that have a defined format, such as XML documents or database tables that conform to a particular predefined schema. This is the realm of the RDBMS. Semi-structured data, on the other hand, is looser, and though there may be a schema, it is often ignored, so it may be used only as a guide to the structure of the data: for example, a spreadsheet, in which the structure is the grid of cells, although the cells themselves may hold anyform of data. Unstructured data does not have any particular internal structure: for example, plain text or image data. MapReduce works well on unstructured or semistructured data, since it is designed to interpret the data at processing time. In other words, the input keys and values for MapReduce are not an intrinsic property of the data, but they are chosen by the person analyzing the data.Relational data is often normalized to retain its integrity, and remove redundancy. Normalization poses problems for MapReduce, since it makes reading a record a nonlocaloperation, and one of the central assumptions that MapReduce makes is that it is possible to perform (high-speed) streaming reads and writes.A web server log is a good example of a set of records that is not normalized (for example, the client hostnames are specified in full each time, even though the same client may appear many times), and this is one reason that logfiles of all kinds are particularly well-suited to analysis with MapReduce.MapReduce is a linearly scalable programming model. The programmer writes two functions—a map function and a reduce function—each of which defines a mapping from one set of key-value pairs to another. These functions are oblivious to the size of the data or the cluster that they are operating on, so they can be used unchanged for a small dataset and for a massive one. More importantly, if you double the size of the input data, a job will run twice as slow. But if you also double the size of the cluster, a job will run as fast as the original one. This is not generally true of SQL queries.Over time, however, the differences between relational databases and MapReduce systems are likely to blur. Both as relational databases start incorporating some of the ideas from MapReduce (such as Aster Data’s and Greenplum’s databases), and, from the other direction, as higher-level query languages built on MapReduce (such as Pig and Hive) make MapReduce systems more approachable to traditional database programmers.Grid ComputingThe High Performance Computing (HPC) and Grid Computing communities have been doing large-scale data processing for years, using such APIs as Message Passing Interface (MPI). Broadly, the approach in HPC is to distribute the work across a cluster of machines, which access a shared filesystem, hosted by a SAN. This works well for predominantly compute-intensive jobs, but becomes a problem when nodes need to access larger data volumes (hundreds of gigabytes, the point at which MapReduce really starts to shine), since the network bandwidth is the bottleneck, and compute nodes become idle.MapReduce tries to colocate the data with the compute node, so data access is fast since it is local. This feature, known as data locality, is at the heart of MapReduce and is the reason for its good performance. Recognizing that network bandwidth is the most precious resource in a data center environment (it is easy to saturate network links by copying data around),MapReduce implementations go to great lengths to preserve it by explicitly modelling network topology. Notice that this arrangement does not preclude high-CPU analyses in MapReduce.MPI gives great control to the programmer, but requires that he or she explicitly handle the mechanics of the data flow, exposed via low-level C routines and constructs, such as sockets, as well as the higher-level algorithm for the analysis. MapReduce operates only at the higher level: the programmer thinks in terms of functions of key and value pairs, and the data flow is implicit.Coordinating the processes in a large-scale distributed computation is a challenge. The hardest aspect is gracefully handling partial failure—when you don’t know if a remote process has failed or not—and still making progress with the overall computation. MapReduce spares the programmer from having to think about failure, since the implementation detects failed map or reduce tasks and reschedules replacements on machines that are healthy. MapReduce is able to do this since it is a shared-nothing architecture, meaning that tasks have no dependence on one other. (This is a slight oversimplification, since the output from mappers is fed to the reducers, but this is under the control of the MapReduce system; in this case, it needs to take more care rerunning a failed reducer than rerunning a failed map, since it has to make sure it can retrieve the necessary map outputs, and if not, regenerate them by running the relevant maps again.) So from the programmer’s point of view, the order in which the tasks run doesn’t matter. By contrast, MPI programs have to explicitly manage their own checkpointing and recovery, which gives more control to the programmer, but makes them more difficult to write.MapReduce might sound like quite a restrictive programming model, and in a sense itis: you are limited to key and value types that are related in specified ways, and mappers and reducers run with very limited coordination between one another (the mappers pass keys and values to reducers). A natural question to ask is: can you do anything useful or nontrivial with it?The answer is yes. MapReduce was invented by engineers at Google as a system for building production search indexes because they found themselves solving the same problem over and over again (and MapReduce was inspired by older ideas from the functional programming, distributed computing, and database communities), but it has since been used for many other applications in many other industries. It is pleasantly surprising to see the range of algorithms that can be expressed in MapReduce, from image analysis, to graph-based problems,to machine learning algorithms. It can’t solve every problem, of course, but it is a general data-processing tool.You can see a sample of some of the applications that Hadoop has been used for in Chapter 14.Volunteer ComputingWhen people first hear about Hadoop and MapReduce, they oft en ask, “How is it different from SETI@home?” SETI, the Search for Extra-Terrestrial Intelligence, runs a project called SETI@home in which volunteers donate CPU time from their otherwise idle computers to analyze radio telescope data for signs of intelligent life outside earth. SETI@home is the most well-known of many volunteer computing projects; others include the Great Internet Mersenne Prime Search (to search for large prime numbers) and Folding@home (to understand protein folding, and how it relates to disease).Volunteer computing projects work by breaking the problem they are trying to solve into chunks called work units, which are sent to computers around the world to be analyzed. For example, a SETI@home work unit is about 0.35 MB of radio telescope data, and takes hours or days to analyze on a typical home computer. When the analysis is completed, the results are sent back to the server, and the client gets another work unit. As a precaution to combat cheating, each work unit is sent to three different machines, and needs at least two results to agree to be accepted.Although SETI@home may be superficially similar to MapReduce (breaking a problem into independent pieces to be worked on in parallel), there are some significant differences. The SETI@home problem is very CPU-intensive, which makes it suitable for running on hundreds of thousands of computers across the world, since the time to transfer the work unit is dwarfed by the time to run the computation on it. Volunteers are donating CPU cycles, not bandwidth.MapReduce is designed to run jobs that last minutes or hours on trusted, dedicated hardware running in a single data center with very high aggregate bandwidth interconnects. By contrast, SETI@home runs a perpetual computation on untrusted machines on the Internet with highly variable connection speeds and no data locality.译文:初识Hadoop古时候,人们用牛来拉重物,当一头牛拉不动一根圆木的时候,他们不曾想过培育个头更大的牛。

云计算研究现状文献综述及外文文献

云计算研究现状文献综述及外文文献

本文档包括该专题的:外文文献、文献综述文献标题:An exploratory study on factors affecting the adoption of cloud computing by information professionals作者:Aharony, Noa期刊:The Electronic Library, 33(2), 308-328.年份:2015一、外文文献An exploratory study on factors affecting the adoption of cloud computing byinformation professionals(影响云计算采用与否的一个探索性研究)Aharony, NoaPurpose- The purpose of this study explores what factors may influence information professionals to adopt new technologies, such as cloud computing in their organizations. The objectives of this study are as follows: to what extent does the technology acceptance model (TAM) explain information professionals intentions towards cloud computing, and to what extent do personal characteristics, such as cognitive appraisal and openness to experience, explain information professionals intentions to use cloud computing.Design/methodology/approach- The research was conducted in Israel during the second semester of the 2013 academic year and encompassed two groups of information professionals: librarians and information specialists. Researchers used seven questionnaires to gather the following data: personal details, computer competence, attitudes to cloud computing, behavioral intention, openness to experience, cognitive appraisal and self-efficacy. Findings- The current study found that the behavioral intention to use cloud computing was impacted by several of the TAM variables, personal characteristics and computer competence.Originality/value- The study expands the scope of research about the TAM by applying it to information professionals and cloud computing and highlights the importance of individual traits, such as cognitive appraisal, personal innovativeness, openness to experience and computer competence when considering technology acceptance. Further, the current study proposes that if directors of information organizations assume that novel technologies may improve their organizations' functioning, they should be familiar with both the TAM and the issue of individual differences. These factors may help them choose the most appropriate workers.Keywords: Keywords Cloud computing, TAM, Cognitive appraisal, Information professionals, Openness to experienceIntroductionOne of the innovations that information technology (IT) has recently presented is thephenomenon of cloud computing. Cloud computing is the result of advancements in various technologies, including the Internet, hardware, systems management and distributed computing (Buyya et al. , 2011). Armbrust et al. (2009) suggested that cloud computing is a collection of applications using hardware and software systems to deliver services to end users via the Internet. Cloud computing offers a variety of services, such as storage and different modes of use (Leavitt, 2009). Cloud computing enables organizations to deliver support applications and avoid the need to develop their own IT systems (Feuerlicht et al. , 2010).Due to the growth of cloud computing use, the question arises as to what factors may influence information professionals to adopt new technologies, such as cloud computing, in their organizations. Assuming that using new technologies may improve the functioning of information organizations, this study seeks to explore if information professionals, who often work with technology and use it as an important vehicle in their workplace, are familiar with technological innovations and whether they are ready to use them in their workplaces. As the phenomenon of cloud computing is relatively new, there are not many surveys that focus on it and, furthermore, no one has so far focussed on the attitudes of information professionals towards cloud computing. The research may contribute to an understanding of the variables that influence attitudes towards cloud computing and may lead to further inquiry in this field.The current study uses the well-known technology acceptance model (TAM), a theory for explaining individuals' behaviours towards technology (Davis, 1989; Venkatesh, 2000), as well as personal characteristics, such as cognitive appraisal and openness to new experiences, as theoretical bases from which we can predict factors which may influence information professionals adopting cloud computing in their workplaces. The objectives of this study are to learn the following: the extent to which the TAM explains information professionals' attitudes towards cloud computing, and the extent to which personal characteristics, such as cognitive appraisal and openness to experiences, explain the intention of information professionals to use cloud computing.Theoretical backgroundCloud computingResearchers have divided cloud computing into three layers: Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS). SaaS has changed the concept of software as a product to that of a service instead. The software runs in the cloud and the user can access it via the Internet to work on an application. PaaS enables powerful tools for developers to create the applications, without having to deal with concerns about the infrastructure. IaaS provides complete infrastructure resources (e.g. servers, software, network equipment and storage). With IaaS, consumers do not have to purchase the latest technology, perform maintenance, upgrade software or buy software licenses (Anuar et al. , 2013). Cloud computing deployment can be divided into four types: private clouds, public clouds, community clouds and hybrid clouds (Mell and Grance, 2011). Public clouds have open access, private clouds run within organizations, community clouds containresources that are shared with others in the community and hybridclouds encompass two or more cloud models. Anuar et al. (2013) presented the main characteristics of cloud computing: flexible scale that enables flexible-scale capabilities for computing; virtualization that offers a new way of getting computing resources remotely, regardless of the location of the user or the resources; high trust , as the cloud offers more reliability to end users than relying on local resources; versatility , because cloud services can serve different sectors in various disciplines use the same cloud; and on demand service , as end users can tailor their service needs and pay accordingly.As cloud computing is relatively new, there are not a lot of surveys that focus on it. Several researchers conducted in-depth interviews investigating respondents' attitudes towards keeping their virtual possessions in the online world (Odom et al. , 2012). Teneyuca (2011) reported on a survey of cloud computing usage trends that included IT professionals as respondents. Results revealed preferences for virtualization and cloud computing technologies. However, the major reasons for cloud computing adoption being impeded were the lack of cloud computing training (43 per cent) and security concerns (36 per cent). Another report showed that nearly 40 per cent of Americans think that saving data to their hard drive is more secure than saving it to a cloud (Teneyuca, 2011). A further study (Ion et al., 2011) explored private users' privacy attitudes and beliefs about cloud computing in comparison with those in companies. Anuar et al. (2013) investigated cloud computing in an academic institution, claiming that cloud computing technology enhances performance within the academic institution. A study that was carried out in the education arena examined factors that led students to adopt cloud computing technology (Behrend et al. , 2010). Technology acceptance modelThe TAM (Davis, 1989) is a socio-technical model which aims to explain user acceptance of an information system. It is based on the theory of reasoned action (TRA) (Fishbein and Ajzen, 1975) which seeks to understand how people construct behaviours. The model suggests that technology acceptance can be explained according to the individual's beliefs, attitudes and intentions (Davis, 1989). The TAM hypothesizes that one's intention is the best predictor of usage behaviour and suggests that an individual's behavioural intention to use technology is determined by two beliefs: perceived usefulness (PU) and perceived ease of use (PEOU). PU refers to the individual's perception that using a technology will improve performance and PEOU addresses a user's perceptions that using a particular system would be free of effort (Davis, 1989). The current study concentrates on PEOU as the researchers wanted to examine if information professionals' perceptions about new technology is affected by its simplicity and friendly interface. Earlier research mainly investigated personal behaviour to use new information systems and technology in the following: corporate environments (Gefen and Straub, 1997);Web shopping (Chang et al. , 2002; Lin and Lu, 2000);education, particularly e-learning (Park, 2009) and m-learning (Aharony, 2014); and the library arena (Aharony, 2011; Park et al. , 2009).Personal innovativenessA construct which may contribute to information professionals' intention behaviour to use cloud computing is personal innovativeness, a major characteristic in innovation diffusion research in general (Agarwal and Prasad, 1998; Rogers, 1983, 1995). Agarwal and Prasad (1998) have coined the term "personal innovativeness in the domain of IT" (PIIT), which describes a quite stable characteristic of the individual across situational considerations. Previous studies found that personal innovativeness is a significant determinant of PEOU, as well as of PU (Agarwal and Karahanna, 2000; Lewis et al. , 2003). Several researchers have suggested that innovative people will search for intellectually or sensorially stimulating experiences (Uray and Dedeoglu, 1997).Openness to experienceAnother variable that may predict respondents' perspectives towards cloud computing is openness to experience which addresses the tendency to search for new and challenging experiences, to think creatively and to enjoy intellectual inquiries (McCrae and Sutin, 2009). People who are highly open to experience are perceived as also open to new challenges, thoughts and emotions (McCrae and Costa, 2003). Studies reported that there is a positive relation between openness to experience and intelligence tests (Gignac et al. , 2004). According to Weiss et al. (2012), challenging transitions may influence differently those who are high or low in openness to experience. Those who are high may approach these situations with curiosity, emphasizing the new possibilities offered to them. However, those who are low in openness may be threatened and try to avoid them by adhering to predictable environments. Various researchers note that people who are high in openness to experience are motivated to resolve new situations (McCrae, 1996; Sorrentino and Roney, 1999). Furthermore, openness to experience is associated with cognitive flexibility and open-mindedness (McCrae and Costa, 1997), and negatively associated with rigidity, uncertainty and inflexibility (Hodson and Sorrentino, 1999). Thus, people who are less open to experience tend to avoid novelty and prefer certainty. Studies reveal that openness to experience declines in the later years (Allemand et al. , 2007; Donnellan and Lucas, 2008).Challenge and threatThe following section will focus on the personality characteristics of challenge and threat that might affect information professionals' behavioural intention to use cloud computing. Challenge and threat are the main variables of a unidimensional, bipolar motivational state. They are the result of relative evaluations of situational demands and personal resources that are influenced both by cognitive and affective processes in motivated performance situations (Vick et al. , 2008). According to Lazarus and Folkman (1984), challenge refers to the potential for growth or gain and is characterized by excitement and eagerness, while threat addresses potential harm and is characterized by anxiety, fear and anger. Situations that suggest low demands and high resources are described as challenging, while those that suggest high demands and low resources are perceived as threatening (Seginer, 2008). In general, challenge or threat can take place in situations such as delivering a speech, taking a test, sports competitions or performing with another person on a cooperative or competitive task.The challenge appraisal suggests that with effort, the demands of the situation can be overcome (Lazarus et al. , 1980; Park and Folkman, 1997). On the other hand, threat appraisal indicates potential danger to one's well-being or self-esteem (Lazarus, 1991; Lazarus and Folkman, 1984), as well as low confidence in one's ability to cope with the threat (Bandura, 1997; Lazarus, 1991; Lazarus and Folkman, 1984). Different studies (Blascovich et al. , 2002; Blascovich and Mendes, 2000; Lazarus and Folkman, 1984; Lazarus et al. , 1980) have found that challenge leads to positive feelings associated with enjoyment, better performance, eagerness and anticipation of personal rewards or benefits. Several studies which focussed on the threat and challenge variable were carried out in the library and information science environment as well (Aharony, 2009, 2011).Self-efficacyAn additional variable which may influence individuals' behavioural intention to use cloud computing is self-efficacy. The concept of self-efficacy was developed in the discipline of "social learning theory" by Bandura (1997). Self-efficacy addresses individuals' beliefs that they possess the resources and skills needed to perform and succeed in a specific task. Therefore, individuals' previous performance and their perceptions of relevant resources available may influence self-efficacy beliefs (Bandura, 1997). Self-efficacy is not just an ability perception, it encompasses the motivation and effort required to complete the task and it helps determine which activities are required, the effort in pursuing these activities and persistence when facing obstacles (Bandura, 1986, 1997). The construct of self-efficacy is made up of four principal sources of information:"mastery experience" refers to previous experience, including success and failure; "vicarious experience" addresses observing the performances, successes and failures of others;"social persuasion" includes verbal persuasion from peers, colleagues and relatives; and"physiological and emotional states" from which people judge their strengths, capabilities and vulnerabilities (Bandura, 1986, 1994, 1995).As self-efficacy is based on self-perceptions regarding different behaviours, it is considered to be situation specific. In other words, a person may exhibit high levels of self-efficacy within one domain, while exhibiting low levels within another (Cassidy and Eachus, 2002). Thus, self-efficacy has generated research in various disciplines such as medicine, business, psychology and education (Kear, 2000; Lev, 1997; Schunk, 1985; Koul and Rubba, 1999). Computer self-efficacy is a sub-field of self-efficacy. It is defined as one's perceived ability to accomplish a task with the use of a computer (Compeau and Higgins, 1995). Various studies have noted that training and experience play important roles in computer self-efficacy (Compeau and Higgins, 1995; Kinzie et al. , 1994; Stone and Henry, 2003). Several studies have investigated the effect of computer self-efficacy on computer training performance (Compeau and Higgins, 1995) and on IT use (Easley et al. , 2003).HypothesesBased on the study objectives and assuming that PEOU, personal innovativeness,cognitive appraisal and openness to experience may predict information professionals' behavioural intention to use cloud computing, the underlying assumptions of this study are as follows:H1. High scores in respondent PEOU will be associated with high scores in their behavioural intention to use cloud computing.H2. High scores in respondents' personal innovativeness will be associated with high scores in their behavioural intention to use cloud computing.H3. Low scores in respondents' threat and high scores in respondents' challenge will be associated with high scores in their behavioural intention to use cloud computing. H4. High scores in respondents' self-efficacy will be associated with high scores in their behavioural intention to use cloud computing.H5. High scores in respondents' openness to experience will be associated with high scores in their behavioural intention to use cloud computing.H6. High scores in respondents' computer competence and in social media use will be associated with high scores in their behavioural intention to use cloud computing. MethodologyData collectionThe research was conducted in Israel during the second semester of the 2013 academic year and encompassed two groups of information professionals: librarians and information specialists. The researchers sent a message and a questionnaire to an Israeli library and information science discussion group named "safranym", which included school, public and academic librarians, and to an Israeli information specialist group named "I-fish", which consists of information specialists that work in different organizations. Researchers explained the study's purpose and asked their members to complete the questionnaire. These two discussion groups consist of about 700 members; 140 responses were received, giving a reply percentage of 20 per cent. Data analysisOf the participants, 25 (17.9 per cent) were male and 115 (82.1 per cent) were female. Their average age was 46.3 years.MeasuresThe current study is based on quantitative research. Researchers used seven questionnaires to gather the following data: personal details, computer competence, attitudes towards cloud computing, behavioural intention, openness to experience, cognitive appraisal and self-efficacy.The personal details questionnaire had two statements. The computer competence questionnaire consisted of two statements rated on a 5-point Likert scale (1 = strongest disagreement; 5 = strongest agreement). The cloud computing attitude questionnaire, based on Liuet al. (2010), was modified for this study and consisted of six statements rated on a seven-point Likert scale (1 = strongest disagreement; 7 = strongest agreement). A principal components factor analysis using Varimax rotation with Kaiser Normalization was conducted and explained 82.98 per cent of the variance. Principal components factor analysis revealed two distinct factors. The first related to information professionals' personal innovativeness (items 2, 3 and 5), and the second to information professionals' perceptions about cloud computing ease ofuse (PEOU) (items 1, 4, and 6); the values of Cronbach's Alpha were 0.89 and 0.88, respectively.The behavioural intention questionnaire, based on Liu et al. (2010), was modified for this study and consisted of three statements rated on a six-point Likert scale (1 = strongest disagreement; 6 = strongest agreement). Its Cronbach's Alpha was 0.79. The openness to experience questionnaire was derived from the Big Five questionnaire (John et al. , 1991) and consisted of eight statements rated on a five-point Likert scale (1 = strongest disagreement; 5 = strongest agreement); Cronbach's Alpha was 0.81. The cognitive appraisal questionnaire measured information professionals' feelings of threat versus challenge when confronted with new situations. It consisted of 10 statements rated on a six-point scale (1 = fully disagree; 6 = fully agree). This questionnaire was previously used (Aharony, 2009, 2011; Yekutiel, 1990) and consisted of two factors: threat (items 1, 2, 3, 5, 7 and 8) and challenge (items 4, 6, 9 and 10). Cronbach's Alpha was 0.70 for the threat factor and 0.89 for the challenge factor.The self-efficacy questionnaire was based on Askar and Umay's (2001) questionnaire and consisted of 18 statements rated on a five-point scale (1 = fully disagree; 5 = fully agree); Cronbach's Alpha was 0.96.FindingsTo examine the relationship between openness to experience, cognitive appraisal (threat, challenge and self-efficacy), TAM variables (personal innovativeness and PEOU), and behavioural intention to use cloud computing, researchers performed Pearson correlations, which are given in Table I.Table I presents significant correlations between research variables and the dependent variable (behavioural intention to use cloud computing). All correlations are positive, except the one between threat and behavioural intention to use cloud computing. Hence, the higher these measures, the greater the behavioural intention to use cloud computing. A significant negative correlation was found between threat and the dependent variable. Therefore, the more threatened respondents are, the lower is their behavioural intention to use cloud computing.Regarding the correlations between research variables, significant positive correlations were found between openness to experience and challenge, self-efficacy, personal innovativeness and PEOU. A significant negative correlation was found between openness to experience and threat. That is, the more open to experience respondents are, the more challenged they are, the higher is their self-efficacy, personal innovativeness, and PEOU and the less threatened they are. In addition, significant negative correlations were found between threat and self-efficacy, personal innovativeness and PEOU. We can conclude that the more threatened respondents are, the less they are self-efficient, personally innovative and the less they perceive cloud computing as easy to use. Significant positive correlations were also found between self-efficacy and personal innovativeness and PEOU. Thus, the more self-efficient respondents are, the more personally innovative they are and the more they perceive cloud computing as easy to use.The study also examined two variables associated with computer competence:computer use and social media use. Table II presents correlations between these two variables and the other research variables.Significant, high correlations were found between computer competence variables and openness to experience, self-efficacy, personal innovativeness, PEOU and behavioural intention to use cloud computing. Hence, the higher respondents' computer competence, the more they are open to experience, self-efficient and personally innovative, and perceive cloud computing as easy to use, the higher is their behavioural intention to use cloud computing.Researchers also examined relationships with demographic variables. To examine the relationship between age and other research variables, the researchers performed Pearson correlations. A significant negative correlation was found between age and PEOU, r = -0.21, p < 0.05. We may assume that the younger the respondents are, the more they perceive cloud computing as easy to use. To examine whether there are differences between males and females concerning the research variables, a MANOV A was performed and did not reveal a significant difference between the two groups concerning research variables, F (7,130) = 1.88, p > 0.05.The researchers also conducted a hierarchical regression using behavioural intention to use cloud computing as a dependent variable. The predictors were entered as five steps:respondents' openness to experience;respondents' computer competence (computer use and social media use);cognitive appraisal (threat, challenge and self-efficacy);TAM variables (personal innovativeness and PEOU); andinteractions with the TAM variables.The entrance of the four first steps was forced, while the interactions were done according to their contribution to the explained variance of behavioural intention to use cloud computing. The regression explained 54 per cent of behavioural intention to use cloud computing. Table III presents the standardized and unstandardized coefficients of the hierarchical regression of respondents' behavioural intention to use cloud computing.The first step introduced the openness variable that contributed significantly by adding 13 per cent to the explained variance of behavioural intention to use cloud computing. The beta coefficient of the openness variable is positive; hence, the more open to experience respondents are, the higher is their behavioural intention to use cloud computing. The second step introduced the two computer competence variables (computer use and social media use) which contributed 5 per cent to the explained variance of behavioural intention. Of these two variables, only the social media variable contributed significantly and its beta coefficient was positive. In other words, the more respondents use social media, the higher is their behavioural intention to use cloud computing. Note that Pearson correlations found significant positive correlations between these two variables and behavioural intention to use cloud computing. It seems that because of the correlation between these two variables, r = 0.33, p < 0.001, the computer use variable did not contribute to the regression.As the third step, researchers added respondents' personal appraisal variables (threat and challenge, and self-efficacy), and this also contributed significantly by adding 25 per cent to the explained variance of behavioural intention. The beta coefficients of challenge and of self-efficacy were positive, while that of threat was negative. Therefore, we may conclude that the more respondents perceived themselves as challenged and self-efficient, and the less they perceived themselves as threatened, the higher is their behavioural intention to use cloud computing. The inclusion of this step caused a decrease in the [beta] size of the openness to experience variable that changed it into an insignificant one, and may suggest a possibility of mediation. Sobel tests indicated that self-efficacy mediates between openness to experience and behavioural intention (z = 4.68, p < 0.001). Hence, the more respondents are open to experience, the higher is their self-efficacy and, as a result, the higher is their behavioural intention to use cloud computing.The fourth step added the TAM variables (respondents' PEOU and personal innovation), and this also contributed significantly by adding 9 per cent to the explained variance of behavioural intention to use cloud computing. The beta coefficient of this variable was positive; therefore, the more respondents perceived themselves to be personally innovative and cloud computing as easy to use, the higher is their behavioural intention to use cloud computing. Note that in this step there was a decrease in the [beta] size of self-efficacy. Sobel tests indicated that of the two variables, PEOU mediates between self-efficacy and behavioural intention (z = 4.77, p < 0.001). Thus, the more respondents perceive themselves as self-efficient, the higher they perceive cloud computing's PEOU and, as a result, the higher is their behavioural intention to use it.As the fifth step, researchers added the interaction between computer use X personal innovativeness. This interaction added 2 per cent to the explained variance of behavioural intention to use cloud computing and is presented in Figure 1.Figure 1 shows a correlation between personal innovation and behavioural intention to use cloud computing among respondents who are low and high in computer use. This correlation is higher among respondents who are low in computer use, [beta] = . 40, p < 0.05, than among those who are high in computer use, [beta] = 0.04, p < 0.05. It seems that especially among participants who are low in computer use, the higher their personal innovativeness, the higher is their behavioural intention to use cloud computing.DiscussionThe present research explored the extent to which the TAM and personal characteristics, such as threat and challenge, self-efficacy and openness to experience, explain information professionals' perspectives on cloud computing. Researchers divided the study hypotheses into three categories. The first (consisting of H1 -H2 ) refers to the TAM, the second (H3 -H5 ) to personality characteristics and, finally, H6 to computer competence. All hypotheses were accepted. Regarding the first category of hypotheses, results show that both were accepted. Findings suggest that high scores in PEOU and personal innovativeness are associated with high scores in respondents' intention to adopt cloud computing. These findings can be associated with previous。

云计算与物联网外文翻译文献

云计算与物联网外文翻译文献

文献信息:文献标题:Integration of Cloud Computing with Internet of Things: Challenges and Open Issues(云计算与物联网的集成:挑战与开放问题)国外作者:HF Atlam等人文献出处:《IEEE International Conference on Internet of Things》, 2017字数统计:英文4176单词,23870字符;中文7457汉字外文文献:Integration of Cloud Computing with Internet of Things:Challenges and Open IssuesAbstract The Internet of Things (IoT) is becoming the next Internet-related revolution. It allows billions of devices to be connected and communicate with each other to share information that improves the quality of our daily lives. On the other hand, Cloud Computing provides on-demand, convenient and scalable network access which makes it possible to share computing resources; indeed, this, in turn, enables dynamic data integration from various data sources. There are many issues standing in the way of the successful implementation of both Cloud and IoT. The integration of Cloud Computing with the IoT is the most effective way on which to overcome these issues. The vast number of resources available on the Cloud can be extremely beneficial for the IoT, while the Cloud can gain more publicity to improve its limitations with real world objects in a more dynamic and distributed manner. This paper provides an overview of the integration of the Cloud into the IoT by highlighting the integration benefits and implementation challenges. Discussion will also focus on the architecture of the resultant Cloud-based IoT paradigm and its new applications scenarios. Finally, open issues and future research directions are also suggested.Keywords: Cloud Computing, Internet of Things, Cloud based IoT, Integration.I.INTRODUCTIONIt is important to explore the common features of the technologies involved in the field of computing. Indeed, this is certainly the case with Cloud Computing and the Internet of Things (IoT) – two paradigms which share many common features. The integration of these numerous concepts may facilitate and improve these technologies. Cloud computing has altered the way in which technologies can be accessed, managed and delivered. It is widely agreed that Cloud computing can be used for utility services in the future. Although many consider Cloud computing to be a new technology, it has, in actual fact, been involved in and encompassed various technologies such as grid, utility computing virtualisation, networking and software services. Cloud computing provides services which make it possible to share computing resources across the Internet. As such, it is not surprising that the origins of Cloud technologies lie in grid, utility computing virtualisation, networking and software services, as well as distributed computing, and parallel computing. On the other hand, the IoT can be considered both a dynamic and global networked infrastructure that manages self-configuring objects in a highly intelligent way. The IoT is moving towards a phase where all items around us will be connected to the Internet and will have the ability to interact with minimum human effort. The IoT normally includes a number of objects with limited storage and computing capacity. It could well be said that Cloud computing and the IoT will be the future of the Internet and next-generation technologies. However, Cloud services are dependent on service providers which are extremely interoperable, while IoT technologies are based on diversity rather than interoperability.This paper provides an overview of the integration of Cloud Computing into the IoT; this involves an examination of the benefits resulting from the integration process and the implementation challenges encountered. Open issues and research directions are also discussed. The remainder of the paper is organised as follows: Section II provides the basic concepts of Cloud computing, IoT, and Cloud-based IoT; SectionIII discusses the benefits of integrating the IoT into the Cloud; Could-based IoT Architecture is presented in section IV; Section V illustrates different Cloud-based IoT applications scenarios. Following this, the challenges facing Cloud-based IoT integration and open research directions are discussed in Section VI and Section VII respectively, before Section VIII concludes the paper.II.BASIC CONCEPTSThis section reviews the basic concepts of Cloud Computing, the IoT, and Cloud-based IoT.1.Cloud ComputingThere exist a number of proposed definitions for Cloud computing, although the most widely agreed upon seems be that put forth by the National Institute of Standards and Technology (NIST). Indeed, the NIST has defined Cloud computing as "a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction".As stated in this definition, Cloud computing comprises four types of deployment models, three different service models, and five essential characteristics.Cloud computing deployment models are most commonly classified as belonging to the public Cloud, where resources are made available to consumers over the Internet. Public Clouds are generally owned by a profitable organisation (e.g. Amazon EC2). Conversely, the infrastructure of a private Cloud is commonly provided by a single organisation to serve the particular purposes of its users. The private Cloud offers a secure environment and a higher level of control (Microsoft Private Cloud). Hybrid Clouds are a mixture of private and public Clouds. This choice is provided for consumers as it makes it possible to overcome some of the limitations of each model. In contrast, a community Cloud is a Cloud infrastructure which is delivered to a group of users by a number of organisations which share the same need.In order to allow consumers to choose the service that suits them, services inCloud computing are provided at three different levels, namely: the Software as a Service (SaaS) model, where software is delivered through the Internet to users (e.g. GoogleApps); the Platform as a Service (PaaS) model, which offers a higher level of integrated environment that can build, test, and deploy specific software (e.g. Microsoft Azure); and finally, with the Infrastructure as a Service (IaaS) model, infrastructure such as storage, hardware and servers are delivered as a service (e.g. Amazon Web Services).2.Internet of ThingsThe IoT represents a modern approach where boundaries between real and digital domains are progressively eliminated by consistently changing every physical device to a smart alternative ready to provide smart services. All things in the IoT (smart devices, sensors, etc.) have their own identity. They are combined to form the communication network and will become actively participating objects. These objects include not only daily usable electronic devices, but also things like food, clothing, materials, parts, and subassemblies; commodities and luxury items; monuments and landmarks; and various forms of commerce and culture. In addition, these objects are able to create requests and alter their states. Thus, all IoT devices can be monitored, tracked and counted, which significantly decreases waste, loss, and cost.The concept of the IoT was first mentioned by Kevin Ashton in 1999, when he stated that “The Internet of Things has the potential to change the world, just as the Internet did. Maybe even more so”. Later, the IoT was formally presented by the International Telecommunication Union (ITU) in 2005. A great many definitions of the IoT have been put forth by numerous organisations and researchers. According to the ITU (2012), the IoT is “a global infrastructure for the Information Society, enabling advanced services by interconnecting (physical and virtual) things based on, existing and evolving, interoperable information and communication technologies”. The IoT introduces a variety of opportunities and applications. However, it faces many challenges which could potentially hinder its successful implementation, such as data storage, heterogeneous resource-constrained, scalability, Things, variable geospatial deployment, and energy efficiency.3.Cloud-Based Internet of ThingsThe IoT and Cloud computing are both rapidly developing services, and have their own unique characteristics. On the one hand, the IoT approach is based on smart devices which intercommunicate in a global network and dynamic infrastructure. It enables ubiquitous computing scenarios. The IoT is typically characterised by widely-distributed devices with limited processing capabilities and storage. These devices encounter issues regarding performance, reliability, privacy, and security. On the other hand, Cloud computing comprises a massive network with unlimited storage capabilities and computation power. Furthermore, it provides a flexible, robust environment which allows for dynamic data integration from various data sources. Cloud computing has partially resolved most of the IoT issues. Indeed, the IoT and Cloud are two comparatively challenging technologies, and are being combined in order to change the current and future environment of internetworking services.The Cloud-based Internet of Things is a platform which allows for the smart usage of applications, information, and infrastructure in a cost-effective way. While the IoT and Cloud computing are different from each other, their features are almost complementary, as shown in TABLE 1. This complementarity is the primary reason why many researchers have proposed their integration.TABLE 1. COMPARISON OF THE IOT WITH CLOUD COMPUTINGIII.BENEFITS OF INTEGRATING IOT WITH CLOUDSince the IoT suffers from limited capabilities in terms of processing power and storage, it must also contend with issues such as performance, security, privacy, reliability. The integration of the IoT into the Cloud is certainly the best way toovercome most of these issues. The Cloud can even benefit from the IoT by expanding its limits with real world objects in a more dynamic and distributed way, and providing new services for billions of devices in different real life scenarios. In addition, the Cloud provides simplicity of use and reduces the cost of the usage of applications and services for end-users. The Cloud also simplifies the flow of the IoT data gathering and processing, and provides quick, low-cost installation and integration for complex data processing and deployment. The benefits of integrating IoT into Cloud are discussed in this section as follows.municationApplication and data sharing are two significant features of the Cloud-based IoT paradigm. Ubiquitous applications can be transmitted through the IoT, whilst automation can be utilised to facilitate low-cost data distribution and collection. The Cloud is an effective and economical solution which can be used to connect, manage, and track anything by using built-in apps and customised portals. The availability of fast systems facilitates dynamic monitoring and remote objects control, as well as data real-time access. It is worth declaring that, although the Cloud can greatly develop and facilitate the IoT interconnection, it still has weaknesses in certain areas. Thus, practical restrictions can appear when an enormous amount of data needs to be transferred from the Internet to the Cloud.2.StorageAs the IoT can be used on billions of devices, it comprises a huge number of information sources, which generate an enormous amount of semi-structured or non-structured data. This is known as Big Data, and has three characteristics: variety (e.g. data types), velocity (e.g. data generation frequency), and volume (e.g. data size). The Cloud is considered to be one of the most cost-effective and suitable solutions when it comes to dealing with the enormous amount of data created by the IoT. Moreover, it produces new chances for data integration, aggregation, and sharing with third parties.3.Processing capabilitiesIoT devices are characterised by limited processing capabilities which preventon-site and complex data processing. Instead, gathered data is transferred to nodes that have high capabilities; indeed, it is here that aggregation and processing are accomplished. However, achieving scalability remains a challenge without an appropriate underlying infrastructure. Offering a solution, the Cloud provides unlimited virtual processing capabilities and an on-demand usage model. Predictive algorithms and data-driven decisions making can be integrated into the IoT in order to increase revenue and reduce risks at a lower cost.4.ScopeWith billions of users communicating with one another together and a variety of information being collected, the world is quickly moving towards the Internet of Everything (IoE) realm - a network of networks with billions of things that generate new chances and risks. The Cloud-based IoT approach provides new applications and services based on the expansion of the Cloud through the IoT objects, which in turn allows the Cloud to work with a number of new real world scenarios, and leads to the emergence of new services.5.New abilitiesThe IoT is characterised by the heterogeneity of its devices, protocols, and technologies. Hence, reliability, scalability, interoperability, security, availability and efficiency can be very hard to achieve. Integrating IoT into the Cloud resolves most of these issues. It provides other features such as ease- of-use and ease-of-access, with low deployment costs.6.New ModelsCloud-based IoT integration empowers new scenarios for smart objects, applications, and services. Some of the new models are listed as follows:•SaaS (Sensing as a Service), which allows access to sensor data;•EaaS (Ethernet as a Service), the main role of which is to provide ubiquitous connectivity to control remote devices;•SAaaS (Sensing and Actuation as a Service), which provides control logics automatically;•IPMaaS (Identity and Policy Management as a Service), which provides access to policy and identity management;•DBaaS (Database as a Service), which provides ubiquitous database management;•SEaaS (Sensor Event as a Service), which dispatches messaging services that are generated by sensor events;•SenaaS (Sensor as a Service), which provides management for remote sensors;•DaaS (Data as a Service), which provides ubiquitous access to any type of data.IV.CLOUD-BASED IOT ARCHITECTUREAccording to a number of previous studies, the well-known IoT architecture is typically divided into three different layers: application, perception and network layer. Most assume that the network layer is the Cloud layer, which realises the Cloud-based IoT architecture, as depicted in Fig. 1.Fig. 1. Cloud-based IoT architectureThe perception layer is used to identify objects and gather data, which is collected from the surrounding environment. In contrast, the main objective of the network layer is to transfer the collected data to the Internet/Cloud. Finally, the application layer provides the interface to different services.V.CLOUD-BASED IOT APPLICATIONSThe Cloud-based IoT approach has introduced a number of applications and smart services, which have affected end users’ daily lives. TABLE 2 presents a brief discussion of certain applications which have been improved by the Cloud-based IoT paradigm.TABLE 2. CLOUD-BASED IOT APPLICATIONSVI.CHALLENGES FACING CLOUD-BASED IOT INTEGRATION There are many challenges which could potentially prevent the successful integration of the Cloud-based IoT paradigm. These challenges include:1.Security and privacyCloud-based IoT makes it possible to transport data from the real world to theCloud. Indeed, one particularly important issues which has not yet been resolved is how to provide appropriate authorisation rules and policies while ensuring that only authorised users have access to the sensitive data; this is crucial when it comes to preserving users’ privacy, and particularly when data integrity must be guaranteed. In addition, when critical IoT applications move into the Cloud, issues arise because of the lack of trust in the service provider, information regarding service level agreements (SLAs), and the physical location of data. Sensitive information leakage can also occur due to the multi-tenancy. Moreover, public key cryptography cannot be applied to all layers because of the processing power constraints imposed by IoT objects. New challenges also require specific attention; for example, the distributed system is exposed to number of possible attacks, such as SQL injection, session riding, cross- site scripting, and side-channel. Moreover, important vulnerabilities, including session hijacking and virtual machine escape are also problematic.2.HeterogeneityOne particularly important challenge faced by the Cloud- based IoT approach is related to the extensive heterogeneity of devices, platforms, operating systems, and services that exist and might be used for new or developed applications. Cloud platforms suffer from heterogeneity issues; for instance, Cloud services generally come with proprietary interfaces, thus allowing for resource integration based on specific providers. In addition, the heterogeneity challenge can be exacerbated when end-users adopt multi-Cloud approaches, and thus services will depend on multiple providers to improve application performance and resilience.3.Big dataWith many predicting that Big Data will reach 50 billion IoT devices by 2020, it is important to pay more attention to the transportation, access, storage and processing of the enormous amount of data which will be produced. Indeed, given recent technological developments, it is clear that the IoT will be one of the core sources of big data, and that the Cloud can facilitate the storage of this data for a long period of time, in addition to subjecting it to complex analysis. Handling the huge amount of data produced is a significant issue, as the application’s whole performance is heavilyreliant on the properties of this data management service. Finding a perfect data management solution which will allow the Cloud to manage massive amounts of data is still a big issue. Furthermore, data integrity is a vital element, not only because of its effect on the service’s quality, but also because of security and privacy issues, the majority of which relate to outsourced data.4.PerformanceTransferring the huge amount of data created from IoT devices to the Cloud requires high bandwidth. As a result, the key issue is obtaining adequate network performance in order to transfer data to Cloud environments; indeed, this is because broadband growth is not keeping pace with storage and computation evolution. In a number of scenarios, services and data provision should be achieved with high reactivity. This is because timeliness might be affected by unpredictable matters and real-time applications are very sensitive to performance efficiency.5.Legal aspectsLegal aspects have been very significant in recent research concerning certain applications. For instance, service providers must adapt to various international regulations. On the other hand, users should give donations in order to contribute to data collection.6.MonitoringMonitoring is a primary action in Cloud Computing when it comes to performance, managing resources, capacity planning, security, SLAs, and for troubleshooting. As a result, the Cloud-based IoT approach inherits the same monitoring demands from the Cloud, although there are still some related challenges that are impacted by velocity, volume, and variety characteristics of the IoT.rge scaleThe Cloud-based IoT paradigm makes it possible to design new applications that aim to integrate and analyse data coming from the real world into IoT objects. This requires interacting with billions of devices which are distributed throughout many areas. The large scale of the resulting systems raises many new issues that are difficult to overcome. For instance, achieving computational capability and storage capacityrequirements is becoming difficult. Moreover, the monitoring process has made the distribution of the IoT devices more difficult, as IoT devices have to face connectivity issues and latency dynamics.VII.OPEN ISSUES AND RESEARCH DIRECTIONSThis section will address some of the open issues and future research directions related to Cloud-based IoT, and which still require more research efforts. These issues include:1.StandardisationMany studies have highlighted the issues of lack of standards, which is considered critical in relation to the Cloud- based IoT paradigm. Although a number of proposed standardisations have been put forth by the scientific society for the deployment of IoT and Cloud approaches, it is obvious that architectures, standard protocols, and APIs are required to allow for interconnection between heterogeneous smart things and the generation of new services, which make up the Cloud- based IoT paradigm.2.Fog ComputingFog computing is a model which extends Cloud computing services to the edge of the network. Similar to the Cloud, Fog supply communicates application services to users. Fog can essentially be considered an extension of Cloud Computing which acts as an intermediate between the edge of the network and the Cloud; indeed, it works with latency-sensitive applications that require other nodes to satisfy their delay requirements. Although storage, computing, and networking are the main resources of both Fog and the Cloud, the Fog has certain features, such as location awareness and edge location, that provide geographical distribution, and low latency; moreover, there are a large nodes; this is in contrast with the Cloud, which is supported for real-time interaction and mobility.3.Cloud CapabilitiesAs in any networked environment, security is considered to be one of the main issues of the Cloud-based IoT paradigm. There are more chances of attacks on boththe IoT and the Cloud side. In the IoT context, data integrity, confidentiality and authenticity can be guaranteed by encryption. However, insider attacks cannot be resolved and it is also hard to use the IoT on devices with limited capabilities.4.SLA enforcementCloud-based IoT users need created data to be conveyed and processed based on application-dependent limitations, which can be tough in some cases. Ensuring a specific Quality of Service (QoS) level regarding Cloud resources by depending on a single provider raises many issues. Thus, multiple Cloud providers may be required to avoid SLA violations. However, dynamically choosing the most appropriate mixture of Cloud providers still represents an open issue due to time, costs, and heterogeneity of QoS management support.5.Big dataIn the previous section, we discussed Big Data as a critical challenge that is tightly coupled with the Cloud-based IoT paradigm. Although a number of contributions have been proposed, Big Data is still considered a critical open issue, and one in need of more research. The Cloud-based IoT approach involves the management and processing of huge amounts of data stemming from various locations and from heterogeneous sources; indeed, in the Cloud-based IoT, many applications need complicated tasks to be performed in real-time.6.Energy efficiencyRecent Cloud-based IoT applications include frequent data that is transmitted from IoT objects to the Cloud, which quickly consumes the node energy. Thus, generating efficient energy when it comes to data processing and transmission remains a significant open issue. Several directions have been suggested to overcome this issue, such as compression technologies, efficient data transmission; and data caching techniques for reusing collected data with time-insensitive application.7.Security and privacyAlthough security and privacy are both critical research issues which have received a great deal of attention, they are still open issues which require more efforts. Indeed, adapting to different threats from hackers is still an issue. Moreover, anotherproblem is providing the appropriate authorisation rules and policies while ensuring that only authorised users have access to sensitive data; this is crucial for preserving users’ privacy, specifically when data integrity must be guaranteed.VIII.CONCLUSIONThe IoT is becoming an increasingly ubiquitous computing service which requires huge volumes of data storage and processing capabilities. The IoT has limited capabilities in terms of processing power and storage, while there also exist consequential issues such as security, privacy, performance, and reliability; As such, the integration of the Cloud into the IoT is very beneficial in terms of overcoming these challenges. In this paper, we presented the need for the creation of the Cloud-based IoT approach. Discussion also focused on the Cloud-based IoT architecture, different applications scenarios, challenges facing the successful integration, and open research directions. In future work, a number of case studies will be carried out to test the effectiveness of the Cloud-based IoT approach in healthcare applications.中文译文:云计算与物联网的集成:挑战与开放问题摘要物联网(IoT)正在成为下一次与互联网相关的革命。

云计算外文翻译参考文献

云计算外文翻译参考文献

云计算外文翻译参考文献(文档含中英文对照即英文原文和中文翻译)原文:Technical Issues of Forensic Investigations in Cloud Computing EnvironmentsDominik BirkRuhr-University BochumHorst Goertz Institute for IT SecurityBochum, GermanyRuhr-University BochumHorst Goertz Institute for IT SecurityBochum, GermanyAbstract—Cloud Computing is arguably one of the most discussedinformation technologies today. It presents many promising technological and economical opportunities. However, many customers remain reluctant to move their business IT infrastructure completely to the cloud. One of their main concerns is Cloud Security and the threat of the unknown. Cloud Service Providers(CSP) encourage this perception by not letting their customers see what is behind their virtual curtain. A seldomly discussed, but in this regard highly relevant open issue is the ability to perform digital investigations. This continues to fuel insecurity on the sides of both providers and customers. Cloud Forensics constitutes a new and disruptive challenge for investigators. Due to the decentralized nature of data processing in the cloud, traditional approaches to evidence collection and recovery are no longer practical. This paper focuses on the technical aspects of digital forensics in distributed cloud environments. We contribute by assessing whether it is possible for the customer of cloud computing services to perform a traditional digital investigation from a technical point of view. Furthermore we discuss possible solutions and possible new methodologies helping customers to perform such investigations.I. INTRODUCTIONAlthough the cloud might appear attractive to small as well as to large companies, it does not come along without its own unique problems. Outsourcing sensitive corporate data into the cloud raises concerns regarding the privacy and security of data. Security policies, companies main pillar concerning security, cannot be easily deployed into distributed, virtualized cloud environments. This situation is further complicated by the unknown physical location of the companie’s assets. Normally,if a security incident occurs, the corporate security team wants to be able to perform their own investigation without dependency on third parties. In the cloud, this is not possible anymore: The CSP obtains all the power over the environmentand thus controls the sources of evidence. In the best case, a trusted third party acts as a trustee and guarantees for the trustworthiness of the CSP. Furthermore, the implementation of the technical architecture and circumstances within cloud computing environments bias the way an investigation may be processed. In detail, evidence data has to be interpreted by an investigator in a We would like to thank the reviewers for the helpful comments and Dennis Heinson (Center for Advanced Security Research Darmstadt - CASED) for the profound discussions regarding the legal aspects of cloud forensics. proper manner which is hardly be possible due to the lackof circumstantial information. For auditors, this situation does not change: Questions who accessed specific data and information cannot be answered by the customers, if no corresponding logs are available. With the increasing demand for using the power of the cloud for processing also sensible information and data, enterprises face the issue of Data and Process Provenance in the cloud [10]. Digital provenance, meaning meta-data that describes the ancestry or history of a digital object, is a crucial feature for forensic investigations. In combination with a suitable authentication scheme, it provides information about who created and who modified what kind of data in the cloud. These are crucial aspects for digital investigations in distributed environments such as the cloud. Unfortunately, the aspects of forensic investigations in distributed environment have so far been mostly neglected by the research community. Current discussion centers mostly around security, privacy and data protection issues [35], [9], [12]. The impact of forensic investigations on cloud environments was little noticed albeit mentioned by the authors of [1] in 2009: ”[...] to our knowledge, no research has been published on how cloud computing environments affect digital artifacts,and on acquisition logistics and legal issues related to cloud computing env ironments.” This statement is also confirmed by other authors [34], [36], [40] stressing that further research on incident handling, evidence tracking and accountability in cloud environments has to be done. At the same time, massive investments are being made in cloud technology. Combined with the fact that information technology increasingly transcendents peoples’ private and professional life, thus mirroring more and more of peoples’actions, it becomes apparent that evidence gathered from cloud environments will be of high significance to litigation or criminal proceedings in the future. Within this work, we focus the notion of cloud forensics by addressing the technical issues of forensics in all three major cloud service models and consider cross-disciplinary aspects. Moreover, we address the usability of various sources of evidence for investigative purposes and propose potential solutions to the issues from a practical standpoint. This work should be considered as a surveying discussion of an almost unexplored research area. The paper is organized as follows: We discuss the related work and the fundamental technical background information of digital forensics, cloud computing and the fault model in section II and III. In section IV, we focus on the technical issues of cloud forensics and discuss the potential sources and nature of digital evidence as well as investigations in XaaS environments including thecross-disciplinary aspects. We conclude in section V.II. RELATED WORKVarious works have been published in the field of cloud security and privacy [9], [35], [30] focussing on aspects for protecting data in multi-tenant, virtualized environments. Desired security characteristics for current cloud infrastructures mainly revolve around isolation of multi-tenant platforms [12], security of hypervisors in order to protect virtualized guest systems and secure network infrastructures [32]. Albeit digital provenance, describing the ancestry of digital objects, still remains a challenging issue for cloud environments, several works have already been published in this field [8], [10] contributing to the issues of cloud forensis. Within this context, cryptographic proofs for verifying data integrity mainly in cloud storage offers have been proposed,yet lacking of practical implementations [24], [37], [23]. Traditional computer forensics has already well researched methods for various fields of application [4], [5], [6], [11], [13]. Also the aspects of forensics in virtual systems have been addressed by several works [2], [3], [20] including the notionof virtual introspection [25]. In addition, the NIST already addressed Web Service Forensics [22] which has a huge impact on investigation processes in cloud computing environments. In contrast, the aspects of forensic investigations in cloud environments have mostly been neglected by both the industry and the research community. One of the first papers focusing on this topic was published by Wolthusen [40] after Bebee et al already introduced problems within cloud environments [1]. Wolthusen stressed that there is an inherent strong need for interdisciplinary work linking the requirements and concepts of evidence arising from the legal field to what can be feasibly reconstructed and inferred algorithmically or in an exploratory manner. In 2010, Grobauer et al [36] published a paper discussing the issues of incident response in cloud environments - unfortunately no specific issues and solutions of cloud forensics have been proposed which will be done within this work.III. TECHNICAL BACKGROUNDA. Traditional Digital ForensicsThe notion of Digital Forensics is widely known as the practice of identifying, extracting and considering evidence from digital media. Unfortunately, digital evidence is both fragile and volatile and therefore requires the attention of special personnel and methods in order to ensure that evidence data can be proper isolated and evaluated. Normally, the process of a digital investigation can be separated into three different steps each having its own specificpurpose:1) In the Securing Phase, the major intention is the preservation of evidence for analysis. The data has to be collected in a manner that maximizes its integrity. This is normally done by a bitwise copy of the original media. As can be imagined, this represents a huge problem in the field of cloud computing where you never know exactly where your data is and additionallydo not have access to any physical hardware. However, the snapshot technology, discussed in section IV-B3, provides a powerful tool to freeze system states and thus makes digital investigations, at least in IaaS scenarios, theoretically possible.2) We refer to the Analyzing Phase as the stage in which the data is sifted and combined. It is in this phase that the data from multiple systems or sources is pulled together to create as complete a picture and event reconstruction as possible. Especially in distributed system infrastructures, this means that bits and pieces of data are pulled together for deciphering the real story of what happened and for providing a deeper look into the data.3) Finally, at the end of the examination and analysis of the data, the results of the previous phases will be reprocessed in the Presentation Phase. The report, created in this phase, is a compilation of all the documentation and evidence from the analysis stage. The main intention of such a report is that it contains all results, it is complete and clear to understand. Apparently, the success of these three steps strongly depends on the first stage. If it is not possible to secure the complete set of evidence data, no exhaustive analysis will be possible. However, in real world scenarios often only a subset of the evidence data can be secured by the investigator. In addition, an important definition in the general context of forensics is the notion of a Chain of Custody. This chain clarifies how and where evidence is stored and who takes possession of it. Especially for cases which are brought to court it is crucial that the chain of custody is preserved.B. Cloud ComputingAccording to the NIST [16], cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal CSP interaction. The new raw definition of cloud computing brought several new characteristics such as multi-tenancy, elasticity, pay-as-you-go and reliability. Within this work, the following three models are used: In the Infrastructure asa Service (IaaS) model, the customer is using the virtual machine provided by the CSP for installing his own system on it. The system can be used like any other physical computer with a few limitations. However, the additive customer power over the system comes along with additional security obligations. Platform as a Service (PaaS) offerings provide the capability to deploy application packages created using the virtual development environment supported by the CSP. For the efficiency of software development process this service model can be propellent. In the Software as a Service (SaaS) model, the customer makes use of a service run by the CSP on a cloud infrastructure. In most of the cases this service can be accessed through an API for a thin client interface such as a web browser. Closed-source public SaaS offers such as Amazon S3 and GoogleMail can only be used in the public deployment model leading to further issues concerning security, privacy and the gathering of suitable evidences. Furthermore, two main deployment models, private and public cloud have to be distinguished. Common public clouds are made available to the general public. The corresponding infrastructure is owned by one organization acting as a CSP and offering services to its customers. In contrast, the private cloud is exclusively operated for an organization but may not provide the scalability and agility of public offers. The additional notions of community and hybrid cloud are not exclusively covered within this work. However, independently from the specific model used, the movement of applications and data to the cloud comes along with limited control for the customer about the application itself, the data pushed into the applications and also about the underlying technical infrastructure.C. Fault ModelBe it an account for a SaaS application, a development environment (PaaS) or a virtual image of an IaaS environment, systems in the cloud can be affected by inconsistencies. Hence, for both customer and CSP it is crucial to have the ability to assign faults to the causing party, even in the presence of Byzantine behavior [33]. Generally, inconsistencies can be caused by the following two reasons:1) Maliciously Intended FaultsInternal or external adversaries with specific malicious intentions can cause faults on cloud instances or applications. Economic rivals as well as former employees can be the reason for these faults and state a constant threat to customers and CSP. In this model, also a malicious CSP is included albeit he isassumed to be rare in real world scenarios. Additionally, from the technical point of view, the movement of computing power to a virtualized, multi-tenant environment can pose further threads and risks to the systems. One reason for this is that if a single system or service in the cloud is compromised, all other guest systems and even the host system are at risk. Hence, besides the need for further security measures, precautions for potential forensic investigations have to be taken into consideration.2) Unintentional FaultsInconsistencies in technical systems or processes in the cloud do not have implicitly to be caused by malicious intent. Internal communication errors or human failures can lead to issues in the services offered to the costumer(i.e. loss or modification of data). Although these failures are not caused intentionally, both the CSP and the customer have a strong intention to discover the reasons and deploy corresponding fixes.IV. TECHNICAL ISSUESDigital investigations are about control of forensic evidence data. From the technical standpoint, this data can be available in three different states: at rest, in motion or in execution. Data at rest is represented by allocated disk space. Whether the data is stored in a database or in a specific file format, it allocates disk space. Furthermore, if a file is deleted, the disk space is de-allocated for the operating system but the data is still accessible since the disk space has not been re-allocated and overwritten. This fact is often exploited by investigators which explore these de-allocated disk space on harddisks. In case the data is in motion, data is transferred from one entity to another e.g. a typical file transfer over a network can be seen as a data in motion scenario. Several encapsulated protocols contain the data each leaving specific traces on systems and network devices which can in return be used by investigators. Data can be loaded into memory and executed as a process. In this case, the data is neither at rest or in motion but in execution. On the executing system, process information, machine instruction and allocated/de-allocated data can be analyzed by creating a snapshot of the current system state. In the following sections, we point out the potential sources for evidential data in cloud environments and discuss the technical issues of digital investigations in XaaS environmentsas well as suggest several solutions to these problems.A. Sources and Nature of EvidenceConcerning the technical aspects of forensic investigations, the amount of potential evidence available to the investigator strongly diverges between thedifferent cloud service and deployment models. The virtual machine (VM), hosting in most of the cases the server application, provides several pieces of information that could be used by investigators. On the network level, network components can provide information about possible communication channels between different parties involved. The browser on the client, acting often as the user agent for communicating with the cloud, also contains a lot of information that could be used as evidence in a forensic investigation. Independently from the used model, the following three components could act as sources for potential evidential data.1) Virtual Cloud Instance: The VM within the cloud, where i.e. data is stored or processes are handled, contains potential evidence [2], [3]. In most of the cases, it is the place where an incident happened and hence provides a good starting point for a forensic investigation. The VM instance can be accessed by both, the CSP and the customer who is running the instance. Furthermore, virtual introspection techniques [25] provide access to the runtime state of the VM via the hypervisor and snapshot technology supplies a powerful technique for the customer to freeze specific states of the VM. Therefore, virtual instances can be still running during analysis which leads to the case of live investigations [41] or can be turned off leading to static image analysis. In SaaS and PaaS scenarios, the ability to access the virtual instance for gathering evidential information is highly limited or simply not possible.2) Network Layer: Traditional network forensics is knownas the analysis of network traffic logs for tracing events that have occurred in the past. Since the different ISO/OSI network layers provide several information on protocols and communication between instances within as well as with instances outside the cloud [4], [5], [6], network forensics is theoretically also feasible in cloud environments. However in practice, ordinary CSP currently do not provide any log data from the network components used by the customer’s instances or applications. For instance, in case of a malware infection of an IaaS VM, it will be difficult for the investigator to get any form of routing information and network log datain general which is crucial for further investigative steps. This situation gets even more complicated in case of PaaS or SaaS. So again, the situation of gathering forensic evidence is strongly affected by the support the investigator receives from the customer and the CSP.3) Client System: On the system layer of the client, it completely depends on the used model (IaaS, PaaS, SaaS) if and where potential evidence could beextracted. In most of the scenarios, the user agent (e.g. the web browser) on the client system is the only application that communicates with the service in the cloud. This especially holds for SaaS applications which are used and controlled by the web browser. But also in IaaS scenarios, the administration interface is often controlled via the browser. Hence, in an exhaustive forensic investigation, the evidence data gathered from the browser environment [7] should not be omitted.a) Browser Forensics: Generally, the circumstances leading to an investigation have to be differentiated: In ordinary scenarios, the main goal of an investigation of the web browser is to determine if a user has been victim of a crime. In complex SaaS scenarios with high client-server interaction, this constitutes a difficult task. Additionally, customers strongly make use of third-party extensions [17] which can be abused for malicious purposes. Hence, the investigator might want to look for malicious extensions, searches performed, websites visited, files downloaded, information entered in forms or stored in local HTML5 stores, web-based email contents and persistent browser cookies for gathering potential evidence data. Within this context, it is inevitable to investigate the appearance of malicious JavaScript [18] leading to e.g. unintended AJAX requests and hence modified usage of administration interfaces. Generally, the web browser contains a lot of electronic evidence data that could be used to give an answer to both of the above questions - even if the private mode is switched on [19].B. Investigations in XaaS EnvironmentsTraditional digital forensic methodologies permit investigators to seize equipment and perform detailed analysis on the media and data recovered [11]. In a distributed infrastructure organization like the cloud computing environment, investigators are confronted with an entirely different situation. They have no longer the option of seizing physical data storage. Data and processes of the customer are dispensed over an undisclosed amount of virtual instances, applications and network elements. Hence, it is in question whether preliminary findings of the computer forensic community in the field of digital forensics apparently have to be revised and adapted to the new environment. Within this section, specific issues of investigations in SaaS, PaaS and IaaS environments will be discussed. In addition, cross-disciplinary issues which affect several environments uniformly, will be taken into consideration. We also suggest potential solutions to the mentioned problems.1) SaaS Environments: Especially in the SaaS model, the customer does notobtain any control of the underlying operating infrastructure such as network, servers, operating systems or the application that is used. This means that no deeper view into the system and its underlying infrastructure is provided to the customer. Only limited userspecific application configuration settings can be controlled contributing to the evidences which can be extracted fromthe client (see section IV-A3). In a lot of cases this urges the investigator to rely on high-level logs which are eventually provided by the CSP. Given the case that the CSP does not run any logging application, the customer has no opportunity to create any useful evidence through the installation of any toolkit or logging tool. These circumstances do not allow a valid forensic investigation and lead to the assumption that customers of SaaS offers do not have any chance to analyze potential incidences.a) Data Provenance: The notion of Digital Provenance is known as meta-data that describes the ancestry or history of digital objects. Secure provenance that records ownership and process history of data objects is vital to the success of data forensics in cloud environments, yet it is still a challenging issue today [8]. Albeit data provenance is of high significance also for IaaS and PaaS, it states a huge problem specifically for SaaS-based applications: Current global acting public SaaS CSP offer Single Sign-On (SSO) access control to the set of their services. Unfortunately in case of an account compromise, most of the CSP do not offer any possibility for the customer to figure out which data and information has been accessed by the adversary. For the victim, this situation can have tremendous impact: If sensitive data has been compromised, it is unclear which data has been leaked and which has not been accessed by the adversary. Additionally, data could be modified or deleted by an external adversary or even by the CSP e.g. due to storage reasons. The customer has no ability to proof otherwise. Secure provenance mechanisms for distributed environments can improve this situation but have not been practically implemented by CSP [10]. Suggested Solution: In private SaaS scenarios this situation is improved by the fact that the customer and the CSP are probably under the same authority. Hence, logging and provenance mechanisms could be implemented which contribute to potential investigations. Additionally, the exact location of the servers and the data is known at any time. Public SaaS CSP should offer additional interfaces for the purpose of compliance, forensics, operations and security matters to their customers. Through an API, the customers should have the ability to receive specific information suchas access, error and event logs that could improve their situation in case of aninvestigation. Furthermore, due to the limited ability of receiving forensic information from the server and proofing integrity of stored data in SaaS scenarios, the client has to contribute to this process. This could be achieved by implementing Proofs of Retrievability (POR) in which a verifier (client) is enabled to determine that a prover (server) possesses a file or data object and it can be retrieved unmodified [24]. Provable Data Possession (PDP) techniques [37] could be used to verify that an untrusted server possesses the original data without the need for the client to retrieve it. Although these cryptographic proofs have not been implemented by any CSP, the authors of [23] introduced a new data integrity verification mechanism for SaaS scenarios which could also be used for forensic purposes.2) PaaS Environments: One of the main advantages of the PaaS model is that the developed software application is under the control of the customer and except for some CSP, the source code of the application does not have to leave the local development environment. Given these circumstances, the customer obtains theoretically the power to dictate how the application interacts with other dependencies such as databases, storage entities etc. CSP normally claim this transfer is encrypted but this statement can hardly be verified by the customer. Since the customer has the ability to interact with the platform over a prepared API, system states and specific application logs can be extracted. However potential adversaries, which can compromise the application during runtime, should not be able to alter these log files afterwards. Suggested Solution:Depending on the runtime environment, logging mechanisms could be implemented which automatically sign and encrypt the log information before its transfer to a central logging server under the control of the customer. Additional signing and encrypting could prevent potential eavesdroppers from being able to view and alter log data information on the way to the logging server. Runtime compromise of an PaaS application by adversaries could be monitored by push-only mechanisms for log data presupposing that the needed information to detect such an attack are logged. Increasingly, CSP offering PaaS solutions give developers the ability to collect and store a variety of diagnostics data in a highly configurable way with the help of runtime feature sets [38].3) IaaS Environments: As expected, even virtual instances in the cloud get compromised by adversaries. Hence, the ability to determine how defenses in the virtual environment failed and to what extent the affected systems havebeen compromised is crucial not only for recovering from an incident. Also forensic investigations gain leverage from such information and contribute to resilience against future attacks on the systems. From the forensic point of view, IaaS instances do provide much more evidence data usable for potential forensics than PaaS and SaaS models do. This fact is caused throughthe ability of the customer to install and set up the image for forensic purposes before an incident occurs. Hence, as proposed for PaaS environments, log data and other forensic evidence information could be signed and encrypted before itis transferred to third-party hosts mitigating the chance that a maliciously motivated shutdown process destroys the volatile data. Although, IaaS environments provide plenty of potential evidence, it has to be emphasized that the customer VM is in the end still under the control of the CSP. He controls the hypervisor which is e.g. responsible for enforcing hardware boundaries and routing hardware requests among different VM. Hence, besides the security responsibilities of the hypervisor, he exerts tremendous control over how customer’s VM communicate with the hardware and theoretically can intervene executed processes on the hosted virtual instance through virtual introspection [25]. This could also affect encryption or signing processes executed on the VM and therefore leading to the leakage of the secret key. Although this risk can be disregarded in most of the cases, the impact on the security of high security environments is tremendous.a) Snapshot Analysis: Traditional forensics expect target machines to be powered down to collect an image (dead virtual instance). This situation completely changed with the advent of the snapshot technology which is supported by all popular hypervisors such as Xen, VMware ESX and Hyper-V.A snapshot, also referred to as the forensic image of a VM, providesa powerful tool with which a virtual instance can be clonedby one click including also the running system’s mem ory. Due to the invention of the snapshot technology, systems hosting crucial business processes do not have to be powered down for forensic investigation purposes. The investigator simply creates and loads a snapshot of the target VM for analysis(live virtual instance). This behavior is especially important for scenarios in which a downtime of a system is not feasible or practical due to existing SLA. However the information whether the machine is running or has been properly powered down is crucial [3] for the investigation. Live investigations of running virtual instances become more common providing evidence data that。

云计算大数据外文翻译文献

云计算大数据外文翻译文献

云计算大数据外文翻译文献(文档含英文原文和中文翻译)原文:Meet HadoopIn pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers.—Grace Hopper Data!We live in the data age. It’s not easy to measure the total volume of data stored electronically, but an IDC estimate put the size of the “digital universe” at 0.18 zettabytes in2006, and is forecasting a tenfold growth by 2011 to 1.8 zettabytes. A zettabyte is 1021 bytes, or equivalently one thousand exabytes, one million petabytes, or one billion terabytes. That’s roughly the same order of magnitude as one disk drive for every person in the world.This flood of data is coming from many sources. Consider the following:• The New York Stock Exchange generates about one terabyte of new trade data perday.• Facebook hosts approximately 10 billion photos, taking up one petabyte of storage.• , the genealogy site, stores around 2.5 petabytes of data.• The Internet Archive stores around 2 petabytes of data, and is growing at a rate of20 terabytes per month.• The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year.So there’s a lot of data out there. But you are probably wondering how it affects you.Most of the data is locked up in the largest web properties (like search engines), orscientific or financial institutions, isn’t it? Does the advent of “Big Data,” as it is being called, affect smaller organizations or individuals?I argue that it does. Take photos, for example. My wife’s grandfather was an avid photographer, and took photographs throughout his adult life. His entire corpus of medium format, slide, and 35mm film, when scanned in at high-resolution, occupies around 10 gigabytes. Compare this to the digital photos that my family took last year,which take up about 5 gigabytes of space. My family is producing photographic data at 35 times the rate my wife’s grandfather’s did, and the rate is increasing every year as it becomes easier to take more and more photos.More generally, the digital streams that individuals are producing are growing apace. Microsoft Research’s MyLifeBits project gives a glimpse of archiving of personal in formation that may become commonplace in the near future. MyLifeBits was an experiment where an individual’s interactions—phone calls, emails, documents were captured electronically and stored for later access. The data gathered included a photo taken every minute, which resulted in an overall data volume of one gigabyte a month. When storage costs come down enough to make it feasible to store continuous audio and video, the data volume for a future MyLifeBits service will be many times that.The trend is f or every individual’s data footprint to grow, but perhaps more importantly the amount of data generated by machines will be even greater than that generated by people. Machine logs, RFID readers, sensor networks, vehicle GPS traces, retail transactions—all of these contribute to the growing mountain of data.The volume of data being made publicly available increases every year too. Organizations no longer have to merely manage their own data: success in the future will be dictated to a large extent by their ability to extract value from other organizations’ data.Initiatives such as Public Data Sets on Amazon Web Services, , and exist to foster the “information commons,” where data can be freely (or in the case of AWS, for a modest price) shared for anyone to download and analyze. Mashups between different information sources make for unexpected and hitherto unimaginable applications.Take, for example, the project, which watches the Astrometry groupon Flickr for new photos of the night sky. It analyzes each image, and identifies which part of the sky it is from, and any interesting celestial bodies, such as stars or galaxies. Although it’s still a new and experimental service, it shows the kind of things that are possible when data (in this case, tagged photographic images) is made available andused for something (image analysis) that was not anticipated by the creator.It has been said that “More data usually beats better algorithms,” which is to say that for some problems (such as recommending movies or music based on past preferences),however fiendish your algorithms are, they can often be beaten simply by having more data (and a less sophisticated algorithm).The good news is that Big Data is here. The bad news is that we are struggling to store and analyze it.Data Storage and AnalysisThe problem is simple: while the storage capacities of hard drives have increased massively over the years, access speeds--the rate at which data can be read from drives--have not kept up. One typical drive from 1990 could store 1370 MB of data and had a transfer speed of 4.4 MB/s, so you could read all the data from a full drive in around five minutes. Almost 20years later one terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.This is a long time to read all data on a single drive and writing is even slower. The obvious way to reduce the time is to read from multiple disks at once. Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes.Only using one hundredth of a disk may seem wasteful. But we can store one hundred datasets, each of which is one terabyte, and provide shared access to them. We can imagine that the users of such a system would be happy to share access in return for shorter analysis times, and, statistically, that their analysis jobs would be likely to be spread over time, so they wouldn`t interfere with each other too much.There`s more to being able to read and write data in parallel to or from multiple disks, though. The first problem to solve is hardware failure: as soon as you start using many pieces of hardware, the chance that one will fail is fairly high. A common way of avoiding data loss is through replication: redundant copies of the data are kept by the system so that in the event of failure, there is another copy available. This is how RAID works, for instance, although Hadoop`s filesystem, the Hadoop Distributed Filesystem (HDFS),takes a slightly different approach, as you shall see later. The second problem is that most analysis tasks need to be able to combine the data in some way; data read from one disk may need to be combined with the data from any of the other 99 disks. Various distributed systems allow data to be combined from multiple sources, but doing this correctly is notoriously challenging. MapReduce provides a programming model that abstracts the problem from disk reads and writes, transforming it into a computation over sets of keys and values. We will look at the details of this model in later chapters, but the important point for the present discussion is that there are two parts to the computation, the map and the re duce, and it’s the interface between the two where the “mixing” occurs. Like HDFS, MapReduce has reliability built-in.This, in a nutshell, is what Hadoop provides: a reliable shared storage and analysis system. The storage is provided by HDFS, and analysis by MapReduce. There are other parts to Hadoop, but these capabilities are its kernel.Comparison with Other SystemsThe approach taken by MapReduce may seem like a brute-force approach. The premise is that the entire dataset—or at least a good portion of it—is processed for each query. But this is its power. MapReduce is a batch query processor, and the ability to run an ad hoc query against your whole dataset and get the results in a reasonable time is transformative. It changes the way you think about data, and unlocks data that was previously archived on tape or disk. It gives people the opportunity to innovate with data. Questions that took too long to get answered before can now be answered, which in turn leads to new questions and new insights.For e xample, Mailtrust, Rackspace’s mail division, used Hadoop for processing email logs. One ad hoc query they wrote was to find the geographic distribution of their users.In their words: This data was so useful that we’ve scheduled the MapReduce job to run monthly and we will be using this data to help us decide which Rackspace data centers to place new mail servers in as we grow. By bringing several hundred gigabytes of data together and having the tools to analyze it, the Rackspace engineers were able to gain an understanding of the data that they otherwise would never have had, and, furthermore, they were able to use what they had learned to improve the service for their customers. You can read more about how Rackspace uses Hadoop in Chapter 14.RDBMSWhy c an’t we use databases with lots of disks to do large-scale batch analysis? Why is MapReduce needed? The answer to these questions comes from another trend in disk drives: seek time is improving more slowly than transfer rate. Seeking is the process of moving the disk’s head to a particular place on the disk to read or write data. It characterizes the latency of a disk operation, whereas the transfer rate corresponds to a disk’s bandwidth.If the data access pattern is dominated by seeks, it will take longer to read or write large portions of the dataset than streaming through it, which operates at the transfer rate. On the other hand, for updating a small proportion of records in a database, a traditional B-Tree (the data structure used in relational databases, which is limited by the rate it can perform seeks) works well. For updating the majority of a database, a B-Tree is less efficient than MapReduce, which uses Sort/Merge to rebuild the database.In many ways, MapReduce can be seen as a complement to an RDBMS. (The differences between the two systems are shown in Table 1-1.) MapReduce is a good fit for problems thatneed to analyze the whole dataset, in a batch fashion, particularly for ad hoc analysis. An RDBMS is good for point queries or updates, where the dataset has been indexed to deliver low-latency retrieval and update times of a relatively small amount of data. MapReduce suits applications where the data is written once, and read many times, whereas a relational database is good for datasets that are continually updated.Table 1-1. RDBMS compared to MapReduceTraditional RDBMS MapReduceData size Gigabytes PetabytesAccess Interactive and batch BatchWrite once, read many times Updates Read and write manytimesStructure Static schema Dynamic schemaIntegrity High LowScaling Nonlinear LinearAnother difference between MapReduce and an RDBMS is the amount of structure in the datasets that they operate on. Structured data is data that is organized into entities that have a defined format, such as XML documents or database tables that conform to a particular predefined schema. This is the realm of the RDBMS. Semi-structured data, on the other hand, is looser, and though there may be a schema, it is often ignored, so it may be used only as a guide to the structure of the data: for example, a spreadsheet, in which the structure is the grid of cells, although the cells themselves may hold anyform of data. Unstructured data does not have any particular internal structure: for example, plain text or image data. MapReduce works well on unstructured or semistructured data, since it is designed to interpret the data at processing time. In other words, the input keys and values for MapReduce are not an intrinsic property of the data, but they are chosen by the person analyzing the data.Relational data is often normalized to retain its integrity, and remove redundancy. Normalization poses problems for MapReduce, since it makes reading a record a nonlocaloperation, and one of the central assumptions that MapReduce makes is that it is possible to perform (high-speed) streaming reads and writes.A web server log is a good example of a set of records that is not normalized (for example, the client hostnames are specified in full each time, even though the same client may appear many times), and this is one reason that logfiles of all kinds are particularly well-suited to analysis with MapReduce.MapReduce is a linearly scalable programming model. The programmer writes two functions—a map function and a reduce function—each of which defines a mapping from one set of key-value pairs to another. These functions are oblivious to the size of the data or the cluster that they are operating on, so they can be used unchanged for a small dataset and for a massive one. More importantly, if you double the size of the input data, a job will run twice as slow. But if you also double the size of the cluster, a job will run as fast as the original one. This is not generally true of SQL queries.Over time, however, the differences between relational databases and MapReduce systems are likely to blur. Both as relational databases start incorporating some of the ideas from MapReduce (such as Aster Data’s and Greenplum’s databases), and, from the other direction, as higher-level query languages built on MapReduce (such as Pig and Hive) make MapReduce systems more approachable to traditional database programmers.Grid ComputingThe High Performance Computing (HPC) and Grid Computing communities have been doing large-scale data processing for years, using such APIs as Message Passing Interface (MPI). Broadly, the approach in HPC is to distribute the work across a cluster of machines, which access a shared filesystem, hosted by a SAN. This works well for predominantly compute-intensive jobs, but becomes a problem when nodes need to access larger data volumes (hundreds of gigabytes, the point at which MapReduce really starts to shine), since the network bandwidth is the bottleneck, and compute nodes become idle.MapReduce tries to colocate the data with the compute node, so data access is fast since it is local. This feature, known as data locality, is at the heart of MapReduce and is the reason for its good performance. Recognizing that network bandwidth is the most precious resource in a data center environment (it is easy to saturate network links by copying data around),MapReduce implementations go to great lengths to preserve it by explicitly modelling network topology. Notice that this arrangement does not preclude high-CPU analyses in MapReduce.MPI gives great control to the programmer, but requires that he or she explicitly handle the mechanics of the data flow, exposed via low-level C routines and constructs, such as sockets, as well as the higher-level algorithm for the analysis. MapReduce operates only at the higher level: the programmer thinks in terms of functions of key and value pairs, and the data flow is implicit.Coordinating the processes in a large-scale distributed computation is a challenge. The hardest aspect is gracefully handling partial failure—when you don’t know if a remote process has failed or not—and still making progress with the overall computation. MapReduce spares the programmer from having to think about failure, since the implementation detects failed map or reduce tasks and reschedules replacements on machines that are healthy. MapReduce is able to do this since it is a shared-nothing architecture, meaning that tasks have no dependence on one other. (This is a slight oversimplification, since the output from mappers is fed to the reducers, but this is under the control of the MapReduce system; in this case, it needs to take more care rerunning a failed reducer than rerunning a failed map, since it has to make sure it can retrieve the necessary map outputs, and if not, regenerate them by running the relevant maps again.) So from the programmer’s point of view, the order in which the tasks run doesn’t matter. By contrast, MPI programs have to explicitly manage their own checkpointing and recovery, which gives more control to the programmer, but makes them more difficult to write.MapReduce might sound like quite a restrictive programming model, and in a sense itis: you are limited to key and value types that are related in specified ways, and mappers and reducers run with very limited coordination between one another (the mappers pass keys and values to reducers). A natural question to ask is: can you do anything useful or nontrivial with it?The answer is yes. MapReduce was invented by engineers at Google as a system for building production search indexes because they found themselves solving the same problem over and over again (and MapReduce was inspired by older ideas from the functional programming, distributed computing, and database communities), but it has since been used for many other applications in many other industries. It is pleasantly surprising to see the range of algorithms that can be expressed in MapReduce, from image analysis, to graph-based problems,to machine learning algorithms. It can’t solve every problem, of course, but it is a general data-processing tool.You can see a sample of some of the applications that Hadoop has been used for in Chapter 14.Volunteer ComputingWhen people first hear about Hadoop and MapReduce, they often ask, “How is it different from SETI@home?” SETI, the Search for Extra-Terrestrial Intelligence, runs a project called SETI@home in which volunteers donate CPU time from their otherwise idle computers to analyze radio telescope data for signs of intelligent life outside earth. SETI@home is the most well-known of many volunteer computing projects; others include the Great Internet Mersenne Prime Search (to search for large prime numbers) and Folding@home (to understand protein folding, and how it relates to disease).Volunteer computing projects work by breaking the problem they are trying to solve into chunks called work units, which are sent to computers around the world to be analyzed. For example, a SETI@home work unit is about 0.35 MB of radio telescope data, and takes hours or days to analyze on a typical home computer. When the analysis is completed, the results are sent back to the server, and the client gets another work unit. As a precaution to combat cheating, each work unit is sent to three different machines, and needs at least two results to agree to be accepted.Although SETI@home may be superficially similar to MapReduce (breaking a problem into independent pieces to be worked on in parallel), there are some significant differences. The SETI@home problem is very CPU-intensive, which makes it suitable for running on hundreds of thousands of computers across the world, since the time to transfer the work unit is dwarfed by the time to run the computation on it. Volunteers are donating CPU cycles, not bandwidth.MapReduce is designed to run jobs that last minutes or hours on trusted, dedicated hardware running in a single data center with very high aggregate bandwidth interconnects. By contrast, SETI@home runs a perpetual computation on untrusted machines on the Internet with highly variable connection speeds and no data locality.译文:初识Hadoop古时候,人们用牛来拉重物,当一头牛拉不动一根圆木的时候,他们不曾想过培育个头更大的牛。

云计算外文文献+翻译

云计算外文文献+翻译

云计算外文文献+翻译1. 引言云计算是一种基于互联网的计算方式,它通过共享的计算资源提供各种服务。

随着云计算的普及和应用,许多研究者对该领域进行了深入的研究。

本文将介绍一篇外文文献,探讨云计算的相关内容,并提供相应的翻译。

2. 外文文献概述作者:Antonio Fernández Anta, Chryssis Georgiou, Evangelos Kranakis出版年份:2019年该外文文献主要综述了云计算的发展和应用。

文中介绍了云计算的基本概念,包括云计算的特点、架构、服务模型以及云计算的挑战和前景。

3. 研究内容该研究综述了云计算技术的基本概念和相关技术。

文中首先介绍了云计算的定义和其与传统计算的比较,深入探讨了云计算的优势和不足之处。

随后,文中介绍了云计算的架构,包括云服务提供商、云服务消费者和云服务的基本组件。

在架构介绍之后,文中提供了云计算的三种服务模型:基础设施即服务(IaaS)、平台即服务(PaaS)和软件即服务(SaaS)。

每种服务模型都从定义、特点和应用案例方面进行了介绍,并为读者提供了更深入的了解。

此外,文中还讨论了云计算的挑战,包括安全性、隐私保护、性能和可靠性等方面的问题。

同时,文中也探讨了云计算的前景和未来发展方向。

4. 文献翻译《云计算:一项调查》是一篇全面介绍云计算的文献。

它详细解释了云计算的定义、架构和服务模型,并探讨了其优势、不足和挑战。

此外,该文献还对云计算的未来发展进行了预测。

对于研究云计算和相关领域的读者来说,该文献提供了一个很好的参考资源。

它可以帮助读者了解云计算的基本概念、架构和服务模型,也可以引导读者思考云计算面临的挑战和应对方法。

5. 结论。

超越台式机一个关于云计算的介绍 毕业论文外文翻译

超越台式机一个关于云计算的介绍 毕业论文外文翻译

超越台式机一个关于云计算的介绍毕业论文外文翻译翻译部分英文原文 Beyond the Desktop: An Introduction to Cloud Computing Michael Miller In a world that sees new technological trends bloom and fade on almost a dailybasis one new trend promises more longevity. This trend is called cloud computingand it will change the way you use your computer and the Internet. Cloud computing portends a major change in how we store information and runapplications. Instead of running program sand data on an individual desktop computereverything is hosted in the “cloud”—a nebulous assemblage of computers and serversaccessed via the Internet. Cloud computing lets you access all your applications anddocuments from anywhere in the world freeing you from the confines of the desktopand making it easier for group members in different locations to collaborate.PART 1 Understanding Cloud Computing The emergence of cloud computing is the computing equivalent of the electricityrevolution of a century ago. Before the advent of electrical utilities every farm andbusiness produced its own electricity from freestanding generators. After the electricalgrid was created farms and businesses shut down their generators and boughtelectricity from the utilities at a much lower price and with much greater reliabilitythan they could produce on their own. Look for the same type of revolution to occur as cloud computing takes hold.The desktop-centric notion of computing that we hold today is bound to fall by thewayside as we come to expect the universal access 24/7 reliability andubiquitouscollaboration promised by cloud computing. It is the way of the future.Cloud Computing: What It Is—and What It Isn’t With traditional desktopcomputing you run copies of software programs on eachcomputer you own. The documents you create are stored on the computer on whichthey were created. Although documents can be accessed from other computers on thenetwork they can’t be accessed by computers outside the network. The whole scene is PC-centric. With cloud computing the software programs you use aren’t run from yourpersonal computer but are rather stored on servers accessed via the Internet. If yourcomputer crashes the software is still available for others to use. Same goes for thedocuments you create they’re stored on a collection of servers accessed via theInternet. Anyone with permission can not only access the documents but can also editand collaborate on those documents in real time. Unlike traditional computing thiscloud computing model isn’t PC-centric it’sdocument-centric. Which PC you use to access a document simplyisn’t important. Butthat’s a simplification. Let’s look in more detail at what cloud computingis—and just asimportant what it isn’t.What Cloud Computing Isn’t First cloud computing isn’t network computing. With networkcomputing applications/documents are hosted on a single company’s server and accessed overthe company’s network. Cloud computing is a lot biggerthan that. It encompassesmultiple companies multiple servers andmultiple networks. Plus unlike networkcomputing cloud services and storage are accessible from anywhere in the world overan Internet connection with network computing access is over the company’snetwork only. Cloud computing also isn’t traditional outsourcing where a company farms outsubcontracts its computing services to an outside firm. While an outsourcing firmmight host a company’s data or applications those documents and programs are onlyaccessible to the company’s employees via the company’s network not to the entireworld via the Internet. So despite superficial similarities networking computing and outsourcing are notcloud computing.What Cloud Computing Is Key to the definition of cloud computing is the “cloud” itself. For our purposesthe cloud is a large group of interconnected computers. These computers can bepersonal computers ornetwork servers they can be public or private. For example Google hosts a cloud that consists of both smallish PCs and largerservers. Google’s cloud is a private one that is Google owns it that is publiclyaccessible by Google’s users. This cloud of computers extends beyond a single company or enterprise. Theapplications and data served by the cloud are available to broad group of userscross-enterprise and cross-platform. Access is via the Internet. Any authorized usercan access these docs and apps from any computer over any Internet connection. Andto the user the technology and infrastructure behind the cloud is invisible. It isn’t apparent and in most casesdoesn’t matter whether cloud services arebased on HTTP HTML XML JavaScript or other specific technologies. _ Cloud computing is user-centric. Once you as a user are connected to the cloudwhatever is stored there—documents messages images applicationswhatever—becomes yours. In addition not only is the data yours but you can alsoshare it with others. In effect any device that accesses your data in the cloud alsobecomes yours. _ Cloud computing is task-centric. Instead offocusing on the application andwhat it can do the focus is on what you need done and how the application can do itfor you. Traditional applications—wordprocessing spreadsheets email and soon—are becoming less important than thedocuments they create.PART 2 Understanding Cloud Computing _ Cloud computing is powerful. Connecting hundreds or thousands of computerstogether in a cloud creates a wealth of computing power impossible with a singledesktop PC. _ Cloud computing is accessible. Because data is stored in the cloud users caninstantly retrieve more information from multiple repositories. You’re not limited t o asingle source of data asyou are with a desktop PC. _ Cloud computing is intelligent. Withall the various data stored on thecomputers in a cloud data mining and analysis are necessary to access thatinformation in an intelligent manner. _ Cloud computing is programmable. Many of the tasks necessary with cloudcomputing must be automated. For example to protect theintegrity of the datainformation stored on a single computer in the cloud must be replicated on othercomputers in the cloud. If that one compu ter goes offline the cloud’sprogrammingautomatically redistributesthat computer’s data to a new computer in the cloud. All these definitions behind us what constitutes cloud computing in the realworld As you’ll learn throughout this book a raft of web-hostedInternet-accessibleGroup-collaborative applications are currently available with many more on the way.Perhaps the best and most popular examples of cloud computing applications todayare the Google family of applications—Google Docs amp SpreadsheetsGoogleCalendar Gmail Picasa and the like. All of these applications are hosted on Google’sservers are accessible to any user with an Internet connection and can be used forgroup collaboration from anywhere in the world. In short cloud computing enables a shift from the computer to the user fromapplications to tasks and from isolated data to datathat can be accessed fromanywhere and shared with anyone. The user no longer has to take on the task of datamanagement he doesn’t even have to remember where the data is. All that matters isthat the data is in the cloud and thus immediately available to that user and to otherauthorized users.From Collaboration to the Cloud: A Short History of Cloud Computing Cloud computing has as its antecedents bothclient/server computing andpeer-to-peer distributed computing. It’s all a matter of how centralizedstoragefacilitates collaboration and how multiple computers work together to increasecomputing power.Client/Server Computing: Centralized Applications and Storage In the antediluvian days of computing pre-1980 or so everything operated ontheclient/server model. All the software applications all the data and all the controlresided on huge mainframe computers otherwise known as servers. If a user wantedto access specific data or run a program he had to connect to the mainframe gainappropriate access and then do his business while essentially “renting” the programor data from the server. Users connected to the server via a computer terminal sometimes called aworkstation or client. This computer was sometimes called a dumb terminal because itdidn’t have a lot if any memory storage space or processing power. It was merelya device that connected the user to and enabled him to use the mainframe computer. Users accessed the mainframe only when granted permission and the informationtechnology IT staff weren’t in the habit of handing out access casually. Even on amainframe computer processing power is limited—and the IT staff were theguardians of that power. Access was not immediate nor could two users access thesame data at the same time. Beyond that users pretty much had to take whatever the IT staff gavethem—with no variations. Want to customize a reportto show only a subset of thenormal information Can’t do it. Want to create a new report to look at some new dataYou can’t do it although the IT staff can—but on their schedulewhich might beweeks from now. The fact is when multiple people are sharing a single computer even if thatcomputer is a huge mainframe you have to wait your turn. Need to rerun a financialreport No problem—if you don’t mind waiting until this afternoon ortomorrowmorning. There isn’t always immediate access in aclient/server environment andseldom is there immediate gratification. So the client/server model while providing similar centralized storage differedfrom cloud computing in that it did not have a user-centric focus with client/servercomputing all the control rested with the mainframe—and with the guardians of thatsingle computer. It was not a user-enabling environment.Peer-to-Peer Computing: Sharing Resources As you can imagine accessing a client/server system was kind of a “hurry up andwait” experience. The server part of the system also created a huge bottleneck. Allcommunications between computers had to go through the server first howeverinefficient that might be. The obvious need to connect one computer to another without first hitting theserver led to the development of peer-to-peer P2P computing. P2P computingdefines a network architecture in which each computer has equivalent capabilities andresponsibilities. This is in contrast to the traditionalclient/server network architecturein which one or more computers are dedicated to serving the others. This relationshipis sometimes characterized as a master/slave relationship with the central server asthe master and the client computer as the slave. P2P was an equalizing concept. In the P2P environment every computer is aclient and a serverthere are no masters and slaves. By recognizing all computers onthe network as peers P2P enables direct exchange of resources and services. There isno need for a central server because any computer can function in that capacity whencalled on to do so. P2P was also a decentralizing concept. Control is decentralized with allcomputers functioning as equals. Content is also dispersed among the various peercomputers. No centralized server is assigned to host the available resources andservices. Perhaps the most notable implementation of P2P computing is the Internet. Manyof today’s usersforget or never knew that the Internet was initially conceived underits original ARPAnet guise as a peer-to-peer system that would share computingresources across the United States. The various ARPAnet sites—and there weren’tmany of them—were connectedtogether not as clients and servers but as equals. The P2P nature of the early Internet was best exemplified by the Usenet enet which was created back in 1979 was anetwork of computers accessed viathe Internet each of which hosted the entire contents of the network. Messages werepropagated between the peer computers users connecting to any single Usenet serverhad access to all or substantially all the messages posted to each individualserver.Although the users’ connection to the U senet server was of the traditionalclient/server nature the relationship between the Usenet servers was definitelyP2P—and presaged the cloud computing of today. That said not every part of the Internet is P2P in nature. With thedevelopment ofthe World Wide Web came a shift away from P2P back to the client/server model. Onthe web each website is served up by a group of computers and sites’ visitors useclient software web browsers to access it. Almost all content is centralized allcontrol is centralized and the clients have no autonomy or control in the process.Distributed Computing: Providing More Computing Power One of the most important subsets of the P2P model is that ofdistributedcomputing where idle PCs across a network or across the Internet are tapped toprovide computing power for large processor-intensive projects. It’s a simpleconceptall about cycle sharing between multiple computers. Apersonal computer running full-out 24 hours a day 7 days a week is capableof tremendous computing power. Most people don’t use their computers 24/7however so a good portion of a computer’s resources go unused. Distributedcomputing uses those resources. When a computer is enlisted for a distributed computing project software isinstalled on the machine to run various processing activities during those periodswhenthe PC is typically unused. The results of that spare-time processing areperiodically uploaded to the distributed computing network and combined withsimilar results from other PCs in the project. The resultif enough computers areinvolved simulates the processing power of much larger mainframes andsupercomputers—which is necessary for some very large and complexcomputingprojects. For example genetic research requires vast amounts of computing power. Left totraditional means it might take years to solve essential mathematical problems. Byconnecting together thousands or millions of individual PCs more power is appliedto the problem and the results are obtained that much sooner. Distributed computing dates back to 1973 when multiple computers werenetworked together at the Xerox PARC labs and worm software was developed tocruise through the network looking for idle resources. A more practical application ofdistributed computing appeared in 1988 when researchers at the DEC DigitalEquipment Corporation System Research Center developed software that distributedthe work to factor large numbers among workstations within their laboratory. By 1990a group of about 100 users utilizing this software had factored a 100-digit number.By 1995 this same effort had been expanded to the web to factor a 130-digit number.It wasn’t long before distributed computing hit the Internet. The first majorInternet-based distributed computing project was launched in 1997which employed thousands of personal computers to crack encryption codes. Evenbigger was SETIhome launched in May 1999 which linked together millions ofindividual computers to search forintelligent life in outer space. Many distributedcomputing projects are conducted within large enterprises using traditional networkconnections to form the distributed computing network. Other larger projects utilizethe computers of everyday Internet users with the computing typically taking placeoffline and then uploaded once a day viatraditional consumer Internet connections.Understanding Cloud Architecture The key to cloud computing is the“cloud”—a massive network of servers or evenindividual PCs interconnected in a grid.These computers run in parallel combiningthe resources of each to generatesupercomputing-like po.。

云技术和服务中英文对照外文翻译文献

云技术和服务中英文对照外文翻译文献

云技术和服务中英文对照外文翻译文献中英文对照外文翻译文献(文档含英文原文和中文翻译)译文:利用云技术和服务的新兴前沿:资产优化利用摘要投资回报最大化是一个主要的焦点对所有公司。

信息被视为手段这样做。

此信息是用来跟踪性能和提高财务业绩主要通过优化利用公司资产。

能力和速度,这是可能的收集信息并将其分发到当前的技术该组织正在不断增加,事实上,有超过了行业的能力,接受和利用它。

今天,生产运营商被淹没在数据的结果一种改进的监控资产的能力。

智能电机保护和智能仪器和条件监控系统经常提供32多块每个设备的信息都与相关的报警。

通常运营商没有装备来理解或行动在这个信息。

生产企业需要充分利用标的物专门为这个目的,通过定位他们的工程人员区域中心。

这些工程师需要配备足够的知识能够理解和接受适当的行动来处理警报和警报通过这些智能设备。

可用的信息可以是有用的,在寻找方法增加生产,减少计划外的维护和最终减少停机时间。

然而,寻找信息在实时,或在实时获得有用的信息,而不花了显着的非生产时间,使数据有用的是一个巨大的挑战。

本文将介绍云技术作为一种获取可视化和报告的经济方法条件为基础的数据。

然后,它将讨论使用云技术,使工程资源与现场数据可通过网络浏览器访问的安全格式技术。

我们将覆盖资产的方法多个云服务的优化和效益飞行员和项目。

当重工业公司在全球范围看,世界级运营实现整体设备效率(OEE)得分百分之91。

从历史上看,石油和天然气行业的滞后,这得分十分以上(Aberdeen 集团“操作风险管理”十月2011)。

OEE 是质量的商,可用性和效率得分。

这些,可用性似乎影响了石油和天然气行业的最大程度。

在石油和天然气的可用性得分的根源更深的研究导致旋转资产作为故障的根本原因,在70%的情况下,失去的生产或计划外停机。

鉴于这一行业的关键资产失败的斗争,但有方法,以帮助推动有效性得分较高,以实现经营效率的目标。

.在未来十年中,海上石油储量的追求将涉及复杂的提取方法和海底提取技术的广泛使用。

新技术云计算外文文献

新技术云计算外文文献

新技术云计算外文文献云计算外文文献⒈引言⑴研究背景⑵目的和意义⒉云计算的基本概念⑴云计算的定义⑵云计算的特点⒊云计算的关键技术⑴虚拟化技术⑵分布式计算⑶大数据处理⑷自动化管理⒋云计算的部署模型⑴公有云⑵私有云⑶混合云⑷社区云⒌云计算的服务模型⑴基础设施即服务 (IaaS)⑵平台即服务 (PaaS)⑶软件即服务 (SaaS)⒍云计算的安全问题⑴数据隐私⑵身份认证⑶访问控制⑷数据备份与恢复⒎云计算在行业中的应用⑴云计算在教育领域中的应用⑵云计算在医疗领域中的应用⑶云计算在金融领域中的应用⑷云计算在制造业中的应用⒏云计算的挑战与发展趋势⑴数据安全和隐私保护⑵法律法规的不完善⑶技术标准的制定和认证⑷云计算的发展趋势⒐结论⑴结论总结⑵研究展望附件:本文所涉及的附件包括相关统计数据和图表。

法律名词及注释:- 数据隐私:指个人或组织在使用云计算服务时,其数据不被未经授权的第三方获取或使用的权利。

- 身份认证:指在云计算环境中确认用户身份的过程,以确保用户访问的合法性和安全性。

- 访问控制:指对云计算资源的访问进行授权和管理,以保护资源免受未经授权的访问。

- 数据备份与恢复:指将数据复制到其他物理位置,以防止数据丢失,并在需要时恢复数据的过程。

- 公有云:指由第三方服务提供商提供的基于互联网的计算资源和服务,供公众使用。

- 私有云:指由单个组织内部使用的云计算基础设施,旨在满足特定组织的需求。

- 混合云:指同时使用公有云和私有云的云计算部署模型,以满足不同需求。

- 社区云:指由特定用户群体使用的云计算资源和服务,通常由或行业组织提供。

云计算外文翻译原文

云计算外文翻译原文

Implementation Issues of A Cloud Computing PlatformBo Peng,Bin Cui and Xiaoming LiDepartment of Computer Science and Technology,Peking University{pb,bin.cui,lxm}@AbstractCloud computing is Internet based system development in which large scalable computing resources are provided“as a service”over the Internet to users.The concept of cloud computing incorporates web infrastructure,software as a service(SaaS),Web2.0and other emerging technologies,and has attracted more and more attention from industry and research community.In this paper,we describe our experience and lessons learnt in construction of a cloud computing platform.Specifically,we design a GFS compatiblefile system with variable chunk size to facilitate massive data processing,and introduce some implementation enhancement on MapReduce to improve the system throughput.We also discuss some practical issues for system implementation.In association of the China web archive(Web InfoMall) which we have been accumulating since2001(now it contains over three billion Chinese web pages), this paper presents our attempt to implement a platform for a domain specific cloud computing service, with large scale web text mining as targeted application.And hopefully researchers besides our selves will benefit from the cloud when it is ready.1IntroductionAs more facets of work and personal life move online and the Internet becomes a platform for virtual human society,a new paradigm of large-scale distributed computing has emerged.Web-based companies,such as Google and Amazon,have built web infrastructure to deal with the internet-scale data storage and computation. If we consider such infrastructure as a“virtual computer”,it demonstrates a possibility of new computing model, i.e.,centralize the data and computation on the“super computer”with unprecedented storage and computing capability,which can be viewed as a simplest form of cloud computing.More generally,the concept of cloud computing can incorporate various computer technologies,including web infrastructure,Web2.0and many other emerging technologies.People may have different perspectives from different views.For example,from the view of end-user,the cloud computing service moves the application software and operation system from desktops to the cloud side,which makes users be able to plug-in anytime from anywhere and utilize large scale storage and computing resources.On the other hand,the cloud computing service provider may focus on how to distribute and schedule the computer resources.Nevertheless,the storage and computing on massive data are the key technologies for a cloud computing infrastructure.Google has developed its infrastructure technologies for cloud computing in recent years,including Google File System(GFS)[8],MapReduce[7]and Bigtable[6].GFS is a scalable distributedfile system,which Copyright0000IEEE.Personal use of this material is permitted.However,permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists,or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Bulletin of the IEEE Computer Society Technical Committee on Data Engineering61emphasizes fault tolerance since it is designed to run on economically scalable but inevitably unreliable(due to its sheer scale)commodity hardware,and delivers high performance service to a large number of clients. Bigtable is a distributed storage system based on GFS for structured data management.It provides a huge three-dimensional mapping abstraction to applications,and has been successfully deployed in many Google products.MapReduce is a programming model with associated implementation for massive data processing. MapReduce provides an abstraction by defining a“mapper”and a“reducer”.The“mapper”is applied to every input key/value pair to generate an arbitrary number of intermediate key/value pairs.The“reducer”is applied to all values associated with the same intermediate key to generate output key/value pairs.MapReduce is an easy-to-use programming model,and has sufficient expression capability to support many real world algorithms and tasks.The MapReduce system can partition the input data,schedule the execution of program across a set of machines,handle machine failures,and manage the inter-machine communication.More recently,many similar systems have been developed.KosmosFS[3]is an open source GFS-Like system,which supports strict POSIX interface.Hadoop[2]is an active Java open source project.With the support from Yahoo,Hadoop has achieved great progress in these two years.It has been deployed in a large system with4,000nodes and used in many large scale data processing tasks.In Oct2007,Google and IBM launched“cloud computing initiative”programs for universities to promote the related teaching and research work on increasingly popular large-scale ter in July2008,HP, Intel and Yahoo launched a similar initiative to promote and develop cloud computing research and education. Such cloud computing projects can not only improve the parallel computing education,but also promote the research work such as Internet-scale data management,processing and scientific computation.Inspired by this trend and motivated by a need to upgrade our existing work,we have implemented a practical web infrastructure as cloud computing platform,which can be used to store large scale web data and provide high performance processing capability.In the last decade,our research and system development focus is on Web search and Web Mining,and we have developed and maintained two public web systems,i.e.,Tianwang Search Engine[4]and Web Archive system Web infomall[1]as shown in Figure1.(a)Tianwang(b)Web infomallFigure1:Search engine and Chines web archive developed at SEWM group of PKU During this period,we have accumulated more than50TB web data,built a PC cluster consisting of100+ PCs,and designed various web application softwares such as webpage text analysis and processing.With the increase of data size and computation workload in the system,we found the cloud computing technology is a promising approach to improve the scalability and productivity of the system for web services.Since2007,we62started to design and develop our web infrastructure system,named“Tplatform”,including GFS-likefile system “TFS”[10]and MapReduce computing environment.We believe our practice of cloud computing platform implementation could be a good reference for researchers or engineers who are interested in this area.2TPlatform:A Cloud Computing PlatformIn this section,we briefly introduce the implementation and components of our cloud computing platform, named“Tplatform”.Wefirst present the overview of the system,followed by the detailed system implementation and some practical issues.Figure2:The System Framework of TplatformFig2shows the overall system framework of the“Tplatform”,which consists of three layers,i.e.,PC cluster, infrastructure for cloud computing platform,and data processing application layer.The PC cluster layer provides the hardware and storage devices for large scale data processing.The application layer provides the services to users,where users can develop their own applications,such as Web data analysis,language processing,cluster and classification,etc.The second layer is the main focus of our work,consisting offile system TFS,distributed data storage mechanism BigTable,and MapReduce programming model.The implementation of BigTable is similar to the approach presented in[6],and hence we omit detailed discussion here.2.1Implementation of File SystemThefile system is the key component of the system to support massive data storage and management.The designed TFS is a scalable,distributedfile system,and each TFS cluster consists of a single master and multiple chunk servers and can be accessed by multiple client.632.1.1TFS ArchitectureIn TFS,files are divided into variable-size chunks.Each chunk is identified by an immutable and globally unique 64bit chunk handle assigned by the master at the time of chunk creation.Chunk servers store the chunks on the local disks and read/write chunk data specified by a chunk handle and byte range.For the data reliability, each chunk is replicated on multiple chunk servers.By default,we maintain three replicas in the system,though users can designate different replication levels for differentfiles.The master maintains the metadata offile system,which includes the namespace,access control information, the mapping fromfiles to chunks,and the current locations of chunks.It also controls system-wide activities such as garbage collection of orphaned chunks,and chunk migration between chunk servers.Each chunk server periodically communicates with the master in HeartBeat messages to report its state and retrieve the instructions.TFS client module is associated with each application by integrating thefile system API,which can commu-nicate with the master and chunkservers to read or write data on behalf of the application.Clients interact with the master for metadata operations,but all data-bearing communication goes directly to the chunkservers.The system is designed to minimize the master’s involvement infile accessing operations.We do not provide the POSIX API.Besides providing the ordinary read and write operations,like GFS,we have also provided an atomic record appending operation so that multiple clients can append concurrently to afile without extra synchronization among them.In the system implementation,we observe that the record appending operation is the key operation for system performance.We design our own system interaction mechanism which is different from GFS and yields better record appending performance.2.1.2Variable Chunk SizeIn GFS,afile is divided intofixed-size chunks(e.g.,64MB).When a client uses record appending operation to append data,the system checks whether appending the record to the last chunk of a certainfile may make the chunk overflowed,i.e.,exceed the maximum size.If so,it pads all the replica of the chunk to the maximum size, and informs the client that the operation should be continued on the new chunk.(Record appending is restricted to be at most one-fourth of the chunk size to keep worst case fragmentation at an acceptable level.)In case of write failure,this approach may lead to duplicated records and incomplete records.In our TFS design,the chunks of afile are allowed to have variable sizes.With the proposed system in-teraction mechanism,this strategy makes the record appending operation more efficient.Padding data,record fragments and record duplications are not necessary in our system.Although this approach brings some extra cost,e.g.,every data structure of chunk needs a chunk size attribute,the overall performance is significantly improved,as the read and record appending operations are the dominating operations in our system and can benefit from this design choice.2.1.3File OperationsWe have designed differentfile operations for TFS,such as read,record append and write.Since we allow variable chunk size in TFS,the operation strategy is different from that of GFS.Here we present the detailed implementation of read operation to show the difference of our approach.To read afile,the client exchanges messages with the master,gets the locations of chunks it wants to read from,and then communicates with the chunk servers to retrieve the data.Since GFS uses thefixed chunk size, the client just needs to translate thefile name and byte offset into a chunk index within thefile,and sends the master a request containing thefile name and chunk index.The master replies with the corresponding chunk handle and locations of the replicas.The client then sends a request to one of the replicas,most likely the closest one.The request specifies the chunk handle and a byte range within that chunk.Further reads of the same chunk do not require any more client-master interaction unless the cached information expires or thefile is reopened.64In our TFS system,the story is different due to the variable chunk size strategy.The client can not translate the byte offset into a chunk index directly.It has to know all the sizes of chunks in thefile before deciding which chunk should be read.Our solution is quite straightforward,when a client opens afile using read mode,it gets all the chunks’information from the master,including chunk handle,chunk size and locations,and use these information to get the proper chunk.Although this strategy is determined by the fact of variable chunk size, its advantage is that the client only needs to communicate with the master once to read the wholefile,which is much efficient than GFS’original design.The disadvantage is that when a client has opened afile for reading, later appended data by other clients is invisible to this client.But we believe this problem is negligible,as the majority of thefiles in web applications are typically created and appended once,and read by data processing applications many times without modifications.If in any situation this problem becomes critical,it can be easily overcome by set an expired timestamp for the chunks’information and refresh it when invalid.The TFS demonstrates our effort to build an infrastructure for large scale data processing.Although our system has the similar assumptions and architectures as GFS,the key difference is that the chunk size is variable, which makes our system able to adopt different system interactions for record appending operation.Our record appending operation is based on chunk level,thus the aggregate record appending performance is no longer restricted by the network bandwidth of the chunk servers that store the last chunk of thefile.Our experimental evaluation shows that our approach significantly improves the concurrent record appending performance for singlefile by25%.More results on TFS have been reported in[10].We believe the design can apply to other similar data processing infrastructures.2.2Implementation of MapReduceMapReduce system is another major component of the cloud computing platform,and has attracted more and more attentions recently[9,7,11].The architecture of our implementation is similar to Hadoop[2],which is a typical master-worker structure.There are three roles in the system:Master,Worker and User.Master is the central controller of the system,which is in charge of data partitioning,task scheduling,load balancing and fault tolerance processing.Worker runs the concrete tasks of data processing and computation.There exist many workers in the system,which fetch the tasks from Master,execute the tasks and communicate with each other for data er is the client of the system,implements the Map and Reduce functions for computation task,and controls theflow of computation.2.2.1Implementation EnhancementWe make three enhancements to improve the MapReduce performance in our system.First,we treat intermediate data transfer as an independent task.Every computation task includes map and reduce subtasks.In a typical implementation such as Hadoop,reduce task starts the intermediate data transfer,which fetches the data from all the machines conducting map tasks.This is an uncontrollable all-to-all communication,which may incur network congestion,and hence degrade the system performance.In our design,we split the transfer task from the reduce task,and propose a“Data transfer module”to execute and schedule the data transfer task independently. With appropriate scheduling algorithm,this method can reduce the probability of network congestion.Although this approach may aggravate the workload of Master when the number of transfer tasks is large,this problem can be alleviated by adjusting the granularity of transfer task and integrating data transfer tasks with the same source and target addresses.In practice,our new approach can significantly improve the data transfer performance.Second,task scheduling is another concern on MapReduce system,which helps to commit resources be-tween a variety of tasks and schedule the order of task execution.To optimize the system resource utility,we adopt multi-level feedback queue scheduling algorithm in our design.Multiple queues are used to allocate the concurrent tasks,and each of them is assigned with a certain priority,which may vary for different tasks with respect to the resources requested.Our algorithm can dynamically adjust the priority of running task,which65balances the system workload and improves the overall throughput.The third improvement is on data serialization.In MapReduce framework,a computation task consists of four steps:map,partition,group and reduce.The data is read in by map operation,intermediate data is gener-ated and transferred in the system,andfinally the results are exported by reduce operation.There exist frequent data exchanges between memory and disk which are generally accomplished by data serialization.In our imple-mentation of MapReduce system,we observed that the simple native data type is frequently used in many data processing applications.Since memory buffer is widely used,most of the data already reside in the memory before they are de-serialized into a new data object.In other words,we should avoid expensive de-serialization operations which consume large volume of memory space and degrade the system performance.To alleviate this problem,we define the data type for key and value as void*pointer.If we want to de-serialize the data with native data type,a simple pointer assignment operation can replace the de-serialization operation,which is much more efficient.With this optimization,we can also sort the data directly in the memory without data de-serialization.This mechanism can significantly improve the MapReduce performance,although it introduces some cost overhead for buffer management.2.2.2Performance Evaluation on MapReduceDue to the lack of benchmark which can represent the typical applications,performance evaluation on MapRe-duce system is not a trivial task.Wefirst use PennySort as the simple benchmark.The result shows that the performance of intermediate data transfer in the shuffle phase is the bottle neck of the system,which actually motivated us to optimize the data transfer module in MapReduce.Furthermore,we also explore a real applica-tion for text mining,which gathers statistics of Chinese word frequency in webpages.We run the program on a 200GB Chinese Web collection.Map function analyzes the content of web page,and produces every individual Chinese word as the key value.Reduce function sums up all aggregated values and exports the frequencies.In our testbed with18nodes,the job was split into3385map tasks,30reduce tasks and101550data transfer tasks, the whole job was successfully completed in about10hours,which is very efficient.2.3Practical Issues for System ImplementationThe data storage and computation capability are the major factors of the cloud computing platform,which determine how well the infrastructure can provide services to end users.We met some engineering and technical problems during the system implementation.Here we discuss some practical issues in our work.2.3.1System Design CriteriaIn the system design,our purpose is to develop a system which is scalable,robust,high-performance and easy to be maintained.However,some system design issues may be conflicted,which places us in a dilemma in many cases.Generally,we take three major criteria into consideration for system design:1)For a certain solution, what is bottleneck of the procedure which may degenerate the system performance?2)Which solution has better scalability andflexibility for future change?3)Since network bandwidth is the scarce resource of the system, how to fully utilize the network resource in the implementation?In the following,we present an example to show our considerations in the implementation.In the MapReduce system,fault tolerance can be conducted by either master or workers.Master takes the role of global controller,maintains the information of the whole system and can easily decide whether a failed task should be rerun,and when/where to be rerun.Workers only keep local information,and take charge of reporting the status of running tasks to Master.Our design combines the advantages of these two factors.The workers can rerun a failed task for a certain number of times,and are even allowed to skip some bad data records which cause the failure.This distributed strategy is more robust and scalable than centralized mechanism,i.e., only re-schedule failed tasks in the Master side.662.3.2Implementation of Inter-machine CommunicationSince the implementation of cloud computing platform is based on the PC cluster,how to design the inter-machine communication protocol is the key issue of programming in the distributed environment.The Remote Procedure Call(RPC)middle ware is a popular paradigm for implementing the client-server model of distributed computing,which is an inter-process communication technology that allows a computer program to cause a subroutine or procedure to execute on another computer in a PC cluster without the programmer explicitly coding the details for this remote interaction.In our system,all the services and heart-beat protocols are RPC calls.We exploit Internet Communications Engine(ICE),which is an object-oriented middleware that provides object-oriented RPC,to implement the RPC framework.Our approach performs very well under our system scale and can support asynchronous communication model.The network communication performance of our system with ICE is comparable to that of special asynchronous protocols with socket programming,which is much more complicated for implementation.2.3.3System Debug and DiagnosisDebug and Diagnosis in distributed environment is a big challenge for researchers and engineers.The overall system consists of various processes distributed in network,and these processes communicate each other to execute a complex task.Because of the concurrent communications in such system,many faults are generally not easy to be located,and hence can hardly be debugged.Therefore,we record complete system log in our system.In All the server and client sides,important software boundaries such as API and RPC interfaces are all logged.For example,log for RPC messages can be used to check integrality of protocol,log for data transfer can be used to validate the correctness of transfer.In addition,we record performance log for performance tuning. In our MapReduce system,log in client side records the details of data read-in time,write-out time of all tasks, time cost of sorting operation in reduce task,which are tuning factors of our system design.In our work,the recorded log not only helps us diagnose the problems in the programs,but also helps find the performance bottleneck of the system,and hence we can improve system implementation accordingly. However,distributed debug and diagnosis are still low efficient and labor consuming.We expect better tools and approaches to improve the effectiveness and efficiency of debug and diagnosis in large scale distributed system implementation.3ConclusionBased on our experience with Tplatform,we have discussed several practical issues in the implementation of a cloud computing platform following Google model.It is observed that while GFS/MapReduce/BigTable provides a great conceptual framework for the software core of a cloud and Hadoop stands for the most popular open source implementation,there are still many interesting implementation issues worth to explore.Three are identified in this paper.•The chunksize of afile in GFS can be variable instead offixed.With careful implementation,this design decision delivers better performance for read and append operations.•The data transfer among participatory nodes in reduce stage can be made”schedulable”instead of”un-controlled”.The new mechanism provides opportunity for avoiding network congestions that degrade performance.•Data with native types can also be effectively serialized for data access in map and reduce functions,which presumably improves performance in some cases.67While Tplatform as a whole is still in progress,namely the implementation of BigTable is on going,the finished parts(TFS and MapReduce)are already useful.Several applications have shown the feasibility and advantages of our new implementation approaches.The source code of Tplatform is available from[5]. AcknowledgmentThis work was Supported by973Project No.2007CB310902,IBM2008SUR Grant for PKU,and National Natural Science foundation of China under Grant No.60603045and60873063.References[1]China Web InfoMall.,2008.[2]The Hadoop Project./,2008.[3]The KosmosFS Project./,2008.[4]Tianwang Search.,2008.[5]Source Code of Tplatform Implementation./˜webg/tplatform,2009.[6]F.Chang,J.Dean,S.Ghemawat,W.C.Hsieh,D.A.Wallach,M.Burrows,T.Chandra,A.Fikes,and R.E.Gruber.Bigtable:a distributed storage system for structured data.In OSDI’06:Proceedings of the7th USENIX Symposium on Operating Systems Design and Implementation,pages15–15,2006.[7]J.Dean and S.Ghemawat.Mapreduce:Simplified data processing on large clusters.In OSDI’04:Proceed-ings of the5th USENIX Symposium on Operating Systems Design and Implementation,pages137–150, 2004.[8]G.Sanjay,G.Howard,and L.Shun-Tak.The googlefile system.In Proceedings of the17th ACM Sympo-sium on Operating Systems Principles,pages29–43,2003.[9]H.Yang,A.Dasdan,R.Hsiao,and D.S.Parker.Map-reduce-merge:simplified relational data processingon large clusters.In SIGMOD’07:Proceedings of the2007ACM SIGMOD international conference on Management of data,pages1029–1040,2007.[10]Z.Yang,Q.Tu,K.Fan,L.Zhu,R.Chen,and B.Peng.Performance gain with variable chunk size ingfs-likefile systems.In Journal of Computational Information Systems,pages1077–1084,2008.[11]M.Zaharia,A.Konwinski,A.D.Joseph,R.Katz,and I.Stoica.Improving mapreduce performance inheterogeneous environments.In OSDI’07:Proceedings of the8th USENIX Symposium on Operating Systems Design and Implementation,pages29–42,2007.68。

Java Web与云计算外文翻译文献

Java Web与云计算外文翻译文献

文献信息文献标题:Java Web Deployment in Cloud Computing(云计算中的Java Web 部署)文献作者:Ankit Kumar Sahu文献出处:《International Journal of Computer Applications》,2013, 75(15):31-34.字数统计:英文2007单词,10665字符;中文3308汉字外文文献Java Web Deployment in Cloud Computing Abstract Cloud Computing is a revolutionary IT field in today’s world. Several technologies like java, python provides deployment on clouds using respective tools. In this paper, the web deployment in cloud computing with java and the performance issues of the java deployment on a cloud is discussed.There are several tools available to deploy a java web- application on clouds, some of these are Google App Engine (By Google), Windows Azure (By Microsoft), Amazon EC2 (By Amazon) etc. Cloud Computing is providing many facilities for deployment but is also having performance issues which is a major factor for the web-applications. Issues with java-deployment on cloud would try to resolve through the framework customization. A java web-application is deployed on Google cloud and examined of its performance, further in this paper.General Terms:Cloud Computing, Google-App-Engine, Java Web- Deployment.1.INTRODUCTIONCloud Computing is a service used over a network for multi- purposes like software, platform, infrastructure service and provides a better a way for virtualization in the IT field. There are several fields that are affected by the cloud computing likedeployment. Cloud Computing makes the IT fields enable for better performance using lesser resources. It includes delivery of the application as services throughout the internet and the software that provide services in the data- centre and hardware and the prototype shift. The data centre- software and hardware is known as a cloud.Many of the companies are shifting to the cloud services like Google App Engine has been started by Google. Microsoft started Windows Azure, Amazon started EC2.A Web- Application is deployed on Google App Engine as a sample application. There are several terms which are discussed as follows:WHAT A CLOUD IS:A cloud is a pool of virtualized computer resources.A cloud can multitude a range of different loads, including wedge-style back-end works and collaborating, User-facing applications. It allows loads to be located and scaled-out rapidly through the quick provisioning of Virtual machines or somatic machines. It supports completed, self-recovering, extremely accessible programming prototypes those allow loads to improve from many obvious hardware/software disasters. It observers resource use in real time to enable rebalancing of provisions when desired.A Cloud is an implicit world available for applications- deployment with optimized cost, whereas Cloud Computing is a regular word for anything that involves distributing services over the Internet. At its humblest, it is providing the assets and proficiencies of information technology enthusiastically as a service. Cloud Computing is a style of computing in which enthusiastically accessible and often virtualized assets are delivered as a service over the Internet.ADV ANTAGES OF CLOUD COMPUTING:•It is swift, with ease and speed of deployment.•Its cost is use-based, and will likely be abridged.•In house IT costs are condensed.•Capital investment is cheap.•The latest technology is offered always.•The use of standard technology is optimistic and facilitate.CLOUD SERVICES:A cloud is a pool of systems, resources and their classes that provides all facilities as per the user-end’s requirements. All the resources, applications are part of a cloud. Cloud Computing provides following classification of its services:IaaS-Infrastructure as aService PaaS-Platform as a ServiceSaaS-Software as a Service1.) Infrastructure as a ServiceThe IaaS is further classified into:i.) Computation as a Service (CaaS):In this kind of service, the virtual machine servers are lent. The cost of the virtual machine servers are based on the capacity of the machine like memory attributes of the server, its operating system and all deployment features.ii.) Data as a Service (DaaS):In this kind of service, Storage is provided for all end-users for storing data. The cost estimation of the service is based on the scale of Gigabyte (GB) and decided bythe provider.There are several cloud computing platforms for the world. The Cloud Computing platform for Google is Google App Engine, which has an efficient and better system for deployment and all.Google has provided a standard all-inclusive answer to Cloud Computing, known as the Google App Engine. Google App Engine provides several features for its clients such as fine grained computing, data storage, data transfer etc. Google App Engine provides VPN (Virtual Private Network), elastic IP-Addressing etc. Google App Engine has become a standard model in Cloud Computing.2.) Platform as a ServicePlatform as a Service (PaaS) provides an environment which is good in performance. Platform like OS could also be used over a network (internet) by any end-users at its requirement. There would be no need of having an installed OS; one could load its OS using the PaaS. The Applications could also be served as a platform services.Microsoft has provided Windows Azure as cloud computing servers. Windows Azure is an effort to provide PaaS services to the users. Window Azure Platform (WAP) is the cloud-OS offered by Microsoft. WAP includes several services with the cloud-OS. Azure services as virtual machine servers as its runtime environments.3.) Software as a ServiceAs PaaS, SaaS provides Software as a Service on a cloud platform. Using SaaS, software may have been developed, installed and updated on the end-user’s request. It reduces the management costs and the software conventions with a Rent Model.2.RELATED WORKAs per the growing development of Cloud Computing, it is going to be the future of the IT-Field. But there are several sections in which the cloud computing is still facing problems. Some of the issues are Performance, Security, Cost, Reliability etc. If the cloud computing wants to be the acceptable future in the IT field, it would have to overcome these issues. In this paper, the performance issue is examined and tried toresolve. To examine the performance issue, a java web-application is deployed to a cloud server (Google App Engine) as follows:STEPS TO DEPLOY A JA V A WEB-APPLCATION ON GOOGLE APP ENGINE:STEP 1: A Java Web-Application can be deployed on Google App Engine using the appengine-java-sdk tool. First download the appengine-java-sdk tool.STEP 2: Using the appcfg.cmd file, the application can be deployed. Assume the application name is MySampleApplication, the command would be:appcfg.cmd update //*PATH TO APPLICATION FOLDER *// MySampleApplicationSTEP 3: The Sample Web-Application is deployed now and would be available on for ease access. The Application Administration can be accessed on .Thus a java web-application is deployed on cloud server (Google App Engine). The performance issue is discussed in this paper which is a major issue in cloud computing for its users. The performance issue includes the delay for running the application on cloud server. The web-server sends its output of the web-application to the client. If the cloud server takes longer time to run the application, this would lead to a delay and this issue is discussed in this paper.3.ISSUES WITH WEB-DEPLOYMENT ON CLOUDThere are several issues in Web-Deployment in cloud computing. As per the current research, the cloud computing is still facing challenges in its fields. Some of the issues are specified here, one of those would be tried to overcome:3.1.Performance IssueThis kind of issues is for those users which are far away from cloud in the factor of distance. The cloud may affect with its lesser performance. The root cause of this issue can be due to High-Latency-Delay.3.2.Security IssueThis kind of issue has been a primary issue in IT. Cloud Computing is also facing this issue. Security attacks and threats are still the problem for the Servers. Several Kinds of security ideas may be used to avoid these issues.3.3.Cost IssueThe cost estimation is measured on the basis of Bandwidth. If the Application needs large bandwidth, the cost would be increased for these kinds of applications. Small Applications need lesser bandwidth, their costs aren’t an issue. Large Applications faces these kinds of issues.3.4.Reliability IssueThis issue includes the reliability of the cloud. The cloud is reliable if and only if its infrastructure and its resources are reliable. These kinds of issues are resolved through individual resources of the clouds.These issues are basic problems in the cloud computing. The area of the problem resolving would be the Performance Issue in the Cloud Computing. This kind of issue could be resolved by the filtering of individual applications. The Web- Application used in the Cloud would be a Java-Web- Application. The Java Cloud requirements are continuously growing better and more refined.All Popular clouds are shifting due to the experimental results. In Cloud Computing, the loads ad costs are not definite in the market, so the prices seem to be changing, every so often in remarkable ways. Even the cloud sellers hasn’t fixed thecost, they are only guessing the costs, like any X Dollars for Y Transactions.Like in Google App Engine, the biggest problem for developers will be adjusting to Google’s non-relational data stores. When Google App Engine was introduced, there were not so many database-projects in the market for clouds.4.OBJECTIVEAs per the given Problem Formulations, the basic objective will be to determine the less-performance causes in Java Web- Application on Clouds.Following phases would be useful in Performance issues:-Choosing a suitable platform for java application like Google App Engine etc.-The Cloud Framework (i.e. The App Engine web-app Framework for Google App Engine) should be configured as per the requirement of the J2EE application.-The infrastructure of the cloud services should be more specific to the Java-Applications.Following methods can be used for the above proposed objective:•The Structure of the Cloud Computing in the respect of Deployment should be in a good manner. The Framework used in the Virtual Machine Server should be independent and optimized so that could be good at performance.•The resources which is used in the cloud computing should be single independent resource and should mention the required configuration (hardware and software) and should be platform independent a reliable to use at large scale.These issues would be resolved by several possible solutions. In the cloud services, to perform some specific task, there are several resources arranged in a particular order. The Cloud Computing provides us customizing its resources according to the individual application and its use.5.METHODOLOGYThe Methodology will used to overcome this issue would be:•Optimization in Framework:As per the Google app engine, the cloud computing has its dependency on the framework of the Web-Application. So the very first approach would be followed the optimization of the framework of the application in a cloud. The only framework, which has been customized for the cloud computing is Spring Framework yet now.While using Spring Framework in cloud deployment, if the application takes a longer time to load, would be thrown as DeadlineExceededException and the control will be shifted to the framework and now framework would take the respective decisions. For a better Cloud Deployment, the entire framework should be optimized.•Reducing or Avoiding the use of module Scanning:In Google App Engine, The Spring Framework process a set of observation as a signal-flag to any other object in its execution .Sometimes the requested resource cannot be availed due to the resource sharing, this also restricts the application-speed in its performance. The Component Scanning is also responsible for making the application with lesser performance and lesser efficient due to its time taking process. To avoid this problem, the Component Scanning would be avoided. In the mandatory case of using Component Scanning, it would be reduced.This Methodology would work with JA V A Web- Applications that are implemented in Spring Framework.6.CONCLUSIONAccording to the results, the performance issue can be easily handled by filtering the web-application individually and the framework-customization. This issue has been a major issue for the cloud-users which would be handled by the suggested idea. This would decrease the High Latency Delay for the application performance and the application would not take longer time to run on the cloud server.中文译文云计算中的Java Web部署摘要云计算是当今世界一个革命性的IT领域。

大数据和云计算技术研究外文文献翻译2017

大数据和云计算技术研究外文文献翻译2017

大数据和云计算技术研究外文文献翻译2017 外文文献翻译原文及译文文献出处Bryant R. The research of big data and cloud computing technology [J]. Information Systems, 2017, 3(5): 98-109 原文 The research of big data and cloudcomputing technology Bryant RoyAbstractMobile Internet and the rapid development of Internet of things and cloud computing technology open the prelude of the era of mobile cloud, big data is becoming more and more attract the line of sight of people.The emergence of the Internet shortens people, the distance between people and the world, the whole world into a "global village", people through the network barrier-free exchange, exchange information and work together. At the same time, with the rapid development of Internet, mature and popular database technology, high memory, high-performance storage devices and storage media, human in daily study, life and work of the amount of data is growing exponentially. Big data problem is produced under such background, become research hot topic in academia and relevant industry, and as one of the important frontier research topic in the field of information technology, attracting more and more scholars studying the effects of large data related problems.Key words: Big data; Data analysis; Cloud computing1 IntroductionBig data is a kind of can reflect the material world and spiritual world motion state and the change of state of information resources, it has the complexity, sparse of decision usefulness, high-speed growth, value and repeatable mining, generally has many potential value. Based on the perspective of big data resource view and management, we think that big data is a kind of important resources that can support management decisions. Therefore, in order to effectively manage the resources and give full play to their potential value, need to study and solve this kind of resource acquisition, processing and application, the definition of the property industrydevelopment and policy guarantee management issues.Big data has the following features:Complexity, as many definition points out, the form and the characteristic of big data is very complicated. The complexity of the large data in addition to performance in its quantity scale, the source of the universality and diversity of morphological structure, but alsoin the change of state and the uncertainty of the respect such as development way. Decision usefulness, big data itself is objective existence of large-scale data resource. Its direct function is limited. Through the analysis and mining, and found its knowledge, can provideall kinds of practical application with other resources to provide decision support, the value of big data is mainly reflected by its decision usefulness. The total stock of non-renewable natural resources with the mining and gradually reduce human, while big data with highspeed growth, namely along with the continuous mining, large data resources not only will not reduce, instead will increase rapidly. Sparse sex value, great amount of data that the data in has brought many opportunities at the same time, also brought a lot of challenges. One of the major challenges is the problem of big data value low density, large data resources quantity is big, but its useful value is sparse, this increases the difficulty of the development and use of big data resources.2 Processing of big data2.1 Data collectionBig data, originally meant the number and types of the more complex, therefore, it becomes especially important to get the data information through various methods. Data acquisition is the basis of a large data processing in the process step, the common methods for data collection with RFID, the classification of the data retrieval tools such as Google and other search engines, as well as bar code technology and so on. And because the emergence of the mobile devices, such as the rapidpopularity of smart phones and tablets, makes a large number of mobile application software is developed, social network gradually large, it also accelerated the velocity ofcirculation of information and acquisition precision. 2.2 Data processing and integrationData processing and integration is mainly completed to have properly deal with the data collected, cleaning demising, and further integrationof storage. According to the mentioned above, is one of the features large data diversity. This decision through various channels to obtain the data types and structures are very complex, brought after the data analysis and processing of great difficulty. Through data processing and integration this step, first, the structure of complex data into asingle or a structure for easy handling, to lay a good foundation forthe later data analysis, because not all of the information in the data are required, therefore, will need to be "noise" and these data cleaning, to ensure the quality and reliability of the data. Commonly used method is in the process of data processing design some data filter, through clustering or the rules of the correlation analysis method will be useless or wrong pick out from the group of data filtering to preventits adverse influence on the final data results. Then these good integration and the data storage, this is an important step, if it'spure random placement, will affect the later data access, could easily lead to data accessibility issues, the general solution is now for specific types of data to establish specialized database, the different kinds of data information classify placement, can effectively reduce the number of data query and the access time, increase the speed of data extraction. 2.3 Data analysisData analysis is the core part in the whole big data processing, because in the process of data analysis, find the value of the data.After a step data processing and integration, the data will become the raw data for data analysis, according to the requirements of theapplication of the data required for further processing and analysis data. Traditional data processing method of data mining, machine learning, intelligent algorithm, statistical analysis, etc., and these methods have already can't meet the demand of the era of big data analysis. In terms of data analysis technology, Google is the most advanced one, Google as big data is the most widely used Internet company, in 2006, the first to put forward the concept of "cloud computing", its internal dataapplications are backing, Google's own internal research and development of a series of cloud computing technology.2.4 Data interpretationData information for the majority of the users, the most concerned about is not the analysis of the data processing, but the explanationfor big data analysis and display, as a result, in a perfect data analysis process, the results of data interpretation steps is very important. If the results of data analysis can not properly display,will create trouble for data users, even mislead users. According to the traditional way is to use text output or download user personal computer display. But with the increase of amount of data, data analysis, the result is often more complex, according to the traditional way is not enough to satisfy the demand of the data analysis results output, therefore, in order to improve data interpretation, show ability, now most of the enterprise data visualization technology is introduced as a way to explain the big data is the most powerful. Through thevisualization result analysis, can vividly show the user the data analysis results, more convenient for users to understand and accept the results. Common visualization techniques are based on a collection of visualization technology, technology, based on image technology based on ICONS. Pixel oriented technology and distributed technology, etc.3 Challenges posed by big data3.1 Big data security and privacy issuesWith the development of big data, data sources and is finding wider and wider application fields: casual web browsing on the Internet willbe a series of traces left behind. In the network login related websites need to input personal important information, such as id number, address, phone number etc. Ubiquitous cameras and sensors will record thepersonal behavior and location information, etc. Through related data analysis, data experts can easily dig up people's habits and personal important information. If this information is applied proper, can help enterprises to understand the needs of the customers at any time in the field of related and habits,facilitate enterprises to adjust the corresponding production plan, make greater economic benefits. But if these important information is stolen by bad molecules, followed is the security of personal information, property, etc. In order to solve the problem of the data of the era of large data privacy, academia and industry are put forwardtheir own solutions. In addition, the data of the era of big data update speed change, and the general data privacy protection technology aremostly based on static data protection, this gives privacy has brought new challenges.Under the condition of complex changes, how to implement the data privacy protection will be one of the key directions in the study of data in the future. 3.2 Large data integration and management Throughout the development of large data, the source of the large data and application is more and more widely, in order to spread in different data management system of the data collected, it is necessary for data integration and management. Although the data integration and management have a lot of methods, but the traditional data storage method already can't meet the demand of the era of big data processing, it is faced with new challenges. Big data era, one of thecharacteristics of big data is the diversity of data types. The data type by gradually transforms the traditional structured data semi-structured and unstructured data. In addition, the data sources are increasingly diversified, the traditional data mostly come from a small number of military enterprise or institute computer terminals. Now, with the popularity of the Internet and mobile devices in the global, the data storage is especially important.You can see by the above, the traditional way of data storage is not enough to meet the demand of present data storage, in order to deal with more and more huge amounts of data and increasingly complex data structures, many companies are working on is suitable for the era of big data distributed file system and distributed parallel database. In theprocess of data storage, data format conversion is necessary, and it is very critical and complex, it puts forward higher requirements on data storage system.3.3 The ecological environment question in the big data Theecological environment problems of big data firstly refer to data resource management and sharing. This is an era of information opening, the open architecture of the Internet can make people in different corners of the earth all share network resources at the same time, it brought great convenience to the scientific research work. But not allof the data can be Shared, unconditional some data for the value of its special properties and is protected by the law can be unconditional. Due to the relevant legal measures is not sound enough, now still lack of a strong enough data protection consciousness, so there is always the data information stolen or data ownership problems, it has both technical problems and legal problems. How to protect the interests of the parties under the premise of solving the problem of data sharing is going to be most important challenges in the era of big data. In the era of big data, data of production and the application field is no longer limited to a few special occasions, almost all of the fields such as you can see the figure of big data, therefore, involve the problem of data in the field of cross is inevitable.,along with the development of large data influence the results of analysis of large data set to be the state governance mode, enterprise decision-making, organization and business process, such as personallifestyles will have a significant impact, and the impact model is worth in-depth research in the future.译文大数据和云计算技术研究Bryant Roy摘要移动互联网、物联网和云计算技术的迅速发展,开启了移动云时代的序幕,大数据也越来越吸引人们的视线。

云计算英文论文

云计算英文论文

云计算英文论文Cloud Computing: An Overview and Future PerspectivesIntroduction:Cloud computing has become an integral part of our digital lives, providing us with on-demand access to computing resources and services over the internet. In recent years, this technology has revolutionized the way we store, process, and retrieve data. This paper aims to provide an overview of cloud computing, its current advancements, and future prospects.1. Understanding Cloud Computing:1.1 Definition: Cloud computing refers to the practice of using a network of remote servers hosted on the internet to store, manage, and process data instead of relying on a local server or personal computer.1.2 Key Characteristics:a) On-demand self-service: Users can access computing resources without human interaction with the service provider.b) Broad network access: Services and applications are available over the internet and can be accessed using various devices.c) Resource pooling: Multiple users share resources to ensure efficient utilization and scalability.d) Rapid elasticity: Computing resources can be scaled up or down as per users' requirements.e) Measured service: Users are billed for the actual usage of resources, allowing cost optimization.2. Cloud Deployment Models:2.1 Public Cloud: Services and infrastructure are provided by a third-party service provider and are available to the general public over the internet.2.2 Private Cloud: Cloud infrastructure is solely dedicated to a single organization, ensuring greater control and security.2.3 Hybrid Cloud: Combines the features of both public and private clouds, allowing organizations to leverage the benefits of both models.2.4 Community Cloud: Shared infrastructure is used by several organizations with shared concerns, such as security or compliance.3. Advantages and Challenges of Cloud Computing:3.1 Advantages:a) Cost savings: Eliminates the need for upfront infrastructure investments, reducing operational costs.b) Scalability and flexibility: Resources can be easily scaled up or down based on demand, ensuring optimal resource utilization.c) Accessibility: Allows users to access data and applications from anywhere with internet connectivity.d) Reliability and resilience: Service providers ensure high availability and backup options to prevent data loss.3.2 Challenges:a) Security and privacy concerns: Storing data on remote servers may raise concerns about data confidentiality and security breaches.b) Network dependency: Reliance on internet connectivity can impact accessibility and performance.c) Vendor lock-in: Switching between cloud service providers may be difficult due to proprietary technologies and formats.4. Current Trends in Cloud Computing:4.1 Edge Computing: Performing data processing and analysis closer to the edge devices, reducing latency and bandwidth requirements.4.2 Serverless Computing: Users focus on writing and deploying code rather than managing infrastructure, enhancing scalability and resource management.4.3 Containerization: Isolating applications and their dependencies into lightweight containers, improving deployment efficiency and portability.4.4 Hybrid Multi-cloud: Organizations leverage multiple cloud providers simultaneously to achieve the best combination of features, cost, and availability.5. Future Perspectives:5.1 Artificial Intelligence and Machine Learning: Integration of AI and ML algorithms into cloud systems can enable intelligent decision-making and automated processes.5.2 Quantum Computing: Harnessing the power of quantum computing can revolutionize cloud services by offering faster computation and enhanced encryption methods.5.3 Internet of Things (IoT): Cloud computing can seamlessly integrate with IoT devices, facilitating real-time data processing and analytics.5.4 Blockchain Technology: Incorporating blockchain can enhance cloud security, data integrity, and decentralized resource management.Conclusion:Cloud computing has transformed the digital landscape, providing organizations and individuals with cost-effective and scalable solutions. Despite security concerns and challenges, it continues to evolve, offering new possibilities such as edge computing, serverless computing, and hybrid multi-cloud environments. Looking ahead, the integration of AI, quantum computing, IoT, and blockchain technology will further revolutionize the capabilities of cloud computing, opening doors to limitless possibilities.。

外文文献及翻译_ Cloud Computing 云计算

外文文献及翻译_ Cloud Computing 云计算

本科毕业设计外文文献及译文文献、资料题目:Cloud Computing文献、资料来源:云计算概述(英文版)文献、资料发表(出版)日期:2009年5月院(部):专业:班级:姓名:学号:指导教师:翻译日期:外文文献:Cloud Computing1. Cloud Computing at a Higher LevelIn many ways, cloud computing is simply a metaphor for the Internet, the increasing movement of compute and data resources onto the Web. But there’s a difference: cloud computing represents a new tipping point for the value of network computing. It delivers higher efficiency, massive scalability, and faster, easier software development. It’s about new programming models, new IT infrastructure, and the enabling of new business models.For those developers and enterprises who want to embrace cloud computing, Sun is developing critical technologies to deliver enterprise scale and systemic qualities to this new paradigm:(1) Interoperability — while most current clouds offer closed platforms and vendor lock-in, developers clamor for interoperability. Sun’s open-source product strategy and Java™ principles are focused on providing interoperability for large-scale computing resources. Think of the existing cloud “islands” merging into a new, interoperable “Intercloud” where applications can be moved to and operate across multiple platforms.(2) High-density horizontal computing —Sun is pioneering high-power-density compute-node architectures and extreme-scale Infiniband fabrics as part of our top-tier HPC deployments. This high-density technology is being incorporated into our large-scale cloud designs.(3)Data in the cloud — More than just compute utilities, cloud computing is increasingly about petascale data. Sun’s Open Storage products offer hybrid data servers with unprecedented efficiency and performance for the emerging data-intensive computing applications that will become a key part of the cloud.These technology bets are focused on driving more efficient large-scale cloud deployments that can provide the infrastructure for next-generation business opportunities: social networks, algorithmic trading, continuous risk analysis, and so on.2. Why Cloud Computing?(1)Clouds: Much More Than Cheap ComputingCloud computing brings a new level of efficiency and economy to delivering IT resources on demand — and in the process it opens up new business models and market opportunities.While many people think of current cloud computing offerings as purely “pay by the drink” compute platforms, they’re really a convergence of two major interdependent IT trends: IT Efficiency — Minimize costs where companies are converting their IT costs from capital expenses to operating expenses through technologies such as virtualization. Cloud computing begins as a way to improve infrastructure resource deployment and utilization, but fully exploiting this infrastructure eventually leads to a new application development model.Business Agility — Maximize return using IT as a competitive weapon through rapid time to market, integrated application stacks, instant machine image deployment, and petascale parallel programming. Cloud computing is embraced as a critical way to revolutionize time to service. But inevitably these services must be built on equally innovative rapid-deployment-infrastructure models.To be sure, these trends have existed in the IT industry for years. However, the recent emergence of massive network bandwidth and virtualization technologies has enabled this transformation to a new services-oriented infrastructure.Cloud computing enables IT organizations to increase hardware utilization rates dramatically, and to scale up to massive capacities in an instant — without constantly having to invest in new infrastructure, train new personnel, or license new software. It also creates new opportunities to build a better breed of network services, in less time, for less money.IT Efficiency on a Whole New ScaleCloud computing is all about efficiency. It provides a way to deploy and access everything from single systems to huge amounts of IT resources — on demand, in real time, at an affordable cost. It makes high-performance compute and high-capacity storage available to anyone with a credit card. And since the best cloud strategies build on concepts and tools that developers already know, clouds also have the potential to redefine the relationship between information technology and the developers and business units that depend on it.Reduce capital expenditures — Cloud computing makes it possible for companies to convert IT costs from capital expense to operating expense through technologies such as virtualization.Cut the cost of running a datacenter — Cloud computing improves infrastructure utilizationrates and streamlines resource management. For example, clouds allow for self-service provisioning through APIs, bringing a higher level of automation to the datacenter and reducing management costs.Eliminate over provisioning — Cloud computing provides scaling on demand, which, when combined with utility pricing, removes the need to overprovision to meet demand. With cloud computing, companies can scale up to massive capacities in an instant.For those who think cloud computing is just fluff, take a closer look at the cloud offerings that are already available. Major Internet providers , Google, and others are leveraging their infrastructure investments and “sharing” their large-scale economics. Already the bandwidth used by Amazon Web Services (AWS) exceeds that associated with their core e-tailing services. Forward-looking enterprises of all types —from Web 2.0 startups to global enterprises — are embracing cloud computing to reduce infrastructure costs.Faster, More Flexible ProgrammingCloud computing isn’t only about hardware —it’s also a programming revolution. Agile, easy-to-access, lightweight Web protocols —coupled with pervasive horizontally scaled architecture — can accelerate development cycles and time to market with new applications and services. New business functions are now just a script away.Accelerated cycles — The cloud computing model provides a faster, more efficient way to develop the new generation of applications and services. Faster development and testing cycles means businesses can accomplish in hours what used to take days, weeks, or months.Increase agility —Cloud computing accommodates change like no other model. For example, Animoto Productions, makers of a mashup tool that creates video from images and music, used cloud computing to scale up from 50 servers to 3,500 in just three days. Cloud computing can also provide a wider selection of more lightweight and agile development tools, simplifying and speeding up the development process.The immediate impact will be unprecedented flexibility in service creation and accelerated development cycles. But at the same time, development flexibility could become constrained by APIs if they’re not truly open. Cloud computing can usher in a new era of productivity for developers if they build on platforms that are designed to be federated rather than centralized. But there’s a major shift underway in programming culture and the languages that will be used inclouds.Today, the integrated, optimized, open-source Apache, MySQL, PHP/Perl/Python (AMP) stack is the preferred platform for building and deploying new Web applications and services. Cloud computing will be the catalyst for the adoption of an even newer stack of more lightweight, agile tools such as lighttpd, an open-source Web server; Hadoop, the free Java software framework that supports data-intensive distributed applications; and MogileFS, a file system that enables horizontal scaling of storage across any number of machines.(2)Compelling New Opportunities: The Cloud EcosystemBut cloud computing isn’t just about a proliferation of Xen image stacks on a restricted handful of infrastructure providers. It’s also about an emerging ecosyst em of complementary services that provide computing resources such as on-ramps for cloud abstraction, professional services to help in deployment, specialized application components such as distributed databases, and virtual private datacenters for the entire range of IT providers and consumers.These services span the range of customer requirements, from individual developers and small startups to large enterprises. And they continue to expand the levels of virtualization, a key architectural component of the cloud that offers ever-higher abstractions of underlying services.(3) How Did Cloud Computing Start?At a basic level, cloud computing is simply a means of delivering IT resources as services. Almost all IT resources can be delivered as a cloud service: applications, compute power, storage capacity, networking, programming tools, even communications services and collaboration tools.Cloud computing began as large-scale Internet service providers such as Google, Amazon, and others built out their infrastructure. An architecture emerged: massively scaled, horizontally distributed system resources, abstracted as virtual IT services and managed as continuously configured, pooled resources. This architectural model was immortalized by George Gilder in his Oc tober 2006 Wired magazine article titled “The Information Factories.” The server farms Gilder wrote about were architecturally similar to grid computing, but where grids are used for loosely coupled, technical computing applications, this new cloud model was being applied to Internet services.Both clouds and grids are built to scale horizontally very efficiently. Both are built to withstand failures of individual elements or nodes. Both are charged on a per-use basis. But whilegrids typically process batch jobs, with a defined start and end point, cloud services can be continuous. What’s more, clouds expand the types of resources available—file storage, databases, and Web services — and extend the applicability to Web and enterprise applications.At the same time, the concept of utility computing became a focus of IT design and operations. As Nick Carr observed in his book The Big Switch, computing services infrastructure was beginning to parallel the development of electricity as a utility. Wouldn’t it b e great if you could purchase compute resources, on demand, only paying for what you need, when you need it?For end users, cloud computing means there are no hardware acquisition costs, no software licenses or upgrades to manage, no new employees or consultants to hire, no facilities to lease, no capital costs of any kind —and no hidden costs. Just a metered, per-use rate or a fixed subscription fee. Use only what you want, pay only for what you use.Cloud computing actually takes the utility model to the next level. It’s a new and evolved form of utility computing in which many different types of resources (hardware, software, storage, communications, and so on) can be combined and recombined on the fly into the specific capabilities or services customers require. From CPU cycles for HPC projects to storage capacity for enterprise-grade backups to complete IDEs for software development, cloud computing can deliver virtually any IT capability, in real time. Under the circumstances it is easy to see that a broad range of organizations and individuals would like to purchase “computing” as a service, and those firms already building hyperscale distributed data centers would inevitably choose to begin offering this infrastructure as a service.(4)Harnessing Cloud ComputingSo how does an individual or a business take advantage of the cloud computing trend? It’s not just about loading machine images consisting of your entire software stack onto a public cloud like AWS — there are several different ways to exploit this infrastructure and explore the ecosystem of new business models.Use the CloudThe number and quality of public, commercially available cloud-based service offerings is growing fast. Using the cloud is often the best option for startups, research projects, Web 2.0 developers, or niche players who want a simple, low-cost way to “load and go.”If you’re an Internet startup today, you will be mandated by your investors to keep you IT spend to aminimum. This is certainly what the cloud is for.Leverage the CloudTypically, enterprises are using public clouds for specific functions or workloads. The cloud is an attractive alternative for:Development and testing — this is perhaps the easiest cloud use case for enterprises (not just startup developers). Why wait to order servers when you don’t even know if the project will pass the proof of concept?Functional offloading —you can use the cloud for specific workloads. For example, SmugMug does its image thumbnailing as a batch job in the cloud.Augmentation — Clouds give you a new option for handling peak load or anticipated spikes in demand for services. This is a very attractive option for enterprises, but also potentially one of the most difficult use cases. Success is dependent on the statefulness of the application and the interdependence with other datasets that may need to be replicated and load-balanced across the two sites.Experimenting — Why download demos of new software, and then install, license, and test it? In the future, software evaluation can be performed in the cloud, before licenses or support need to be purchased.Build the CloudMany large enterprises understand the economic benefits of cloud computing but want to ensure strict enforcement of security policies. So they’re experimenting fir st with “private” clouds, with a longer-term option of migrating mature enterprise applications to a cloud that’s able to deliver the right service levels.Other companies may simply want to build private clouds to take advantage of the economics of resource pools and standardize their development and deployment processes.Be the CloudThis category includes both cloud computing service providers and cloud aggregators —companies that offer multiple types of cloud services.As enterprises and service providers gain experience with the cloud architecture model and confidence in the security and access-control technologies that are available, many will decide to deploy externally facing cloud services. The phenomenal growth rates of some of the publiccloud offerings available today will no doubt accelerate the momentum. Amazon’s EC2 was introduced only two years ago and officially graduated from beta to general availability in October 2008.Cloud service providers can:Provide new routes to market for startups and Web 2.0 application developersOffer new value-added capabilities such as analyticsDerive a competitive edge through enterprise-level SLAsHelp enterprise customers develop their own cloudsIf you’re building large datacenters today, you should proba bly be thinking about whether you’re going to offer cloud services.(5)Public, Private, and Hybrid CloudsA company may choose to use a service provider’s cloud or build its own — but is it always all or nothing? Sun sees an opportunity to blend the advantages of the two primary options: Public clouds are run by third parties, and jobs from many different customers may be mixed together on the servers, storage systems, and other infrastructure within the cloud. End users don’t know who else’s job may be me running on the same server, network, or disk as their own jobs.Private clouds are a good option for companies dealing with data protection and service-level issues. Private clouds are on-demand infrastructure owned by a single customer who controls which applications run, and where. They own the server, network, and disk and can decide which users are allowed to use the infrastructure.But even those who feel compelled in the short term to build a private cloud will likely want to run applications both in privately owned infrastructure and in the public cloud space. This gives rise to the concept of a hybrid cloud.Hybrid clouds combine the public and private cloud models. You own parts and share other parts, though in a controlled way. Hybrid clouds offer the promise of on-demand, externally provisioned scale, but add the complexity of determining how to distribute applications across these different environments. While enterprises may be attracted to the promise of a hybrid cloud, this option, at least initially, will likely be reserved for simple stateless applications that require no complex databases or synchronization.3. Cloud Computing Defined(1)Cornerstone TechnologyWhile the basic technologies of cloud computing such as horizontally scaled, distributed compute nodes have been available for some time, virtualization — the abstraction of computer resources —is the cornerstone technology for all cloud architectures. With the ability to virtualize servers (behind a hypervisor-abstracted operating system), storage devices, desktops, and applications, a wide array of IT resources can now be allocated on demand.The dramatic growth in the ubiquitous availability of affordable high-bandwidth networking over the past several years is equally critical. What was available to only a small percentage of Internet users a decade ago is now offered to the majority of Internet users in North America, Europe, and Asia: high bandwidth, which allows massive compute and data resources to be accessed from the browser. Virtualized resources can truly be anywhere in the cloud — not just across gigabit datacenter LANs and WANs but also via broadband to remote programmers and end users.Additional enabling technologies for cloud computing can deliver IT capabilities on an absolutely unprecedented scale. Just a few examples:Sophisticated file systems such as ZFS can support virtually unlimited storage capacities, integration of the file system and volume management, snapshots and copy-on-write clones, on-line integrity checking, and repair.Patterns in architecture allow for accelerated development of superscale cloud architectures by providing repeatable solutions to common problems.New techniques for managing structured, unstructured, and semistructured data can provide radical improvements in data-intensive computing.Machine images can be instantly deployed, dramatically simplifying and accelerating resource allocation while increasing IT agility and responsiveness.(2)The Architectural Services Layers of Cloud ComputingWhile the first revolution of the Internet saw the three-tier (or n-tier) model emerge as a general architecture, the use of virtualization in clouds has created a new set of layers: applications, services, and infrastructure. These layers don’t just encapsu late on-demand resources; they also define a new application development model. And within each layer ofabstraction there are myriad business opportunities for defining services that can be offered on a pay-per-use basis.Software as a Service (SaaS)SaaS is at the highest layer and features a complete application offered as a service, on demand, via multitenancy —meaning a single instance of the software runs on the provider’s infrastructure and serves multiple client organizations.The most widely known example of SaaS is , but there are now many others, including the Google Apps offering of basic business services such as e-mail. Of course, ’s multitenant application has preceded the definition of cloud computing by a few years. On the other hand, like many other players in cloud computing, now operates at more than one cloud layer with its release of , a companion application development environment, or platform as a service.Platform as a Service (PaaS)The middle layer, or PaaS, is the encapsulation of a development environment abstraction and the packaging of a payload of services. The archetypal payload is a Xen image (part of Amazon Web Services) containing a basic Web stack (for example, a Linux distro, a Web server, and a programming environment such as Pearl or Ruby).PaaS offerings can provide for every phase of software development and testing, or they can be specialized around a particular area, such as content management.Commercial examples include Google App Engine, which serves applications on Google’s infrastructure. PaaS services such as these can provide a great deal of flexibility but may be constrained by the capabilities that are available through the provider.Infrastructure as a Service (IaaS)IaaS is at the lowest layer and is a means of delivering basic storage and compute capabilities as standardized services over the network. Servers, storage systems, switches, routers, and other systems are pooled (through virtualization technology, for example) to handle specific types of workloads — from batch processing to server/storage augmentation during peak loads.The best-known commercial example is Amazon Web Services, whose EC2 and S3 services offer bare-bones compute and storage services (respectively). Another example is Joyent whose main product is a line of virtualized servers which provide a highly scalable on-demandinfrastructure for running Web sites, including rich Web applications written in Ruby on Rails, PHP, Python, and Java.中文译文:云计算1.更高层次的云计算在很多情况下,云计算仅仅是互联网的一个隐喻,也就是网络上运算和数据资源日益增加的一个隐喻。

外文文献翻译大数据和云计算2017

外文文献翻译大数据和云计算2017

大数据和云计算技术外文文献翻译(含:英文原文及中文译文)文献出处:Bryant R. The research of big data and cloud computing technology [J]. Information Systems, 2017, 3(5): 98-109英文原文The research of big data and cloud computing technologyBryant RoyAbstractThe rapid development of mobile Internet, Internet of Things, and cloud computing technologies has opened the prelude to the era of mobile cloud, and big data is increasingly attracting people's attention. The emergence of the Internet has shortened the distance between people, people, and the world. The entire world has become a "global village," and people have accessibility, information exchange, and collaborative work through the Internet. At the same time, with the rapid development of the Internet, the maturity and popularity of database technologies, and the emergence of high-memory, high-performance storage devices and storage media, the amount of data generated by humans in daily learning, living, and work is growing exponentially. The big data problem is generated under such a background. It has become a hot topic in scientific research and related industry circles. As one of the most cutting-edge topics in the field of information technology, it has attracted more andmore scholars to study the issue of big data.Keywords: big data; data analysis; cloud computing1 IntroductionBig data is an information resource that can reflect changes in the state and state of the physical world and the spiritual world. It has complexity, decision-making usefulness, high-speed growth, sparseness, and reproducibility. It generally has a variety of potential values. Based on the perspective of big data resources and management, big data is considered as an important resource that can support management decisions. Therefore, in order to effectively manage this resource and give full play to its potential value, it is necessary to study and solve such management problems as the acquisition, processing, application, definition of property rights, industrial development, and policy guarantee. Big data has the following characteristics:Complexity, as pointed out by many definitions, forms and characteristics of big data are extremely complex. In addition to the complexity of big data, the breadth of its sources, and the diversity of its morphological structure, the complexity of big data also manifests itself in uncertainties in its state changes and development methods. The usefulness of decision-making, big data itself is an objective large-scale data resources, and its direct function is limited. By analyzing, digging, and discovering the knowledge contained in it, it can provide decisionsupport for other practical applications that are difficult to provide with other resources. The value of big data is also reflected mainly through its decision-making usefulness. With rapid growth, this feature of big data resources is different from natural resources such as oil. The total stock of non-renewable natural resources will gradually decrease with the continuous exploitation of human beings. Big data, however, has rapid growth, that is, with continuous exploitation, big data resources will not only not decrease but will increase rapidly. The sparseness of value and the large amount of data in big data have brought many opportunities and brought many challenges. One of its main challenges is the low density of big data values. Although the number of big data resources is large, the useful value contained in it is sparse, which increases the difficulty of developing and utilizing big data resources.2 Big data processing flowData AcquisitionBig data, which originally meant a large quantity and variety of types, was extremely important for obtaining data information through various methods. Data collection is the most basic step in the process of big data processing. At present, commonly used data collection methods include RFID, data search and classification tools such as Google and other search engines, and bar code technology. And due to the emergence of mobile devices, such as the rapid spread of smart phones and tabletcomputers, a large amount of mobile software has been developed and applied, and social networks have become increasingly large. This has also accelerated the speed of information circulation and acquisition accuracy.Data Processing and IntegrationThe processing and integration of data is mainly to complete the proper processing of the collected data, cleaning and denoising, and further integrated storage. According to the foregoing, one of the characteristics of big data is diversity. This determines that the type and structure of data obtained through various channels are very complex, and brings great difficulties to subsequent data analysis and processing. Through the steps of data processing and integration, these complex structural data are first converted into a single or easy-to-handle structure, which lays a good foundation for future data analysis because not all information in these data is required. Therefore, these data must also be “de-noised” and cleaned to ensure da ta quality and reliability. The commonly used method is to design some data filters during the data processing process, and use the rule method of clustering or association analysis to pick out unwanted or erroneous outlier data and filter it out to prevent it from adversely affecting the final data result; These integrated data are integrated and stored. This is a very important step. If it is simply placed at random, it will affect the access to future data. It is easy to causedata access problems. Now the general solution is to The establishment of a special database for specific types of data, and the placement of these different types of data information, can effectively reduce the time for data query and access, and increase the speed of data extraction.Data AnalysisData analysis is the most central part of the overall big data processing process, because in the process of data analysis, the value of the data will be found. After the processing and integration of the previous step data, the resulting data becomes the original data for data analysis, and the data is further processed and analyzed according to the application requirements of the required data. The traditional methods of data processing analysis include data mining, machine learning, intelligent algorithms, and statistical analysis. These methods can no longer meet the needs of data analysis in the era of big data. (Google is the most advanced data analysis technology, Google as the Internet The most widely used company for big data, pioneered the concept of "cloud computing" in 2006. The application of various internal data is based on Google's own internal research and development of a series of cloud computing technologies.Data InterpretationFor the majority of users of data and information, the most concerned is not the analysis and processing of data, but the interpretationand presentation of the results of big data analysis. Therefore, in a complete data analysis process, the interpretation of data results is crucial. important. If the results of data analysis cannot be properly displayed, data users will be troubled and even mislead users. The traditional data display method is to download the output in text form or display the processing result on the user's personal computer. However, as the amount of data increases, the results of data analysis tend to be more complicated. The use of traditional data display methods is insufficient to meet the output requirements of data analysis results. Therefore, in order to increase the number of dataAccording to explanations and demonstration capabilities, most companies now introduce data visualization technology as the most powerful way to explain big data. By visualizing the results, you can visualize the data analysis results to the user, which is more convenient for users to understand and accept the results. Common visualization technologies include collection-based visualization technology, icon-based technology, image-based technology, pixel-oriented technology, and distributed technology.3 Big Data ChallengesBig Data Security and Privacy IssuesWith the development of big data, the sources and applications of data are becoming more and more extensive. When browsing the webfreely on the Internet, a series of browsing trails are left. When logging in to a related website on the Internet, you need to input personal important information, such as an ID card. Number, mobile number, address, etc. Cameras and sensors are everywhere to record personal behavior and location information. Through relevant data analysis, data experts can easily discover people's behavior habits and personal important information. If this information is used properly, it can help companies in related fields to understand the needs and habits of customers at any time, so that enterprises can adjust their production plans and achieve greater economic benefits. However, if these important information are stolen by bad people, security issues such as personal information and property will follow. In order to solve the problem of data privacy in the era of big data, academics and industry have come up with their own solutions. In addition, the speed of updating and changing data in the era of big data is accelerating, and general data privacy protection technologies are mostly based on static data protection, which brings new challenges to privacy protection. How to implement data privacy and security protection under complex and changing conditions will be one of the key directions for future big data research.Big Data Integration and ManagementLooking at the development process of big data, the sources and applications of big data are becoming more and more extensive. In orderto collect and collect data distributed in different data management systems, it is necessary to integrate and manage data. Although there are many methods for data integration and management, the traditional data storage methods can no longer meet the data processing requirements in the era of big data, which is facing new challenges. data storage. In the era of big data, one of the characteristics of big data is the diversity of data types. Data types are gradually transformed from traditional structured data into semi-structured and unstructured data. In addition, the sources of data are also gradually diversified. Most of the traditional data comes from a small number of military companies or research institutes' computer terminals; now, with the popularity of the Internet and mobile devices in the world, the storage of data is particularly important (by As can be seen in the previous article, traditional data storage methods are insufficient to meet the current data storage requirements. To deal with more and more massive data and increasingly complex data structures, many companies have started to develop distributed files suitable for the era of big data. System and distributed parallel database. In the data storage process, the data format of the transfer change is necessary, but also very critical and complex, which puts higher requirements on data storage systems.Big Data Ecological EnvironmentThe eco-environmental problem of big data involves firstly the issueof data resource management and sharing. This is an era of normalization and openness. The open structure of the Internet allows people to share all network resources in different corners of the earth at the same time. This has brought great convenience to scientific research. However, not all data can be shared unconditionally. Some data are protected by law because of their special value attributes and cannot be used unconditionally. Because the relevant legal measures are still not sound enough and lack sufficient data protection awareness, there is always the problem of data theft or ownership of data. This has both technical and legal issues. How to solve the problem of data sharing under the premise of protecting multiple interests will be an important challenge in the era of big data (In the era of big data, the production and application of data is not limited to a few special occasions, almost all areas, etc. Everyone can see the big data, so the data cross-cutting issues involved in these areas are inevitable. With the deepening of the influence of big data, big data analysis results will inevitably be on the national governance model, corporate decision-making, organization and Business processes, personal lifestyles, etc. will have a huge impact, and this mode of influence is worth further study in the future.中文译文大数据和云计算技术研究Bryant Roy摘要移动互联网、物联网和云计算技术的迅速发展,开启了移动云时代的序幕,大数据也越来越吸引人们的视线。

云计算技术的应用与发展趋势(英文中文双语版优质文档)

云计算技术的应用与发展趋势(英文中文双语版优质文档)

云计算技术的应用与发展趋势(英文中文双语版优质文档)With the continuous development of information technology, cloud computing technology has become an indispensable part of enterprise information construction. Cloud computing technology can help enterprises realize a series of functions such as resource sharing, data storage and processing, application development and deployment. This article will discuss from three aspects: the application of cloud computing technology, the advantages of cloud computing technology and the development trend of cloud computing technology.1. Application of Cloud Computing Technology1. Resource sharingCloud computing technology can bring together different resources to realize resource sharing. Enterprises can use cloud computing technology to share resources such as servers, storage devices, and network devices, so as to maximize the utilization of resources.2. Data storage and processingCloud computing technology can help enterprises store and process massive data. Through cloud computing technology, enterprises can store data in the cloud to realize remote access and backup of data. At the same time, cloud computing technology can also help enterprises analyze and process data and provide more accurate decision support.3. Application development and deploymentCloud computing technology can help enterprises develop and deploy applications faster and more conveniently. Through cloud computing technology, enterprises can deploy applications on the cloud to realize remote access and management of applications. At the same time, cloud computing technology can also provide a variety of development tools and development environment, which is convenient for enterprises to carry out application development.2. Advantages of cloud computing technology1. High flexibilityCloud computing technology can flexibly adjust the usage and allocation of resources according to the needs of enterprises, so as to realize the optimal utilization of resources. At the same time, cloud computing technology can also support elastic expansion and contraction, which is convenient for enterprises to cope with business peaks and valleys.2. High securityCloud computing technology can ensure the security of enterprise data through data encryption, identity authentication, access control and other means. At the same time, cloud computing technology can also provide a multi-level security protection system to prevent security risks such as hacker attacks and data leakage.3. Cost-effectiveCompared with the traditional IT construction model, the cost of cloud computing technology is lower. Through cloud computing technology, enterprises can avoid large-scale hardware investment and maintenance costs, and save enterprise R&D and operating expenses.4. Convenient managementCloud computing technology can help enterprises achieve unified resource management and monitoring. Through cloud computing technology, enterprises can centrally manage resources such as multiple servers, storage devices, and network devices, which is convenient for enterprises to carry out unified monitoring and management.5. Strong scalabilityCloud computing technology can quickly increase or decrease the usage and configuration of resources according to the needs of enterprises, so as to realize the rapid expansion and contraction of business. At the same time, cloud computing technology can also provide a variety of expansion methods, such as horizontal expansion, vertical expansion, etc., to facilitate enterprises to expand their business on demand.3. The development trend of cloud computing technology1. The advent of the multi-cloud eraWith the development of cloud computing technology, the multi-cloud era has arrived. Enterprises can choose different cloud platforms and deploy services on multiple clouds to achieve high availability and elastic expansion of services.2. Combination of artificial intelligence and cloud computingArtificial intelligence is one of the current hot technologies, and cloud computing technology can also provide better support for the development of artificial intelligence. Cloud computing technology can provide high-performance computing resources and storage resources, providing better conditions for the training and deployment of artificial intelligence.3. The Rise of Edge ComputingEdge computing refers to the deployment of computing resources and storage resources at the edge of the network to provide faster and more convenient computing and storage services. With the development of the Internet of Things and the popularization of 5G networks, edge computing will become an important expansion direction of cloud computing technology.4. Guarantee of security and privacyWith the widespread application of cloud computing technology, data security and privacy protection have become important issues facing cloud computing technology. In the future, cloud computing technology will pay more attention to security measures such as data encryption, identity authentication and access control to ensure the security and privacy of corporate and personal data.To sum up, cloud computing technology has become an indispensable part of enterprise information construction. Through cloud computing technology, enterprises can realize a series of functions such as resource sharing, data storage and processing, application development and deployment. At the same time, cloud computing technology also has the advantages of high flexibility, high security, high cost-effectiveness, convenient management and strong scalability. In the future, with the multi-cloud era, the combination of artificial intelligence and cloud computing, the rise of edge computing, and the protection of security and privacy, cloud computing technology will continue to enhance its importance and application value in enterprise information construction.随着信息技术的不断发展,云计算技术已经成为企业信息化建设中不可或缺的一部分。

云计算毕业论文外文翻译

云计算毕业论文外文翻译

Cloud ComputingCloud computing is a service provided by way of a dynamic scalable virtualized computing model resources over the Internet. This mode offers available , convenient , on-demand network access into a shared pool of computing resources can be configured ( resources, including networks, servers, storage , applications, services ) , these resources can be quickly provided , just put it less administration , or with service providers rarely interact .It is calculated by the distribution of the large number of distributed computers, rather than the local computer or a remote server, running enterprise data centers and the Internet will be more similar. This allows companies to switch to the application of the resources required , according to demand access to a computer and storage systems. Like a single generator from the old model to a centralized power plant model . It means that computing power can also be used as a tradable commodity , like gas, water and electricity , as access to convenient , inexpensive. The biggest difference is that it is transmitted through the Internet.Cloud computing service characteristics and nature of clouds, water circulation on the Internet has a certainsimilarity , so the cloud is a very apt analogy.Cloud computing proposed for the development of computers played a role in making the network more widely based . In my opinion cloud computing has four significant advantages : Cloud computing provides the most reliable and secure data storage center , users do not have to worry about data loss, virus attacks and other problems ; cloud computing client devices require a minimum , is also the most convenience ; cloud computing data and applications can be easily shared between different devices ; cloud computing as we use the network provides almost infinite number of possibilities.But on the other hand there are also disadvantages two aspects : first, it is safe , because the cloud computing power and data in the cloud , how to ensure the security of customer data is a relatively important . Security has two aspects, one is the data is not lost , the general service providers will have the ability to solve backup , but also occurs occasionally lost ; Another is that your data will not leak , although this will take some measures to service providers , not to outsiders , such as hacker attacks to get data , but the problem is the service provider 's internal staff is great . The second is the network delay or interruption. Cloud computing generally remoteaccess via the network, although it is now improving speed quickly, and LAN but compared to the speed or delays , and if once a network outage , the service will not be able to access .云计算云计算是一种通过Internet以办事的方式提供动态可伸缩的虚拟化的资源的计算模式。

Java Web与云计算外文翻译文献

Java Web与云计算外文翻译文献

文献信息文献标题:Java Web Deployment in Cloud Computing(云计算中的Java Web 部署)文献作者:Ankit Kumar Sahu文献出处:《International Journal of Computer Applications》,2013, 75(15):31-34.字数统计:英文2007单词,10665字符;中文3308汉字外文文献Java Web Deployment in Cloud Computing Abstract Cloud Computing is a revolutionary IT field in today’s world. Several technologies like java, python provides deployment on clouds using respective tools. In this paper, the web deployment in cloud computing with java and the performance issues of the java deployment on a cloud is discussed.There are several tools available to deploy a java web- application on clouds, some of these are Google App Engine (By Google), Windows Azure (By Microsoft), Amazon EC2 (By Amazon) etc. Cloud Computing is providing many facilities for deployment but is also having performance issues which is a major factor for the web-applications. Issues with java-deployment on cloud would try to resolve through the framework customization. A java web-application is deployed on Google cloud and examined of its performance, further in this paper.General Terms:Cloud Computing, Google-App-Engine, Java Web- Deployment.1.INTRODUCTIONCloud Computing is a service used over a network for multi- purposes like software, platform, infrastructure service and provides a better a way for virtualization in the IT field. There are several fields that are affected by the cloud computing likedeployment. Cloud Computing makes the IT fields enable for better performance using lesser resources. It includes delivery of the application as services throughout the internet and the software that provide services in the data- centre and hardware and the prototype shift. The data centre- software and hardware is known as a cloud.Many of the companies are shifting to the cloud services like Google App Engine has been started by Google. Microsoft started Windows Azure, Amazon started EC2.A Web- Application is deployed on Google App Engine as a sample application. There are several terms which are discussed as follows:WHAT A CLOUD IS:A cloud is a pool of virtualized computer resources.A cloud can multitude a range of different loads, including wedge-style back-end works and collaborating, User-facing applications. It allows loads to be located and scaled-out rapidly through the quick provisioning of Virtual machines or somatic machines. It supports completed, self-recovering, extremely accessible programming prototypes those allow loads to improve from many obvious hardware/software disasters. It observers resource use in real time to enable rebalancing of provisions when desired.A Cloud is an implicit world available for applications- deployment with optimized cost, whereas Cloud Computing is a regular word for anything that involves distributing services over the Internet. At its humblest, it is providing the assets and proficiencies of information technology enthusiastically as a service. Cloud Computing is a style of computing in which enthusiastically accessible and often virtualized assets are delivered as a service over the Internet.ADV ANTAGES OF CLOUD COMPUTING:•It is swift, with ease and speed of deployment.•Its cost is use-based, and will likely be abridged.•In house IT costs are condensed.•Capital investment is cheap.•The latest technology is offered always.•The use of standard technology is optimistic and facilitate.CLOUD SERVICES:A cloud is a pool of systems, resources and their classes that provides all facilities as per the user-end’s requirements. All the resources, applications are part of a cloud. Cloud Computing provides following classification of its services:IaaS-Infrastructure as aService PaaS-Platform as a ServiceSaaS-Software as a Service1.) Infrastructure as a ServiceThe IaaS is further classified into:i.) Computation as a Service (CaaS):In this kind of service, the virtual machine servers are lent. The cost of the virtual machine servers are based on the capacity of the machine like memory attributes of the server, its operating system and all deployment features.ii.) Data as a Service (DaaS):In this kind of service, Storage is provided for all end-users for storing data. The cost estimation of the service is based on the scale of Gigabyte (GB) and decided bythe provider.There are several cloud computing platforms for the world. The Cloud Computing platform for Google is Google App Engine, which has an efficient and better system for deployment and all.Google has provided a standard all-inclusive answer to Cloud Computing, known as the Google App Engine. Google App Engine provides several features for its clients such as fine grained computing, data storage, data transfer etc. Google App Engine provides VPN (Virtual Private Network), elastic IP-Addressing etc. Google App Engine has become a standard model in Cloud Computing.2.) Platform as a ServicePlatform as a Service (PaaS) provides an environment which is good in performance. Platform like OS could also be used over a network (internet) by any end-users at its requirement. There would be no need of having an installed OS; one could load its OS using the PaaS. The Applications could also be served as a platform services.Microsoft has provided Windows Azure as cloud computing servers. Windows Azure is an effort to provide PaaS services to the users. Window Azure Platform (WAP) is the cloud-OS offered by Microsoft. WAP includes several services with the cloud-OS. Azure services as virtual machine servers as its runtime environments.3.) Software as a ServiceAs PaaS, SaaS provides Software as a Service on a cloud platform. Using SaaS, software may have been developed, installed and updated on the end-user’s request. It reduces the management costs and the software conventions with a Rent Model.2.RELATED WORKAs per the growing development of Cloud Computing, it is going to be the future of the IT-Field. But there are several sections in which the cloud computing is still facing problems. Some of the issues are Performance, Security, Cost, Reliability etc. If the cloud computing wants to be the acceptable future in the IT field, it would have to overcome these issues. In this paper, the performance issue is examined and tried toresolve. To examine the performance issue, a java web-application is deployed to a cloud server (Google App Engine) as follows:STEPS TO DEPLOY A JA V A WEB-APPLCATION ON GOOGLE APP ENGINE:STEP 1: A Java Web-Application can be deployed on Google App Engine using the appengine-java-sdk tool. First download the appengine-java-sdk tool.STEP 2: Using the appcfg.cmd file, the application can be deployed. Assume the application name is MySampleApplication, the command would be:appcfg.cmd update //*PATH TO APPLICATION FOLDER *// MySampleApplicationSTEP 3: The Sample Web-Application is deployed now and would be available on for ease access. The Application Administration can be accessed on .Thus a java web-application is deployed on cloud server (Google App Engine). The performance issue is discussed in this paper which is a major issue in cloud computing for its users. The performance issue includes the delay for running the application on cloud server. The web-server sends its output of the web-application to the client. If the cloud server takes longer time to run the application, this would lead to a delay and this issue is discussed in this paper.3.ISSUES WITH WEB-DEPLOYMENT ON CLOUDThere are several issues in Web-Deployment in cloud computing. As per the current research, the cloud computing is still facing challenges in its fields. Some of the issues are specified here, one of those would be tried to overcome:3.1.Performance IssueThis kind of issues is for those users which are far away from cloud in the factor of distance. The cloud may affect with its lesser performance. The root cause of this issue can be due to High-Latency-Delay.3.2.Security IssueThis kind of issue has been a primary issue in IT. Cloud Computing is also facing this issue. Security attacks and threats are still the problem for the Servers. Several Kinds of security ideas may be used to avoid these issues.3.3.Cost IssueThe cost estimation is measured on the basis of Bandwidth. If the Application needs large bandwidth, the cost would be increased for these kinds of applications. Small Applications need lesser bandwidth, their costs aren’t an issue. Large Applications faces these kinds of issues.3.4.Reliability IssueThis issue includes the reliability of the cloud. The cloud is reliable if and only if its infrastructure and its resources are reliable. These kinds of issues are resolved through individual resources of the clouds.These issues are basic problems in the cloud computing. The area of the problem resolving would be the Performance Issue in the Cloud Computing. This kind of issue could be resolved by the filtering of individual applications. The Web- Application used in the Cloud would be a Java-Web- Application. The Java Cloud requirements are continuously growing better and more refined.All Popular clouds are shifting due to the experimental results. In Cloud Computing, the loads ad costs are not definite in the market, so the prices seem to be changing, every so often in remarkable ways. Even the cloud sellers hasn’t fixed thecost, they are only guessing the costs, like any X Dollars for Y Transactions.Like in Google App Engine, the biggest problem for developers will be adjusting to Google’s non-relational data stores. When Google App Engine was introduced, there were not so many database-projects in the market for clouds.4.OBJECTIVEAs per the given Problem Formulations, the basic objective will be to determine the less-performance causes in Java Web- Application on Clouds.Following phases would be useful in Performance issues:-Choosing a suitable platform for java application like Google App Engine etc.-The Cloud Framework (i.e. The App Engine web-app Framework for Google App Engine) should be configured as per the requirement of the J2EE application.-The infrastructure of the cloud services should be more specific to the Java-Applications.Following methods can be used for the above proposed objective:•The Structure of the Cloud Computing in the respect of Deployment should be in a good manner. The Framework used in the Virtual Machine Server should be independent and optimized so that could be good at performance.•The resources which is used in the cloud computing should be single independent resource and should mention the required configuration (hardware and software) and should be platform independent a reliable to use at large scale.These issues would be resolved by several possible solutions. In the cloud services, to perform some specific task, there are several resources arranged in a particular order. The Cloud Computing provides us customizing its resources according to the individual application and its use.5.METHODOLOGYThe Methodology will used to overcome this issue would be:•Optimization in Framework:As per the Google app engine, the cloud computing has its dependency on the framework of the Web-Application. So the very first approach would be followed the optimization of the framework of the application in a cloud. The only framework, which has been customized for the cloud computing is Spring Framework yet now.While using Spring Framework in cloud deployment, if the application takes a longer time to load, would be thrown as DeadlineExceededException and the control will be shifted to the framework and now framework would take the respective decisions. For a better Cloud Deployment, the entire framework should be optimized.•Reducing or Avoiding the use of module Scanning:In Google App Engine, The Spring Framework process a set of observation as a signal-flag to any other object in its execution .Sometimes the requested resource cannot be availed due to the resource sharing, this also restricts the application-speed in its performance. The Component Scanning is also responsible for making the application with lesser performance and lesser efficient due to its time taking process. To avoid this problem, the Component Scanning would be avoided. In the mandatory case of using Component Scanning, it would be reduced.This Methodology would work with JA V A Web- Applications that are implemented in Spring Framework.6.CONCLUSIONAccording to the results, the performance issue can be easily handled by filtering the web-application individually and the framework-customization. This issue has been a major issue for the cloud-users which would be handled by the suggested idea. This would decrease the High Latency Delay for the application performance and the application would not take longer time to run on the cloud server.中文译文云计算中的Java Web部署摘要云计算是当今世界一个革命性的IT领域。

大数据、云计算技术与审计外文文献翻译最新译文

大数据、云计算技术与审计外文文献翻译最新译文

大数据、云计算技术与审计外文文献翻译最新译文毕业设计附件外文文献翻译:原文+译文文献出处:Chaudhuri S. Big data,cloud computing technology and the audit[J]. IT Professional Magazine, 2016, 2(4): 38-51.原文Big data,cloud computing technology and the auditChaudhuri SAbstractAt present, large data along with the development of cloud computing technology, is a significant impact on global economic and social life. Big data and cloud computing technology to modern audit provides a new technology and method of auditing organizations and audit personnel to grasp the big data, content and characteristics of cloud computing technology, to promote the further development of the modern audit technology and method.Keywords: big data, cloud computing technology, audit, advice1 Related concept1.1 Large dataThe word "data" (data) is the meaning of "known" in Latin, can also be interpreted as "fact”. In 2009, the concept of “big data” gradually begins to spread in society. The conce pt of "big data" truly become popular, it is because the Obama administration in 2012 high-profile announced its "big data research and development plan”. It marks the era of "big data" really began to enter the social economic life.” Big data" (bigdata), or "huge amounts of data, refers to the amount of data involved too big to use the current mainstream software tools, in a certain period of time to realize collection, analysis, processing, or converted to help decision-makers decision-making information available. Internet data center (IDC) said "big data" is for the sake of more economical, more efficient from high frequency, large capacity, different structures and types of data to derive value and design of a new generation of architecture and technology, and use it to describe and define the information explosion times produce huge amounts of data, and name the related technology development and innovation. Big data has four characteristics: first, the data volume is huge, jumped from TB level to the level of PB.Second, processing speed, the traditionaldata mining technology are fundamentally different. Third, many data types’pictures, location information, video, web logs, and other forms. Fourth, the value of low density, high commercial value.1.2 Cloud computing"Cloud computing" concept was created in large Internet companies such as Google and IBM handle huge amounts of data in practice. On August 9, 2006, Google CEO Eric Schmidt (Eric Schmidt) in the search engine assembly for the first time put forward the concept of "cloud computing”. In October 2007, Google and IBM began in the United States university campus to promote cloud computing technology plan, the project hope to reduce the cost of distributed computing technology in academic research, and provide the related hardware and software equipment for these universities and technical support (Michael Mille, 2009).The world there are many about the definition of"cloud computing”.” Cloud computing" is the increase of the related services based on Internet, use and delivery mode, is through the Internet to provide dynamic easy extension and often virtualized resources. American national standards institute of technology (NIST) in 2009 about cloud computing is defined as: "cloud computing is a kind of pay by usage pattern, this pattern provides available, convenient, on-demand network access, enter the configurable computing resources Shared pool resources (including network, servers, storage, applications, services, etc.), these resources can be quick to provide, just in the management of the very few and or little interaction with service providers."1.3 The relationship between big data and cloud computingOverall, big data and cloud computing are complementary to each other. Big data mainly focus on the actual business, focus on "data", provide the technology and methods of data collection, mining and analysis, and emphasizes the data storage capacity. Cloud computing focuses on "computing", pay attention to IT infrastructure, providing IT solutions, emphasizes the ability to calculate, the data processing ability. If there is no large data storage of data, so the cloud computing ability strong again, also hard to find a place; If there is no cloud computing ability of data processing, the big data storage of data rich again, and ultimately, used in practice. From a technical point of view, large data relies on the cloud computing. Huge amounts of data storage technology, massive data management technology, graphs programming model is the key technology of cloud computing, are also big data technology base. And the data will be "big", themost important is the technology provided by the cloudcomputing platform. After the data is on the "cloud", broke the past their segmentation of data storage, more easy to collect and obtain, big data to present in front of people. From the focus, the emphasis of the big data and cloud computing. The emphasis of the big data is all sorts of data, broad, deep huge amounts of data mining, found in the data value, forcing companies to shift from "business-driven" for "data driven”. And the cloud is mainly through the Internet, extension, and widely available computing and storage resources and capabilities, its emphasis is IT resources, processing capacity and a variety of applications, to help enterprises save IT deployment costs. Cloud computing the benefits of the IT department in enterprise, and big data benefit enterprise business management department.2 Big data and cloud computing technology analysis of the influence of the audit2.1 Big data and cloud computing technology promote the development of continuous audit modeIn traditional audit, the auditor only after completion of the audited business audit, and audit process is not audit all data and information, just take some part of the audit. This after the event, and limited audit on the audited complex production and business operation and management system is difficult to make the right evaluation in time, and for the evaluation of increasingly frequent and complex operation and management activities of the authenticity and legitimacy is too slow. Along with the rapid development of information technology, more and more audit organization began to implement continuous audit way, to solve the problem of the time difference between audit results and economic activity. However, auditors for audit, often limited by current business conditions and information technology means,the unstructured data to digital, or related detail data cannot be obtained, the causes to question the judgment of the are no specific further and deeper. And big data and cloud computing technology can promote the development of continuous audit mode, make the information technology and big data and cloud computing technology is better, especially for the business data and risk control "real time" to demand higher specific industry, such as banking, securities, insurance industry, the continuous audit in these industries is imminent.2.2 Big data and cloud computing technology to promote the application of overall audit modeThe current audit mode is based on the evaluation of audit risk to implement sampling audit. In impossible to collect and analyze the audited all economic business data, the current audit modemainly depends on the audit sampling, from the perspective of the local inference as a whole, namely to extract the samples from working on the audit, and then deduced the whole situation of the audit object. The sampling audit mode, due to the limited sample drawn, and ignored the many and the specific business activity, the auditors cannot find and reveal the audited major fraud, hidden significant audit risks. Big data and cloud computing technology for the auditor, is not only a technical means are available, the technology and method will provide the auditor with the feasibility of implementing overall audit mode. Using big data and cloud computing technology, cross-industry, across the enterprise to collect and analysis of the data, can need not random sampling method, and use to collect and analyze all the data of general audit mode. Use of big data and cloud computing technology overall audit mode is to analyze all thedata related to the audit object allows the auditor to establish overall audit of the thinking mode; can make the modern audit for revolutionary change. Auditors to implement overall audit mode, can avoid audit sampling risk. If could gather all the data in general, you can see more subtle and in-depth information, deep analysis of the data in multiple perspectives, to discover the hidden details in the data information of value to the audit problem. At the same time, the auditor implement overall audit mode, can be found from the audit sampling mode can find problems.2.3 Big data and cloud computing technology for integrated application of the audit resultsAt present, the auditor audit results is mainly provided to the audit report of the audited, its format is fixed, single content, contains less information. As the big data and cloud computing technology is widely used in the audit, the auditor audit results in addition to the audit report, and in the process of audit collection, mining, analysis and processing of large amounts of information and data, can be provided to the audited to improve management, promote the integrated application of the audit results, improve the comprehensive application effect of the audit results. First of all, the auditor in the audit to obtain large amounts of data and related information of summary and induction, financial, business and find the inner rules of operation and management etc, common problems and development trend, through the summary induces a macroscopic and comprehensive strong audit information, to provide investors and other stakeholders audited data prove that, correlation analysis and decision making Suggestions, thus promoting the improvement of the audited management level. Second, auditorsby using big data and cloud computing technology can be the same problem in different category analysis and processing, from a differentAngle and different level of integration of refining to satisfy the needs of different levels. Again, the auditor will audit results for intelligent retained, by big data and cloud computing technology, to regulation and curing the problem in the system, in order to calculate or determine the problem developing trend, an early warning of the auditees.3 Big data and cloud computing technology promote the relationship between the applications of evidenceAuditors in the audit process should be based on sufficient and appropriate audit evidence audit opinion, and issue the audit report. However, under the big data and cloud computing environment, auditors are faced with both a huge amount data screening test, and facing the challenge of collecting appropriate audit evidence. Auditors when collecting audit evidence, the traditional thinking path is to collect audit evidence, based on the causal relationship between the big data analysis will be more use of correlation analysis to gather and found that the audit evidence. But from the perspective of audit evidence found, because of big data technology provides an unprecedented interdisciplinary, quantitative dimensions available, made a lot of relevant information to the audit records and analysis. Big data and cloud computing technology has not changed the causal relationship between things, but in the big data and cloud computing technology the development and use of correlation, makes the analysis of data dependence on causal logic relationship is reduced, and even more inclined to application based on the analysis of correlation data, on the basis ofcorrelation analysis of data validation is large, one of the important characteristics of cloud computing technology. In the big data and cloud computing environment, the auditor can collect audit evidence are mostly electronic evidence. Electronic evidence itself is very complex, and cloud computing technology makes it more difficult to obtain evidence of the causal. Auditors should collect from long-term dependence on cause and effect and found that the audit evidence, into a correlation is used to collect and found that the audit evidence.译文大数据、云计算技术与审计Chaudhuri S摘要目前,大数据伴随着云计算技术的发展,正在对全球经济社会生活产生巨大的影响。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

云计算外文翻译参考文献(文档含中英文对照即英文原文和中文翻译)原文:Technical Issues of Forensic Investigations in Cloud Computing EnvironmentsDominik BirkRuhr-University BochumHorst Goertz Institute for IT SecurityBochum, GermanyRuhr-University BochumHorst Goertz Institute for IT SecurityBochum, GermanyAbstract—Cloud Computing is arguably one of the most discussedinformation technologies today. It presents many promising technological and economical opportunities. However, many customers remain reluctant to move their business IT infrastructure completely to the cloud. One of their main concerns is Cloud Security and the threat of the unknown. Cloud Service Providers(CSP) encourage this perception by not letting their customers see what is behind their virtual curtain. A seldomly discussed, but in this regard highly relevant open issue is the ability to perform digital investigations. This continues to fuel insecurity on the sides of both providers and customers. Cloud Forensics constitutes a new and disruptive challenge for investigators. Due to the decentralized nature of data processing in the cloud, traditional approaches to evidence collection and recovery are no longer practical. This paper focuses on the technical aspects of digital forensics in distributed cloud environments. We contribute by assessing whether it is possible for the customer of cloud computing services to perform a traditional digital investigation from a technical point of view. Furthermore we discuss possible solutions and possible new methodologies helping customers to perform such investigations.I. INTRODUCTIONAlthough the cloud might appear attractive to small as well as to large companies, it does not come along without its own unique problems. Outsourcing sensitive corporate data into the cloud raises concerns regarding the privacy and security of data. Security policies, companies main pillar concerning security, cannot be easily deployed into distributed, virtualized cloud environments. This situation is further complicated by the unknown physical location of the companie’s assets. Normally,if a security incident occurs, the corporate security team wants to be able to perform their own investigation without dependency on third parties. In the cloud, this is not possible anymore: The CSP obtains all the power over the environmentand thus controls the sources of evidence. In the best case, a trusted third party acts as a trustee and guarantees for the trustworthiness of the CSP. Furthermore, the implementation of the technical architecture and circumstances within cloud computing environments bias the way an investigation may be processed. In detail, evidence data has to be interpreted by an investigator in a We would like to thank the reviewers for the helpful comments and Dennis Heinson (Center for Advanced Security Research Darmstadt - CASED) for the profound discussions regarding the legal aspects of cloud forensics. proper manner which is hardly be possible due to the lackof circumstantial information. For auditors, this situation does not change: Questions who accessed specific data and information cannot be answered by the customers, if no corresponding logs are available. With the increasing demand for using the power of the cloud for processing also sensible information and data, enterprises face the issue of Data and Process Provenance in the cloud [10]. Digital provenance, meaning meta-data that describes the ancestry or history of a digital object, is a crucial feature for forensic investigations. In combination with a suitable authentication scheme, it provides information about who created and who modified what kind of data in the cloud. These are crucial aspects for digital investigations in distributed environments such as the cloud. Unfortunately, the aspects of forensic investigations in distributed environment have so far been mostly neglected by the research community. Current discussion centers mostly around security, privacy and data protection issues [35], [9], [12]. The impact of forensic investigations on cloud environments was little noticed albeit mentioned by the authors of [1] in 2009: ”[...] to our knowledge, no research has been published on how cloud computing environments affect digital artifacts,and on acquisition logistics and legal issues related to cloud computing env ironments.” This statement is also confirmed by other authors [34], [36], [40] stressing that further research on incident handling, evidence tracking and accountability in cloud environments has to be done. At the same time, massive investments are being made in cloud technology. Combined with the fact that information technology increasingly transcendents peoples’ private and professional life, thus mirroring more and more of peoples’actions, it becomes apparent that evidence gathered from cloud environments will be of high significance to litigation or criminal proceedings in the future. Within this work, we focus the notion of cloud forensics by addressing the technical issues of forensics in all three major cloud service models and consider cross-disciplinary aspects. Moreover, we address the usability of various sources of evidence for investigative purposes and propose potential solutions to the issues from a practical standpoint. This work should be considered as a surveying discussion of an almost unexplored research area. The paper is organized as follows: We discuss the related work and the fundamental technical background information of digital forensics, cloud computing and the fault model in section II and III. In section IV, we focus on the technical issues of cloud forensics and discuss the potential sources and nature of digital evidence as well as investigations in XaaS environments including thecross-disciplinary aspects. We conclude in section V.II. RELATED WORKVarious works have been published in the field of cloud security and privacy [9], [35], [30] focussing on aspects for protecting data in multi-tenant, virtualized environments. Desired security characteristics for current cloud infrastructures mainly revolve around isolation of multi-tenant platforms [12], security of hypervisors in order to protect virtualized guest systems and secure network infrastructures [32]. Albeit digital provenance, describing the ancestry of digital objects, still remains a challenging issue for cloud environments, several works have already been published in this field [8], [10] contributing to the issues of cloud forensis. Within this context, cryptographic proofs for verifying data integrity mainly in cloud storage offers have been proposed,yet lacking of practical implementations [24], [37], [23]. Traditional computer forensics has already well researched methods for various fields of application [4], [5], [6], [11], [13]. Also the aspects of forensics in virtual systems have been addressed by several works [2], [3], [20] including the notionof virtual introspection [25]. In addition, the NIST already addressed Web Service Forensics [22] which has a huge impact on investigation processes in cloud computing environments. In contrast, the aspects of forensic investigations in cloud environments have mostly been neglected by both the industry and the research community. One of the first papers focusing on this topic was published by Wolthusen [40] after Bebee et al already introduced problems within cloud environments [1]. Wolthusen stressed that there is an inherent strong need for interdisciplinary work linking the requirements and concepts of evidence arising from the legal field to what can be feasibly reconstructed and inferred algorithmically or in an exploratory manner. In 2010, Grobauer et al [36] published a paper discussing the issues of incident response in cloud environments - unfortunately no specific issues and solutions of cloud forensics have been proposed which will be done within this work.III. TECHNICAL BACKGROUNDA. Traditional Digital ForensicsThe notion of Digital Forensics is widely known as the practice of identifying, extracting and considering evidence from digital media. Unfortunately, digital evidence is both fragile and volatile and therefore requires the attention of special personnel and methods in order to ensure that evidence data can be proper isolated and evaluated. Normally, the process of a digital investigation can be separated into three different steps each having its own specificpurpose:1) In the Securing Phase, the major intention is the preservation of evidence for analysis. The data has to be collected in a manner that maximizes its integrity. This is normally done by a bitwise copy of the original media. As can be imagined, this represents a huge problem in the field of cloud computing where you never know exactly where your data is and additionallydo not have access to any physical hardware. However, the snapshot technology, discussed in section IV-B3, provides a powerful tool to freeze system states and thus makes digital investigations, at least in IaaS scenarios, theoretically possible.2) We refer to the Analyzing Phase as the stage in which the data is sifted and combined. It is in this phase that the data from multiple systems or sources is pulled together to create as complete a picture and event reconstruction as possible. Especially in distributed system infrastructures, this means that bits and pieces of data are pulled together for deciphering the real story of what happened and for providing a deeper look into the data.3) Finally, at the end of the examination and analysis of the data, the results of the previous phases will be reprocessed in the Presentation Phase. The report, created in this phase, is a compilation of all the documentation and evidence from the analysis stage. The main intention of such a report is that it contains all results, it is complete and clear to understand. Apparently, the success of these three steps strongly depends on the first stage. If it is not possible to secure the complete set of evidence data, no exhaustive analysis will be possible. However, in real world scenarios often only a subset of the evidence data can be secured by the investigator. In addition, an important definition in the general context of forensics is the notion of a Chain of Custody. This chain clarifies how and where evidence is stored and who takes possession of it. Especially for cases which are brought to court it is crucial that the chain of custody is preserved.B. Cloud ComputingAccording to the NIST [16], cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal CSP interaction. The new raw definition of cloud computing brought several new characteristics such as multi-tenancy, elasticity, pay-as-you-go and reliability. Within this work, the following three models are used: In the Infrastructure asa Service (IaaS) model, the customer is using the virtual machine provided by the CSP for installing his own system on it. The system can be used like any other physical computer with a few limitations. However, the additive customer power over the system comes along with additional security obligations. Platform as a Service (PaaS) offerings provide the capability to deploy application packages created using the virtual development environment supported by the CSP. For the efficiency of software development process this service model can be propellent. In the Software as a Service (SaaS) model, the customer makes use of a service run by the CSP on a cloud infrastructure. In most of the cases this service can be accessed through an API for a thin client interface such as a web browser. Closed-source public SaaS offers such as Amazon S3 and GoogleMail can only be used in the public deployment model leading to further issues concerning security, privacy and the gathering of suitable evidences. Furthermore, two main deployment models, private and public cloud have to be distinguished. Common public clouds are made available to the general public. The corresponding infrastructure is owned by one organization acting as a CSP and offering services to its customers. In contrast, the private cloud is exclusively operated for an organization but may not provide the scalability and agility of public offers. The additional notions of community and hybrid cloud are not exclusively covered within this work. However, independently from the specific model used, the movement of applications and data to the cloud comes along with limited control for the customer about the application itself, the data pushed into the applications and also about the underlying technical infrastructure.C. Fault ModelBe it an account for a SaaS application, a development environment (PaaS) or a virtual image of an IaaS environment, systems in the cloud can be affected by inconsistencies. Hence, for both customer and CSP it is crucial to have the ability to assign faults to the causing party, even in the presence of Byzantine behavior [33]. Generally, inconsistencies can be caused by the following two reasons:1) Maliciously Intended FaultsInternal or external adversaries with specific malicious intentions can cause faults on cloud instances or applications. Economic rivals as well as former employees can be the reason for these faults and state a constant threat to customers and CSP. In this model, also a malicious CSP is included albeit he isassumed to be rare in real world scenarios. Additionally, from the technical point of view, the movement of computing power to a virtualized, multi-tenant environment can pose further threads and risks to the systems. One reason for this is that if a single system or service in the cloud is compromised, all other guest systems and even the host system are at risk. Hence, besides the need for further security measures, precautions for potential forensic investigations have to be taken into consideration.2) Unintentional FaultsInconsistencies in technical systems or processes in the cloud do not have implicitly to be caused by malicious intent. Internal communication errors or human failures can lead to issues in the services offered to the costumer(i.e. loss or modification of data). Although these failures are not caused intentionally, both the CSP and the customer have a strong intention to discover the reasons and deploy corresponding fixes.IV. TECHNICAL ISSUESDigital investigations are about control of forensic evidence data. From the technical standpoint, this data can be available in three different states: at rest, in motion or in execution. Data at rest is represented by allocated disk space. Whether the data is stored in a database or in a specific file format, it allocates disk space. Furthermore, if a file is deleted, the disk space is de-allocated for the operating system but the data is still accessible since the disk space has not been re-allocated and overwritten. This fact is often exploited by investigators which explore these de-allocated disk space on harddisks. In case the data is in motion, data is transferred from one entity to another e.g. a typical file transfer over a network can be seen as a data in motion scenario. Several encapsulated protocols contain the data each leaving specific traces on systems and network devices which can in return be used by investigators. Data can be loaded into memory and executed as a process. In this case, the data is neither at rest or in motion but in execution. On the executing system, process information, machine instruction and allocated/de-allocated data can be analyzed by creating a snapshot of the current system state. In the following sections, we point out the potential sources for evidential data in cloud environments and discuss the technical issues of digital investigations in XaaS environmentsas well as suggest several solutions to these problems.A. Sources and Nature of EvidenceConcerning the technical aspects of forensic investigations, the amount of potential evidence available to the investigator strongly diverges between thedifferent cloud service and deployment models. The virtual machine (VM), hosting in most of the cases the server application, provides several pieces of information that could be used by investigators. On the network level, network components can provide information about possible communication channels between different parties involved. The browser on the client, acting often as the user agent for communicating with the cloud, also contains a lot of information that could be used as evidence in a forensic investigation. Independently from the used model, the following three components could act as sources for potential evidential data.1) Virtual Cloud Instance: The VM within the cloud, where i.e. data is stored or processes are handled, contains potential evidence [2], [3]. In most of the cases, it is the place where an incident happened and hence provides a good starting point for a forensic investigation. The VM instance can be accessed by both, the CSP and the customer who is running the instance. Furthermore, virtual introspection techniques [25] provide access to the runtime state of the VM via the hypervisor and snapshot technology supplies a powerful technique for the customer to freeze specific states of the VM. Therefore, virtual instances can be still running during analysis which leads to the case of live investigations [41] or can be turned off leading to static image analysis. In SaaS and PaaS scenarios, the ability to access the virtual instance for gathering evidential information is highly limited or simply not possible.2) Network Layer: Traditional network forensics is knownas the analysis of network traffic logs for tracing events that have occurred in the past. Since the different ISO/OSI network layers provide several information on protocols and communication between instances within as well as with instances outside the cloud [4], [5], [6], network forensics is theoretically also feasible in cloud environments. However in practice, ordinary CSP currently do not provide any log data from the network components used by the customer’s instances or applications. For instance, in case of a malware infection of an IaaS VM, it will be difficult for the investigator to get any form of routing information and network log datain general which is crucial for further investigative steps. This situation gets even more complicated in case of PaaS or SaaS. So again, the situation of gathering forensic evidence is strongly affected by the support the investigator receives from the customer and the CSP.3) Client System: On the system layer of the client, it completely depends on the used model (IaaS, PaaS, SaaS) if and where potential evidence could beextracted. In most of the scenarios, the user agent (e.g. the web browser) on the client system is the only application that communicates with the service in the cloud. This especially holds for SaaS applications which are used and controlled by the web browser. But also in IaaS scenarios, the administration interface is often controlled via the browser. Hence, in an exhaustive forensic investigation, the evidence data gathered from the browser environment [7] should not be omitted.a) Browser Forensics: Generally, the circumstances leading to an investigation have to be differentiated: In ordinary scenarios, the main goal of an investigation of the web browser is to determine if a user has been victim of a crime. In complex SaaS scenarios with high client-server interaction, this constitutes a difficult task. Additionally, customers strongly make use of third-party extensions [17] which can be abused for malicious purposes. Hence, the investigator might want to look for malicious extensions, searches performed, websites visited, files downloaded, information entered in forms or stored in local HTML5 stores, web-based email contents and persistent browser cookies for gathering potential evidence data. Within this context, it is inevitable to investigate the appearance of malicious JavaScript [18] leading to e.g. unintended AJAX requests and hence modified usage of administration interfaces. Generally, the web browser contains a lot of electronic evidence data that could be used to give an answer to both of the above questions - even if the private mode is switched on [19].B. Investigations in XaaS EnvironmentsTraditional digital forensic methodologies permit investigators to seize equipment and perform detailed analysis on the media and data recovered [11]. In a distributed infrastructure organization like the cloud computing environment, investigators are confronted with an entirely different situation. They have no longer the option of seizing physical data storage. Data and processes of the customer are dispensed over an undisclosed amount of virtual instances, applications and network elements. Hence, it is in question whether preliminary findings of the computer forensic community in the field of digital forensics apparently have to be revised and adapted to the new environment. Within this section, specific issues of investigations in SaaS, PaaS and IaaS environments will be discussed. In addition, cross-disciplinary issues which affect several environments uniformly, will be taken into consideration. We also suggest potential solutions to the mentioned problems.1) SaaS Environments: Especially in the SaaS model, the customer does notobtain any control of the underlying operating infrastructure such as network, servers, operating systems or the application that is used. This means that no deeper view into the system and its underlying infrastructure is provided to the customer. Only limited userspecific application configuration settings can be controlled contributing to the evidences which can be extracted fromthe client (see section IV-A3). In a lot of cases this urges the investigator to rely on high-level logs which are eventually provided by the CSP. Given the case that the CSP does not run any logging application, the customer has no opportunity to create any useful evidence through the installation of any toolkit or logging tool. These circumstances do not allow a valid forensic investigation and lead to the assumption that customers of SaaS offers do not have any chance to analyze potential incidences.a) Data Provenance: The notion of Digital Provenance is known as meta-data that describes the ancestry or history of digital objects. Secure provenance that records ownership and process history of data objects is vital to the success of data forensics in cloud environments, yet it is still a challenging issue today [8]. Albeit data provenance is of high significance also for IaaS and PaaS, it states a huge problem specifically for SaaS-based applications: Current global acting public SaaS CSP offer Single Sign-On (SSO) access control to the set of their services. Unfortunately in case of an account compromise, most of the CSP do not offer any possibility for the customer to figure out which data and information has been accessed by the adversary. For the victim, this situation can have tremendous impact: If sensitive data has been compromised, it is unclear which data has been leaked and which has not been accessed by the adversary. Additionally, data could be modified or deleted by an external adversary or even by the CSP e.g. due to storage reasons. The customer has no ability to proof otherwise. Secure provenance mechanisms for distributed environments can improve this situation but have not been practically implemented by CSP [10]. Suggested Solution: In private SaaS scenarios this situation is improved by the fact that the customer and the CSP are probably under the same authority. Hence, logging and provenance mechanisms could be implemented which contribute to potential investigations. Additionally, the exact location of the servers and the data is known at any time. Public SaaS CSP should offer additional interfaces for the purpose of compliance, forensics, operations and security matters to their customers. Through an API, the customers should have the ability to receive specific information suchas access, error and event logs that could improve their situation in case of aninvestigation. Furthermore, due to the limited ability of receiving forensic information from the server and proofing integrity of stored data in SaaS scenarios, the client has to contribute to this process. This could be achieved by implementing Proofs of Retrievability (POR) in which a verifier (client) is enabled to determine that a prover (server) possesses a file or data object and it can be retrieved unmodified [24]. Provable Data Possession (PDP) techniques [37] could be used to verify that an untrusted server possesses the original data without the need for the client to retrieve it. Although these cryptographic proofs have not been implemented by any CSP, the authors of [23] introduced a new data integrity verification mechanism for SaaS scenarios which could also be used for forensic purposes.2) PaaS Environments: One of the main advantages of the PaaS model is that the developed software application is under the control of the customer and except for some CSP, the source code of the application does not have to leave the local development environment. Given these circumstances, the customer obtains theoretically the power to dictate how the application interacts with other dependencies such as databases, storage entities etc. CSP normally claim this transfer is encrypted but this statement can hardly be verified by the customer. Since the customer has the ability to interact with the platform over a prepared API, system states and specific application logs can be extracted. However potential adversaries, which can compromise the application during runtime, should not be able to alter these log files afterwards. Suggested Solution:Depending on the runtime environment, logging mechanisms could be implemented which automatically sign and encrypt the log information before its transfer to a central logging server under the control of the customer. Additional signing and encrypting could prevent potential eavesdroppers from being able to view and alter log data information on the way to the logging server. Runtime compromise of an PaaS application by adversaries could be monitored by push-only mechanisms for log data presupposing that the needed information to detect such an attack are logged. Increasingly, CSP offering PaaS solutions give developers the ability to collect and store a variety of diagnostics data in a highly configurable way with the help of runtime feature sets [38].3) IaaS Environments: As expected, even virtual instances in the cloud get compromised by adversaries. Hence, the ability to determine how defenses in the virtual environment failed and to what extent the affected systems havebeen compromised is crucial not only for recovering from an incident. Also forensic investigations gain leverage from such information and contribute to resilience against future attacks on the systems. From the forensic point of view, IaaS instances do provide much more evidence data usable for potential forensics than PaaS and SaaS models do. This fact is caused throughthe ability of the customer to install and set up the image for forensic purposes before an incident occurs. Hence, as proposed for PaaS environments, log data and other forensic evidence information could be signed and encrypted before itis transferred to third-party hosts mitigating the chance that a maliciously motivated shutdown process destroys the volatile data. Although, IaaS environments provide plenty of potential evidence, it has to be emphasized that the customer VM is in the end still under the control of the CSP. He controls the hypervisor which is e.g. responsible for enforcing hardware boundaries and routing hardware requests among different VM. Hence, besides the security responsibilities of the hypervisor, he exerts tremendous control over how customer’s VM communicate with the hardware and theoretically can intervene executed processes on the hosted virtual instance through virtual introspection [25]. This could also affect encryption or signing processes executed on the VM and therefore leading to the leakage of the secret key. Although this risk can be disregarded in most of the cases, the impact on the security of high security environments is tremendous.a) Snapshot Analysis: Traditional forensics expect target machines to be powered down to collect an image (dead virtual instance). This situation completely changed with the advent of the snapshot technology which is supported by all popular hypervisors such as Xen, VMware ESX and Hyper-V.A snapshot, also referred to as the forensic image of a VM, providesa powerful tool with which a virtual instance can be clonedby one click including also the running system’s mem ory. Due to the invention of the snapshot technology, systems hosting crucial business processes do not have to be powered down for forensic investigation purposes. The investigator simply creates and loads a snapshot of the target VM for analysis(live virtual instance). This behavior is especially important for scenarios in which a downtime of a system is not feasible or practical due to existing SLA. However the information whether the machine is running or has been properly powered down is crucial [3] for the investigation. Live investigations of running virtual instances become more common providing evidence data that。

相关文档
最新文档