大数据外文翻译文献
大数据领域学术文章英语作文格式
![大数据领域学术文章英语作文格式](https://img.taocdn.com/s3/m/46359a51b6360b4c2e3f5727a5e9856a5712265f.png)
大数据领域学术文章英语作文格式## Big Data: A Comprehensive Review.### Introduction.Big data refers to massive, complex, and rapidly generated datasets that are difficult to process using traditional data management tools. The advent of big data has revolutionized various industries, from healthcare to finance, transportation, and agriculture. In this paper, we present a comprehensive review of big data, including its characteristics, challenges, opportunities, and applications.### Characteristics of Big Data.Big data is often characterized by the following attributes:Volume: Big data datasets are massive, typicallyranging from terabytes to petabytes or even exabytes in size.Variety: Big data comes in various formats, including structured, semi-structured. and unstructured data.Velocity: Big data is generated rapidly and continuously, requiring real-time or near-real-time processing.Veracity: Big data quality can vary, and it isessential to address data cleansing and validation.### Challenges in Big Data Analytics.Big data analytics presents several challenges:Data storage and management: Storing and managing large and diverse datasets require efficient and scalable data storage solutions.Data processing: Traditional data processing tools areoften inadequate for handling big data, necessitating specialized big data processing techniques.Data analysis: Extracting meaningful insights from big data requires advanced analytics techniques and machine learning algorithms.Data security and privacy: Protecting big data from unauthorized access, breaches, and data loss is a significant challenge.### Opportunities of Big Data.Despite the challenges, big data presents numerous opportunities:Improved decision-making: Big data analytics enables data-driven decision-making, providing invaluable insights into customer behavior, market trends, and operational patterns.Predictive analytics: Big data allows for predictiveanalytics, identifying patterns and forecasting future events.Real-time analytics: Processing big data in near-real-time enables instant decision-making and rapid response to changing conditions.Innovation: Big data analytics drives innovation by fostering new products, services, and business models.### Applications of Big Data.Big data finds applications in numerous domains:Healthcare: Big data analytics helps improve patient diagnosis, treatment, and disease prevention.Finance: Big data is used for risk assessment, fraud detection, and personalized financial services.Transportation: Big data optimizes traffic flow, improves safety, and enhances the overall transportationsystem.Agriculture: Big data supports precision farming, crop yield prediction, and sustainable agriculture practices.Retail: Big data analytics enables personalized recommendations, customer segmentation, and supply chain optimization.### Conclusion.Big data has emerged as a transformative force in the modern world. Its vast volume, variety, velocity, and veracity present challenges but also offer unprecedented opportunities for data-driven decision-making, predictive analytics, real-time insights, and innovation. As the amount of data continues to grow exponentially, the role of big data analytics will only become more critical in shaping the future of various industries and sectors.### 中文回答:## 大数据,一个全面的评论。
大数据外文翻译参考文献综述
![大数据外文翻译参考文献综述](https://img.taocdn.com/s3/m/4164a53ebd64783e09122bd8.png)
大数据外文翻译参考文献综述(文档含中英文对照即英文原文和中文翻译)原文:Data Mining and Data PublishingData mining is the extraction of vast interesting patterns or knowledge from huge amount of data. The initial idea of privacy-preserving data mining PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. Privacy-preserving data mining considers the problem of running data mining algorithms on confidential data that is not supposed to be revealed even to the partyrunning the algorithm. In contrast, privacy-preserving data publishing (PPDP) may not necessarily be tied to a specific data mining task, and the data mining task may be unknown at the time of data publishing. PPDP studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data mining tasks. Privacy-preserving for both data mining (PPDM) and data publishing (PPDP) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes. One well studied approach is the k-anonymity model [1] which in turn led to other models such as confidence bounding, l-diversity, t-closeness, (α,k)-anonymity, etc. In particular, all known mechanisms try to minimize information loss and such an attempt provides a loophole for attacks. The aim of this paper is to present a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explain their effects on Data Privacy.Although data mining is potentially useful, many data holders are reluctant to provide their data for data mining for the fear of violating individual privacy. In recent years, study has been made to ensure that the sensitive information of individuals cannot be identified easily.Anonymity Models, k-anonymization techniques have been the focus of intense research in the last few years. In order to ensure anonymization of data while at the same time minimizing the informationloss resulting from data modifications, everal extending models are proposed, which are discussed as follows.1.k-Anonymityk-anonymity is one of the most classic models, which technique that prevents joining attacks by generalizing and/or suppressing portions of the released microdata so that no individual can be uniquely distinguished from a group of size k. In the k-anonymous tables, a data set is k-anonymous (k ≥ 1) if each record in the data set is in- distinguishable from at least (k . 1) other records within the same data set. The larger the value of k, the better the privacy is protected. k-anonymity can ensure that individuals cannot be uniquely identified by linking attacks.2. Extending ModelsSince k-anonymity does not provide sufficient protection against attribute disclosure. The notion of l-diversity attempts to solve this problem by requiring that each equivalence class has at least l well-represented value for each sensitive attribute. The technology of l-diversity has some advantages than k-anonymity. Because k-anonymity dataset permits strong attacks due to lack of diversity in the sensitive attributes. In this model, an equivalence class is said to have l-diversity if there are at least l well-represented value for the sensitive attribute. Because there are semantic relationships among the attribute values, and different values have very different levels of sensitivity. Afteranonymization, in any equivalence class, the frequency (in fraction) of a sensitive value is no more than α.3. Related Research AreasSeveral polls show that the public has an in- creased sense of privacy loss. Since data mining is often a key component of information systems, homeland security systems, and monitoring and surveillance systems, it gives a wrong impression that data mining is a technique for privacy intrusion. This lack of trust has become an obstacle to the benefit of the technology. For example, the potentially beneficial data mining re- search project, Terrorism Information Awareness (TIA), was terminated by the US Congress due to its controversial procedures of collecting, sharing, and analyzing the trails left by individuals. Motivated by the privacy concerns on data mining tools, a research area called privacy-reserving data mining (PPDM) emerged in 2000. The initial idea of PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. The solutions were often tightly coupled with the data mining algorithms under consideration. In contrast, privacy-preserving data publishing (PPDP) may not necessarily tie to a specific data mining task, and the data mining task is sometimes unknown at the time of data publishing. Furthermore, some PPDP solutions emphasize preserving the datatruthfulness at the record level, but PPDM solutions often do not preserve such property. PPDP Differs from PPDM in Several Major Ways as Follows :1) PPDP focuses on techniques for publishing data, not techniques for data mining. In fact, it is expected that standard data mining techniques are applied on the published data. In contrast, the data holder in PPDM needs to randomize the data in such a way that data mining results can be recovered from the randomized data. To do so, the data holder must understand the data mining tasks and algorithms involved. This level of involvement is not expected of the data holder in PPDP who usually is not an expert in data mining.2) Both randomization and encryption do not preserve the truthfulness of values at the record level; therefore, the released data are basically meaningless to the recipients. In such a case, the data holder in PPDM may consider releasing the data mining results rather than the scrambled data.3) PPDP primarily “anonymizes” the data by hiding the identity of record owners, whereas PPDM seeks to directly hide the sensitive data. Excellent surveys and books in randomization and cryptographic techniques for PPDM can be found in the existing literature. A family of research work called privacy-preserving distributed data mining (PPDDM) aims at performing some data mining task on a set of private databasesowned by different parties. It follows the principle of Secure Multiparty Computation (SMC), and prohibits any data sharing other than the final data mining result. Clifton et al. present a suite of SMC operations, like secure sum, secure set union, secure size of set intersection, and scalar product, that are useful for many data mining tasks. In contrast, PPDP does not perform the actual data mining task, but concerns with how to publish the data so that the anonymous data are useful for data mining. We can say that PPDP protects privacy at the data level while PPDDM protects privacy at the process level. They address different privacy models and data mining scenarios. In the field of statistical disclosure control (SDC), the research works focus on privacy-preserving publishing methods for statistical tables. SDC focuses on three types of disclosures, namely identity disclosure, attribute disclosure, and inferential disclosure. Identity disclosure occurs if an adversary can identify a respondent from the published data. Revealing that an individual is a respondent of a data collection may or may not violate confidentiality requirements. Attribute disclosure occurs when confidential information about a respondent is revealed and can be attributed to the respondent. Attribute disclosure is the primary concern of most statistical agencies in deciding whether to publish tabular data. Inferential disclosure occurs when individual information can be inferred with high confidence from statistical information of the published data.Some other works of SDC focus on the study of the non-interactive query model, in which the data recipients can submit one query to the system. This type of non-interactive query model may not fully address the information needs of data recipients because, in some cases, it is very difficult for a data recipient to accurately construct a query for a data mining task in one shot. Consequently, there are a series of studies on the interactive query model, in which the data recipients, including adversaries, can submit a sequence of queries based on previously received query results. The database server is responsible to keep track of all queries of each user and determine whether or not the currently received query has violated the privacy requirement with respect to all previous queries. One limitation of any interactive privacy-preserving query system is that it can only answer a sublinear number of queries in total; otherwise, an adversary (or a group of corrupted data recipients) will be able to reconstruct all but 1 . o(1) fraction of the original data, which is a very strong violation of privacy. When the maximum number of queries is reached, the query service must be closed to avoid privacy leak. In the case of the non-interactive query model, the adversary can issue only one query and, therefore, the non-interactive query model cannot achieve the same degree of privacy defined by Introduction the interactive model. One may consider that privacy-reserving data publishing is a special case of the non-interactivequery model.This paper presents a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explains their effects on Data Privacy. k-anonymity is used for security of respondents identity and decreases linking attack in the case of homogeneity attack a simple k-anonymity model fails and we need a concept which prevent from this attack solution is l-diversity. All tuples are arranged in well represented form and adversary will divert to l places or on l sensitive attributes. l-diversity limits in case of background knowledge attack because no one predicts knowledge level of an adversary. It is observe that using generalization and suppression we also apply these techniques on those attributes which doesn’t need th is extent of privacy and this leads to reduce the precision of publishing table. e-NSTAM (extended Sensitive Tuples Anonymity Method) is applied on sensitive tuples only and reduces information loss, this method also fails in the case of multiple sensitive tuples.Generalization with suppression is also the causes of data lose because suppression emphasize on not releasing values which are not suited for k factor. Future works in this front can include defining a new privacy measure along with l-diversity for multiple sensitive attribute and we will focus to generalize attributes without suppression using other techniques which are used to achieve k-anonymity because suppression leads to reduce the precision ofpublishing table.译文:数据挖掘和数据发布数据挖掘中提取出大量有趣的模式从大量的数据或知识。
大数据应用的英文作文
![大数据应用的英文作文](https://img.taocdn.com/s3/m/86a555492379168884868762caaedd3383c4b59c.png)
大数据应用的英文作文Title: The Application of Big Data: Transforming Industries。
In today's digital age, the proliferation of data has become unprecedented, ushering in the era of big data. This vast amount of data holds immense potential,revolutionizing various sectors and industries. In this essay, we will explore the applications of big data and its transformative impact across different domains.One of the primary areas where big data has made significant strides is in healthcare. With the advent of electronic health records (EHRs) and wearable devices, healthcare providers can now collect and analyze vast amounts of patient data in real-time. This data includesvital signs, medical history, genomic information, and more. By applying advanced analytics and machine learning algorithms to this data, healthcare professionals canidentify patterns, predict disease outbreaks, personalizetreatments, and improve overall patient care. For example, predictive analytics can help identify patients at risk of developing chronic conditions such as diabetes or heart disease, allowing for proactive interventions to prevent or mitigate these conditions.Another sector that has been transformed by big data is finance. In the financial industry, data-driven algorithms are used for risk assessment, fraud detection, algorithmic trading, and customer relationship management. By analyzing large volumes of financial transactions, market trends, and customer behavior, financial institutions can make more informed decisions, optimize investment strategies, and enhance the customer experience. For instance, banks employ machine learning algorithms to detect suspicious activities and prevent fraudulent transactions in real-time, safeguarding both the institution and its customers.Furthermore, big data has revolutionized the retail sector, empowering companies to gain deeper insights into consumer preferences, shopping behaviors, and market trends. Through the analysis of customer transactions, browsinghistory, social media interactions, and demographic data, retailers can personalize marketing campaigns, optimize pricing strategies, and enhance inventory management. For example, e-commerce platforms utilize recommendation systems powered by machine learning algorithms to suggest products based on past purchases and browsing behavior, thereby improving customer engagement and driving sales.The transportation industry is also undergoing a profound transformation fueled by big data. With the proliferation of GPS-enabled devices, sensors, andtelematics systems, transportation companies can collect vast amounts of data on vehicle performance, traffic patterns, weather conditions, and logistics operations. By leveraging this data, companies can optimize route planning, reduce fuel consumption, minimize delivery times, and enhance overall operational efficiency. For instance, ride-sharing platforms use predictive analytics to forecast demand, allocate drivers more effectively, and optimizeride routes, resulting in improved service quality and customer satisfaction.In addition to these sectors, big data is making significant strides in fields such as manufacturing, agriculture, energy, and government. In manufacturing, data analytics is used for predictive maintenance, quality control, and supply chain optimization. In agriculture, precision farming techniques enabled by big data help optimize crop yields, minimize resource usage, and mitigate environmental impact. In energy, smart grid technologies leverage big data analytics to optimize energy distribution, improve grid reliability, and promote energy efficiency. In government, big data is utilized for urban planning, public safety, healthcare management, and policy formulation.In conclusion, the application of big data is transforming industries across the globe, enabling organizations to make data-driven decisions, unlock new insights, and drive innovation. From healthcare and finance to retail and transportation, the impact of big data is profound and far-reaching. As we continue to harness the power of data analytics and machine learning, we can expect further advancements and breakthroughs that will shape the future of our society and economy.。
客户关系管理和大数据外文文献翻译
![客户关系管理和大数据外文文献翻译](https://img.taocdn.com/s3/m/7d8b5cf53b3567ec112d8a6e.png)
文献信息文献标题:Customer relationship management and big data enabled: Personalization & customization of services(客户关系管理和大数据:服务的个性化和定制化)文献作者及出处:Anshari M, Almunawar M N, lim S A, et al. Customer relationship management and big data enabled: Personalization & customization of services[J]. Applied Computing and Informatics, 2019, 15(2): 94-101.字数统计:英文3633单词,20174字符;中文6464汉字外文文献Customer relationship management and big data enabled: Personalization & customization of services Abstract The emergence of big data brings a new wave of Customer Relationship Management (CRM)’s strategies in supporting personalization and customization of sales, services and customer services. CRM needs big data for better customers experiences especially personalization and customization of services. Big data is a popular term used to describe data that is volume, velocity, variety, veracity, and value of data both structured and unstructured. Big data requires new tools and techniques to capture, store and analyse it and is used to improve decision making for enhancing customer management. The aim of the research is to examine big data for CRM’s scenario. The method of collection of data for this study was literature review and thematic analysis from recent studies. The study reveals that CRM with big data has enabled business to become more aggressive in term of marketing strategy like push notification through smartphone to their potential target audiences.Keywords: Big data; Data analytics; CRM; Web 2.0; Social networks1.IntroductionManaging good customer relationship in an organization refers to the concepts, tools, and strategies of customer relationship management (CRM). CRM as a tools with Web/Apps technology provides organizations ability to understand customers or potential customers its usual practices and thus deliver a particular activities that might convince them to make transactions and decisions. CRM has been discussed in many fields such as business, health care, science, and other service industries. The massive adoption of big data in any sectors has triggered assessment of frontend perspective especially managing customer relationship. It is pivotal to examine the role of big data within CRM strategies.Big data have quantum leap to a digital era where public generates a huge data in any sectors and industries. The amount of data are captured, collected, and processed by organization through digital sensors, communications, computation, and storage had captured information which was valuable to businesses, sciences, government, and society at large. A large amount of data streaming from smartphones, computers, parking meters, buses, trains, and supermarkets. Search engine companies collect enormous amount of data per day and share these data to useful information for others as well as their own used.Big data sources can come from structured or unstructured data formats. These data sources are gathered from multi channels like social networks, voice recording, image processing, video recording, open government data (OGD), and online customers’ activities. Those activities are extracted for the business to understand the patterns or behavior of their customers. Big data can help business to portray their behavior to gain its value especially in sales, customer service, marketing and promotion.Public or private organization see the potential of big data and mining them into big value. Many organizations have made huge investments to collect, integrate, analyse data, and use it to run business activities. For instance in marketing activities as part of CRM’s module; customers are exposed with a lot of marketing messages every day and many people is just ignore those messages unless they find a valuefrom the messages received. Email campaigning program are distributed to public or random customers about their new product so that customers might be interested to have one. Email campaigning may turn into disappointing situation because customers feel bombarded with the spam and lead to increase number of unsubscribes. Marketing strategy is about understanding customers’ habit and behavior about product or service so that the messages are perceived valuable for them. Unfortunately, many organizations may simplify marketing strategies by focusing a short term relationship with their customers with no path in attracting, retaining, and extending for long term relationship. Therefore, there is a need for personalization and customization of marketing that fits for each and every potential customer.CRM as a frontline in organization requires extensive supporting accurate data analytics to ensure potential customers to engage in transaction. Since customers make buying decisions every day and every decision depends on consideration of cost, benefits, and value. At this point, big data aims to support CRM strategies so that organization can quantify sales transactions, promotion, product awareness, building long term relationship and loyalty. Furthermore, the paper address the following question: How can big data in CRM will enhance CRM strategies in delivering personalization and customization of services for customer? The structure of this study is organized as follows. In the next section, a literature review of related work. Section 3 explains the methodology and results of our study. Section 4 presents a discussion of our findings. Recommendations for suggested future research directions are presented in Section 5, and Section 6 concludes the paper.2.literature reviewIn conventional business practice, data was collected as a recording activities to the business with no formal intention as an important asset, only collected for specific purposes such as retailers recorded sales for accounting, the number of visits in the advertising banners for calculating advertisement revenue and so on. Since many organizations either privates or publics have realized the value of data gathered as an asset, data no longer treated as its initial purpose. With the capabilities of processinghuge amount of data, it has created a new industry of data analytic services. For example IBM and Twitter involved partnership on data analytics for the purpose of selling analytical information to corporate clients in order to provide businesses a real-time conversations to make smarter decision. With IBM analytical skills and Twitter massive data source, the partnership had created an interesting strategic partnership as both partners leverage on their respective strength and expertise. Big data is considered as the recent development of decision support data management. Big data have big impact towards businesses ranging from CRM, ERP, and SCM. In the next section is discussed recent literatures on CRM and big data.2.1.Big dataBig data is a huge amount of data that is hardly processed with a traditional processing tools for extracting its value. It has an impact in various fields like business, healthcare, financial, security, communication, agriculture, and even traffic control. Big data creates opportunities for business that can use it for generating business value. The purpose is intended to gain value from volumes and a variety of data by allowing velocity of analysis. It is known as 5 Vs model; volume, velocity, and variety, value, and veracity (Fig. 1). V olume means processing massive data scale from any data type gathered. The explosive of data volumes improve a knowledge sharing and people awareness. Big data is a particularly massive volume with a large data sets, and those data cannot be analysed its content using traditional database tools, management, and processing. Velocity means real time data processing, specifically data collection and analysis. Velocity processes very large data in real-time processing. In addition, big data escalates its speed velocity surpassing that of old methods of computing. Variety is any types of data from various channels including structured and unstructured data like audio, video, image, location data for example Google Map, webpage, and text, as well as traditional structured data. Some of the semistructured data based can use Hadoop. It focuses on analysing volumes of data involved and mining the data and calculations involved in large amount of computing. Finally, veracity refers to data authenticity with the interest in the data source of Web log files, social media, enterprise content, transaction, data application. Date need a validpower of information to ensure its authenticity and safety.Fig. 1. Big data’s componentsMany organizations have been deploying big data application in running their business activities to gain value from big data analytics. Value is generated from big data processing that supports the right decision. Organizations need to refine and process it to gain value from big data analytic. For instance, value generated from big data analytic can help to reveal the conditions and save life of a new born baby by recording, examining or analysing every heart rate of an infant, data analytics help to finalize the indicators of the new born. One of the applications on the use of big data is to optimize machine or device performance. For instance, Toyota Prius is installed with cameras, GPS and sophisticated computers and sensors to ensure safety precaution on the road automatically.Big data also reduces the maintenance costs for instance, organizations deploy cloud computing approach where data are stored in the cloud. The emergence of cloud computing has enabled big data analytics to be cost efficient, easily accessed, and reliable. Cloud computing is robust, reliable and responsive when there are issues because it is responsible of cloud service provider. Since, service outrages are unacceptable at the business. Whenever data analytic goes down impacting marketingactivities are disrupted and customers have to question whether to trust such a system. Therefore reliability is competitive advantage of cloud computing in big data application.In addition, businesses have aggressively built their organization on big data capabilities. Unfortunately the fact is only 8% of the marketers have comprehensive and effective solutions in collecting and analysing those data. Evans Data Corporation conducted survey of big data and advanced analytics in organization (Fig. 2). Customer-cantered departments like as marketing, sales, and customer service are dominant users for 38.2% of all big data and advanced analytical apps. While, marketing department has the most common users (14.4%) of the data analytics, followed by IT (13.3%), and research for 13% (Columbus, 2015).Fig. 2. Big data analytics usage in organization. Sources: Evans Data Corporation2.2.Customer relationship management and social CRMAny business requires Customer Relationship Management (CRM) to sustain and survive in the long term. CRM is a tool and strategy for managing customers’ interaction using technology to automate business processes. CRM consists of sales, marketing, and customer service activities (Fig. 3). The aims are to find, attract newcustomers, nurture and retain them for future business. Business uses CRM in meeting customers’ expectations and aligning with the organization’s mission and objectives in order to bring about a sustainable performance and effective customer relationships.Fig. 3. CRM scope & moduleThe emergence of Web 2.0 has been based on collaboration platform like wikis, blogs, and social media aiming to facilitate creativity, collaboration, and sharing among users for tasks other than just emailing and retrieving information. The concept of a social network defines an organization as a system that contains objects such as people, groups, and other organizations linked together by a range of relationships. Web 2.0 is a tool that can be used to communicate a political agenda to the public via social networks. Users can gain access to the data on Web 2.0 enabled sites and exercise control over such data. Web 2.0 represents a revolution in how people communicate facilitating peer-to-peer collaboration and easy access to real-time communication. The rapid growth in Web 2.0 has impacted organization that cannot their customer relationship by using traditional CRM techniques. Social CRM is a recent approach and strategies to reveal patterns in customer management, behavior, or anything related to the multi channels customers’ interactions as expressed at Fig. 4. Social CRM makes more precise analysis possible based on people conversation in social media, and thus helps them to provide more accurate programs or activities leading to customers’ interests and preferences.Fig. 4. CRM 1.0 vs CRM 2.0Marketing is one of CRM’s activities or process of promoting and selling products or services, which also include research and advertisement. Social networks enables social marketing that is necessary efforts for marketing teams to expect going viral and receiving customers’ attention. ‘‘Marketing, is defined an the activity, set of institutions, and processes for creating, communicating, delivering, and exchanging offerings that have value for customers, clients, partners, and society at large.”. Marketing should focus on building relationships and meanings. It also applies to sales and customer services where organizations use social networks as a tool to make sales as much as possible of handling customers’ complaint at social media. Since social networks is part of big data source, the next question, how big data will impact CRM strategies.Social media has empowered customers to make conversation and business organization may utilize an increasing amount of data through people conversations that is available to them for company’s benefits such as understanding customer preference, complaining items, people expectations. Web 2.0 platform allows customers to express their opinions. In the context of CRM, social networks provide a means of strengthening relationships between customers and service providers. Itmight be utilized to create long-term relationships between business organizations and their customers and public in general. Adopting social networks into CRM is known as Social CRM or a second generation of CRM (CRM 2.0) that empowers customers to express their opinions and expectations about product or services. Social CRM has become ‘a must’ strategies for any organization nowadays to understand their customers better. By playing a significant role in the management of relationships, Social CRM stimulates fundamental changes in customer’s behavior. Social CRM has an impact towards multi channels relationships in all areas either public or private sectors is no exception.3.MethodThe study investigates the factors that an organization considers to adopt big data. The objective of the study is to investigate recent big data adoption in an organization. The methods consisted of in-depth analysis of the latest research on big data in business organization. The data for this report was through literature review of articles ranging from 2010 to 2015. The reason for choosing this time period because of the velocity of big data, any older articles might have irrelevant information. Contents analysis is applied for reviewing literature reviews of big data published in peerreviewed journals. The review process then is clustered into a thematic. We enhance and integrate various possible solutions into proposed model. We chose only English-language articles published in peer-reviewed journals. After removing duplicates and articles beyond the scope of this study, these articles were reviewed to extract feature of CRM and big data capabilities at Fig. 5.Fig. 5. Big data and marketing4.DiscussionBusiness realizes that their most valuable assets are relationships with customers and all stakeholders. In fact, building personal and social relationships become important area in marketing. The importance of relationships as market based assets that contribute to customers’ value. With the amount of data increase, some business organizations use advanced powerful computers with a huge storage to process big data analytics and to increase their performance resulting in tremendous cost saving. Businesses manage structured and unstructured data sources such as social marketing, retail databases, recorded customer activity, logistics, and enterprise data to establish a quality level of CRM strategies by having the abilities or knowledge on how to recognize big data and its advantage. While, big data analytics is a process to reveal the variety of data types in big data itself. There are some CRM strategies that can happen through big data and big data analytics.Since big data can provide a pattern of customers’ information, businesses can predict and assume what are the needs of their customers nowadays. Fig. 5 indicates basic framework on how big data can contribute to generating CRM strategy. Big data had helped shaped many industries and changed the way businesses operatednowadays. Big companies definitely benefited from this shift especially companies such as technology giants such as Amazon and googles and would continue to serve these giants from the sheer volume of data they generated. Data Velocity showed how marketers could have access to real-time data, for example real time analytics of interactions on internet sites and also social media interactions.CRM with the big data influence, a new paradigm had been created to allow accessibility and availability of information which result in greater take up by big or small business alike. Big data offers pervasive knowledge acquisition in CRM activities. Big data will support long-term relationship through understanding customers’ life cycle and behavior in more comprehensive perspective. Customers voluntarily generate a huge amount of data daily by detailing their interest and preference about products or services to the public through various channels. Therefore, big data analytic can come up with a comprehensive views of customers so that organization can enhance service fitting with customer attention, engagement, participation, and personalization. The study introduces several fundamental concept of marketing with big data that are closely related to customer based CRM strategies in an organization by engaging customer life cycle.CRM with big data brings a promise of big transformation that can affect organization in delivering CRM strategies. There were many benefits for using big data in CRM and the following were just some of the benefits such as accurate and update in profiling of target costumers, predicting trend on customer reaction toward marketing messages and product offerings, create personalise message that create emotional attachment and product offering, maximizing value chain strategies, producing accurate assessment measures, effective digital marketing and campaign-based strategies, customers retention which was a cheaper option, and create tactics and getting product insights. The combination of using big data in CRM can certainly enhance long term relationship with customers and manifest into an impressive set of CRM activities. There is an example of the successful usage of big data in CRM when Netflix used big data to run their streaming video service. Instead of using traditional methods of data gathering, they were able to find out what theircustomers want and made measurable marketing decisions. Big data can perform better CRM strategies than any processes with double the speed.CRM with big data features becomes more aggressive in term of marketing strategy like push notification through smartphone to the potential target audiences. Web / Apps users who make comment, liking page, or comes back visiting Web or Apps are potential customers are targeted for pushed notification. Technically, there are many third parties for Apps or Web that can help business to set up push notification right to the users. For instance, there are also many plugin supports web push facilities in CMS based website. Notification can be set up auto generated or manual whenever new contents are available directed at customer convenience in the form of text message, link sharing, or smartphone notification offering promotion at nearby shop. CRM aims to quantify sales transactions, promotion, product awareness, while its strategies for building long term relationship and loyalty. Businesses cannot simplify marketing strategies only focusing a short term relationship with customers without any path in attracting, retaining, and extending for long term relationship.In addition, the organization can also create better customer personas by using the profile data as the backbone of creating accurately personifications for the customers. Also the organization will have data on what the customers’ needs and preferences and used this data to provide better content for the audience where the content is relevant and valuable to them. All these data can also provide valuable information for the management team to improve marketing budget management by ensuring business operational process stayed on budget with the help of data and to be more focused and targeted.5.ChallengesBig data in CRM has very much potential to offer, with its ability to collect and produce a big amounts of data, big data could really be the downfall as well without the proper expertise and tools to obtain and analysed them. Many challenges must be managed before these potential can be fully optimized. Firstly, it may occur when organizations are shortage in technical supports and expertise. Secondly, it is difficultto track customer behavior especially trailing customers moving from brand awareness to conversion. It challenges to connect the dot from online to offline channels such as when and where customer see or read about a product to finally purchasing the product. Thirdly, CRM with big data may need more user friendly data analytics tools in producing report especially when it comes to utilizing the data appropriately across the channels, especially when they do not understand the effectiveness of their efforts in the process. There is no one size fit all solution, staffs need to integrate big data into their strategies, especially products lines, and content offering and customer journey is unique. Until such tools is available many CRM staffs would continue to search for solutions to overcome this challenge. The last challenge refers to data authenticity with the interest in the data source of Web log files, social media, enterprise content, transaction, data application may need a valid power of information to ensure its authenticity and safety. For examples, all the post or tweets we post on social networks are observed by the one who manages the big data. Finally, there is a possibility that the research may lack of generalizability because it requires case study and primary data collection from the business organizations, this research will plan to reach a large number of participants in the future.6.ConclusionCRM is about understanding of human behavior and interests. Big data can be expected to improve customer relationship as it allows interactivity, multi-way communications, personalization, and customization. The recent developments of big data analytics have optimized process, growth, and generate aggressive marketing strategy and delivering value for each customer and potential customer. CRM with big data enabled engage customers in delivering affective CRM activities where marketing teams at the organizations tune the ideas into executable marketing program. Big data enhance CRM strategies by understanding better customers’ habits and behaviors so that business can deliver CRM be more personalized and customized for each and every customers. Finally, CRM with big data will make better tools andstrategies more personalized and customized to the customers because they understand well target audiences and intended message to send.中文译文客户关系管理和大数据:服务的个性化和定制化摘要大数据的出现带来了客户关系管理CRM)战略的新浪潮,支持个性化和定制化的销售、服务及客户服务。
大数据文献综述英文版
![大数据文献综述英文版](https://img.taocdn.com/s3/m/1408b250c850ad02de8041c6.png)
The development and tendency of Big DataAbstract: "Big Data" is the most popular IT word after the "Internet of things" and "Cloud computing". From the source, development, status quo and tendency of big data, we can understand every aspect of it. Big data is one of the most important technologies around the world and every country has their own way to develop the technology.Key words: big data; IT; technology1 The source of big dataDespite the famous futurist Toffler propose the conception of “Big Data” in 1980, for a long time, because the primary stage is still in the development of IT industry and uses of information sources, “Big Data” is not get enough attention by the people in that age[1].2 The development of big dataUntil the financial crisis in 2008 force the IBM ( multi-national corporation of IT industry) proposing conception of “Smart City”and vigorously promote Internet of Things and Cloud computing so that information data has been in a massive growth meanwhile the need for the technology is very urgent. Under this condition, some American data processing companies have focused on developing large-scale concurrent processing system, then the “Big Data”technology become available sooner and Hadoop mass data concurrent processing system has received wide attention. Since 2010, IT giants have proposed their products in big data area. Big companies such as EMC、HP、IBM、Microsoft all purchase other manufacturer relating to big data in order to achieve technical integration[1]. Based on this, we can learn how important the big data strategy is. Development of big data thanks to some big IT companies such as Google、Amazon、China mobile、Alibaba and so on, because they need a optimization way to store and analysis data. Besides, there are also demands of health systems、geographic space remote sensing and digital media[2].3 The status quo of big dataNowadays America is in the lead of big data technology and market application. USA federal government announced a “Big Data’s research and development” plan in March,2012, which involved six federal government department the National Science Foundation, Health Research Institute, Department of Energy, Department of Defense, Advanced Research Projects Agency and Geological Survey in order to improve the ability to extract information and viewpoint of big data[1]. Thus, it can speed science and engineering discovery up, and it is a major move to push some research institutions making innovations.The federal government put big data development into a strategy place, which hasa big impact on every country. At present, many big European institutions is still at the primary stage to use big data and seriously lack technology about big data. Most improvements and technology of big data are come from America. Therefore, there are kind of challenges of Europe to keep in step with the development of big data. But, in the financial service industry especially investment banking in London is one of the earliest industries in Europe. The experiment and technology of big data is as good as the giant institution of America. And, the investment of big data has been maintained promising efforts. January 2013, British government announced 1.89 million pound will be invested in big data and calculation of energy saving technology in earth observation and health care[3].Japanese government timely takes the challenge of big data strategy. July 2013, Japan’s communications ministry proposed a synthesize strategy called “Energy ICT of Japan” which focused on big data application. June 2013, the abe cabinet formally announced the new IT strategy----“The announcement of creating the most advanced IT country”. This announcement comprehensively expounded that Japanese new IT national strategy is with the core of developing opening public data and big data in 2013 to 2020[4].Big data has also drawn attention of China government.《Guiding opinions of the State Council on promoting the healthy and orderly development of the Internet of things》promote to quicken the core technology including sensor network、intelligent terminal、big data processing、intelligent analysis and service integration. December 2012, the national development and reform commission add data analysis software into special guide, in the beginning of 2013 ministry of science and technology announced that big data research is one of the most important content of “973 program”[1]. This program requests that we need to research the expression, measure and semantic understanding of multi-source heterogeneous data, research modeling theory and computational model, promote hardware and software system architecture by energy optimal distributed storage and processing, analysis the relationship of complexity、calculability and treatment efficiency[1]. Above all, we can provide theory evidence for setting up scientific system of big data.4 The tendency of big data4.1 See the future by big dataIn the beginning of 2008, Alibaba found that the whole number of sellers were on a slippery slope by mining analyzing user-behavior data meanwhile the procurement to Europe and America was also glide. They accurately predicting the trend of world economic trade unfold half year earlier so they avoid the financial crisis[2]. Document [3] cite an example which turned out can predict a cholera one year earlier by mining and analysis the data of storm, drought and other natural disaster[3].4.2 Great changes and business opportunitiesWith the approval of big data values, giants of every industry all spend more money in big data industry. Then great changes and business opportunity comes[4].In hardware industry, big data are facing the challenges of manage, storage and real-time analysis. Big data will have an important impact of chip and storage industry,besides, some new industry will be created because of big data[4].In software and service area, the urgent demand of fast data processing will bring great boom to data mining and business intelligence industry.The hidden value of big data can create a lot of new companies, new products, new technology and new projects[2].4.3 Development direction of big dataThe storage technology of big data is relational database at primary. But due to the canonical design, friendly query language, efficient ability dealing with online affair, Big data dominate the market a long term. However, its strict design pattern, it ensures consistency to give up function, its poor expansibility these problems are exposed in big data analysis. Then, NoSQL data storage model and Bigtable propsed by Google start to be in fashion[5].Big data analysis technology which uses MapReduce technological frame proposed by Google is used to deal with large scale concurrent batch transaction. Using file system to store unstructured data is not lost function but also win the expansilility. Later, there are big data analysis platform like HA VEn proposed by HP and Fusion Insight proposed by Huawei . Beyond doubt, this situation will be continued, new technology and measures will come out such as next generation data warehouse, Hadoop distribute and so on[6].ConclusionThis paper we analysis the development and tendency of big data. Based on this, we know that the big data is still at a primary stage, there are too many problems need to deal with. But the commercial value and market value of big data are the direction of development to information age.忽略此处..[1] Li Chunwei, Development report of China’s E-Commerce enterprises, Beijing , 2013,pp.268-270[2] Li Fen, Zhu Zhixiang, Liu Shenghui, The development status and the problems of large data, Journal of Xi’an University of Posts and Telecommunications, 18 volume, pp. 102-103,sep.2013 [3] Kira Radinsky, Eric Horivtz, Mining the Web to Predict Future Events[C]//Proceedings of the 6th ACM International Conference on Web Search and Data Mining, WSDM 2013: New York: Association for Computing Machinery,2013,pp.255-264[4] Chapman A, Allen M D, Blaustein B. It’s About the Data: Provenance as a Toll for Assessing Data Fitness[C]//Proc of the 4th USENIX Workshop on the Theory and Practice of Provenance, Berkely, CA: USENIX Association, 2012:8[5] Li Ruiqin, Zheng Janguo, Big data Research: Status quo, Problems and Tendency[J],Network Application,Shanghai,1994,pp.107-108[6] Meng Xiaofeng, Wang Huiju, Du Xiaoyong, Big Daya Analysis: Competition and Survival of RDBMS and ManReduce[J], Journal of software, 2012,23(1): 32-45。
大数据挖掘外文翻译文献
![大数据挖掘外文翻译文献](https://img.taocdn.com/s3/m/352fe298da38376baf1faea6.png)
文献信息:文献标题:A Study of Data Mining with Big Data(大数据挖掘研究)国外作者:VH Shastri,V Sreeprada文献出处:《International Journal of Emerging Trends and Technology in Computer Science》,2016,38(2):99-103字数统计:英文2291单词,12196字符;中文3868汉字外文文献:A Study of Data Mining with Big DataAbstract Data has become an important part of every economy, industry, organization, business, function and individual. Big Data is a term used to identify large data sets typically whose size is larger than the typical data base. Big data introduces unique computational and statistical challenges. Big Data are at present expanding in most of the domains of engineering and science. Data mining helps to extract useful data from the huge data sets due to its volume, variability and velocity. This article presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective.Keywords: Big Data, Data Mining, HACE theorem, structured and unstructured.I.IntroductionBig Data refers to enormous amount of structured data and unstructured data thatoverflow the organization. If this data is properly used, it can lead to meaningful information. Big data includes a large number of data which requires a lot of processing in real time. It provides a room to discover new values, to understand in-depth knowledge from hidden values and provide a space to manage the data effectively. A database is an organized collection of logically related data which can be easily managed, updated and accessed. Data mining is a process discovering interesting knowledge such as associations, patterns, changes, anomalies and significant structures from large amount of data stored in the databases or other repositories.Big Data includes 3 V’s as its characteristics. They are volume, velocity and variety. V olume means the amount of data generated every second. The data is in state of rest. It is also known for its scale characteristics. Velocity is the speed with which the data is generated. It should have high speed data. The data generated from social media is an example. Variety means different types of data can be taken such as audio, video or documents. It can be numerals, images, time series, arrays etc.Data Mining analyses the data from different perspectives and summarizing it into useful information that can be used for business solutions and predicting the future trends. Data mining (DM), also called Knowledge Discovery in Databases (KDD) or Knowledge Discovery and Data Mining, is the process of searching large volumes of data automatically for patterns such as association rules. It applies many computational techniques from statistics, information retrieval, machine learning and pattern recognition. Data mining extract only required patterns from the database in a short time span. Based on the type of patterns to be mined, data mining tasks can be classified into summarization, classification, clustering, association and trends analysis.Big Data is expanding in all domains including science and engineering fields including physical, biological and biomedical sciences.II.BIG DATA with DATA MININGGenerally big data refers to a collection of large volumes of data and these data are generated from various sources like internet, social-media, business organization, sensors etc. We can extract some useful information with the help of Data Mining. It is a technique for discovering patterns as well as descriptive, understandable, models from a large scale of data.V olume is the size of the data which is larger than petabytes and terabytes. The scale and rise of size makes it difficult to store and analyse using traditional tools. Big Data should be used to mine large amounts of data within the predefined period of time. Traditional database systems were designed to address small amounts of data which were structured and consistent, whereas Big Data includes wide variety of data such as geospatial data, audio, video, unstructured text and so on.Big Data mining refers to the activity of going through big data sets to look for relevant information. To process large volumes of data from different sources quickly, Hadoop is used. Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Its distributed supports fast data transfer rates among nodes and allows the system to continue operating uninterrupted at times of node failure. It runs Map Reduce for distributed data processing and is works with structured and unstructured data.III.BIG DATA characteristics- HACE THEOREM.We have large volume of heterogeneous data. There exists a complex relationship among the data. We need to discover useful information from this voluminous data.Let us imagine a scenario in which the blind people are asked to draw elephant. The information collected by each blind people may think the trunk as wall, leg as tree, body as wall and tail as rope. The blind men can exchange information with each other.Figure1: Blind men and the giant elephantSome of the characteristics that include are:i.Vast data with heterogeneous and diverse sources: One of the fundamental characteristics of big data is the large volume of data represented by heterogeneous and diverse dimensions. For example in the biomedical world, a single human being is represented as name, age, gender, family history etc., For X-ray and CT scan images and videos are used. Heterogeneity refers to the different types of representations of same individual and diverse refers to the variety of features to represent single information.ii.Autonomous with distributed and de-centralized control: the sources are autonomous, i.e., automatically generated; it generates information without any centralized control. We can compare it with World Wide Web (WWW) where each server provides a certain amount of information without depending on other servers.plex and evolving relationships: As the size of the data becomes infinitely large, the relationship that exists is also large. In early stages, when data is small, there is no complexity in relationships among the data. Data generated from social media and other sources have complex relationships.IV.TOOLS:OPEN SOURCE REVOLUTIONLarge companies such as Facebook, Yahoo, Twitter, LinkedIn benefit and contribute work on open source projects. In Big Data Mining, there are many open source initiatives. The most popular of them are:Apache Mahout:Scalable machine learning and data mining open source software based mainly in Hadoop. It has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent patternmining.R: open source programming language and software environment designed for statistical computing and visualization. R was designed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand beginning in 1993 and is used for statistical analysis of very large data sets.MOA: Stream data mining open source software to perform data mining in real time. It has implementations of classification, regression; clustering and frequent item set mining and frequent graph mining. It started as a project of the Machine Learning group of University of Waikato, New Zealand, famous for the WEKA software. The streams framework provides an environment for defining and running stream processes using simple XML based definitions and is able to use MOA, Android and Storm.SAMOA: It is a new upcoming software project for distributed stream mining that will combine S4 and Storm with MOA.Vow pal Wabbit: open source project started at Yahoo! Research and continuing at Microsoft Research to design a fast, scalable, useful learning algorithm. VW is able to learn from terafeature datasets. It can exceed the throughput of any single machine networkinterface when doing linear learning, via parallel learning.V.DATA MINING for BIG DATAData mining is the process by which data is analysed coming from different sources discovers useful information. Data Mining contains several algorithms which fall into 4 categories. They are:1.Association Rule2.Clustering3.Classification4.RegressionAssociation is used to search relationship between variables. It is applied in searching for frequently visited items. In short it establishes relationship among objects. Clustering discovers groups and structures in the data.Classification deals with associating an unknown structure to a known structure. Regression finds a function to model the data.The different data mining algorithms are:Table 1. Classification of AlgorithmsData Mining algorithms can be converted into big map reduce algorithm based on parallel computing basis.Table 2. Differences between Data Mining and Big DataVI.Challenges in BIG DATAMeeting the challenges with BIG Data is difficult. The volume is increasing every day. The velocity is increasing by the internet connected devices. The variety is also expanding and the organizations’ capability to capture and process the data is limited.The following are the challenges in area of Big Data when it is handled:1.Data capture and storage2.Data transmission3.Data curation4.Data analysis5.Data visualizationAccording to, challenges of big data mining are divided into 3 tiers.The first tier is the setup of data mining algorithms. The second tier includesrmation sharing and Data Privacy.2.Domain and Application Knowledge.The third one includes local learning and model fusion for multiple information sources.3.Mining from sparse, uncertain and incomplete data.4.Mining complex and dynamic data.Figure 2: Phases of Big Data ChallengesGenerally mining of data from different data sources is tedious as size of data is larger. Big data is stored at different places and collecting those data will be a tedious task and applying basic data mining algorithms will be an obstacle for it. Next we need to consider the privacy of data. The third case is mining algorithms. When we are applying data mining algorithms to these subsets of data the result may not be that much accurate.VII.Forecast of the futureThere are some challenges that researchers and practitioners will have to deal during the next years:Analytics Architecture:It is not clear yet how an optimal architecture of analytics systems should be to deal with historic data and with real-time data at the same time. An interesting proposal is the Lambda architecture of Nathan Marz. The Lambda Architecture solves the problem of computing arbitrary functions on arbitrary data in real time by decomposing the problem into three layers: the batch layer, theserving layer, and the speed layer. It combines in the same system Hadoop for the batch layer, and Storm for the speed layer. The properties of the system are: robust and fault tolerant, scalable, general, and extensible, allows ad hoc queries, minimal maintenance, and debuggable.Statistical significance: It is important to achieve significant statistical results, and not be fooled by randomness. As Efron explains in his book about Large Scale Inference, it is easy to go wrong with huge data sets and thousands of questions to answer at once.Distributed mining: Many data mining techniques are not trivial to paralyze. To have distributed versions of some methods, a lot of research is needed with practical and theoretical analysis to provide new methods.Time evolving data: Data may be evolving over time, so it is important that the Big Data mining techniques should be able to adapt and in some cases to detect change first. For example, the data stream mining field has very powerful techniques for this task.Compression: Dealing with Big Data, the quantity of space needed to store it is very relevant. There are two main approaches: compression where we don’t loose anything, or sampling where we choose what is thedata that is more representative. Using compression, we may take more time and less space, so we can consider it as a transformation from time to space. Using sampling, we are loosing information, but the gains inspace may be in orders of magnitude. For example Feldman et al use core sets to reduce the complexity of Big Data problems. Core sets are small sets that provably approximate the original data for a given problem. Using merge- reduce the small sets can then be used for solving hard machine learning problems in parallel.Visualization: A main task of Big Data analysis is how to visualize the results. As the data is so big, it is very difficult to find user-friendly visualizations. New techniques, and frameworks to tell and show stories will be needed, as for examplethe photographs, infographics and essays in the beautiful book ”The Human Face of Big Data”.Hidden Big Data: Large quantities of useful data are getting lost since new data is largely untagged and unstructured data. The 2012 IDC studyon Big Data explains that in 2012, 23% (643 exabytes) of the digital universe would be useful for Big Data if tagged and analyzed. However, currently only 3% of the potentially useful data is tagged, and even less is analyzed.VIII.CONCLUSIONThe amounts of data is growing exponentially due to social networking sites, search and retrieval engines, media sharing sites, stock trading sites, news sources and so on. Big Data is becoming the new area for scientific data research and for business applications.Data mining techniques can be applied on big data to acquire some useful information from large datasets. They can be used together to acquire some useful picture from the data.Big Data analysis tools like Map Reduce over Hadoop and HDFS helps organization.中文译文:大数据挖掘研究摘要数据已经成为各个经济、行业、组织、企业、职能和个人的重要组成部分。
大数据应用的参考文献
![大数据应用的参考文献](https://img.taocdn.com/s3/m/c695ea0da22d7375a417866fb84ae45c3b35c2be.png)
大数据应用的参考文献以下是关于大数据应用的一些参考文献:1. "Big Data: A Revolution That Will Transform How We Live, Work, and Think" by Viktor Mayer-Schönberger and Kenneth Cukier2. "Hadoop: The Definitive Guide" by Tom White3. "Big Data: A Primer" by Eric Siegel4. "Data Science for Business" by Foster Provost and Tom Fawcett5. "Big Data Analytics: Turning Big Data into Big Money" by Frank J. Ohlhorst6. "The Big Data-Driven Business: How to Use Big Data to Win Customers, Beat Competitors, and Boost Profits" by Russell Glass and Sean Callahan7. "Data-Driven: Creating a Data Culture" by Hilary Mason and DJ Patil8. "Big Data at Work: Dispelling the Myths, Uncovering the Opportunities" by Thomas H. Davenport9. "The Human Face of Big Data" by Rick Smolan and Jennifer Erwitt10. "Big Data: Techniques and Technologies in Geoinformatics"edited by Hassan A. Karimi and Abdulrahman Y. Zekri这些文献包括了关于大数据的定义、技术、应用案例以及商业价值等方面的内容,可以作为深入了解和研究大数据应用的参考资源。
数据分析外文文献+翻译
![数据分析外文文献+翻译](https://img.taocdn.com/s3/m/7bafed7666ec102de2bd960590c69ec3d4bbdb66.png)
数据分析外文文献+翻译文献1:《数据分析在企业决策中的应用》该文献探讨了数据分析在企业决策中的重要性和应用。
研究发现,通过数据分析可以获取准确的商业情报,帮助企业更好地理解市场趋势和消费者需求。
通过对大量数据的分析,企业可以发现隐藏的模式和关联,从而制定出更具竞争力的产品和服务策略。
数据分析还可以提供决策支持,帮助企业在不确定的环境下做出明智的决策。
因此,数据分析已成为现代企业成功的关键要素之一。
文献2:《机器研究在数据分析中的应用》该文献探讨了机器研究在数据分析中的应用。
研究发现,机器研究可以帮助企业更高效地分析大量的数据,并从中发现有价值的信息。
机器研究算法可以自动研究和改进,从而帮助企业发现数据中的模式和趋势。
通过机器研究的应用,企业可以更准确地预测市场需求、优化业务流程,并制定更具策略性的决策。
因此,机器研究在数据分析中的应用正逐渐受到企业的关注和采用。
文献3:《数据可视化在数据分析中的应用》该文献探讨了数据可视化在数据分析中的重要性和应用。
研究发现,通过数据可视化可以更直观地呈现复杂的数据关系和趋势。
可视化可以帮助企业更好地理解数据,发现数据中的模式和规律。
数据可视化还可以帮助企业进行数据交互和决策共享,提升决策的效率和准确性。
因此,数据可视化在数据分析中扮演着非常重要的角色。
翻译文献1标题: The Application of Data Analysis in Business Decision-making The Application of Data Analysis in Business Decision-making文献2标题: The Application of Machine Learning in Data Analysis The Application of Machine Learning in Data Analysis文献3标题: The Application of Data Visualization in Data Analysis The Application of Data Visualization in Data Analysis翻译摘要:本文献研究了数据分析在企业决策中的应用,以及机器研究和数据可视化在数据分析中的作用。
大数据英文版
![大数据英文版](https://img.taocdn.com/s3/m/b17e243077c66137ee06eff9aef8941ea76e4b9d.png)
大数据英文版Big Data: Revolutionizing the Way We Analyze and Utilize InformationIntroduction:In this era of digital transformation, the rapid growth of data has become a defining characteristic of our society. Big data refers to the massive volume, velocity, and variety of information that is generated from various sources such as social media, sensors, and online transactions. The ability to effectively analyze and utilize this data has revolutionized industries and transformed the way we make decisions. This article explores the impact of big data, its applications, challenges, and the future prospects of this emerging field.1. The Impact of Big Data:Big data has had a profound impact on various sectors, including business, healthcare, finance, and education. By harnessing the power of data analytics, organizations can gain valuable insights, make informed decisions, and improve their operational efficiency. For instance, retailers can analyze customer purchasing patterns to personalize marketing campaigns and enhance customer satisfaction. In the healthcare sector, big data analytics can be used to predict disease outbreaks, improve patient care, and optimize resource allocation.2. Applications of Big Data:2.1 Business Intelligence:Big data analytics enables organizations to gain a competitive edge by extracting actionable insights from vast amounts of structured and unstructured data. Companies can analyze customer behavior, market trends, and competitor strategies to make data-driven decisions and drive innovation. Moreover, big data analytics can help optimize supply chain management, detect fraud, and improve customer relationship management.2.2 Healthcare:Big data has the potential to revolutionize healthcare by enabling personalized medicine, improving patient outcomes, and reducing costs. By analyzing electronic health records, genomic data, and real-time patient monitoring, healthcare providers can identify patterns, predict diseases, and develop targeted treatment plans. Additionally, big data analytics can enhance clinical research, facilitate drug discovery, and improve healthcare delivery.2.3 Finance:The finance industry heavily relies on big data analytics to detect fraudulent activities, assess creditworthiness, and optimize investment strategies. By analyzing large volumes of financial data, including market trends, customer transactions, and social media sentiment, financial institutions can make more accurate risk assessments and improve their decision-making processes. Furthermore, big data analytics can help identify potential market opportunities and enhance regulatory compliance.2.4 Education:Big data analytics is transforming the education sector by providing insights into student performance, learning patterns, and personalized learning experiences. By analyzing student data, educators can identify at-risk students, tailor instructional approaches, and develop targeted interventions. Moreover, big data analytics can facilitate adaptive learning platforms, improve curriculum design, and enable lifelong learning.3. Challenges of Big Data:While big data offers immense opportunities, it also presents several challenges that need to be addressed:3.1 Data Privacy and Security:The vast amount of data collected raises concerns about privacy and security. Organizations must ensure that data is stored securely, and appropriate measures aretaken to protect sensitive information. Additionally, regulations and policies need to be in place to safeguard individuals' privacy rights.3.2 Data Quality and Integration:Big data comes from various sources and in different formats, making it challenging to ensure data quality and integrate disparate datasets. Data cleansing and integration techniques are essential to ensure accurate and reliable analysis.3.3 Scalability and Infrastructure:The sheer volume and velocity of big data require robust infrastructure and scalable systems to store, process, and analyze the data in a timely manner. Organizations need to invest in advanced technologies and tools to handle the growing demands of big data analytics.4. Future Prospects of Big Data:The future of big data looks promising, with ongoing advancements in technology and increased adoption across industries. The emergence of artificial intelligence and machine learning algorithms will further enhance the capabilities of big data analytics. Additionally, the integration of big data with the Internet of Things (IoT) will generate new opportunities for data-driven decision-making and predictive analytics.Conclusion:Big data has revolutionized the way we analyze and utilize information, enabling organizations to gain valuable insights, make data-driven decisions, and drive innovation. Its applications span across various sectors, including business, healthcare, finance, and education. However, challenges such as data privacy, quality, and infrastructure need to be addressed to fully harness the potential of big data. With ongoing advancements and increased adoption, big data is set to play a pivotal role in shaping the future of industries and society as a whole.。
信息技术发展趋势研究论文中英文外文翻译文献
![信息技术发展趋势研究论文中英文外文翻译文献](https://img.taocdn.com/s3/m/9e5a0b652bf90242a8956bec0975f46527d3a7cc.png)
信息技术发展趋势研究论文中英文外文翻译文献本文旨在通过翻译介绍几篇关于信息技术发展趋势的外文文献,以帮助读者更全面、深入地了解该领域的研究进展。
以下是几篇相关文献的简要介绍:1. 文献标题: "Emerging Trends in Information Technology"- 作者: John Smith- 发表年份: 2019本文调查了信息技术领域的新兴趋势,包括人工智能、大数据、云计算和物联网等。
通过对相关案例的分析,研究人员得出了一些关于这些趋势的结论,并探讨了它们对企业和社会的潜在影响。
2. 文献标题: "Cybersecurity Challenges in the Digital Age"- 作者: Anna Johnson- 发表年份: 2020这篇文献探讨了数字时代中信息技术领域所面临的网络安全挑战。
通过分析日益复杂的网络威胁和攻击方式,研究人员提出了一些应对策略,并讨论了如何提高组织和个人的网络安全防护能力。
3. 文献标题: "The Impact of Artificial Intelligence on Job Market"- 作者: Sarah Thompson- 发表年份: 2018这篇文献研究了人工智能对就业市场的影响。
作者通过分析行业数据和相关研究,讨论了自动化和智能化技术对各个行业和职位的潜在影响,并提出了一些建议以适应未来就业市场的变化。
以上是对几篇外文文献的简要介绍,它们涵盖了信息技术发展趋势的不同方面。
读者可以根据需求进一步查阅这些文献,以获得更深入的了解和研究。
互联网大数据金融中英文对照外文翻译文献
![互联网大数据金融中英文对照外文翻译文献](https://img.taocdn.com/s3/m/f85c4e3ebd64783e08122b10.png)
互联网大数据金融中英文对照外文翻译文献(文档含英文原文和中文翻译)原文:Internet Finance's Impact on Traditional FinanceAbstractAs the advances in modern information and Internet technology, especially the develop of cloud computing, big data, mobile Internet, search engines and social networks, profoundly change, even subvert many traditional industries, and the financial industry is no exception. In recent years, financial industry has become the most far-reaching area influenced by Internet, after commercial distribution and the media. Many Internet-based financial service models have emerged, and have had a profound and huge impact on traditional financial industries. "Internet-Finance" has win the focus of public attention.Internet-Finance is low cost, high efficiency, and pays more attention to the user experience, and these features enable it to fully meet the special needs of traditional "long tail financial market", to flexibly provide more convenient and efficient financial services and diversified financial products, to greatly expand the scope and depth of financial services, to shorten the distance between people space and time, andto establish a new financial environment, which effectively integrate and take use of fragmented time, information, capital and other scattered resources, then add up to form a scale, and grow a new profit point for various financial institutions. Moreover, with the continuous penetration and integration in traditional financial field, Internet-Finance will bring new challenges, but also opportunities to the traditional. It contribute to the transformation of the traditional commercial banks, compensate for the lack of efficiency in funding process and information integration, and provide new distribution channels for securities, insurance, funds and other financial products. For many SMEs, Internet-Finance extend their financing channels, reduce their financing threshold, and improve their efficiency in using funds. However, the cross-industry nature of the Internet Finance determines its risk factors are more complex, sensitive and varied, and therefore we must properly handle the relationship between innovative development and market regulation, industry self-regulation.Key Words:Internet Finance; Commercial Banks; Effects; Regulatory1 IntroductionThe continuous development of Internet technology, cloud computing, big data, a growing number of Internet applications such as social networks for the business development of traditional industry provides a strong support, the level of penetration of the Internet on the traditional industry. The end of the 20th century, Microsoft chairman Bill Gates, who declared, "the traditional commercial bank will become the new century dinosaur". Nowadays, with the development of the Internet electronic information technology, we really felt this trend, mobile payment, electronic bank already occupies the important position in our daily life.Due to the concept of the Internet financial almost entirely from the business practices, therefore the present study focused on the discussion. Internet financial specific mode, and the influence of traditional financial industry analysis and counter measures are lack of systemic research. Internet has always been a key battleground in risk investment, and financial industry is the thinking mode of innovative experimental various business models emerge in endlessly, so it is difficult to use a fixed set of thinking to classification and definition. The mutual penetration andintegration of Internet and financial, is a reflection of technical development and market rules requirements, is an irreversible trend. The Internet bring traditional financial is not only a low cost and high efficiency, more is a kind of innovative thinking mode and unremitting pursuit of the user experience. The traditional financial industry to actively respond to. Internet financial, for such a vast blue ocean enough to change the world, it is very worthy of attention to straighten out its development, from the existing business model to its development prospects."Internet financial" belongs to the latest formats form, discusses the Internet financial research of literature, but the lack of systemic and more practical. So this article according to the characteristics of the Internet industry practical stronger, the several business models on the market for summary analysis, and the traditional financial industry how to actively respond to the Internet wave of financial analysis and Suggestions are given, with strong practical significance.2 Internet financial backgroundInternet financial platform based on Internet resources, on the basis of the big data and cloud computing new financial model. Internet finance with the help of the Internet technology, mobile communication technology to realize financing, payment and information intermediary business, is a traditional industry and modern information technology represented by the Internet, mobile payment, cloud computing, data mining, search engines and social networks, etc.) Produced by the combination of emerging field. Whether financial or the Internet, the Internet is just the difference on the strategic, there is no strict definition of distinction. As the financial and the mutual penetration and integration of the Internet, the Internet financial can refer all through the Internet technology to realize the financing behavior. Internet financial is the Internet and the traditional financial product of mutual infiltration and fusion, the new financial model has a profound background. The emergence of the Internet financial is a craving for cost reduction is the result of the financial subject, is also inseparable from the rapid development of modern information technology to provide technical support.2.1 Demands factorsTraditional financial markets there are serious information asymmetry, greatly improve the transaction risk. Exhibition gradually changed people's spending habits, more and more high to the requirement of service efficiency and experience; In addition, rising operating costs, to stimulate the financial main body's thirst for financial innovation and reform; This pulled by demand factors, become the Internet financial produce powerful inner driving force.2.2 Supply driving factorData mining, cloud computing and Internet search engines, such as the development of technology, financial and institutional technology platform. Innovation, enterprise profit-driven mixed management, etc., for the transformation of traditional industry and Internet companies offered financial sector penetration may, for the birth and development of the Internet financial external technical support, become a kind of externalization of constitution. In the Internet "openness, equality, cooperation, share" platform, third-party financing and payment, online investment finance, credit evaluation model, not only makes the traditional pattern of financial markets will be great changes have taken place, and modern information technology is more easily to serve various financial entities. For the traditional financial institutions, especially in the banking, securities and insurance institutions, more opportunities than the crisis, development is better than a challenge.3 Internet financial constitute the main body3.1 Capital providersBetween Internet financial comprehensive, its capital providers include not only the traditional financial institutions, including penetrating into the Internet. In terms of the current market structure, the traditional financial sector mainly include commercial Banks, securities, insurance, fund and small loan companies, mainly includes the part of the Internet companies and emerging subject, such as the amazon, and some channels on Internet for the company. These companies is not only the providers of capital market, but also too many traditional so-called "low net worth clients" suppliers of funds into the market. In operation form, the former mainly through the Internet, to the traditional business externalization, the latter mainlythrough Internet channels to penetrate business, both externalization and penetration, both through the Internet channel to achieve the financial business innovation and reform.3.2 Capital demandersInternet financial mode of capital demanders although there is no breakthrough in the traditional government, enterprise and individual, but on the benefit has greatly changed. In the rise and development of the Internet financial, especially Internet companies to enter the threshold of made in the traditional financial institutions, relatively weak groups and individual demanders, have a more convenient and efficient access to capital. As a result, the Internet brought about by the universality and inclusive financial better than the previous traditional financial pattern.3.3 IntermediariesInternet financial rely on efficient and convenient information technology, greatly reduces the financial markets is the wrong information. Docking directly through Internet, according to both parties, transaction cost is greatly reduced, so the Internet finance main body for the dependence of the intermediary institutions decreased significantly, but does not mean that the Internet financial markets, there is no intermediary institutions. In terms of the development of the Internet financial situation at present stage, the third-party payment platform plays an intermediary role in this field, not only ACTS as a financial settlement platform, but also to the capital supply and demand of the integration of upstream and downstream link multi-faceted, in meet the funds to pay at the same time, have the effect of capital allocation. Especially in the field of electronic commerce, this function is more obvious.3.4 Large financial dataBig financial data collection refers to the vast amounts of unstructured data, through the study of the depth of its mining and real-time analysis, grasp the customer's trading information, consumption habits and consumption information, and predict customer behavior and make the relevant financial institutions in the product design, precise marketing and greatly improve the efficiency of risk management, etc. Financial services platform based on the large data mainly refers to with vast tradingdata of the electronic commerce enterprise's financial services. The key to the big data from a large number of chaotic ability to rapidly gaining valuable information in the data, or from big data assets liquidation ability quickly. Big data information processing, therefore, often together with cloud computing.4 Global economic issuesFOR much of the past year the fast-growing economies of the emerging world watched the Western financial hurricane from afar. Their own banks held few of the mortgage-based assets that undid the rich world’s financial firms. Commodity exporters were thriving, thanks to high prices fo r raw materials. China’s economic juggernaut powered on. And, from Budapest to Brasília, an abundance of credit fuelled domestic demand. Even as talk mounted of the rich world suffering its worst financial collapse since the Depression, emerging economies seemed a long way from the centre of the storm.No longer. As foreign capital has fled and confidence evaporated, the emerging world’s stockmarkets have plunged (in some cases losing half their value) and currencies tumbled. The seizure in the credit market caused havoc, as foreign banks abruptly stopped lending and stepped back from even the most basic banking services, including trade credits.Like their rich-world counterparts, governments are battling to limit the damage (see article). That is easiest for those with large foreign-exchange reserves. Russia is spending $220 billion to shore up its financial services industry. South Korea has guaranteed $100 billion of its banks’ debt. Less well-endowed countries are asking for help.Hungary has secured a EURO5 billion ($6.6 billion) lifeline from the European Central Bank and is negotiating a loan from the IMF, as is Ukraine. Close to a dozen countries are talking to the fund about financial help.Those with long-standing problems are being driven to desperate measures. Argentina is nationalising its private pension funds, seeminglyto stave off default (see article). But even stalwarts are looking weaker. Figures released this week showed that China’s growth slowed to 9% in the year to the third quarter-still a rapid pace but a lot slower than the double-digit rates of recent years.The various emerging economies are in different states of readiness, but the cumulative impact of all this will be enormous. Most obviously, how these countries fare will determine whether the world economy faces a mild recession or something nastier. Emerging economies accounted for around three-quarters of global growth over the past 18 months. But their economic fate will also have political consequences.In many places-eastern Europe is one example (see article)-financial turmoil is hitting weak governments. But even strong regimes could suffer. Some experts think that China needs growth of 7% a year to contain social unrest. More generally, the coming strife will shape the debate about the integration of the world economy. Unlike many previous emerging-market crises, today’s mess spread from the rich world, largely thanks to increasingly integrated capital markets. If emerging economies collapse-either into a currency crisis or a sharp recession-there will be yet more questioning of the wisdom of globalised finance.Fortunately, the picture is not universally dire. All emerging economies will slow. Some will surely face deep recessions. But many are facing the present danger in stronger shape than ever before, armed with large reserves, flexible currencies and strong budgets. Good policy-both at home and in the rich world-can yet avoid a catastrophe.One reason for hope is that the direct economic fallout from the rich world’s d isaster is manageable. Falling demand in America and Europe hurts exports, particularly in Asia and Mexico. Commodity prices have fallen: oil is down nearly 60% from its peak and many crops and metals have done worse. That has a mixed effect. Although it hurtscommodity-exporters from Russia to South America, it helps commodity importers in Asia and reduces inflation fears everywhere. Countries like Venezuela that have been run badly are vulnerable (see article), but given the scale of the past boom, the commodity bust so far seems unlikely to cause widespread crises.The more dangerous shock is financial. Wealth is being squeezed as asset prices decline. China’s house prices, for instance, have started falling (see article). This will dampen domestic confidence, even though consumers are much less indebted than they are in the rich world. Elsewhere, the sudden dearth of foreign-bank lending and the flight of hedge funds and other investors from bond markets has slammed the brakes on credit growth. And just as booming credit once underpinned strong domestic spending, so tighter credit will mean slower growth.Again, the impact will differ by country. Thanks to huge current-account surpluses in China and the oil-exporters in the Gulf, emerging economies as a group still send capital to the rich world. But over 80 have deficits of more than 5% of GDP. Most of these are poor countries that live off foreign aid; but some larger ones rely on private capital. For the likes of Turkey and South Africa a sudden slowing in foreign financing would force a dramatic adjustment. A particular worry is eastern Europe, where many countries have double-digit deficits. In addition, even some countries with surpluses, such as Russia, have banks that have grown accustomed to easy foreign lending because of the integration of global finance. The rich world’s bank bail-outs may limit the squeeze, but the flow of capital to the emerging world will slow. The Institute of International Finance, a bankers’ group, expects a 30% decline in net flows of private capital from last year.This credit crunch will be grim, but most emerging markets can avoid catastrophe. The biggest ones are in relatively good shape. The morevulnerable ones can (and should) be helped.Among the giants, China is in a league of its own, with a $2 trillion arsenal of reserves, a current-account surplus, little connection to foreign banks and a budget surplus that offers lots of room to boost spending. Since the country’s leaders have made clear that they will do whatev er it takes to cushion growth, China’s economy is likely to slow-perhaps to 8%-but not collapse. Although that is not enough to save the world economy, such growth in China would put a floor under commodity prices and help other countries in the emerging world.The other large economies will be harder hit, but should be able to weather the storm. India has a big budget deficit and many Brazilian firms have a large foreign-currency exposure. But Brazil’s economy is diversified and both countries have plenty of reserves to smooth the shift to slower growth. With $550 billion of reserves, Russia ought to be able to stop a run on the rouble. In the short-term at least, the most vulnerable countries are all smaller ones.There will be pain as tighter credit forces adjustments. But sensible, speedy international assistance would make a big difference. Several emerging countries have asked America’s Federal Reserve for liquidity support; some hope that China will bail them out. A better route is surely the IMF, which has huge expertise and some $250 billion to lend. Sadly, borrowing from the fund carries a stigma. That needs to change. The IMF should develop quicker, more flexible financial instruments and minimise the conditions it attaches to loans. Over the past month deft policymaking saw off calamity in the rich world. Now it is time for something similar in the emerging world.5 ConclusionsInternet financial model can produce not only huge social benefit, lower transaction costs, provide higher than the existing direct and indirect financingefficiency of the allocation of resources, to provide power for economic development, will also be able to use the Internet and its related software technology played down the traditional finance specialized division of labor, makes the financial participants more mass popularization, risk pricing term matching complex transactions, tend to be simple. Because of the Internet financial involved in the field are mainly concentrated in the field of traditional financial institutions to the current development is not thorough, namely traditional financial "long tail" market, can complement with the original traditional financial business situation, so in the short term the Internet finance from the Angle of the size of the market will not make a big impact to the traditional financial institutions, but the Internet financial business model, innovative ideas, and its apparent high efficiency for the traditional financial institutions brought greater impact on the concept, also led to the traditional financial institutions to further accelerate the mutual penetration and integration with the Internet.译文:互联网金融对传统金融的影响作者:罗萨米;拉夫雷特摘要网络的发展,深刻地改变甚至颠覆了许多传统行业,金融业也不例外。
外文文献翻译大数据和云计算2017
![外文文献翻译大数据和云计算2017](https://img.taocdn.com/s3/m/60b620156137ee06eef91889.png)
大数据和云计算技术外文文献翻译(含:英文原文及中文译文)文献出处:Bryant R. The research of big data and cloud computing technology [J]. Information Systems, 2017, 3(5): 98-109英文原文The research of big data and cloud computing technologyBryant RoyAbstractThe rapid development of mobile Internet, Internet of Things, and cloud computing technologies has opened the prelude to the era of mobile cloud, and big data is increasingly attracting people's attention. The emergence of the Internet has shortened the distance between people, people, and the world. The entire world has become a "global village," and people have accessibility, information exchange, and collaborative work through the Internet. At the same time, with the rapid development of the Internet, the maturity and popularity of database technologies, and the emergence of high-memory, high-performance storage devices and storage media, the amount of data generated by humans in daily learning, living, and work is growing exponentially. The big data problem is generated under such a background. It has become a hot topic in scientific research and related industry circles. As one of the most cutting-edge topics in the field of information technology, it has attracted more andmore scholars to study the issue of big data.Keywords: big data; data analysis; cloud computing1 IntroductionBig data is an information resource that can reflect changes in the state and state of the physical world and the spiritual world. It has complexity, decision-making usefulness, high-speed growth, sparseness, and reproducibility. It generally has a variety of potential values. Based on the perspective of big data resources and management, big data is considered as an important resource that can support management decisions. Therefore, in order to effectively manage this resource and give full play to its potential value, it is necessary to study and solve such management problems as the acquisition, processing, application, definition of property rights, industrial development, and policy guarantee. Big data has the following characteristics:Complexity, as pointed out by many definitions, forms and characteristics of big data are extremely complex. In addition to the complexity of big data, the breadth of its sources, and the diversity of its morphological structure, the complexity of big data also manifests itself in uncertainties in its state changes and development methods. The usefulness of decision-making, big data itself is an objective large-scale data resources, and its direct function is limited. By analyzing, digging, and discovering the knowledge contained in it, it can provide decisionsupport for other practical applications that are difficult to provide with other resources. The value of big data is also reflected mainly through its decision-making usefulness. With rapid growth, this feature of big data resources is different from natural resources such as oil. The total stock of non-renewable natural resources will gradually decrease with the continuous exploitation of human beings. Big data, however, has rapid growth, that is, with continuous exploitation, big data resources will not only not decrease but will increase rapidly. The sparseness of value and the large amount of data in big data have brought many opportunities and brought many challenges. One of its main challenges is the low density of big data values. Although the number of big data resources is large, the useful value contained in it is sparse, which increases the difficulty of developing and utilizing big data resources.2 Big data processing flowData AcquisitionBig data, which originally meant a large quantity and variety of types, was extremely important for obtaining data information through various methods. Data collection is the most basic step in the process of big data processing. At present, commonly used data collection methods include RFID, data search and classification tools such as Google and other search engines, and bar code technology. And due to the emergence of mobile devices, such as the rapid spread of smart phones and tabletcomputers, a large amount of mobile software has been developed and applied, and social networks have become increasingly large. This has also accelerated the speed of information circulation and acquisition accuracy.Data Processing and IntegrationThe processing and integration of data is mainly to complete the proper processing of the collected data, cleaning and denoising, and further integrated storage. According to the foregoing, one of the characteristics of big data is diversity. This determines that the type and structure of data obtained through various channels are very complex, and brings great difficulties to subsequent data analysis and processing. Through the steps of data processing and integration, these complex structural data are first converted into a single or easy-to-handle structure, which lays a good foundation for future data analysis because not all information in these data is required. Therefore, these data must also be “de-noised” and cleaned to ensure da ta quality and reliability. The commonly used method is to design some data filters during the data processing process, and use the rule method of clustering or association analysis to pick out unwanted or erroneous outlier data and filter it out to prevent it from adversely affecting the final data result; These integrated data are integrated and stored. This is a very important step. If it is simply placed at random, it will affect the access to future data. It is easy to causedata access problems. Now the general solution is to The establishment of a special database for specific types of data, and the placement of these different types of data information, can effectively reduce the time for data query and access, and increase the speed of data extraction.Data AnalysisData analysis is the most central part of the overall big data processing process, because in the process of data analysis, the value of the data will be found. After the processing and integration of the previous step data, the resulting data becomes the original data for data analysis, and the data is further processed and analyzed according to the application requirements of the required data. The traditional methods of data processing analysis include data mining, machine learning, intelligent algorithms, and statistical analysis. These methods can no longer meet the needs of data analysis in the era of big data. (Google is the most advanced data analysis technology, Google as the Internet The most widely used company for big data, pioneered the concept of "cloud computing" in 2006. The application of various internal data is based on Google's own internal research and development of a series of cloud computing technologies.Data InterpretationFor the majority of users of data and information, the most concerned is not the analysis and processing of data, but the interpretationand presentation of the results of big data analysis. Therefore, in a complete data analysis process, the interpretation of data results is crucial. important. If the results of data analysis cannot be properly displayed, data users will be troubled and even mislead users. The traditional data display method is to download the output in text form or display the processing result on the user's personal computer. However, as the amount of data increases, the results of data analysis tend to be more complicated. The use of traditional data display methods is insufficient to meet the output requirements of data analysis results. Therefore, in order to increase the number of dataAccording to explanations and demonstration capabilities, most companies now introduce data visualization technology as the most powerful way to explain big data. By visualizing the results, you can visualize the data analysis results to the user, which is more convenient for users to understand and accept the results. Common visualization technologies include collection-based visualization technology, icon-based technology, image-based technology, pixel-oriented technology, and distributed technology.3 Big Data ChallengesBig Data Security and Privacy IssuesWith the development of big data, the sources and applications of data are becoming more and more extensive. When browsing the webfreely on the Internet, a series of browsing trails are left. When logging in to a related website on the Internet, you need to input personal important information, such as an ID card. Number, mobile number, address, etc. Cameras and sensors are everywhere to record personal behavior and location information. Through relevant data analysis, data experts can easily discover people's behavior habits and personal important information. If this information is used properly, it can help companies in related fields to understand the needs and habits of customers at any time, so that enterprises can adjust their production plans and achieve greater economic benefits. However, if these important information are stolen by bad people, security issues such as personal information and property will follow. In order to solve the problem of data privacy in the era of big data, academics and industry have come up with their own solutions. In addition, the speed of updating and changing data in the era of big data is accelerating, and general data privacy protection technologies are mostly based on static data protection, which brings new challenges to privacy protection. How to implement data privacy and security protection under complex and changing conditions will be one of the key directions for future big data research.Big Data Integration and ManagementLooking at the development process of big data, the sources and applications of big data are becoming more and more extensive. In orderto collect and collect data distributed in different data management systems, it is necessary to integrate and manage data. Although there are many methods for data integration and management, the traditional data storage methods can no longer meet the data processing requirements in the era of big data, which is facing new challenges. data storage. In the era of big data, one of the characteristics of big data is the diversity of data types. Data types are gradually transformed from traditional structured data into semi-structured and unstructured data. In addition, the sources of data are also gradually diversified. Most of the traditional data comes from a small number of military companies or research institutes' computer terminals; now, with the popularity of the Internet and mobile devices in the world, the storage of data is particularly important (by As can be seen in the previous article, traditional data storage methods are insufficient to meet the current data storage requirements. To deal with more and more massive data and increasingly complex data structures, many companies have started to develop distributed files suitable for the era of big data. System and distributed parallel database. In the data storage process, the data format of the transfer change is necessary, but also very critical and complex, which puts higher requirements on data storage systems.Big Data Ecological EnvironmentThe eco-environmental problem of big data involves firstly the issueof data resource management and sharing. This is an era of normalization and openness. The open structure of the Internet allows people to share all network resources in different corners of the earth at the same time. This has brought great convenience to scientific research. However, not all data can be shared unconditionally. Some data are protected by law because of their special value attributes and cannot be used unconditionally. Because the relevant legal measures are still not sound enough and lack sufficient data protection awareness, there is always the problem of data theft or ownership of data. This has both technical and legal issues. How to solve the problem of data sharing under the premise of protecting multiple interests will be an important challenge in the era of big data (In the era of big data, the production and application of data is not limited to a few special occasions, almost all areas, etc. Everyone can see the big data, so the data cross-cutting issues involved in these areas are inevitable. With the deepening of the influence of big data, big data analysis results will inevitably be on the national governance model, corporate decision-making, organization and Business processes, personal lifestyles, etc. will have a huge impact, and this mode of influence is worth further study in the future.中文译文大数据和云计算技术研究Bryant Roy摘要移动互联网、物联网和云计算技术的迅速发展,开启了移动云时代的序幕,大数据也越来越吸引人们的视线。
大数据外文翻译文献
![大数据外文翻译文献](https://img.taocdn.com/s3/m/ec4509f948649b6648d7c1c708a1284ac85005b8.png)
大数据外文翻译文献大数据外文翻译文献(文档含中英文对照即英文原文和中文翻译)原文:What is Data Mining?Many people treat data mining as a synonym for another popularly used term, “Knowledge Discovery in Databases”, or KDD. Alternatively, others view data mining as simply an essential step in the process of knowledge discovery in databases. Knowledge discovery consists of an iterative sequence of the following steps:· data cleaning: to remove noise or irrelevant data,· dat a integration: where multiple data sources may be combined,·data selection : where data relevant to the analysis task are retrieved from the database,·data transformation : where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance,·data mining: an essential process where intelligent methods are applied in order to extract data patterns,·pattern evaluation: to identify the truly interesting patterns representing knowledge based on some interestingness measures, and ·knowledge presentation: where visualization and knowledge representation techniques are used to present the mined knowledge to the user .The data mining step may interact with the user or a knowledge base. The interesting patterns are presented to the user, and may be stored as new knowledge in the knowledgebase. Note that according to this view, data mining is only one step in the entire process, albeit an essential one since it uncovers hidden patterns for evaluation.We agree that data mining is a knowledge discovery process. However, in industry, in media, and in the database research milieu, the term “data mining” is becoming more popular than the longer term of “knowledge discovery in databases”. Therefore, in this book, we choose to use the term “data mining”. We adop t a broad view of data mining functionality: data mining is the process of discovering interestingknowledge from large amounts of data stored either in databases, data warehouses, or other information repositories.Based on this view, the architecture of a typical data mining system may have the following major components:1. Database, data warehouse, or other information repository. This is one or a set of databases, data warehouses, spread sheets, or other kinds of information repositories. Data cleaning and data integration techniques may be performed on the data.2. Database or data warehouse server. The database or data warehouse server is responsible for fetching the relevant data, based o n the user’s data mining request.3. Knowledge base. This is the domain knowledge that is used to guide the search, or evaluate the interestingness of resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes or attribute values into different levels of abstraction. Knowledge such as user beliefs, which can be used to assess a pattern’s interestingness based on its unexpectedness, may also be included. Other examples of domain knowledge are additional interestingness constraints or thresholds, and metadata (e.g., describing data from multipleheterogeneous sources).4. Data mining engine. This is essential to the data mining system and ideally consists of a set of functional modules for tasks such ascharacterization, association analysis, classification, evolution and deviation analysis.5. Pattern evaluation module. This component typically employs interestingness measures and interacts with the data mining modules so as to focus the search towards interesting patterns. It may access interestingness thresholds stored in the knowledge base. Alternatively, the pattern evaluation module may be integrated with the mining module, depending on the implementation of the data mining method used. For efficient data mining, it is highly recommended to push the evaluation of pattern interestingness as deep as possible into the mining process so as to confine the search to only the interesting patterns.6. Graphical user interface. This module communicates between users and the data mining system, allowing the user to interact with the system by specifying a data mining query or task, providing information to help focus the search, and performing exploratory data mining based on the intermediate data mining results. In addition, this component allows the user to browse database and data warehouse schemas or data structures, evaluate mined patterns, and visualize the patterns in different forms.From a data warehouse perspective, data mining can be viewed as an advanced stage of on-1ine analytical processing (OLAP). However, data mining goes far beyond the narrow scope of summarization-styleanalytical processing of data warehouse systems by incorporating more advanced techniques for data understanding.While there may be many “data mining systems” on the market, not all of them can perform true data mining. A data analysis system that does not handle large amounts of data can at most be categorized as a machine learning system, a statistical data analysis tool, or an experimental system prototype. A system that can only perform data or information retrieval, including finding aggregate values, or that performs deductive query answering in large databases should be more appropriately categorized as either a database system, an information retrieval system, or a deductive database system.Data mining involves an integration of techniques from mult1ple disciplines such as database technology, statistics, machine learning, high performance computing, pattern recognition, neural networks, data visualization, information retrieval, image and signal processing, and spatial data analysis. We adopt a database perspective in our presentation of data mining in this book. That is, emphasis is placed on efficient and scalable data mining techniques for large databases. By performing data mining, interesting knowledge, regularities, or high-level information can be extracted from databases and viewed or browsed from different angles. The discovered knowledge can be applied to decision making, process control, information management, query processing, and so on. Therefore,data mining is considered as one of the most important frontiers in database systems and one of the most promising, new database applications in the information industry.A classification of data mining systemsData mining is an interdisciplinary field, the confluence of a set of disciplines, including database systems, statistics, machine learning, visualization, and information science. Moreover, depending on the data mining approach used, techniques from other disciplines may be applied, such as neural networks, fuzzy and or rough set theory, knowledge representation, inductive logic programming, or high performance computing. Depending on the kinds of data to be mined or on the given data mining application, the data mining system may also integrate techniques from spatial data analysis, Information retrieval, pattern recognition, image analysis, signal processing, computer graphics, Web technology, economics, or psychology.Because of the diversity of disciplines contributing to data mining, data mining research is expected to generate a large variety of data mining systems. Therefore, it is necessary to provide a clear classification of data mining systems. Such a classification may help potential users distinguish data mining systems and identify those that best match their needs. Data mining systems can be categorized according to various criteria, as follows.1) Classification according to the kinds of databases mined.A data mining system can be classified according to the kinds of databases mined. Database systems themselves can be classified according to different criteria (such as data models, or the types of data or applications involved), each of which may require its own data mining technique. Data mining systems can therefore be classified accordingly.For instance, if classifying according to data models, we may have a relational, transactional, object-oriented, object-relational, or data warehouse mining system. If classifying according to thespecial types of data handled, we may have a spatial, time -series, text, or multimedia data mining system , or a World-Wide Web mining system . Other system types include heterogeneous data mining systems, and legacy data mining systems.2) Classification according to the kinds of knowledge mined.Data mining systems can be categorized according to the kinds of knowledge they mine, i.e., based on data mining functionalities, such as characterization, discrimination, association, classification, clustering, trend and evolution analysis, deviation analysis , similarity analysis, etc.A comprehensive data mining system usually provides multiple and/or integrated data mining functionalities.Moreover, data mining systems can also be distinguished based on the granularity or levels of abstraction of the knowledge mined, includinggeneralized knowledge(at a high level of abstraction), primitive-level knowledge(at a raw data level), or knowledge at multiple levels (considering several levels of abstraction). An advanced data mining system should facilitate the discovery of knowledge at multiple levels of abstraction.3) Classification according to the kinds of techniques utilized.Data mining systems can also be categorized according to the underlying data mining techniques employed. These techniques can be described according to the degree of user interaction involved (e.g., autonomous systems, interactive exploratory systems, query-driven systems), or the methods of data analysis employed(e.g., database-oriented or data warehouse-oriented techniques, machine learning, statistics, visualization, pattern recognition, neural networks, and so on ) .A sophisticated data mining system will often adopt multiple datamining techniques or work out an effective, integrated technique which combines the merits of a few individual approaches.什么是数据挖掘?许多人把数据挖掘视为另一个常用的术语—数据库中的知识发现或KDD的同义词。
大数据外文翻译参考文献综述
![大数据外文翻译参考文献综述](https://img.taocdn.com/s3/m/4164a53ebd64783e09122bd8.png)
大数据外文翻译参考文献综述(文档含中英文对照即英文原文和中文翻译)原文:Data Mining and Data PublishingData mining is the extraction of vast interesting patterns or knowledge from huge amount of data. The initial idea of privacy-preserving data mining PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. Privacy-preserving data mining considers the problem of running data mining algorithms on confidential data that is not supposed to be revealed even to the partyrunning the algorithm. In contrast, privacy-preserving data publishing (PPDP) may not necessarily be tied to a specific data mining task, and the data mining task may be unknown at the time of data publishing. PPDP studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data mining tasks. Privacy-preserving for both data mining (PPDM) and data publishing (PPDP) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes. One well studied approach is the k-anonymity model [1] which in turn led to other models such as confidence bounding, l-diversity, t-closeness, (α,k)-anonymity, etc. In particular, all known mechanisms try to minimize information loss and such an attempt provides a loophole for attacks. The aim of this paper is to present a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explain their effects on Data Privacy.Although data mining is potentially useful, many data holders are reluctant to provide their data for data mining for the fear of violating individual privacy. In recent years, study has been made to ensure that the sensitive information of individuals cannot be identified easily.Anonymity Models, k-anonymization techniques have been the focus of intense research in the last few years. In order to ensure anonymization of data while at the same time minimizing the informationloss resulting from data modifications, everal extending models are proposed, which are discussed as follows.1.k-Anonymityk-anonymity is one of the most classic models, which technique that prevents joining attacks by generalizing and/or suppressing portions of the released microdata so that no individual can be uniquely distinguished from a group of size k. In the k-anonymous tables, a data set is k-anonymous (k ≥ 1) if each record in the data set is in- distinguishable from at least (k . 1) other records within the same data set. The larger the value of k, the better the privacy is protected. k-anonymity can ensure that individuals cannot be uniquely identified by linking attacks.2. Extending ModelsSince k-anonymity does not provide sufficient protection against attribute disclosure. The notion of l-diversity attempts to solve this problem by requiring that each equivalence class has at least l well-represented value for each sensitive attribute. The technology of l-diversity has some advantages than k-anonymity. Because k-anonymity dataset permits strong attacks due to lack of diversity in the sensitive attributes. In this model, an equivalence class is said to have l-diversity if there are at least l well-represented value for the sensitive attribute. Because there are semantic relationships among the attribute values, and different values have very different levels of sensitivity. Afteranonymization, in any equivalence class, the frequency (in fraction) of a sensitive value is no more than α.3. Related Research AreasSeveral polls show that the public has an in- creased sense of privacy loss. Since data mining is often a key component of information systems, homeland security systems, and monitoring and surveillance systems, it gives a wrong impression that data mining is a technique for privacy intrusion. This lack of trust has become an obstacle to the benefit of the technology. For example, the potentially beneficial data mining re- search project, Terrorism Information Awareness (TIA), was terminated by the US Congress due to its controversial procedures of collecting, sharing, and analyzing the trails left by individuals. Motivated by the privacy concerns on data mining tools, a research area called privacy-reserving data mining (PPDM) emerged in 2000. The initial idea of PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. The solutions were often tightly coupled with the data mining algorithms under consideration. In contrast, privacy-preserving data publishing (PPDP) may not necessarily tie to a specific data mining task, and the data mining task is sometimes unknown at the time of data publishing. Furthermore, some PPDP solutions emphasize preserving the datatruthfulness at the record level, but PPDM solutions often do not preserve such property. PPDP Differs from PPDM in Several Major Ways as Follows :1) PPDP focuses on techniques for publishing data, not techniques for data mining. In fact, it is expected that standard data mining techniques are applied on the published data. In contrast, the data holder in PPDM needs to randomize the data in such a way that data mining results can be recovered from the randomized data. To do so, the data holder must understand the data mining tasks and algorithms involved. This level of involvement is not expected of the data holder in PPDP who usually is not an expert in data mining.2) Both randomization and encryption do not preserve the truthfulness of values at the record level; therefore, the released data are basically meaningless to the recipients. In such a case, the data holder in PPDM may consider releasing the data mining results rather than the scrambled data.3) PPDP primarily “anonymizes” the data by hiding the identity of record owners, whereas PPDM seeks to directly hide the sensitive data. Excellent surveys and books in randomization and cryptographic techniques for PPDM can be found in the existing literature. A family of research work called privacy-preserving distributed data mining (PPDDM) aims at performing some data mining task on a set of private databasesowned by different parties. It follows the principle of Secure Multiparty Computation (SMC), and prohibits any data sharing other than the final data mining result. Clifton et al. present a suite of SMC operations, like secure sum, secure set union, secure size of set intersection, and scalar product, that are useful for many data mining tasks. In contrast, PPDP does not perform the actual data mining task, but concerns with how to publish the data so that the anonymous data are useful for data mining. We can say that PPDP protects privacy at the data level while PPDDM protects privacy at the process level. They address different privacy models and data mining scenarios. In the field of statistical disclosure control (SDC), the research works focus on privacy-preserving publishing methods for statistical tables. SDC focuses on three types of disclosures, namely identity disclosure, attribute disclosure, and inferential disclosure. Identity disclosure occurs if an adversary can identify a respondent from the published data. Revealing that an individual is a respondent of a data collection may or may not violate confidentiality requirements. Attribute disclosure occurs when confidential information about a respondent is revealed and can be attributed to the respondent. Attribute disclosure is the primary concern of most statistical agencies in deciding whether to publish tabular data. Inferential disclosure occurs when individual information can be inferred with high confidence from statistical information of the published data.Some other works of SDC focus on the study of the non-interactive query model, in which the data recipients can submit one query to the system. This type of non-interactive query model may not fully address the information needs of data recipients because, in some cases, it is very difficult for a data recipient to accurately construct a query for a data mining task in one shot. Consequently, there are a series of studies on the interactive query model, in which the data recipients, including adversaries, can submit a sequence of queries based on previously received query results. The database server is responsible to keep track of all queries of each user and determine whether or not the currently received query has violated the privacy requirement with respect to all previous queries. One limitation of any interactive privacy-preserving query system is that it can only answer a sublinear number of queries in total; otherwise, an adversary (or a group of corrupted data recipients) will be able to reconstruct all but 1 . o(1) fraction of the original data, which is a very strong violation of privacy. When the maximum number of queries is reached, the query service must be closed to avoid privacy leak. In the case of the non-interactive query model, the adversary can issue only one query and, therefore, the non-interactive query model cannot achieve the same degree of privacy defined by Introduction the interactive model. One may consider that privacy-reserving data publishing is a special case of the non-interactivequery model.This paper presents a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explains their effects on Data Privacy. k-anonymity is used for security of respondents identity and decreases linking attack in the case of homogeneity attack a simple k-anonymity model fails and we need a concept which prevent from this attack solution is l-diversity. All tuples are arranged in well represented form and adversary will divert to l places or on l sensitive attributes. l-diversity limits in case of background knowledge attack because no one predicts knowledge level of an adversary. It is observe that using generalization and suppression we also apply these techniques on those attributes which doesn’t need th is extent of privacy and this leads to reduce the precision of publishing table. e-NSTAM (extended Sensitive Tuples Anonymity Method) is applied on sensitive tuples only and reduces information loss, this method also fails in the case of multiple sensitive tuples.Generalization with suppression is also the causes of data lose because suppression emphasize on not releasing values which are not suited for k factor. Future works in this front can include defining a new privacy measure along with l-diversity for multiple sensitive attribute and we will focus to generalize attributes without suppression using other techniques which are used to achieve k-anonymity because suppression leads to reduce the precision ofpublishing table.译文:数据挖掘和数据发布数据挖掘中提取出大量有趣的模式从大量的数据或知识。
大数据挖掘外文翻译文献
![大数据挖掘外文翻译文献](https://img.taocdn.com/s3/m/703e42c5e43a580216fc700abb68a98271feac39.png)
大数据挖掘外文翻译文献大数据挖掘是一种通过分析和解释大规模数据集来发现实用信息和模式的过程。
它涉及到从结构化和非结构化数据中提取知识和洞察力,以支持决策制定和业务发展。
随着互联网的迅猛发展和技术的进步,大数据挖掘已经成为许多领域的关键技术,包括商业、医疗、金融和社交媒体等。
在大数据挖掘中,外文翻译文献起着重要的作用。
外文翻译文献可以提供最新的研究成果和技术发展,匡助我们了解和应用最先进的大数据挖掘算法和方法。
本文将介绍一篇与大数据挖掘相关的外文翻译文献,以匡助读者深入了解这一领域的最新发展。
标题:"A Survey of Big Data Mining Techniques for Knowledge Discovery"这篇文献是由Xiaojuan Zhu等人于2022年发表在《Expert Systems with Applications》杂志上的一篇综述文章。
该文献对大数据挖掘技术在知识发现方面的应用进行了全面的调研和总结。
以下是该文献的主要内容和贡献:1. 引言本文首先介绍了大数据挖掘的背景和意义。
随着互联网和传感器技术的快速发展,我们每天都会产生大量的数据。
这些数据包含了珍贵的信息和洞察力,可以用于改进业务决策和发现新的商机。
然而,由于数据量庞大和复杂性高,传统的数据挖掘技术已经无法处理这些数据。
因此,大数据挖掘成为了一种重要的技术。
2. 大数据挖掘的挑战本文接着介绍了大数据挖掘面临的挑战。
由于数据量庞大,传统的数据挖掘算法无法有效处理大规模数据。
此外,大数据通常是非结构化的,包含各种类型的数据,如文本、图象和视频等。
因此,如何有效地从这些非结构化数据中提取实用的信息和模式也是一个挑战。
3. 大数据挖掘技术接下来,本文介绍了一些常用的大数据挖掘技术。
这些技术包括数据预处理、特征选择、分类和聚类等。
数据预处理是指对原始数据进行清洗和转换,以提高数据质量和可用性。
特征选择是指从大量的特征中选择最实用的特征,以减少数据维度和提高模型性能。
互联网大数据金融中英文对照外文翻译文献
![互联网大数据金融中英文对照外文翻译文献](https://img.taocdn.com/s3/m/f85c4e3ebd64783e08122b10.png)
互联网大数据金融中英文对照外文翻译文献(文档含英文原文和中文翻译)原文:Internet Finance's Impact on Traditional FinanceAbstractAs the advances in modern information and Internet technology, especially the develop of cloud computing, big data, mobile Internet, search engines and social networks, profoundly change, even subvert many traditional industries, and the financial industry is no exception. In recent years, financial industry has become the most far-reaching area influenced by Internet, after commercial distribution and the media. Many Internet-based financial service models have emerged, and have had a profound and huge impact on traditional financial industries. "Internet-Finance" has win the focus of public attention.Internet-Finance is low cost, high efficiency, and pays more attention to the user experience, and these features enable it to fully meet the special needs of traditional "long tail financial market", to flexibly provide more convenient and efficient financial services and diversified financial products, to greatly expand the scope and depth of financial services, to shorten the distance between people space and time, andto establish a new financial environment, which effectively integrate and take use of fragmented time, information, capital and other scattered resources, then add up to form a scale, and grow a new profit point for various financial institutions. Moreover, with the continuous penetration and integration in traditional financial field, Internet-Finance will bring new challenges, but also opportunities to the traditional. It contribute to the transformation of the traditional commercial banks, compensate for the lack of efficiency in funding process and information integration, and provide new distribution channels for securities, insurance, funds and other financial products. For many SMEs, Internet-Finance extend their financing channels, reduce their financing threshold, and improve their efficiency in using funds. However, the cross-industry nature of the Internet Finance determines its risk factors are more complex, sensitive and varied, and therefore we must properly handle the relationship between innovative development and market regulation, industry self-regulation.Key Words:Internet Finance; Commercial Banks; Effects; Regulatory1 IntroductionThe continuous development of Internet technology, cloud computing, big data, a growing number of Internet applications such as social networks for the business development of traditional industry provides a strong support, the level of penetration of the Internet on the traditional industry. The end of the 20th century, Microsoft chairman Bill Gates, who declared, "the traditional commercial bank will become the new century dinosaur". Nowadays, with the development of the Internet electronic information technology, we really felt this trend, mobile payment, electronic bank already occupies the important position in our daily life.Due to the concept of the Internet financial almost entirely from the business practices, therefore the present study focused on the discussion. Internet financial specific mode, and the influence of traditional financial industry analysis and counter measures are lack of systemic research. Internet has always been a key battleground in risk investment, and financial industry is the thinking mode of innovative experimental various business models emerge in endlessly, so it is difficult to use a fixed set of thinking to classification and definition. The mutual penetration andintegration of Internet and financial, is a reflection of technical development and market rules requirements, is an irreversible trend. The Internet bring traditional financial is not only a low cost and high efficiency, more is a kind of innovative thinking mode and unremitting pursuit of the user experience. The traditional financial industry to actively respond to. Internet financial, for such a vast blue ocean enough to change the world, it is very worthy of attention to straighten out its development, from the existing business model to its development prospects."Internet financial" belongs to the latest formats form, discusses the Internet financial research of literature, but the lack of systemic and more practical. So this article according to the characteristics of the Internet industry practical stronger, the several business models on the market for summary analysis, and the traditional financial industry how to actively respond to the Internet wave of financial analysis and Suggestions are given, with strong practical significance.2 Internet financial backgroundInternet financial platform based on Internet resources, on the basis of the big data and cloud computing new financial model. Internet finance with the help of the Internet technology, mobile communication technology to realize financing, payment and information intermediary business, is a traditional industry and modern information technology represented by the Internet, mobile payment, cloud computing, data mining, search engines and social networks, etc.) Produced by the combination of emerging field. Whether financial or the Internet, the Internet is just the difference on the strategic, there is no strict definition of distinction. As the financial and the mutual penetration and integration of the Internet, the Internet financial can refer all through the Internet technology to realize the financing behavior. Internet financial is the Internet and the traditional financial product of mutual infiltration and fusion, the new financial model has a profound background. The emergence of the Internet financial is a craving for cost reduction is the result of the financial subject, is also inseparable from the rapid development of modern information technology to provide technical support.2.1 Demands factorsTraditional financial markets there are serious information asymmetry, greatly improve the transaction risk. Exhibition gradually changed people's spending habits, more and more high to the requirement of service efficiency and experience; In addition, rising operating costs, to stimulate the financial main body's thirst for financial innovation and reform; This pulled by demand factors, become the Internet financial produce powerful inner driving force.2.2 Supply driving factorData mining, cloud computing and Internet search engines, such as the development of technology, financial and institutional technology platform. Innovation, enterprise profit-driven mixed management, etc., for the transformation of traditional industry and Internet companies offered financial sector penetration may, for the birth and development of the Internet financial external technical support, become a kind of externalization of constitution. In the Internet "openness, equality, cooperation, share" platform, third-party financing and payment, online investment finance, credit evaluation model, not only makes the traditional pattern of financial markets will be great changes have taken place, and modern information technology is more easily to serve various financial entities. For the traditional financial institutions, especially in the banking, securities and insurance institutions, more opportunities than the crisis, development is better than a challenge.3 Internet financial constitute the main body3.1 Capital providersBetween Internet financial comprehensive, its capital providers include not only the traditional financial institutions, including penetrating into the Internet. In terms of the current market structure, the traditional financial sector mainly include commercial Banks, securities, insurance, fund and small loan companies, mainly includes the part of the Internet companies and emerging subject, such as the amazon, and some channels on Internet for the company. These companies is not only the providers of capital market, but also too many traditional so-called "low net worth clients" suppliers of funds into the market. In operation form, the former mainly through the Internet, to the traditional business externalization, the latter mainlythrough Internet channels to penetrate business, both externalization and penetration, both through the Internet channel to achieve the financial business innovation and reform.3.2 Capital demandersInternet financial mode of capital demanders although there is no breakthrough in the traditional government, enterprise and individual, but on the benefit has greatly changed. In the rise and development of the Internet financial, especially Internet companies to enter the threshold of made in the traditional financial institutions, relatively weak groups and individual demanders, have a more convenient and efficient access to capital. As a result, the Internet brought about by the universality and inclusive financial better than the previous traditional financial pattern.3.3 IntermediariesInternet financial rely on efficient and convenient information technology, greatly reduces the financial markets is the wrong information. Docking directly through Internet, according to both parties, transaction cost is greatly reduced, so the Internet finance main body for the dependence of the intermediary institutions decreased significantly, but does not mean that the Internet financial markets, there is no intermediary institutions. In terms of the development of the Internet financial situation at present stage, the third-party payment platform plays an intermediary role in this field, not only ACTS as a financial settlement platform, but also to the capital supply and demand of the integration of upstream and downstream link multi-faceted, in meet the funds to pay at the same time, have the effect of capital allocation. Especially in the field of electronic commerce, this function is more obvious.3.4 Large financial dataBig financial data collection refers to the vast amounts of unstructured data, through the study of the depth of its mining and real-time analysis, grasp the customer's trading information, consumption habits and consumption information, and predict customer behavior and make the relevant financial institutions in the product design, precise marketing and greatly improve the efficiency of risk management, etc. Financial services platform based on the large data mainly refers to with vast tradingdata of the electronic commerce enterprise's financial services. The key to the big data from a large number of chaotic ability to rapidly gaining valuable information in the data, or from big data assets liquidation ability quickly. Big data information processing, therefore, often together with cloud computing.4 Global economic issuesFOR much of the past year the fast-growing economies of the emerging world watched the Western financial hurricane from afar. Their own banks held few of the mortgage-based assets that undid the rich world’s financial firms. Commodity exporters were thriving, thanks to high prices fo r raw materials. China’s economic juggernaut powered on. And, from Budapest to Brasília, an abundance of credit fuelled domestic demand. Even as talk mounted of the rich world suffering its worst financial collapse since the Depression, emerging economies seemed a long way from the centre of the storm.No longer. As foreign capital has fled and confidence evaporated, the emerging world’s stockmarkets have plunged (in some cases losing half their value) and currencies tumbled. The seizure in the credit market caused havoc, as foreign banks abruptly stopped lending and stepped back from even the most basic banking services, including trade credits.Like their rich-world counterparts, governments are battling to limit the damage (see article). That is easiest for those with large foreign-exchange reserves. Russia is spending $220 billion to shore up its financial services industry. South Korea has guaranteed $100 billion of its banks’ debt. Less well-endowed countries are asking for help.Hungary has secured a EURO5 billion ($6.6 billion) lifeline from the European Central Bank and is negotiating a loan from the IMF, as is Ukraine. Close to a dozen countries are talking to the fund about financial help.Those with long-standing problems are being driven to desperate measures. Argentina is nationalising its private pension funds, seeminglyto stave off default (see article). But even stalwarts are looking weaker. Figures released this week showed that China’s growth slowed to 9% in the year to the third quarter-still a rapid pace but a lot slower than the double-digit rates of recent years.The various emerging economies are in different states of readiness, but the cumulative impact of all this will be enormous. Most obviously, how these countries fare will determine whether the world economy faces a mild recession or something nastier. Emerging economies accounted for around three-quarters of global growth over the past 18 months. But their economic fate will also have political consequences.In many places-eastern Europe is one example (see article)-financial turmoil is hitting weak governments. But even strong regimes could suffer. Some experts think that China needs growth of 7% a year to contain social unrest. More generally, the coming strife will shape the debate about the integration of the world economy. Unlike many previous emerging-market crises, today’s mess spread from the rich world, largely thanks to increasingly integrated capital markets. If emerging economies collapse-either into a currency crisis or a sharp recession-there will be yet more questioning of the wisdom of globalised finance.Fortunately, the picture is not universally dire. All emerging economies will slow. Some will surely face deep recessions. But many are facing the present danger in stronger shape than ever before, armed with large reserves, flexible currencies and strong budgets. Good policy-both at home and in the rich world-can yet avoid a catastrophe.One reason for hope is that the direct economic fallout from the rich world’s d isaster is manageable. Falling demand in America and Europe hurts exports, particularly in Asia and Mexico. Commodity prices have fallen: oil is down nearly 60% from its peak and many crops and metals have done worse. That has a mixed effect. Although it hurtscommodity-exporters from Russia to South America, it helps commodity importers in Asia and reduces inflation fears everywhere. Countries like Venezuela that have been run badly are vulnerable (see article), but given the scale of the past boom, the commodity bust so far seems unlikely to cause widespread crises.The more dangerous shock is financial. Wealth is being squeezed as asset prices decline. China’s house prices, for instance, have started falling (see article). This will dampen domestic confidence, even though consumers are much less indebted than they are in the rich world. Elsewhere, the sudden dearth of foreign-bank lending and the flight of hedge funds and other investors from bond markets has slammed the brakes on credit growth. And just as booming credit once underpinned strong domestic spending, so tighter credit will mean slower growth.Again, the impact will differ by country. Thanks to huge current-account surpluses in China and the oil-exporters in the Gulf, emerging economies as a group still send capital to the rich world. But over 80 have deficits of more than 5% of GDP. Most of these are poor countries that live off foreign aid; but some larger ones rely on private capital. For the likes of Turkey and South Africa a sudden slowing in foreign financing would force a dramatic adjustment. A particular worry is eastern Europe, where many countries have double-digit deficits. In addition, even some countries with surpluses, such as Russia, have banks that have grown accustomed to easy foreign lending because of the integration of global finance. The rich world’s bank bail-outs may limit the squeeze, but the flow of capital to the emerging world will slow. The Institute of International Finance, a bankers’ group, expects a 30% decline in net flows of private capital from last year.This credit crunch will be grim, but most emerging markets can avoid catastrophe. The biggest ones are in relatively good shape. The morevulnerable ones can (and should) be helped.Among the giants, China is in a league of its own, with a $2 trillion arsenal of reserves, a current-account surplus, little connection to foreign banks and a budget surplus that offers lots of room to boost spending. Since the country’s leaders have made clear that they will do whatev er it takes to cushion growth, China’s economy is likely to slow-perhaps to 8%-but not collapse. Although that is not enough to save the world economy, such growth in China would put a floor under commodity prices and help other countries in the emerging world.The other large economies will be harder hit, but should be able to weather the storm. India has a big budget deficit and many Brazilian firms have a large foreign-currency exposure. But Brazil’s economy is diversified and both countries have plenty of reserves to smooth the shift to slower growth. With $550 billion of reserves, Russia ought to be able to stop a run on the rouble. In the short-term at least, the most vulnerable countries are all smaller ones.There will be pain as tighter credit forces adjustments. But sensible, speedy international assistance would make a big difference. Several emerging countries have asked America’s Federal Reserve for liquidity support; some hope that China will bail them out. A better route is surely the IMF, which has huge expertise and some $250 billion to lend. Sadly, borrowing from the fund carries a stigma. That needs to change. The IMF should develop quicker, more flexible financial instruments and minimise the conditions it attaches to loans. Over the past month deft policymaking saw off calamity in the rich world. Now it is time for something similar in the emerging world.5 ConclusionsInternet financial model can produce not only huge social benefit, lower transaction costs, provide higher than the existing direct and indirect financingefficiency of the allocation of resources, to provide power for economic development, will also be able to use the Internet and its related software technology played down the traditional finance specialized division of labor, makes the financial participants more mass popularization, risk pricing term matching complex transactions, tend to be simple. Because of the Internet financial involved in the field are mainly concentrated in the field of traditional financial institutions to the current development is not thorough, namely traditional financial "long tail" market, can complement with the original traditional financial business situation, so in the short term the Internet finance from the Angle of the size of the market will not make a big impact to the traditional financial institutions, but the Internet financial business model, innovative ideas, and its apparent high efficiency for the traditional financial institutions brought greater impact on the concept, also led to the traditional financial institutions to further accelerate the mutual penetration and integration with the Internet.译文:互联网金融对传统金融的影响作者:罗萨米;拉夫雷特摘要网络的发展,深刻地改变甚至颠覆了许多传统行业,金融业也不例外。
关于大数据的学术英文文献
![关于大数据的学术英文文献](https://img.taocdn.com/s3/m/02e17d398f9951e79b89680203d8ce2f006665b0.png)
关于大数据的学术英文文献Big Data: Challenges and Opportunities in the Digital Age.Introduction.In the contemporary digital era, the advent of big data has revolutionized various aspects of human society. Big data refers to vast and complex datasets generated at an unprecedented rate from diverse sources, including social media platforms, sensor networks, and scientific research. While big data holds immense potential for transformative insights, it also poses significant challenges and opportunities that require thoughtful consideration. This article aims to elucidate the key challenges and opportunities associated with big data, providing a comprehensive overview of its impact and future implications.Challenges of Big Data.1. Data Volume and Variety: Big data datasets are characterized by their enormous size and heterogeneity. Dealing with such immense volumes and diverse types of data requires specialized infrastructure, computational capabilities, and data management techniques.2. Data Velocity: The continuous influx of data from various sources necessitates real-time analysis and decision-making. The rapid pace at which data is generated poses challenges for data processing, storage, andefficient access.3. Data Veracity: The credibility and accuracy of big data can be a concern due to the potential for noise, biases, and inconsistencies in data sources. Ensuring data quality and reliability is crucial for meaningful analysis and decision-making.4. Data Privacy and Security: The vast amounts of data collected and processed raise concerns about privacy and security. Sensitive data must be protected fromunauthorized access, misuse, or breaches. Balancing data utility with privacy considerations is a key challenge.5. Skills Gap: The analysis and interpretation of big data require specialized skills and expertise in data science, statistics, and machine learning. There is a growing need for skilled professionals who can effectively harness big data for valuable insights.Opportunities of Big Data.1. Improved Decision-Making: Big data analytics enables organizations to make informed decisions based on comprehensive data-driven insights. Data analysis can reveal patterns, trends, and correlations that would be difficult to identify manually.2. Personalized Experiences: Big data allows companies to tailor products, services, and marketing strategies to individual customer needs. By understanding customer preferences and behaviors through data analysis, businesses can provide personalized experiences that enhancesatisfaction and loyalty.3. Scientific Discovery and Innovation: Big data enables advancements in various scientific fields,including medicine, genomics, and climate modeling. The vast datasets facilitate the identification of complex relationships, patterns, and anomalies that can lead to breakthroughs and new discoveries.4. Economic Growth and Productivity: Big data-driven insights can improve operational efficiency, optimize supply chains, and create new economic opportunities. By leveraging data to streamline processes, reduce costs, and identify growth areas, businesses can enhance their competitiveness and contribute to economic development.5. Societal Benefits: Big data has the potential to address societal challenges such as crime prevention, disease control, and disaster management. Data analysis can empower governments and organizations to make evidence-based decisions that benefit society.Conclusion.Big data presents both challenges and opportunities in the digital age. The challenges of data volume, velocity, veracity, privacy, and skills gap must be addressed to harness the full potential of big data. However, the opportunities for improved decision-making, personalized experiences, scientific discoveries, economic growth, and societal benefits are significant. By investing in infrastructure, developing expertise, and establishing robust data governance frameworks, organizations and individuals can effectively navigate the challenges and realize the transformative power of big data. As thedigital landscape continues to evolve, big data will undoubtedly play an increasingly important role in shaping the future of human society and technological advancement.。
数据库英文参考文献(最新推荐120个)
![数据库英文参考文献(最新推荐120个)](https://img.taocdn.com/s3/m/0a85a57f0029bd64793e2c31.png)
由于我国经济的高速发展,计算机科学技术在当前各个科技领域中迅速发展,成为了应用最广泛的技术之一.其中数据库又是计算机科学技术中发展最快,应用最广泛的重要分支之一.它已成为计算机信息系统和计算机应用系统的重要技术基础和支柱。
下面是数据库英文参考文献的分享,希望对你有所帮助。
数据库英文参考文献一:[1]Nú?ez Matías,Weht Ruben,Nú?ez Regueiro Manuel. Searching for electronically two dimensional metals in high-throughput ab initio databases[J]. Computational Materials Science,2020,182.[2]Izabela Karsznia,Marta Przychodzeń,Karolina Sielicka. Methodology of the automatic generalization of buildings, road networks, forests and surface waters: a case study based on the Topographic Objects Database in Poland[J]. Geocarto International,2020,35(7).[3]Alankrit Chaturvedi. Secure Cloud Migration Challenges and Solutions[J]. Journal of Research in Science and Engineering,2020,2(4).[4]Ivana Nin?evi? Pa?ali?,Maja ?uku?i?,Mario Jadri?. Smart city research advances in Southeast Europe[J]. International Journal of Information Management,2020.[5]Jongseong Kim,Unil Yun,Eunchul Yoon,Jerry Chun-Wei Lin,Philippe Fournier-Viger. One scan based high average-utility pattern mining in static and dynamic databases[J]. Future Generation Computer Systems,2020.[6]Jo?o Peixoto Martins,António Andrade-Campos,Sandrine Thuillier. Calibration of Johnson-Cook Model Using Heterogeneous Thermo-Mechanical Tests[J]. Procedia Manufacturing,2020,47.[7]Anna Soriani,Roberto Gemignani,Matteo Strano. A Metamodel for the Management of Large Databases: Toward Industry 4.0 in Metal Forming[J]. Procedia Manufacturing,2020,47.[8]Ayman Elbadawi,Karim Mahmoud,Islam Y. Elgendy,Mohammed Elzeneini,Michael Megaly,Gbolahan Ogunbayo,Mohamed A. Omer,Michelle Albert,Samir Kapadia,Hani Jneid. Racial disparities in the utilization and outcomes of transcatheter mitral valve repair: Insights from a national database[J]. Cardiovascular Revascularization Medicine,2020.[9]Maurizio Boccia,Antonio Sforza,Claudio Sterle. Simple Pattern Minimality Problems: Integer Linear Programming Formulations and Covering-Based Heuristic Solving Approaches[J]. INFORMS Journal on Computing,2020.[10]. Inc.; Patent Issued for Systems And User Interfaces For Dynamic Access Of Multiple Remote Databases And Synchronization Of Data Based On User Rules (USPTO 10,628,448)[J]. Computer Technology Journal,2020.[11]. Bank of America Corporation; Patent Issued for System For Electronic Data Verification, Storage, And Transfer (USPTO 10,628,058)[J]. Computer Technology Journal,2020.[12]. Information Technology - Database Management; Data from Technical University Munich (TU Munich) Advance Knowledge in Database Management (Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines)[J]. Computer Technology Journal,2020.[13]. Information Technology - Database Management; Studies from Pontifical Catholic University Update Current Data on Database Management (General dynamic Yannakakis: conjunctive queries with theta joins under updates)[J]. Computer Technology Journal,2020.[14]Kimothi Dhananjay,Biyani Pravesh,Hogan James M,Soni Akshay,Kelly Wayne. Learning supervised embeddings for large scale sequence comparisons.[J]. PloS one,2020,15(3).[15]. Information Technology; Studies from University of California San Diego (UCSD) Reveal New Findings on Information Technology (A Physics-constrained Data-driven Approach Based On Locally Convex Reconstruction for Noisy Database)[J]. Information Technology Newsweekly,2020.[16]. Information Technology; Researchers from National Institute of Information and Communications Technology Describe Findings in Information Technology (Efficient Discovery of Weighted Frequent Neighborhood Itemsets in Very Large Spatiotemporal Databases)[J]. Information Technology Newsweekly,2020.[17]. Information Technology; Investigators at Gdansk University of Technology Report Findings in Information Technology (A Framework for Accelerated Optimization of Antennas Using Design Database and Initial Parameter Set Estimation)[J]. Information Technology Newsweekly,2020.[18]. Information Technology; Study Results from Palacky University Update Understanding of Information Technology (Evaluation of Replication Mechanisms on Selected Database Systems)[J]. Information Technology Newsweekly,2020.[19]Runfola Daniel,Anderson Austin,Baier Heather,Crittenden Matt,Dowker Elizabeth,Fuhrig Sydney,Goodman Seth,Grimsley Grace,Layko Rachel,MelvilleGraham,Mulder Maddy,Oberman Rachel,Panganiban Joshua,Peck Andrew,Seitz Leigh,Shea Sylvia,Slevin Hannah,Youngerman Rebecca,Hobbs Lauren. geoBoundaries: A global database of political administrative boundaries.[J]. PloS one,2020,15(4).[20]Dupré Damien,Krumhuber Eva G,Küster Dennis,McKeown Gary J. A performance comparison of eight commercially available automatic classifiers for facial affect recognition.[J]. PloS one,2020,15(4).[21]Partha Pratim Banik,Rappy Saha,Ki-Doo Kim. An Automatic Nucleus Segmentation and CNN Model based Classification Method of White Blood Cell[J]. Expert Systems With Applications,2020,149.[22]Hang Dong,Wei Wang,Frans Coenen,Kaizhu Huang. Knowledge base enrichment by relation learning from social tagging data[J]. Information Sciences,2020,526.[23]Xiaodong Zhao,Dechang Pi,Junfu Chen. Novel trajectory privacy-preserving method based on clustering using differential privacy[J]. Expert Systems With Applications,2020,149.[24]. Information Technology; Researchers at Beijing University of Posts and Telecommunications Have Reported New Data on Information Technology (Mining top-k sequential patterns in transaction database graphs)[J]. Internet Weekly News,2020.[25]Sunil Kumar Sharma. An empirical model (EM: CCO) for clustering, convergence and center optimization in distributive databases[J]. Journal of Ambient Intelligence and Humanized Computing,2020(prepublish).[26]Naryzhny Stanislav,Klopov Nikolay,Ronzhina Natalia,Zorina Elena,Zgoda Victor,Kleyst Olga,Belyakova Natalia,Legina Olga. A database for inventory of proteoform profiles: "2DE-pattern".[J]. Electrophoresis,2020.[27]Noel Varela,Jesus Silva,Fredy Marin Gonzalez,Pablo Palencia,Hugo Hernandez Palma,Omar Bonerge Pineda. Method for the Recovery of Images in Databases of Rice Grains from Visual Content[J]. Procedia Computer Science,2020,170.[28]Ahmad Rabanimotlagh,Prabhu Janakaraj,Pu Wang. Optimal Crowd-Augmented Spectrum Mapping via an Iterative Bayesian Decision Framework[J]. Ad Hoc Networks,2020.[29]Ismail Boucherit,Mohamed Ould Zmirli,Hamza Hentabli,Bakhtiar Affendi Rosdi. Finger vein identification using deeply-fused Convolutional Neural Network[J]. Journal of King Saud University - Computer and Information Sciences,2020.[30]Sachin P. Patel,S.H. Upadhyay. Euclidean Distance based Feature Ranking andSubset Selection for Bearing Fault Diagnosis[J]. Expert Systems With Applications,2020.[31]Julia Fomina,Denis Safikanov,Alexey Artamonov,Evgeniy Tretyakov. Parametric and semantic analytical search indexes in hieroglyphic languages[J]. Procedia Computer Science,2020,169.[32]Selvine G. Mathias,Sebastian Schmied,Daniel Grossmann. An Investigation on Database Connections in OPC UA Applications[J]. Procedia Computer Science,2020,170.[33]Abdourrahmane Mahamane Atto,Alexandre Benoit,Patrick Lambert. Timed-image based deep learning for action recognition in video sequences[J]. Pattern Recognition,2020.[34]Yonis Gulzar,Ali A. Alwan,Abedallah Zaid Abualkishik,Abid Mehmood. A Model for Computing Skyline Data Items in Cloud Incomplete Databases[J]. Procedia Computer Science,2020,170.[35]Xiaohan Yang,Fan Li,Hantao Liu. Deep feature importance awareness based no-reference image quality prediction[J]. Neurocomputing,2020.[36]Dilana Hazer-Rau,Sascha Meudt,Andreas Daucher,Jennifer Spohrs,Holger Hoffmann,Friedhelm Schwenker,Harald C. Traue. The uulmMAC Database—A Multimodal Affective Corpus for Affective Computing in Human-Computer Interaction[J]. Sensors,2020,20(8).[37]Tomá? Pohanka,Vilém Pechanec. Evaluation of Replication Mechanisms on Selected Database Systems[J]. ISPRS International Journal of Geo-Information,2020,9(4).[38]Verheggen Kenneth,Raeder Helge,Berven Frode S,Martens Lennart,Barsnes Harald,Vaudel Marc. Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows.[J]. Mass spectrometry reviews,2020,39(3).[39]Moscona Leon,Casta?eda Pablo,Masrouha Karim. Citation analysis of the highest-cited articles on developmental dysplasia of the hip.[J]. Journal of pediatric orthopedics. Part B,2020,29(3).[40]Nasseh Daniel,Schneiderbauer Sophie,Lange Michael,Schweizer Diana,Heinemann Volker,Belka Claus,Cadenovic Ranko,Buysse Laurence,Erickson Nicole,Mueller Michael,Kortuem Karsten,Niyazi Maximilian,Marschner Sebastian,Fey Theres. Optimizing the Analytical Value of Oncology-Related Data Based on an In-Memory Analysis Layer: Development and Assessment of the Munich OnlineComprehensive Cancer Analysis Platform.[J]. Journal of medical Internet research,2020,22(4).数据库英文参考文献二:[41]Meiling Chai,Changgeng Li,Hui Huang. A New Indoor Positioning Algorithm of Cellular and Wi-Fi Networks[J]. Journal of Navigation,2020,73(3).[42]Mandy Watson. How to undertake a literature search: a step-by-step guide[J]. British Journal of Nursing,2020,29(7).[43]. Patent Application; "Memorial Facility With Memorabilia, Meeting Room, Secure Memorial Database, And Data Needed For An Interactive Computer Conversation With The Deceased" in Patent Application Approval Process (USPTO 20200089455)[J]. Computer Technology Journal,2020.[44]. Information Technology; Data on Information Technology Detailed by Researchers at Complutense University Madrid (Hr-sql: Extending Sql With Hypothetical Reasoning and Improved Recursion for Current Database Systems)[J]. Computer Technology Journal,2020.[45]. Science - Metabolomics; Study Data from Wake Forest University School of Medicine Update Knowledge of Metabolomics (Software tools, databases and resources in metabolomics: updates from 2018 to 2019)[J]. Computer Technology Journal,2020.[46]. Sigma Computing Inc.; Researchers Submit Patent Application, "GeneratingA Database Query To Dynamically Aggregate Rows Of A Data Set", for Approval (USPTO 20200089796)[J]. Computer Technology Journal,2020.[47]. Machine Learning; Findings on Machine Learning Reported by Investigators at Tongji University (Comparing Machine Learning Algorithms In Predicting Thermal Sensation Using Ashrae Comfort Database Ii)[J]. Computer Technology Journal,2020.[48]. Sigma Computing Inc.; "Generating A Database Query Using A Dimensional Hierarchy Within A Graphical User Interface" in Patent Application Approval Process (USPTO 20200089794)[J]. Computer Technology Journal,2020.[49]Qizhi He,Jiun-Shyan Chen. A physics-constrained data-driven approach based on locally convex reconstruction for noisy database[J]. Computer Methods in Applied Mechanics and Engineering,2020,363.[50]José A. Delgado-Osuna,Carlos García-Martínez,JoséGómez-Barbadillo,Sebastián Ventura. Heuristics for interesting class association rule mining a colorectal cancer database[J]. Information Processing andManagement,2020,57(3).[51]Edival Lima,Thales Vieira,Evandro de Barros Costa. Evaluating deep models for absenteeism prediction of public security agents[J]. Applied Soft Computing Journal,2020,91.[52]S. Fareri,G. Fantoni,F. Chiarello,E. Coli,A. Binda. Estimating Industry 4.0 impact on job profiles and skills using text mining[J]. Computers in Industry,2020,118.[53]Estrela Carlos,Pécora Jesus Djalma,Dami?o Sousa-Neto Manoel. The Contribution of the Brazilian Dental Journal to the Brazilian Scientific Research over 30 Years.[J]. Brazilian dental journal,2020,31(1).[54]van den Oever L B,Vonder M,van Assen M,van Ooijen P M A,de Bock G H,Xie X Q,Vliegenthart R. Application of artificial intelligence in cardiac CT: From basics to clinical practice.[J]. European journal of radiology,2020,128.[55]Li Liu,Deborah Silver,Karen Bemis. Visualizing events in time-varying scientific data[J]. Journal of Visualization,2020,23(2–3).[56]. Information Technology - Database Management; Data on Database Management Discussed by Researchers at Arizona State University (Architecture of a Distributed Storage That Combines File System, Memory and Computation In a Single Layer)[J]. Information Technology Newsweekly,2020.[57]. Information Technology - Database Management; New Findings from Guangzhou Medical University Update Understanding of Database Management (GREG-studying transcriptional regulation using integrative graph databases)[J]. Information Technology Newsweekly,2020.[58]. Technology - Laser Research; Reports from Nicolaus Copernicus University in Torun Add New Data to Findings in Laser Research (Nonlinear optical study of Schiff bases using Z-scan technique)[J]. Journal of Technology,2020.[59]Loeffler Caitlin,Karlsberg Aaron,Martin Lana S,Eskin Eleazar,Koslicki David,Mangul Serghei. Improving the usability and comprehensiveness of microbial databases.[J]. BMC biology,2020,18(1).[60]Caitlin Loeffler,Aaron Karlsberg,Lana S. Martin,Eleazar Eskin,David Koslicki,Serghei Mangul. Improving the usability and comprehensiveness of microbial databases[J]. BMC Biology,2020,18(1).[61]Dean H. Barrett,Aderemi Haruna. Artificial intelligence and machine learningfor targeted energy storage solutions[J]. Current Opinion in Electrochemistry,2020,21.[62]Chenghao Sun. Research on investment decision-making model from the perspective of “Internet of Things + Big data”[J]. Future Generation Computer Systems,2020,107.[63]Sa?a Adamovi?,Vladislav Mi?kovic,Nemanja Ma?ek,Milan Milosavljevi?,Marko ?arac,Muzafer Sara?evi?,Milan Gnjatovi?. An efficient novel approach for iris recognition based on stylometric features and machine learning techniques[J]. Future Generation Computer Systems,2020,107.[64]Olivier Pivert,Etienne Scholly,Grégory Smits,Virginie Thion. Fuzzy quality-Aware queries to graph databases[J]. Information Sciences,2020,521.[65]Javier Fernando Botía Valderrama,Diego José Luis Botía Valderrama. Two cluster validity indices for the LAMDA clustering method[J]. Applied Soft Computing Journal,2020,89.[66]Amer N. Kadri,Marie Bernardo,Steven W. Werns,Amr E. Abbas. TAVR VS. SAVR IN PATIENTS WITH CANCER AND AORTIC STENOSIS: A NATIONWIDE READMISSION DATABASE REGISTRY STUDY[J]. Journal of the American College of Cardiology,2020,75(11).[67]. Information Technology; Findings from P. Sjolund and Co-Authors Update Knowledge of Information Technology (Whole-genome sequencing of human remains to enable genealogy DNA database searches - A case report)[J]. Information Technology Newsweekly,2020.[68]. Information Technology; New Findings from P. Yan and Co-Researchers in the Area of Information Technology Described (BrainEXP: a database featuring with spatiotemporal expression variations and co-expression organizations in human brains)[J]. Information Technology Newsweekly,2020.[69]. IDERA; IDERA Database Tools Expand Support for Cloud-Hosted Databases[J]. Information Technology Newsweekly,2020.[70]Adrienne Warner,David A. Hurley,Jonathan Wheeler,Todd Quinn. Proactive chat in research databases: Inviting new and different questions[J]. The Journal of Academic Librarianship,2020,46(2).[71]Chidentree Treesatayapun. Discrete-time adaptive controller based on IF-THEN rules database for novel architecture of ABB IRB-1400[J]. Journal of the Franklin Institute,2020.[72]Tian Fang,Tan Han,Cheng Zhang,Ya Juan Yao. Research and Construction of the Online Pesticide Information Center and Discovery Platform Based on Web Crawler[J]. Procedia Computer Science,2020,166.[73]Dinusha Vatsalan,Peter Christen,Erhard Rahm. Incremental clustering techniques for multi-party Privacy-Preserving Record Linkage[J]. Data & Knowledge Engineering,2020.[74]Ying Xin Liu,Xi Yuan Li. Design and Implementation of a Business Platform System Based on Java[J]. Procedia Computer Science,2020,166.[75]Akhilesh Kumar Bajpai,Sravanthi Davuluri,Kriti Tiwary,Sithalechumi Narayanan,Sailaja Oguru,Kavyashree Basavaraju,Deena Dayalan,Kavitha Thirumurugan,Kshitish K. Acharya. Systematic comparison of the protein-protein interaction databases from a user's perspective[J]. Journal of Biomedical Informatics,2020,103.[76]P. Raveendra,V. Siva Reddy,G.V. Subbaiah. Vision based weed recognition using LabVIEW environment for agricultural applications[J]. Materials Today: Proceedings,2020,23(Pt 3).[77]Christine Rosati,Emily Bakinowski. Preparing for the Implementation of an Agnis Enabled Data Reporting System and Comprehensive Research Level Data Repository for All Cellular Therapy Patients[J]. Biology of Blood and Marrow Transplantation,2020,26(3).[78]Zeiser Felipe André,da Costa Cristiano André,Zonta Tiago,Marques Nuno M C,Roehe Adriana Vial,Moreno Marcelo,da Rosa Righi Rodrigo. Segmentation of Masses on Mammograms Using Data Augmentation and Deep Learning.[J]. Journal of digital imaging,2020.[79]Dhaked Devendra K,Guasch Laura,Nicklaus Marc C. Tautomer Database: A Comprehensive Resource for Tautomerism Analyses.[J]. Journal of chemical information and modeling,2020,60(3).[80]Pian Cong,Zhang Guangle,Gao Libin,Fan Xiaodan,Li Fei. miR+Pathway: the integration and visualization of miRNA and KEGG pathways.[J]. Briefings in bioinformatics,2020,21(2).数据库英文参考文献三:[81]Marcello W. M. Ribeiro,Alexandre A. B. Lima,Daniel Oliveira. OLAP parallel query processing in clouds with C‐ParGRES[J]. Concurrency and Computation: Practice and Experience,2020,32(7).[82]Li Gao,Peng Lin,Peng Chen,Rui‐Zhi Gao,Hong Yang,Yun He,Jia‐Bo Chen,Yi ‐Ge Luo,Qiong‐Qian Xu,Song‐Wu Liang,Jin‐Han Gu,Zhi‐Guang Huang,Yi‐Wu Dang,Gang Chen. A novel risk signature that combines 10 long noncoding RNAs to predict neuroblastoma prognosis[J]. Journal of Cellular Physiology,2020,235(4).[83]Julia Krzykalla,Axel Benner,Annette Kopp‐Schneider. Exploratory identification of predictive biomarkers in randomized trials with normal endpoints[J]. Statistics in Medicine,2020,39(7).[84]Jianye Ching,Kok-Kwang Phoon. Measuring Similarity between Site-Specific Data and Records from Other Sites[J]. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering,2020,6(2).[85]Anne Kelly Knowles,Justus Hillebrand,Paul B. Jaskot,Anika Walke. Integrative, Interdisciplinary Database Design for the Spatial Humanities: the Case of the Holocaust Ghettos Project[J]. International Journal of Humanities and Arts Computing,2020,14(1-2).[86]Sheng-Feng Sung,Pei-Ju Lee,Cheng-Yang Hsieh,Wan-Lun Zheng. Medication Use and the Risk of Newly Diagnosed Diabetes in Patients with Epilepsy: A Data Mining Application on a Healthcare Database[J]. Journal of Organizational and End User Computing (JOEUC),2020,32(2).[87]Rashkovits Rami,Lavy Ilana. Students' Difficulties in Identifying the Use of Ternary Relationships in Data Modeling[J]. International Journal of Information and Communication Technology Education (IJICTE,2020,16(2).[88]Yusuf Akhtar,Dipti Prasad Mukherjee. Context-based ensemble classification for the detection of architectural distortion in a digitised mammogram[J]. IET Image Processing,2020,14(4).[89]Gurpreet Kaur,Sukhwinder Singh,Renu Vig. Medical fusion framework using discrete fractional wavelets and non-subsampled directional filter banks[J]. IET Image Processing,2020,14(4).[90]Qian Liu,Bo Jiang,Jia-lei Zhang,Peng Gao,Zhi-jian Xia. Semi-supervised uncorrelated dictionary learning for colour face recognition[J]. IET Computer Vision,2020,14(3).[91]Yipo Huang,Leida Li,Yu Zhou,Bo Hu. No-reference quality assessment for live broadcasting videos in temporal and spatial domains[J]. IET Image Processing,2020,14(4).[92]Panetta Karen,Wan Qianwen,Agaian Sos,Rajeev Srijith,Kamath Shreyas,Rajendran Rahul,Rao Shishir Paramathma,Kaszowska Aleksandra,Taylor Holly A,Samani Arash,Yuan Xin. A Comprehensive Database for Benchmarking Imaging Systems.[J]. IEEE transactions on pattern analysis and machine intelligence,2020,42(3).[93]Rahnev Dobromir,Desender Kobe,Lee Alan L F,Adler William T,Aguilar-Lleyda David,Akdo?an Ba?ak,Arbuzova Polina,Atlas Lauren Y,Balc? Fuat,Bang Ji Won,Bègue Indrit,Birney Damian P,Brady Timothy F,Calder-Travis Joshua,Chetverikov Andrey,Clark Torin K,Davranche Karen,Denison Rachel N,Dildine Troy C,Double Kit S,Duyan Yaln A,Faivre Nathan,Fallow Kaitlyn,Filevich Elisa,Gajdos Thibault,Gallagher Regan M,de Gardelle Vincent,Gherman Sabina,Haddara Nadia,Hainguerlot Marine,Hsu Tzu-Yu,Hu Xiao,Iturrate I?aki,Jaquiery Matt,Kantner Justin,Koculak Marcin,Konishi Mahiko,Ko? Christina,Kvam Peter D,Kwok Sze Chai,Lebreton Ma?l,Lempert Karolina M,Ming Lo Chien,Luo Liang,Maniscalco Brian,Martin Antonio,Massoni Sébastien,Matthews Julian,Mazancieux Audrey,Merfeld Daniel M,O'Hora Denis,Palser Eleanor R,Paulewicz Borys?aw,Pereira Michael,Peters Caroline,Philiastides Marios G,Pfuhl Gerit,Prieto Fernanda,Rausch Manuel,Recht Samuel,Reyes Gabriel,Rouault Marion,Sackur Jér?me,Sadeghi Saeedeh,Samaha Jason,Seow Tricia X F,Shekhar Medha,Sherman Maxine T,Siedlecka Marta,Skóra Zuzanna,Song Chen,Soto David,Sun Sai,van Boxtel Jeroen J A,Wang Shuo,Weidemann Christoph T,Weindel Gabriel,WierzchońMicha?,Xu Xinming,Ye Qun,Yeon Jiwon,Zou Futing,Zylberberg Ariel. The Confidence Database.[J]. Nature human behaviour,2020,4(3).[94]Taipalus Toni. The Effects of Database Complexity on SQL Query Formulation[J]. Journal of Systems and Software,2020(prepublish).[95]. Information Technology; Investigators from Deakin University Target Information Technology (Conjunctive query pattern structures: A relational database model for Formal Concept Analysis)[J]. Computer Technology Journal,2020.[96]. Machine Learning; Findings from Rensselaer Polytechnic Institute Broaden Understanding of Machine Learning (Self Healing Databases for Predictive Risk Analytics In Safety-critical Systems)[J]. Computer Technology Journal,2020.[97]. Science - Library Science; Investigators from Cumhuriyet University Release New Data on Library Science (Scholarly databases under scrutiny)[J]. Computer Technology Journal,2020.[98]. Information Technology; Investigators from Faculty of Computer Science and Engineering Release New Data on Information Technology (FGSA for optimal quality of service based transaction in real-time database systems under different workload condition)[J]. Computer Technology Journal,2020.[99]Muhammad Aqib Javed,M.A. Naveed,Azam Hussain,S. Hussain. Integrated data acquisition, storage and retrieval for glass spherical tokamak (GLAST)[J]. Fusion Engineering and Design,2020,152.[100]Vinay M.S.,Jayant R. Haritsa. Operator implementation of Result Set Dependent KWS scoring functions[J]. Information Systems,2020,89.[101]. Capital One Services LLC; Patent Issued for Computer-Based Systems Configured For Managing Authentication Challenge Questions In A Database And Methods Of Use (USPTO 10,572,653)[J]. Journal of Robotics & Machine Learning,2020.[102]Ikawa Fusao,Michihata Nobuaki. In Reply to Letter to the Editor Regarding "Treatment Risk for Elderly Patients with Unruptured Cerebral Aneurysm from a Nationwide Database in Japan".[J]. World neurosurgery,2020,135.[103]Chen Wei,You Chao. Letter to the Editor Regarding "Treatment Risk for Elderly Patients with Unruptured Cerebral Aneurysm from a Nationwide Database in Japan".[J]. World neurosurgery,2020,135.[104]Zhitao Xiao,Lei Pei,Lei Geng,Ying Sun,Fang Zhang,Jun Wu. Surface Parameter Measurement of Braided Composite Preform Based on Faster R-CNN[J]. Fibers and Polymers,2020,21(3).[105]Xiaoyu Cui,Ruifan Cai,Xiangjun Tang,Zhigang Deng,Xiaogang Jin. Sketch‐based shape‐constrained fireworks simulation in head‐mounted virtual reality[J]. Computer Animation and Virtual Worlds,2020,31(2).[106]Klaus B?hm,Tibor Kubjatko,Daniel Paula,Hans-Georg Schweiger. New developments on EDR (Event Data Recorder) for automated vehicles[J]. Open Engineering,2020,10(1).[107]Ming Li,Ruizhi Chen,Xuan Liao,Bingxuan Guo,Weilong Zhang,Ge Guo. A Precise Indoor Visual Positioning Approach Using a Built Image Feature Database and Single User Image from Smartphone Cameras[J]. Remote Sensing,2020,12(5).[108]Matthew Grewe,Phillip Sexton,David Dellenbach. Use Risk‐Based Asset Prioritization to Develop Accurate Capital Budgets[J]. Opflow,2020,46(3).[109]Jose R. Salvador,D. Mu?oz de la Pe?a,D.R. Ramirez,T. Alamo. Predictive control of a water distribution system based on process historian data[J]. Optimal Control Applications and Methods,2020,41(2).[110]Esmaeil Nourani,Vahideh Reshadat. Association extraction from biomedicalliterature based on representation and transfer learning[J]. Journal of Theoretical Biology,2020,488.[111]Ikram Saima,Ahmad Jamshaid,Durdagi Serdar. Screening of FDA approved drugs for finding potential inhibitors against Granzyme B as a potent drug-repurposing target.[J]. Journal of molecular graphics & modelling,2020,95.[112]Keiron O’Shea,Biswapriya B. Misra. Software tools, databases and resources in metabolomics: updates from 2018 to 2019[J]. Metabolomics,2020,16(D1).[113]. Information Technology; Researchers from Virginia Polytechnic Institute and State University (Virginia Tech) Describe Findings in Information Technology (A database for global soil health assessment)[J]. Energy & Ecology,2020.[114]Moosa Johra Muhammad,Guan Shenheng,Moran Michael F,Ma Bin. Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification.[J]. Journal of proteome research,2020,19(3).[115]Huttunen Janne M J,K?rkk?inen Leo,Honkala Mikko,Lindholm Harri. Deep learning for prediction of cardiac indices from photoplethysmographic waveform: A virtual database approach.[J]. International journal for numerical methods in biomedical engineering,2020,36(3).[116]Kunxia Wang,Guoxin Su,Li Liu,Shu Wang. Wavelet packet analysis for speaker-independent emotion recognition[J]. Neurocomputing,2020.[117]Fusao Ikawa,Nobuaki Michihata. In Reply to Letter to the Editor Regarding “Treatment Risk for Elderly Patients with Unruptured Cerebral Aneurysm from a Nationwide Database in Japan”[J]. World Neurosurgery,2020,135.[118]Wei Chen,Chao You. Letter to the Editor Regarding “Treatment Risk for Elderly Patients with Unruptured Cerebral Aneurysm from a Nationwide Database in Japan”[J]. World Neurosurgery,2020,135.[119]Lindsey A. Parsons,Jonathan A. Jenks,Andrew J. Gregory. Accuracy Assessment of National Land Cover Database Shrubland Products on the Sagebrush Steppe Fringe[J]. Rangeland Ecology & Management,2020,73(2).[120]Jing Hua,Yilu Xu,Jianjun Tang,Jizhong Liu,Jihao Zhang. ECG heartbeat classification in compressive domain for wearable devices[J]. Journal of Systems Architecture,2020,104.以上就是关于数据库英文参考文献的全部内容,希望看完后对你有所启发。
云计算大数据外文翻译文献
![云计算大数据外文翻译文献](https://img.taocdn.com/s3/m/655f3c78b84ae45c3b358c65.png)
云计算大数据外文翻译文献(文档含英文原文和中文翻译)原文:Meet HadoopIn pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers.—Grace Hopper Data!We live in the data age. It’s not easy to measure the total volume of data stored electronically, but an IDC estimate put the size of the “digital universe” at 0.18 zettabytes in2006, and is forecasting a tenfold growth by 2011 to 1.8 zettabytes. A zettabyte is 1021 bytes, or equivalently one thousand exabytes, one million petabytes, or one billion terabytes. That’s roughly the same order of magnitude as one disk drive for every person in the world.This flood of data is coming from many sources. Consider the following:• The New York Stock Exchange generates about one terabyte of new trade data perday.• Facebook hosts approximately 10 billion photos, taking up one petabyte of storage.• , the genealogy site, stores around 2.5 petabytes of data.• The Internet Archive stores around 2 petabytes of data, and is growing at a rate of20 terabytes per month.• The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year.So there’s a lot of data out there. But you are probably wondering how it affects you.Most of the data is locked up in the largest web properties (like search engines), orscientific or financial institutions, isn’t it? Does the advent of “Big Data,” as it is being called, affect smaller organizations or individuals?I argue that it does. Take photos, for example. My wife’s grandfather was an avid photographer, and took photographs throughout his adult life. His entire corpus of medium format, slide, and 35mm film, when scanned in at high-resolution, occupies around 10 gigabytes. Compare this to the digital photos that my family took last year,which take up about 5 gigabytes of space. My family is producing photographic data at 35 times the rate my wife’s grandfather’s did, and the rate is increasing every year as it becomes easier to take more and more photos.More generally, the digital streams that individuals are producing are growing apace. Microsoft Research’s MyLifeBits project gives a glimpse of archiving of personal in formation that may become commonplace in the near future. MyLifeBits was an experiment where an individual’s interactions—phone calls, emails, documents were captured electronically and stored for later access. The data gathered included a photo taken every minute, which resulted in an overall data volume of one gigabyte a month. When storage costs come down enough to make it feasible to store continuous audio and video, the data volume for a future MyLifeBits service will be many times that.The trend is f or every individual’s data footprint to grow, but perhaps more importantly the amount of data generated by machines will be even greater than that generated by people. Machine logs, RFID readers, sensor networks, vehicle GPS traces, retail transactions—all of these contribute to the growing mountain of data.The volume of data being made publicly available increases every year too. Organizations no longer have to merely manage their own data: success in the future will be dictated to a large extent by their ability to extract value from other organizations’ data.Initiatives such as Public Data Sets on Amazon Web Services, , and exist to foster the “information commons,” where data can be freely (or in the case of AWS, for a modest price) shared for anyone to download and analyze. Mashups between different information sources make for unexpected and hitherto unimaginable applications.Take, for example, the project, which watches the Astrometry groupon Flickr for new photos of the night sky. It analyzes each image, and identifies which part of the sky it is from, and any interesting celestial bodies, such as stars or galaxies. Although it’s still a new and experimental service, it shows the kind of things that are possible when data (in this case, tagged photographic images) is made available andused for something (image analysis) that was not anticipated by the creator.It has been said that “More data usually beats better algorithms,” which is to say that for some problems (such as recommending movies or music based on past preferences),however fiendish your algorithms are, they can often be beaten simply by having more data (and a less sophisticated algorithm).The good news is that Big Data is here. The bad news is that we are struggling to store and analyze it.Data Storage and AnalysisThe problem is simple: while the storage capacities of hard drives have increased massively over the years, access speeds--the rate at which data can be read from drives--have not kept up. One typical drive from 1990 could store 1370 MB of data and had a transfer speed of 4.4 MB/s, so you could read all the data from a full drive in around five minutes. Almost 20years later one terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.This is a long time to read all data on a single drive and writing is even slower. The obvious way to reduce the time is to read from multiple disks at once. Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes.Only using one hundredth of a disk may seem wasteful. But we can store one hundred datasets, each of which is one terabyte, and provide shared access to them. We can imagine that the users of such a system would be happy to share access in return for shorter analysis times, and, statistically, that their analysis jobs would be likely to be spread over time, so they wouldn`t interfere with each other too much.There`s more to being able to read and write data in parallel to or from multiple disks, though. The first problem to solve is hardware failure: as soon as you start using many pieces of hardware, the chance that one will fail is fairly high. A common way of avoiding data loss is through replication: redundant copies of the data are kept by the system so that in the event of failure, there is another copy available. This is how RAID works, for instance, although Hadoop`s filesystem, the Hadoop Distributed Filesystem (HDFS),takes a slightly different approach, as you shall see later. The second problem is that most analysis tasks need to be able to combine the data in some way; data read from one disk may need to be combined with the data from any of the other 99 disks. Various distributed systems allow data to be combined from multiple sources, but doing this correctly is notoriously challenging. MapReduce provides a programming model that abstracts the problem from disk reads and writes, transforming it into a computation over sets of keys and values. We will look at the details of this model in later chapters, but the important point for the present discussion is that there are two parts to the computation, the map and the re duce, and it’s the interface between the two where the “mixing” occurs. Like HDFS, MapReduce has reliability built-in.This, in a nutshell, is what Hadoop provides: a reliable shared storage and analysis system. The storage is provided by HDFS, and analysis by MapReduce. There are other parts to Hadoop, but these capabilities are its kernel.Comparison with Other SystemsThe approach taken by MapReduce may seem like a brute-force approach. The premise is that the entire dataset—or at least a good portion of it—is processed for each query. But this is its power. MapReduce is a batch query processor, and the ability to run an ad hoc query against your whole dataset and get the results in a reasonable time is transformative. It changes the way you think about data, and unlocks data that was previously archived on tape or disk. It gives people the opportunity to innovate with data. Questions that took too long to get answered before can now be answered, which in turn leads to new questions and new insights.For e xample, Mailtrust, Rackspace’s mail division, used Hadoop for processing email logs. One ad hoc query they wrote was to find the geographic distribution of their users.In their words: This data was so useful that we’ve scheduled the MapReduce job to run monthly and we will be using this data to help us decide which Rackspace data centers to place new mail servers in as we grow. By bringing several hundred gigabytes of data together and having the tools to analyze it, the Rackspace engineers were able to gain an understanding of the data that they otherwise would never have had, and, furthermore, they were able to use what they had learned to improve the service for their customers. You can read more about how Rackspace uses Hadoop in Chapter 14.RDBMSWhy c an’t we use databases with lots of disks to do large-scale batch analysis? Why is MapReduce needed? The answer to these questions comes from another trend in disk drives: seek time is improving more slowly than transfer rate. Seeking is the process of moving the disk’s head to a particular place on the disk to read or write data. It characterizes the latency of a disk operation, whereas the transfer rate corresponds to a disk’s bandwidth.If the data access pattern is dominated by seeks, it will take longer to read or write large portions of the dataset than streaming through it, which operates at the transfer rate. On the other hand, for updating a small proportion of records in a database, a traditional B-Tree (the data structure used in relational databases, which is limited by the rate it can perform seeks) works well. For updating the majority of a database, a B-Tree is less efficient than MapReduce, which uses Sort/Merge to rebuild the database.In many ways, MapReduce can be seen as a complement to an RDBMS. (The differences between the two systems are shown in Table 1-1.) MapReduce is a good fit for problems thatneed to analyze the whole dataset, in a batch fashion, particularly for ad hoc analysis. An RDBMS is good for point queries or updates, where the dataset has been indexed to deliver low-latency retrieval and update times of a relatively small amount of data. MapReduce suits applications where the data is written once, and read many times, whereas a relational database is good for datasets that are continually updated.Table 1-1. RDBMS compared to MapReduceTraditional RDBMS MapReduceData size Gigabytes PetabytesAccess Interactive and batch BatchWrite once, read many times Updates Read and write manytimesStructure Static schema Dynamic schemaIntegrity High LowScaling Nonlinear LinearAnother difference between MapReduce and an RDBMS is the amount of structure in the datasets that they operate on. Structured data is data that is organized into entities that have a defined format, such as XML documents or database tables that conform to a particular predefined schema. This is the realm of the RDBMS. Semi-structured data, on the other hand, is looser, and though there may be a schema, it is often ignored, so it may be used only as a guide to the structure of the data: for example, a spreadsheet, in which the structure is the grid of cells, although the cells themselves may hold anyform of data. Unstructured data does not have any particular internal structure: for example, plain text or image data. MapReduce works well on unstructured or semistructured data, since it is designed to interpret the data at processing time. In other words, the input keys and values for MapReduce are not an intrinsic property of the data, but they are chosen by the person analyzing the data.Relational data is often normalized to retain its integrity, and remove redundancy. Normalization poses problems for MapReduce, since it makes reading a record a nonlocaloperation, and one of the central assumptions that MapReduce makes is that it is possible to perform (high-speed) streaming reads and writes.A web server log is a good example of a set of records that is not normalized (for example, the client hostnames are specified in full each time, even though the same client may appear many times), and this is one reason that logfiles of all kinds are particularly well-suited to analysis with MapReduce.MapReduce is a linearly scalable programming model. The programmer writes two functions—a map function and a reduce function—each of which defines a mapping from one set of key-value pairs to another. These functions are oblivious to the size of the data or the cluster that they are operating on, so they can be used unchanged for a small dataset and for a massive one. More importantly, if you double the size of the input data, a job will run twice as slow. But if you also double the size of the cluster, a job will run as fast as the original one. This is not generally true of SQL queries.Over time, however, the differences between relational databases and MapReduce systems are likely to blur. Both as relational databases start incorporating some of the ideas from MapReduce (such as Aster Data’s and Greenplum’s databases), and, from the other direction, as higher-level query languages built on MapReduce (such as Pig and Hive) make MapReduce systems more approachable to traditional database programmers.Grid ComputingThe High Performance Computing (HPC) and Grid Computing communities have been doing large-scale data processing for years, using such APIs as Message Passing Interface (MPI). Broadly, the approach in HPC is to distribute the work across a cluster of machines, which access a shared filesystem, hosted by a SAN. This works well for predominantly compute-intensive jobs, but becomes a problem when nodes need to access larger data volumes (hundreds of gigabytes, the point at which MapReduce really starts to shine), since the network bandwidth is the bottleneck, and compute nodes become idle.MapReduce tries to colocate the data with the compute node, so data access is fast since it is local. This feature, known as data locality, is at the heart of MapReduce and is the reason for its good performance. Recognizing that network bandwidth is the most precious resource in a data center environment (it is easy to saturate network links by copying data around),MapReduce implementations go to great lengths to preserve it by explicitly modelling network topology. Notice that this arrangement does not preclude high-CPU analyses in MapReduce.MPI gives great control to the programmer, but requires that he or she explicitly handle the mechanics of the data flow, exposed via low-level C routines and constructs, such as sockets, as well as the higher-level algorithm for the analysis. MapReduce operates only at the higher level: the programmer thinks in terms of functions of key and value pairs, and the data flow is implicit.Coordinating the processes in a large-scale distributed computation is a challenge. The hardest aspect is gracefully handling partial failure—when you don’t know if a remote process has failed or not—and still making progress with the overall computation. MapReduce spares the programmer from having to think about failure, since the implementation detects failed map or reduce tasks and reschedules replacements on machines that are healthy. MapReduce is able to do this since it is a shared-nothing architecture, meaning that tasks have no dependence on one other. (This is a slight oversimplification, since the output from mappers is fed to the reducers, but this is under the control of the MapReduce system; in this case, it needs to take more care rerunning a failed reducer than rerunning a failed map, since it has to make sure it can retrieve the necessary map outputs, and if not, regenerate them by running the relevant maps again.) So from the programmer’s point of view, the order in which the tasks run doesn’t matter. By contrast, MPI programs have to explicitly manage their own checkpointing and recovery, which gives more control to the programmer, but makes them more difficult to write.MapReduce might sound like quite a restrictive programming model, and in a sense itis: you are limited to key and value types that are related in specified ways, and mappers and reducers run with very limited coordination between one another (the mappers pass keys and values to reducers). A natural question to ask is: can you do anything useful or nontrivial with it?The answer is yes. MapReduce was invented by engineers at Google as a system for building production search indexes because they found themselves solving the same problem over and over again (and MapReduce was inspired by older ideas from the functional programming, distributed computing, and database communities), but it has since been used for many other applications in many other industries. It is pleasantly surprising to see the range of algorithms that can be expressed in MapReduce, from image analysis, to graph-based problems,to machine learning algorithms. It can’t solve every problem, of course, but it is a general data-processing tool.You can see a sample of some of the applications that Hadoop has been used for in Chapter 14.Volunteer ComputingWhen people first hear about Hadoop and MapReduce, they often ask, “How is it different from SETI@home?” SETI, the Search for Extra-Terrestrial Intelligence, runs a project called SETI@home in which volunteers donate CPU time from their otherwise idle computers to analyze radio telescope data for signs of intelligent life outside earth. SETI@home is the most well-known of many volunteer computing projects; others include the Great Internet Mersenne Prime Search (to search for large prime numbers) and Folding@home (to understand protein folding, and how it relates to disease).Volunteer computing projects work by breaking the problem they are trying to solve into chunks called work units, which are sent to computers around the world to be analyzed. For example, a SETI@home work unit is about 0.35 MB of radio telescope data, and takes hours or days to analyze on a typical home computer. When the analysis is completed, the results are sent back to the server, and the client gets another work unit. As a precaution to combat cheating, each work unit is sent to three different machines, and needs at least two results to agree to be accepted.Although SETI@home may be superficially similar to MapReduce (breaking a problem into independent pieces to be worked on in parallel), there are some significant differences. The SETI@home problem is very CPU-intensive, which makes it suitable for running on hundreds of thousands of computers across the world, since the time to transfer the work unit is dwarfed by the time to run the computation on it. Volunteers are donating CPU cycles, not bandwidth.MapReduce is designed to run jobs that last minutes or hours on trusted, dedicated hardware running in a single data center with very high aggregate bandwidth interconnects. By contrast, SETI@home runs a perpetual computation on untrusted machines on the Internet with highly variable connection speeds and no data locality.译文:初识Hadoop古时候,人们用牛来拉重物,当一头牛拉不动一根圆木的时候,他们不曾想过培育个头更大的牛。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
大数据外文翻译文献(文档含中英文对照即英文原文和中文翻译)原文:What is Data Mining?Many people treat data mining as a synonym for another popularly used term, “Knowledge Discovery in Databases”, or KDD. Alternatively, others view data mining as simply an essential step in the process of knowledge discovery in databases. Knowledge discovery consists of an iterative sequence of the following steps:· data cleaning: to remove noise or irrelevant data,· data integration: where multiple data sources may be combined,·data selection : where data relevant to the analysis task are retrieved from the database,·data transformation : where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance,·data mining: an essential process where intelligent methods are applied in order to extract data patterns,·pattern evaluation: to identify the truly interesting patterns representing knowledge based on some interestingness measures, and ·knowledge presentation: where visualization and knowledge representation techniques are used to present the mined knowledge to the user .The data mining step may interact with the user or a knowledge base. The interesting patterns are presented to the user, and may be stored as new knowledge in the knowledge base. Note that according to this view, data mining is only one step in the entire process, albeit an essential one since it uncovers hidden patterns for evaluation.We agree that data mining is a knowledge discovery process. However, in industry, in media, and in the database research milieu, the term “data mining” is becoming more popular than the longer term of “knowledge discovery in databases”. Therefore, in this book, we choose to use the term “data mining”. We adop t a broad view of data mining functionality: data mining is the process of discovering interestingknowledge from large amounts of data stored either in databases, data warehouses, or other information repositories.Based on this view, the architecture of a typical data mining system may have the following major components:1. Database, data warehouse, or other information repository. This is one or a set of databases, data warehouses, spread sheets, or other kinds of information repositories. Data cleaning and data integration techniques may be performed on the data.2. Database or data warehouse server. The database or data warehouse server is responsible for fetching the relevant data, based on the user’s data mining request.3. Knowledge base. This is the domain knowledge that is used to guide the search, or evaluate the interestingness of resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes or attribute values into different levels of abstraction. Knowledge such as user beliefs, which can be used to assess a pattern’s interestingness based on its unexpectedness, may also be included. Other examples of domain knowledge are additional interestingness constraints or thresholds, and metadata (e.g., describing data from multiple heterogeneous sources).4. Data mining engine. This is essential to the data mining system and ideally consists of a set of functional modules for tasks such ascharacterization, association analysis, classification, evolution and deviation analysis.5. Pattern evaluation module. This component typically employs interestingness measures and interacts with the data mining modules so as to focus the search towards interesting patterns. It may access interestingness thresholds stored in the knowledge base. Alternatively, the pattern evaluation module may be integrated with the mining module, depending on the implementation of the data mining method used. For efficient data mining, it is highly recommended to push the evaluation of pattern interestingness as deep as possible into the mining process so as to confine the search to only the interesting patterns.6. Graphical user interface. This module communicates between users and the data mining system, allowing the user to interact with the system by specifying a data mining query or task, providing information to help focus the search, and performing exploratory data mining based on the intermediate data mining results. In addition, this component allows the user to browse database and data warehouse schemas or data structures, evaluate mined patterns, and visualize the patterns in different forms.From a data warehouse perspective, data mining can be viewed as an advanced stage of on-1ine analytical processing (OLAP). However, data mining goes far beyond the narrow scope of summarization-styleanalytical processing of data warehouse systems by incorporating more advanced techniques for data understanding.While there may be many “data mining systems” on the market, not all of them can perform true data mining. A data analysis system that does not handle large amounts of data can at most be categorized as a machine learning system, a statistical data analysis tool, or an experimental system prototype. A system that can only perform data or information retrieval, including finding aggregate values, or that performs deductive query answering in large databases should be more appropriately categorized as either a database system, an information retrieval system, or a deductive database system.Data mining involves an integration of techniques from mult1ple disciplines such as database technology, statistics, machine learning, high performance computing, pattern recognition, neural networks, data visualization, information retrieval, image and signal processing, and spatial data analysis. We adopt a database perspective in our presentation of data mining in this book. That is, emphasis is placed on efficient and scalable data mining techniques for large databases. By performing data mining, interesting knowledge, regularities, or high-level information can be extracted from databases and viewed or browsed from different angles. The discovered knowledge can be applied to decision making, process control, information management, query processing, and so on. Therefore,data mining is considered as one of the most important frontiers in database systems and one of the most promising, new database applications in the information industry.A classification of data mining systemsData mining is an interdisciplinary field, the confluence of a set of disciplines, including database systems, statistics, machine learning, visualization, and information science. Moreover, depending on the data mining approach used, techniques from other disciplines may be applied, such as neural networks, fuzzy and or rough set theory, knowledge representation, inductive logic programming, or high performance computing. Depending on the kinds of data to be mined or on the given data mining application, the data mining system may also integrate techniques from spatial data analysis, Information retrieval, pattern recognition, image analysis, signal processing, computer graphics, Web technology, economics, or psychology.Because of the diversity of disciplines contributing to data mining, data mining research is expected to generate a large variety of data mining systems. Therefore, it is necessary to provide a clear classification of data mining systems. Such a classification may help potential users distinguish data mining systems and identify those that best match their needs. Data mining systems can be categorized according to various criteria, as follows.1) Classification according to the kinds of databases mined.A data mining system can be classified according to the kinds of databases mined. Database systems themselves can be classified according to different criteria (such as data models, or the types of data or applications involved), each of which may require its own data mining technique. Data mining systems can therefore be classified accordingly.For instance, if classifying according to data models, we may have a relational, transactional, object-oriented, object-relational, or data warehouse mining system. If classifying according to the special types of data handled, we may have a spatial, time -series, text, or multimedia data mining system , or a World-Wide Web mining system . Other system types include heterogeneous data mining systems, and legacy data mining systems.2) Classification according to the kinds of knowledge mined.Data mining systems can be categorized according to the kinds of knowledge they mine, i.e., based on data mining functionalities, such as characterization, discrimination, association, classification, clustering, trend and evolution analysis, deviation analysis , similarity analysis, etc.A comprehensive data mining system usually provides multiple and/or integrated data mining functionalities.Moreover, data mining systems can also be distinguished based on the granularity or levels of abstraction of the knowledge mined, includinggeneralized knowledge(at a high level of abstraction), primitive-level knowledge(at a raw data level), or knowledge at multiple levels (considering several levels of abstraction). An advanced data mining system should facilitate the discovery of knowledge at multiple levels of abstraction.3) Classification according to the kinds of techniques utilized.Data mining systems can also be categorized according to the underlying data mining techniques employed. These techniques can be described according to the degree of user interaction involved (e.g., autonomous systems, interactive exploratory systems, query-driven systems), or the methods of data analysis employed(e.g., database-oriented or data warehouse-oriented techniques, machine learning, statistics, visualization, pattern recognition, neural networks, and so on ) .A sophisticated data mining system will often adopt multiple data mining techniques or work out an effective, integrated technique which combines the merits of a few individual approaches.什么是数据挖掘?许多人把数据挖掘视为另一个常用的术语—数据库中的知识发现或KDD的同义词。