大数据介绍英文方案
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Eventual Consistency
at some point in the future, data will converge to a consistent state. No guarantees are made “when”.
3 NoSQL
JSON Structure
{ field1: value1, field2: value2 … fieldN: valueN } var mydoc = { _id:ObjectId("5099803df3f4948bd2f98391"), name: { first: "Alan", last: "Turing" }, birth: new Date('Jun 23, 1912'), death: new Date('Jun 07, 1954'), contribs: [ "Turing machine", "Turing test", …], views : NumberLong(1250000) }
3 NoSQL
RDBMS vs NoSQL
• Xszc
Row DB: 001:10,Smith,Joe,40000;002:12,Jones,Mary,50000;003:11,Johnson,Cathy,44000;004:22,Jones,Bob,5 5000; index: 001:40000;002:50000;003:44000;004:55000; Column DB: 10:001,12:002,11:003,22:004;Smith:001,Jones:002,Johnson:003,Jones:004;Joe:001,Mary:002,Cathy: 003,Bob:004;40000:001,50000 …;Smith:001,Jones:002,004,Johnson:003;…
3
NoSQL
3 NoSQL
NoSQL refers to document-oriented databases SQL doesn’t scale well horizontally. It is schemaless. But not formless (JSON format). JSON: data interchange format Mongo Database Couch Database
3 NoSQL
Benefits
• Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data. • Column-oriented organizations are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows. • Row-oriented organizations are more efficient when many columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek. • Row-oriented organizations are more efficient when writing a new row if all of the column data is supplied at the same time, as the entire row can be written with a single disk seek.
Variety
The type and nature of the data.
Velocity
In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development.
Variability
Inconsistency of the data set can hamper processes to handle and manage it.
Veracity
The quality of captured data can vary greatly, affecting accurate analysis.
on Alibaba’s marketplaces
US$1,133,942
spent on Alibaba
1 2
Definition
Characteristic NoSQL RDBMS MapReduce Applications
C
3
ONTENTS 4 5
6
1
Definition
1 Definition
on a day-to-day basis
volume of data
BIG DATA
for better decisions
important data
2
Characteristic
2 Characteristic
Volume
The quantity of generated and stored data.
BIG DATA
EVERY MINUTE…
Didi rides hailed:
1,388
cabs
2,777
private cars
EVERY MINUTE…
395,833 People log in To WeChat
194,44wk.baidu.com people are video or audio chatting
EVERY MINUTE…
625,000
Youku Tudou videos being watched
EVERY MINUTE…
64,814
posts and reposts on Weibo
4,166,667 search queries
SEARCH
EVERY MINUTE…
774 people buy something
3 NoSQL
Basic Availability
spread data across many storage systems with a high degree of replication.
Base Model
Soft State
data consistency is the developer's problem and should not be handled by the database.