云计算的安全架构与技术 补充材料2PPT课件

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Workers
Workers
Workers
GFS: underlying storage system
▪ Goal
▪ global view ▪ make huge files available in the face of node failures
▪ Master Node (meta server)
How does it work?
Locality issue
▪ Master scheduling policy
▪ Asks GFS for locations of replicas of input ▪ Map tasks typically split into 64MB (== GFS block
▪ Map
▪ Process a key/value pair to generate intermediate key/value pairs
▪ Reduce
▪ Merge all intermediate values associated with the same key
▪ Partition
Function
U
C
E
Result
▪ Map:
▪ Accepts input key/value pair
▪ Emits intermediate key/value pair
▪ Reduce :
▪ Accepts intermediate key/value* pair
▪ Emits output key/value pair
reduce(string key, iterator values) //key: word //values: list of counts int results = 0; for each v in values result += ParseInt(v); Emit(AsString(result));
▪ Centralized, index all chunks on data servers
▪ Chunk server (data server)
▪ split into contiguous chunks, typically 16-64MB. ▪ Each chunk replicated (usually 2x or 3x).
▪ Origin from Google, [OSDI’04] ▪ A simple programming model ▪ Functional model ▪ For large-scale data processing
▪ Exploits large set of commodity computers ▪ Executes process in distributed manner ▪ Offers high availability
Very big data
Split data Split data Split data
Split data
count count count
count
count count count
merge
merged count
count
Map+Reduce
Very big data
R
M A P
E
Partitioning D
Take a Close Look at MapReduce
Xuanhua Shi
Acknowledgement
▪ Most of the slides are from Dr. Bing Chen, ▪ Some slides are from SHADI IBRAHIM,
What is MapReduce
The design and how it works
Architecture overview
Master node
user Job tracker
Slave node 1 Task tracker
Slave node 2 Task tracker
Slave node N Task tracker
size) ▪ Map tasks scheduled so GFS input block replica are
on same machine or same rack
▪ Effect
▪ Thousands of machines read input at local disk speed ▪ Without this, rack switches limit read rate
Motivation
▪ Lots of demands for very large scale data
processing
▪ A certain common themes for these
demands
▪ Lots of machines needed (scaling) ▪ Two basic operations on the input
▪ Map ▪ Reduce
Distributed Grep
Very big data
Split data Split data Split data
Split data
grep grep grep
grep
matches
Biblioteka Baidumatches
matches
cat
matches
All matches
Distributed Word Count
▪ By default : hash(key) mod R ▪ Well balanced
Diagram (1)
Diagram (2)
A Simple Example
▪ Counting words in a large set of documents
map(string value) //key: document name //value: document contents for each word w in value EmitIntermediate(w, “1”);
▪ Try to keep replicas in different racks.
GFS architecture
GFS Master
Client
C0 C1 C5 C2
C1 C5 C3
C0 C5

C2
1 Chunkserver Chunkserver 2
N Chunkserver
Functions in the Model
相关文档
最新文档