2002New Hardware-Slides on how to select systems
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Find commonly used systems for a problem
– May not obviously be in the same category, e.g. database vs key-value store
Requires quickly parsing “jargon” for each one
1. Project Architecture
Software has to meet many conflicting goals
– Robustness – Performance – Security – Time to market
Is there any way to improve in all of them? YES: keep the software small.
" How to decide which one to use when?
Key Considerations
1. Project architecture 2. Available alternatives 3. Workload envelope 4. Ease of management
• E.g. HDFS, S3, Kafka
– Do apps need to know details of the system?
• Consistency, availability, performance quirks
– Is it easy to plug in existing code?
4. Ease of Management
Many factors matter for real-world use:
– How easy is the system to Βιβλιοθήκη Baidunstall or upgrade? – Can you find people trained in using/managing it? – How can you monitor it / tell it’s working? – How can you troubleshoot if something’s wrong?
More Generally
Good design needs to be easy to understand and easy to change Key considerations for big data systems:
– Is the data easy to access from other tools?
– Similar to reading the papers we covered
3. Workload Envelope
What workload characteristics does the system support?
– Scale in various dimensions (total data size, number of fields, length of records, etc) – Performance metrics (throughput, latency, etc) – Deployment environment (e.g. multi-datacenter) – Consistency and availability
Example
Summary
• Define the problem you want to solve • List most common alternatives • Design and run an evaluation task (for both workload and management actions) • Write down and share the results
How to Find Workload Envelope
Design an evaluation workload you can test on different systems
– E.g. load and query fake data
Read about other deployed use cases
How to Select Technologies
Dec 8, 2015
The Problem
We’ve seen a lot of different systems:
– Storage: GFS, BigTable, Dynamo, databases, … – Resource sharing: Mesos, Borg, EC2, ... – Analytics: MapReduce, Spark, Dremel, Naiad, ... – Serving: Tao, Unicorn, Druid, ...
Typical Designs
Other Data Web Server
Logging Bus" (e.g. Kafka)
Archival Storage" (e.g. HDFS)
Database
Many Apps
Many Apps
2. Available Alternatives
– May not obviously be in the same category, e.g. database vs key-value store
Requires quickly parsing “jargon” for each one
1. Project Architecture
Software has to meet many conflicting goals
– Robustness – Performance – Security – Time to market
Is there any way to improve in all of them? YES: keep the software small.
" How to decide which one to use when?
Key Considerations
1. Project architecture 2. Available alternatives 3. Workload envelope 4. Ease of management
• E.g. HDFS, S3, Kafka
– Do apps need to know details of the system?
• Consistency, availability, performance quirks
– Is it easy to plug in existing code?
4. Ease of Management
Many factors matter for real-world use:
– How easy is the system to Βιβλιοθήκη Baidunstall or upgrade? – Can you find people trained in using/managing it? – How can you monitor it / tell it’s working? – How can you troubleshoot if something’s wrong?
More Generally
Good design needs to be easy to understand and easy to change Key considerations for big data systems:
– Is the data easy to access from other tools?
– Similar to reading the papers we covered
3. Workload Envelope
What workload characteristics does the system support?
– Scale in various dimensions (total data size, number of fields, length of records, etc) – Performance metrics (throughput, latency, etc) – Deployment environment (e.g. multi-datacenter) – Consistency and availability
Example
Summary
• Define the problem you want to solve • List most common alternatives • Design and run an evaluation task (for both workload and management actions) • Write down and share the results
How to Find Workload Envelope
Design an evaluation workload you can test on different systems
– E.g. load and query fake data
Read about other deployed use cases
How to Select Technologies
Dec 8, 2015
The Problem
We’ve seen a lot of different systems:
– Storage: GFS, BigTable, Dynamo, databases, … – Resource sharing: Mesos, Borg, EC2, ... – Analytics: MapReduce, Spark, Dremel, Naiad, ... – Serving: Tao, Unicorn, Druid, ...
Typical Designs
Other Data Web Server
Logging Bus" (e.g. Kafka)
Archival Storage" (e.g. HDFS)
Database
Many Apps
Many Apps
2. Available Alternatives