新浪微博实时数据分析服务的构架与实践

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

……
Metric
5
Influxdb Prometheus Elasticsearch
25
Pipeline
Ø Spark Streaming Ø Flink Ø Java

Spark
Flink
Java
sql

map flatMap •
UDF
PipelineJob job
Spark Flink
Spark Streaming
3. Spark Kafka 0.9+
offset
4. Flink
1. 2. 3. 4.
Pipeline stage
Pipeline Job
Pipeline stage
Flattener …… Splitter
Pipeline stage
Selector
Agg
SQL UDF UDAF
Pro Pipeline
stage Flattener Substring
Selector
…… ……
…… ……
4 10 3
4
1. 2. 3. 4.
10
• Databus
• docker container
• Summon
Kafka HDFS
Databus
Y A R N
Hive Presto Summon ES Pinot
Kafka HDFS
JSON Avro
Pipeli neJob
SQL
Pipeli neJob
+ Spark
Flink
Ø Ø Hive Ø Redis Etcd
hdfs kafka 0.09
partition
offset
flatMap
Swk.baidu.comL
foreachPartition
Map<String,Object>
Map<String,Object>
3
Web Server
Alert Manager
JSON Avro Summon
JVM
Kafka HDFS HTTP
Kafka HDFS
JSON Avro
Kafka HDFS Http
JSON Avro Summon
… … …
Pipeline stage Adder
Convertor
Replacer
Pre Casewhen
stage
condition
1. SparkConf driver executor 2. Spark 3. Flink 4. Flink TaskManager
Kafka
Form Kafka
1. Kafka
2. Spark Kafka Direct
StreamingListener
at least once at most once
Leader Controller
Label
Worker
Leader Controller
Label
Worker
Ø
Ø
• ARIMA
Ø
• RNN • Tensorflow Time Series
Ø
• Prophet
Ø
• Xgboost
Ø
Ø Influxdb Ø Prometheus Ø ElasticSearch
Grafana
Config Server Pipeline DControl
Metrics DB
Yarn
Pipeline
Zookeeper
Web
Ø Ø Ø
Ø TBScheduler Ø Ø
• worker • •



Leader
zookeeper
Leader Controller
Label
Worker
相关文档
最新文档