大数据分析存储解决方案
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Fraud / theft protection
What action should I take?
Decision management
What did I learn, what’s best?
Cognitive
Why did it happen?
Reporting and analysis
Call Centers
海量数据集成和转化
Stream Computing
InfoSphere Streams
低延迟流数据分析 Velocity, Variety & Volume Data-In-Motion
MPP Data Warehouse
Netezza High Capacity Appliance
基于结构化数据的可查询 归档
Warehouse
BI and Predictive Analytics
Streams
Raw Data Structured Data Text Analytics Data Mining Entity Analytics Machine Learning
BigInsights
Navigation and Discovery
Cognitive
Why did it happen?
Reporting and analysis
Call Centers
关系掌控 构建和维护电网的唯 一试图
Grid
分时时段电价的实时定价 或 提供及时的需求/响应服务
What could happen?
Predictive analytics and modeling
OLTP System Data ERP data Traditional Sources New Sources
RFID
Page 2
对新式基础架构的需求
Systems of Engagement (SoE)
在可靠和安全的环境中处理关键业务
应用
存取和处理海量数据——包括结构化
和非结构化数据
Data in Many Forms
Information Governance, Security and Business Continuity
Page 8
© Copyright IBM Corporation 2014
IBM Big Data Platform大数据平台
InfoSphere BigInsights
Hadoop-based 低延迟分析〃针对 多样化的、海量静态数据Data-AtRest
Apache Hadoop: 跨服务器集群的大数据集分布 式处理开放系统框架〃采用的 是一种简单化编程模型
Hadoop Information Integration
InfoSphere Information Server
Page 6
数据分析的高可用性,以确 保随时了解用户喜好
On premise, Cloud, As a service
跨应用的TB级的数据需求 –通 用虚拟化存储平台
IBM Big Data & Analytics Reference Architecture
All Data Sources
Streaming Data
Outage Mgmt
Information Integration & Governance
Systems Security Storage
预测哪些用户适合于哪些分 时时段电价或需求/响应服务
Billing systems
数据在加载到数据仓库前的清洗、 验证,这些数据可能来自很多的用 户、收费系统或断电保护系统
What Could Happen? Case Management
Geo Spatial
Descriptive 描述
Video & Image What Has Happened?
Analytic Applications
Cloud Services Relational
Exploratቤተ መጻሕፍቲ ባይዱon and Discovery
Page 11
Hadoop 说明, Map Reduce, HDFS
HDFS 把数据分散存储在多个存储节点Node上 HDFS 设计时就假设存储节点有失效的可能〃所以HDFS会把一份数据复制3份以上〃分散存 储在多个节点上〃从而实现系统整体上的可靠性 HDFS文件系统是由服务器节点集群组成的〃每台服务器依照HDFS的特有block协议支持网 络化block 数据 HDFS Name Node 有发生单点故障的危险 IBM 在改善文件系统的性能同时消除了单点故障 ——Elastic Storage -SNC (available as beta code)
Big Data & Analytics
对的决策 对的地方 对的时间点
速度及时响应随时可能出现的商业机
会,这就需要灵活、实时性的基础架 构
System of Record (SoR)
The dynamics of SoR and SoE:
– 通过负载及资源部署的优化,来增强 灵活性和效益 – 通过采用包括基于开放标准的技术等 新技术来改善IT economics
Big Data Platform Capabilities
Information Ingest Real-time Analytics Warehouse & Data Marts Analytic Appliances
Advanced Analytics/ New Insights
Cognitive认知
IBM Big Data & Analytics Infrastructure
Page 4
案例: Smart Metering智慧电力计费 大数据分析应用可以带来 真正的业务价值
Grid Operations 电网管理 Field Service 外勤现场服务
电网运维优化 减少停电次数和时间
及时发现能源损耗问题、 以及偷电和欺诈行为
具备洞悉能力的系统 Systems of Insight
Creative, holistic thought, intuition Systems Of Engagement
Hadoop and Streams
New Approach
Data Warehouse Transaction Data Internal App Data Structured Mainframe Data
What is Hadoop?
What: 一种开源软件〃将数据计算分布到整个集群的常见商用服务器和 存储上
Why: 传统的计算架构是一种沿纵向扩展模式〃通过更快的SAN、大容 量内存和多级缓存将数据加载到CPU上〃成本比较高。 What: Hadoop 把大数据集合拆分区划为小数据集合〃再把小数据集合 分发到多台普通服务器上〃是一种横向扩展模式。 Why: Scalable, Flexible, Cost Effective, Fault Tolerent Components: Map Reduce, HDFS
Page 3
大数据分析的新型架构解决方案
All Data
Data Zone
IBM Watson Foundations Application Zone
New/Enhanced Applications
Meters
Real-time Data Processing & Analytics
What is happening?
Smart Analytics System Netezza 1000
基于结构化数据的 BI+定制化分析 Data
基于结构化数据的运营分析
InfoSphere Warehouse
基于结构化数据的大容量数据 分析 Page 10
Informix Timeseries
Time-structured analytics
Resource Planning
Smart Metering
资源规划
电量使用预测更为精确
Customer Service / Customer Operations
提高客户满意度
法规遵从
5
实现真正的有效的 法规遵从
Page 5
案例: 用大数据分析来加强 Smart Metering
All Data
Deep Analytics data zone EDW and data mart zone
Discovery and exploration
Fraud / theft protection
What action should I take?
Decision management
What did I learn, what’s best?
serve portals What is 分析用户用电情况,侦 happening? 测偷电、改表等行为
Customer self-
ERP
Location
Operational data zone
Customers
Landing, Exploration and Archive data zone
Intelligence Analysis
Exploration, Integrated Warehouse, and Mart Zones
Discovery Deep Reflection Operational Predictive
Decision Management
Data at Rest
Stream Processing Data Integration Master Data
Multimedia Web Logs Social Data Text Data: emails Sensor data: images
Repeatable Linear
Accumulation
Systems of Insight Unstructured Enterprise Exploratory Integration Dynamic and Context
IBM存储解决方案
——数据分析的存储
IBM STG 谢文华 wenhuax@cn.ibm.com
© Copyright IBM Corporation 2014
从企业数据向大数据的扩展
Structured, analytical, logical Systems of Record
Traditional Approach
Learn Dynamically?
New/ Enhanced Applications
Watson
Text Data
Applications Data
Prescriptive 规范
Best Outcomes?
Alerts
Automated Process Time Series
Predictive预测
What could happen?
Predictive analytics and modeling
Outage Mgmt
Grid
Information Integration & Governance
Systems Security Storage
Billing systems
On premise, Cloud, As a service
What Do You Have? ISV Solutions
Social Network
Page 7
New Infrastructure Leverages Data Types
Real-time Analytics
Streams
Data in Motion
Video/Audio Network/Sensor Entity Analytics Predictive Information Ingestion and Operational Information Landing Area, Analytics Zone and Archive
Discovery and exploration
Customer selfserve portals
ERP
Location
Operational data zone
Customers
Landing, Exploration and Archive data zone
Deep Analytics data zone EDW and data mart zone
IBM Watson
历史用电状态数据的复杂 查询处理
对整个企业的结构化和非结 构化数据t做全局导览 Foundations Navigation,从中发现 Discover价值
New/Enhanced Applications
实时收集、存储并分析数据, 最快可达 50,000 data
Meters
points/sec Real-time Data Processing & Analytics