big data idg_cvw 2
合集下载
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
• Machine learning for consumer content • Surfacing news you want but might not see
• Using location data to learn about you • Predicting where you are and where you’re going
• Honorable mention: Placed
• 70,000 variables for credit underwriting • Less chance of default means lower costs
• Video chat + NLP + semantic analysis • MindMeld surfaces content as we talk
• Massive pricing database and 8+B data points • What to buy and when to buy it
• Machine learning + UI to clean up data • Enabling big data by making data usable
Machine learning does the hard work
• • • • Skytree Ayasdi BeyondCore Mahout
• Cheap satellites + Hadoop = Query the earth • Daily imaging and location data analysis
Bigger, faster, smarter
How a new wave of startups is building around data
wk.baidu.com
Data is everywhere
• Your own data (servers, apps, etc.) • APIs (pick a web service (e.g.,Twitter) • Data marketplaces
– Factual – Datafiniti – Infochimps – Windows Azure Marketplace
• Public data
– Many government agencies (in U.S.)
Data management has been solved
Hadoop • HDFS/MapR FS/Quantcast FS • MapReduce • YARN/Mesos/Spark/Coron a/(Storm) • Impala/Drill NoSQL • HBase • MongoDB • Cassandra • Riak • Redis • Couchbase
Lessons learned
• Data is the fuel, not the output
– Users want to see results, not analyze data – Results mean content, advice, etc. – not more numbers
• Don’t be afraid to lets people and machines work in tandem
– But make humans’ work easy as possible
• Go big
– Satellites, not sentiment analysis
Areas for innovation
You can use that data
BI • Platfora • Hive/Shark • Hadapt • Teradata/Aster Data • EMC Greenplum • ClearStory • Precog • Metamarkets/Druid • SAP HANA Applications/platforms • Continuuity • Infochimps • WibiData • 0xdata
Physical goods (e.g., True & Co.) • Better products through data • Something like continuous development Really smart devices (e.g., Nest but better) • Connected to each other • Collective intelligence Social science/crime (e.g., bullying algorithm) • Spotting problematic behavior