uai2010_graphlab

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Gibbs Update:
- Read samples on adjacent vertices - Read edge potentials - Compute a new sample for the current vertex
16
Update Function Schedule
a
h a i b d
Exponential Parallelism
10
Exponentially Increasing Parallel Performance
Processor Speed GHz
1
13 Million Wikipedia Pages Constant Sequential 3.6 Billion photos on Flickr
Algorithm Parameters? Sufficient Statistics? Sum of all the vertices?
23
Shared Data Table (SDT)
Global constant parameters
Constant: Temperature
Constant: Total # Samples
CPU 1
a
b
c
d
e
f
g
h
i
j
k
CPU 2
17
Update Function Schedule
CPU 1
a
b
c
d
a
e
f
g
i
b d
h
i
j
k
CPU 2
18
Static Schedule
Scheduler determines the order of Update Function Evaluations
Performance
0.1
0.01 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008
Release Date
2010
2
Parallel Programming is Hard
Designing efficient Parallel Algorithms is hard
A New Framework for Parallel Machine Learning
Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein
GraphLab
Carnegie Mellon
1 2 . 9
2 4 . 1
4 2 . 3
8 4 . 3
2 1 . 3
1 8 . 4
2 5 . 8
8 4 . 4
Embarrassingly Parallel independent computation No Communication needed
6
MapReduce – Reduce Phase
31
Obtaining More Parallelism
Not all update functions will modify the entire scope!
Belief Propagation: Only uses edge data Gibbs Sampling: Only needs to read adjacent vertices
parallel design challenges
Avoid these problems by using high-level abstractions.
3
MapReduce – Map Phase
CPU 1
1 2 . 9
CPU 2
4 2 . 3
CPU 3
2 1 . 3
CPU 4
2 5 . 8
Sync!
0
1 5 2 3
Accumulate Function: Add Apply Function: Divide by |V|
22 2 9
1
1
3
2
1
2
1
25
Shared Data Table (SDT)
Global constant parameters Global computation (Sync Operation)
Express data dependencies Iterative
Simplifies the design of parallel programs:
Abstract away hardware issues Automatic data synchronization Addresses multiple hardware architectures
2 1 . 3
2 5 . 8
Embarrassingly Parallel independent computation No Communication needed
5
MapReduce – Map Phase
CPU 1
1 7 . 5
CPU 2
6 7 . 5
CPU 3
1 4 . 9
CPU 4
3 4 . 3
Embarrassingly Parallel independent computation No Communication needed
4
MapReduce – Map Phase
CPU 1
2 4 . 1
CPU 2
8 4 . 3
CPU 3
1 8 . 4
CPU 4
8 4 . 4
1 2 . 9
4 2 . 3
9
Feature Extraction Cross Validation
Belief Propagation SVM Sampling Neural Networks
Common Properties
1) Sparse Data Dependencies • • Sparse Primal SVM Tensor/Matrix Factorization
Data Graph Shared Data Table
GraphLab Model
Scheduling
Update Functions and Scopes
14
Data Graph
A Graph with data associated with every vertex and edge.
X1
Obtain different algorithms simply by changing a flag!
--scheduler=fifo --scheduler=priority --scheduler=splash
22
Global Information
What if we need global information?
2) Local Computations
• •
Sampling Belief Propagation
3) Iterative Updates • • Expectation Maximization Optimization
Operation A Operation B
10
Gibbs Sampling
Constant: Temperature Sync: Loglikelihood
Constant: Total # Samples
Sync: Sample Statistics
26
Safety Consistency
Carnegie Mellon 27
and
Write-Write Race
Write-Write Race If adjacent update functions write simultaneously
GraphLab design ensures race-free operation
29
Scope Rules
Guaranteed safety for all update functions
30
Full Consistency
Only allow update functions two vertices apart to be run in parallel Reduced opportunities for parallelism
1) Sparse Data Dependencies 2) Local Computations X1 X2 X3
X4
X5
X6
3) Iterative Updates
X7
X8
X9
11
GraphLab is the Solution
Designed specifically for ML needs
20
Dynamic Schedule
a
h a b
CPU 1
a
b
c
d
e
f
g
h
i
j
k
CPU 2
21
Dynamic Schedule
Update Functions can insert new tasks into the schedule
FIFO Queue Priority Queue Splash Schedule Wildfire BP [Selvatici et al.] Residual BP [Elidan et al.] Splash BP [Gonzalez et al.]
CPU 1
22 26 . 26
CPU 2
17 26 . 31
1 2 . 9
2 4 . 1
1 7 . 5
wenku.baidu.com
4 2 . 3
8 4 . 3
6 7 . 5
2 1 . 3
1 8 . 4
1 4 . 9
2 5 . 8
8 4 . 4
3 4 . 3
Fold/Aggregation
7
Related Data
Interdependent Computation: Not MapReduceable
Race conditions and deadlocks Parallel memory bottlenecks Architecture specific concurrency Difficult to debug Graduate students repeatedly address the same ML experts
X2
X3
X4
x3: Sample value C(X3): sample counts
X5
X6
X7
X8
X9
X10
X11
Φ(X6,X9): Binary potential
:Data
15
Update Functions
Update Functions are operations which are applied on a vertex and transform the data in the scope of the vertex
Implementation here is multi-core Distributed implementation in progress
12
GraphLab
A New Framework for Parallel Machine Learning
Carnegie Mellon
GraphLab
24
Sync Operation
Sync is a fold/reduce operation over the graph Accumulate performs an aggregation over vertices Apply makes a final modification to the accumulated data Example: Compute the average of all the vertices
Left update writes:
Final Value
Right update writes:
28
Race Conditions + Deadlocks
Just one of the many possible races Race-free code is extremely difficult to write
8
Parallel Computing and ML
Not all algorithms are efficiently data parallel
Data-Parallel Complex Parallel Structure
Kernel Methods Tensor Factorization Deep Belief Networks Lasso
Synchronous Schedule: Every vertex updated simultaneously Round Robin Schedule: Every vertex updated sequentially
19
Need for Dynamic Scheduling
Converged Slowly Converging Focus Effort