PYTORCH与机器学习面临的新挑战

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

P Y T O R C H A N D T H E N E W C H A L L E N G E S O F M L
微信扫码二维码，免费报告轻松领
加入微信群领行研报告资料集
每日领取最新免费5+份精选报告
1.扫一扫二维码，添加客服微信（微信号：Teamkon2）
2.添加好友请备注：姓名+单位+业务领域
3.群主将邀请您进专业行业报告资源群
报告整理于网络，只用于分享，如有侵权，请联系我们
加入“知识星球行业与管理资源”，免费下载报告
1.免费下载各领域行业研究报告、咨询公司管理方案，企业运营制
度、科技方案与大咖报告等。

2.每月同步更新2000+份最新行业资源；涵盖科技、金融、教育、
互联网、房地产、生物制药、医疗健康等行研报告、科技动态、管理方案；
微信扫码加入“知识星球行业与管理资源”，
获取更多行业报告、管理文案、大师笔记
20374372118137155167195264315318495
762
1469
2472
3711
5371
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
201320142015201620172018
C i t a t i o n C o u n t
Gradient-Based Learning Applied to Document Recognition, LeCun et al., 1998
LeCun’s Law and the Rise of Deep Learning
TRANSLATION SPARK AR OCULUS VR BLOOD DONATIONS
400T+ PREDICTIONS PER DAY
1B+ PHONES RUNNING NEURAL NETS GLOBALLY
SIMPLICITY OVER
COMPLEXITY
HARDWARE ACCELERATED INFERENCE
DISTRIBUTED TRAINING
DYNAMIC NEURAL NETWORKS EAGER &
GRAPH-BASED EXECUTION
WHAT IS PYTORCH?
B U I L T B Y T H E
C O M M U N I T Y
B U I L T F O R
P R O D U C T I O N D E S I G N E D F O R
R E S E A R C H E R S
B U I L T B Y T H E
C O M M U N I T Y
B U I L T F O R
P R O D U C T I O N D E S I G N E D F O R
R E S E A R C H E R S
~1,200
C O N T R I B U T O R S 50%+
Y O Y G R O W T H
22K
P Y T O R C H F O R U M U S E R S
B U I L T B Y T H E
C O M M U N I T Y
B U I L T F O R
P R O D U C T I O N D E S I G N E D F O R
R E S E A R C H E R S
G R O W T H I N A R X I V M E N T I O N S I N R E S E A R C H P A P E R S
16K+S T U D E N T S E N R O L L E D
I N C O U R S E S
21M
M I N U T E S O F W A T C H T I M E I N T H E L A S T 12 M O N T H S U D A C I T Y
F A S T.A
I Practical Deep Learning for Coders, V3Part 2: Deep Learning from the Foundations
Introduction to Machine
Learning for Coders A Code-First Introduction to Natural Language Processing
B U I L T B Y T H E
C O M M U N I T Y
B U I L T F O R
P R O D U C T I O N D E S I G N E D F O R
R E S E A R C H E R S
R E S E A R C H P R O D U C T I O N
P Y T O R C H
P Y T O R C H
P Y T O R C H
P Y T O R C H
C O R E P
R I N C I P L E S BUILDING FOR SCALE DEVELOPER
EFFICIENCY
DEVELOPER EFFICIENCY ENABLING A HIGH VELOCITY OF MODEL ITERATION AND INNOVATION
C L E A N A P I S
`
`Today, we name and access dimensions by comment:# Tensor[N, C, H, W]images = torch.randn(32, 3, 56, 56)images.sum(dim=1)images.select(dim=1, index=0)But naming explicitly leads to more readable and maintainable code:NCHW = [‘N’, ‘C’, ‘H’, ‘W’]
images = torch.randn(32, 3, 56, 56, names=NCHW)
N A M E D
T E N S O R S
E X P E R I M E N T A L
`T O R C H S C R I P T
Models are Python TorchScript
programs, an optimizable subset of
Python
+ Same “models are programs” idea
+ Production deployment
+ No Python dependency
+ Compilation for performance
optimization class RNN(nn.Module):def __init__(self, W_h, U_h, W_y, b_h, b_y):super(RNN, self).__init__()self.W_h = nn.Parameter(W_h)self.U_h = nn.Parameter(U_h)self.W_y = nn.Parameter(W_y)self.b_h = nn.Parameter(b_h)self.b_y = nn.Parameter(b_y)def forward(self, x, h):y = []for t in range(x.size(0)):h=torch.tanh(x[t]@self.W_h+********_h+self.b_h)y+=[torch.tanh(********_y+self.b_y)]if t % 10== 0:
print ("stats: ", h.mean(), h.var())
return torch.stack(y), h
# one annotation!
script_rnn = torch.jit.script(RNN(W_h, U_h, W_y, b_h, b_y))
C O R E P
R I N C I P L E S BUILDING FOR SCALE DEVELOPER
EFFICIENCY
BUILDING FOR
SCALE
HIGH PERFORMANCE EXECUTION FOR MODEL TRAINING AND INFERENCE
30%
50%FB data used in an ML pipeline TODAY FB data used in an ML
pipeline in 20183X ML Data Growth in One Year
WORKFLOWS TRAINED RANKING ENGINEERS COMPUTE CONSUMED
3X
INCREASE 2X
INCREASE 3X INCREASE
O P T I M I Z I N G F O R H A R D W A R E B A C K E N D S
PYTORCH DEVELOPMENT ENV
PYTORCH JIT
MKL-DNN Cuda/CuDNN
(Q)NNPACK FBGEMM
XLA Glow TVM
Bryce Canyon
(70X HDDs + Integrated Compute)Big Basin (8X GPU + 2X CPU)Twin Lakes
(Single socket CPU card, Low Mem)
Feature Engineering 1
Training 2Inference
3Lightning
(30X Flash Drives JBOF)Tioga Pass (Dual CPU, High Mem)Tioga Pass
sxm2
Efficient inference on server and mobile devices using reduced precision math.
CONTROL
DYNAMIC
QUANTIZATION
POST
TRAINING
QUANTIZATION QUANTIZATION AWARE TRAINING LESS MEMORY
COMPUTE SPEEDUP
P Y T O R C H
R E S E A R C H
P R O T O T Y P I N G
P R O D U C T I O N
D E P L O Y M E N T +
N A M E D T E N S O R S
PyTorch set the bar for ML Developer UX by focusing on expressivity and productivity
"I want to write a program, not to (manually) build a graph"
Where are similar areas for improvement today?
D a t a h a s s e m a n t i c m e a n i n g!
But we force users to drop that context and use an abstract
"Tensor" mathematical object
Type to enter a caption.
K e y I n s i g h t:N a m e d D i m e n s i o n s
Inspired by and done in collaboration with Prof. Alexander Rush, now Cornell Tech.
K e y I n s i g h t:N a m e d D i m e n s i o n s Today we name and access dimensions by comment
Today we name and access dimensions by comment But naming explicitly leads to more readable and
maintainable code K e y I n s i g h t:N a m e d D i m e n s i o n s
By retaining semantic meaning, we also avoid common "Tensor Pitfalls" -Accidental Broadcasting
-Accidental Alignment
By retaining semantic meaning, we also avoid common "Tensor Pitfalls" -Accidental Broadcasting
-Accidental Alignment
A c c i d e n t a l
B r o a d c a s t i n g We didn't expect broadcasting to happen, but it did:
A c c i d e n t a l
B r o a d c a s t i n g We didn't expect broadcasting to happen, but it did:
We can catch this automatically!
A c c i d e n t a l
B r o a d c a s t i n g
We didn't expect broadcasting to happen, but it did:Broadcast by position, but check that dimension names are
aligned.
We can catch this automatically!
By retaining semantic meaning, we also avoid common "Tensor Pitfalls" -Accidental Broadcasting
-Accidental Alignment
A c c i d e n t a l A l i g n m e n t
No 1->N broadcast occurs across semantically distinct dimensions, but size happens to match.
A c c i d e n t a l A l i g n m e n t
No 1->N broadcasting occurs across semantically distinct
dimensions, but size happens to match.
But there are so many formats!
A c c i d e n t a l A l i g n m e n t
No 1->N broadcasting occurs across semantically distinct
dimensions, but size happens to match.
But there are so many formats!
There is a "time bomb" if I ever normalize the wrong format
and the "unaligned" dimensions have the same size!
A c c i d e n t a l A l i g n m e n t
No 1->N broadcasting occurs across semantically distinct dimensions, but size happens to match.
A c c i d e n t a l A l i g n m e n t
No 1->N broadcasting occurs across semantically distinct
dimensions, but size happens to match.
If we broadcast by name (align_as), we only need a single
normalize function for all formats。