NetApp 重复数据删除技术
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Block Write Log New FPs
Change Log File Change Log File
qsort
qsort ... qsort Merge Sort
Fingerprint File
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 12
Initialization (only necessary on pre-existing volume)
Block Write Log New FPs
Change Log File Change Log File
Gather Gatherer File
qsort
qsort ... qsort Merge Sort
A-SIS Deduplication: How it really works!
Block Write Log New FPs
Change Log File Change Log File
Fingerprint File
© 2008 NetApp. All rights reserved.
SIS Check
Sort by Inode
Update Inode
qsort
qsort ... qsort Duplicate Merge Sort Entry File Block Ref Count File
Fingerprint File
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 13
NetApp Confidential -- Do Not Distribute
3
为什么需要 Deduplication for FAS? 降低存储成本
FC – Based Systems
$/GBeffective
பைடு நூலகம்
SATA – Based Systems RAID-DP
Primary (FC) Primary & NearStore (SATA) Dedupe Space Savings “Other” Space Savings
Fingerprint File
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 10
A-SIS Deduplication: How it really works!
Block Write Log New FPs
A-SIS Deduplication: How it really works!
Block Write Log New FPs
Change Log File Change Log File
Byte-by-Byte Compare Increment and decrement Block Ref. Count File Update new inode
空间节省变化基于你的数据类型 NetApp 空间节省估算工具用于 POC 的测试
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Tag line, tag line
NetApp 重复数据删除技术
NetApp Deduplication History
NetApp Deduplication for FAS:
以前的名称 “A-SIS deduplication” Supports R200, FAS2000, FAS3000, and FAS6000 注:最小支持版本 7.2.4
Update Inode
qsort
qsort ... qsort Duplicate Merge Sort Entry File Block Ref Count File
Deduplicating
Sorting
Fingerprint File
© 2008 NetApp. All rights reserved.
Change Log File Change Log File
Byte-by-Byte Compare Increment and decrement Block Ref. Count File Update new inode
Sort by Inode
Update Inode
qsort
qsort ... qsort Duplicate Merge Sort Entry File Block Ref Count File
Change Log File Change Log File
Fingerprint File
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 11
A-SIS Deduplication: How it really works!
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 5
两种方式从Deduplication for FAS中获益 两种方式从 中获益
Time-Based Deduplication
Backup 1 Backup 2 Backup 3 Backup 4
Fingerprint File
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 9
A-SIS Deduplication: How it really works!
Block Write Log New FPs
Volume Deduplication
Original Data Volume Duplicates Identified And Removed Actual Storage Consumed
卷的重复项扫描 在单一的卷中去除重复的数据 适用于归档和压力不大的主存储 系统 Deduplication周期性地基于变化 进行重复项扫描 节省体现为全卷的百分比
INODE 1
INODE 2
IND
IND
IND
IND
DATA
DATA
DATA
DATA
© 2008 NetApp. All rights reserved.
NetApp Confidential -- Do Not Distribute
8
A-SIS Deduplication: How it really works!
Change Log File Change Log File
Gather Gatherer File
Gathering
Byte-by-Byte Compare Increment and decrement Block Ref. Count File Update new inode
Sort by Inode
© 2008 NetApp. All rights reserved.
NetApp Confidential -- Do Not Distribute
6
Deduplication “数据块级 重复项合并 数据块级” 数据块级
原始数 据文件 重复数据 块确认 重复数据 块移除
(在字节级校验后)
对于应用和用户来 说文件没有任 何变化
业界第一个普遍意义的重复数据删除技术 到2008年5月,已经安装了~6,600 个许可
– 系统总容量约 185PB – 平均空间节省达 30%
© 2008 NetApp. All rights reserved.
NetApp Confidential -- Do Not Distribute
2
支持Deduplication的FAS系统 的 支持 系统
State Enabled OR
Status Progress Active 30MB Verified
/vol/vol5
Enabled
Active
10% Merged
© 2008 NetApp. All rights reserved.
NetApp Confidential -- Do Not Distribute
应用透明的重复项合并 显著的容量节省:
– 备份数据 – 归档数据 – 访问压力不大的主数据
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 7
实现的技术:WAFL 数据块共享
Deduplication 在 WAFL 文件系统树中实现数据块共享 一个单独的数据块可被索引 256 次
NetApp Deduplication
FAS6040 FAS6080
FAS3070 FAS3040
Before
After
FAS3020
FAS2050 FAS2020
New in ONTAP 7.3
NetApp Deduplication for V-Series
All V-Series
© 2008 NetApp. All rights reserved.
A-SIS Deduplication Upcoming Features
© 2008 NetApp. All rights reserved.
NetApp Confidential -- Do Not Distribute
4
Deduplication for FAS
高级单一实例存储
– 数据块级重复识别
卷级操作 – 支持任何协议
– CIFS/NFS, FCP/iSCSI, FTP, HTTP, NDMP
应用透明
– Content Agnostic
最小化的开销
– 写开销 <10% – 读开销 0% – 容量开销 1-3%
Data Ontap 7.2.4以后在任意有 NearStore 授权的 FAS 和 R200 的存储上免费提供
Original Data Deduplicated Data New Data
Actual Storage Consumed
基于时间的重复项扫描 从若干个备份拷贝中去除重复项 空间节省率随者时间的推移而提 高 每次备份结束运行Deduplication 重复项扫描 节省可见空间率: 20:1甚至更多
SIS Check
NetApp Confidential -- Do Not Distribute Checking 15
Deduplication: “sis status” 进度信息和阶段
Filer> sis status
收集
Path /vol/vol5
State Enabled
Status Progress Active 25 MB Scanned
排序
Path /vol/vol5
State Enabled
Status Progress Active 25 MB Searched
重复删除
Path /vol/vol5
State Enabled
Status Progress Active 40MB (20%) done
核验
Path /vol/vol5
检查状态
– sis status [-l] <vol>
检查节省的空间!
– df –s <vol>
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 17
A-SIS Deduplication 空间节省
16
A-SIS Deduplication: 命令
授权激活
– license add <a_sis>
启动
– sis on <vol>
重复项处理已存在的数据
– sis start -s <vol>
规划何时进行 deduplicate 或是手动
– sis config [-s schedule] <vol> – sis start <vol>
NetApp Confidential -- Do Not Distribute 14
A-SIS Deduplication: How it really works!
Initialization (only necessary on pre-existing volume
Block Write Log New FPs
Change Log File Change Log File
qsort
qsort ... qsort Merge Sort
Fingerprint File
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 12
Initialization (only necessary on pre-existing volume)
Block Write Log New FPs
Change Log File Change Log File
Gather Gatherer File
qsort
qsort ... qsort Merge Sort
A-SIS Deduplication: How it really works!
Block Write Log New FPs
Change Log File Change Log File
Fingerprint File
© 2008 NetApp. All rights reserved.
SIS Check
Sort by Inode
Update Inode
qsort
qsort ... qsort Duplicate Merge Sort Entry File Block Ref Count File
Fingerprint File
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 13
NetApp Confidential -- Do Not Distribute
3
为什么需要 Deduplication for FAS? 降低存储成本
FC – Based Systems
$/GBeffective
பைடு நூலகம்
SATA – Based Systems RAID-DP
Primary (FC) Primary & NearStore (SATA) Dedupe Space Savings “Other” Space Savings
Fingerprint File
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 10
A-SIS Deduplication: How it really works!
Block Write Log New FPs
A-SIS Deduplication: How it really works!
Block Write Log New FPs
Change Log File Change Log File
Byte-by-Byte Compare Increment and decrement Block Ref. Count File Update new inode
空间节省变化基于你的数据类型 NetApp 空间节省估算工具用于 POC 的测试
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Tag line, tag line
NetApp 重复数据删除技术
NetApp Deduplication History
NetApp Deduplication for FAS:
以前的名称 “A-SIS deduplication” Supports R200, FAS2000, FAS3000, and FAS6000 注:最小支持版本 7.2.4
Update Inode
qsort
qsort ... qsort Duplicate Merge Sort Entry File Block Ref Count File
Deduplicating
Sorting
Fingerprint File
© 2008 NetApp. All rights reserved.
Change Log File Change Log File
Byte-by-Byte Compare Increment and decrement Block Ref. Count File Update new inode
Sort by Inode
Update Inode
qsort
qsort ... qsort Duplicate Merge Sort Entry File Block Ref Count File
Change Log File Change Log File
Fingerprint File
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 11
A-SIS Deduplication: How it really works!
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 5
两种方式从Deduplication for FAS中获益 两种方式从 中获益
Time-Based Deduplication
Backup 1 Backup 2 Backup 3 Backup 4
Fingerprint File
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 9
A-SIS Deduplication: How it really works!
Block Write Log New FPs
Volume Deduplication
Original Data Volume Duplicates Identified And Removed Actual Storage Consumed
卷的重复项扫描 在单一的卷中去除重复的数据 适用于归档和压力不大的主存储 系统 Deduplication周期性地基于变化 进行重复项扫描 节省体现为全卷的百分比
INODE 1
INODE 2
IND
IND
IND
IND
DATA
DATA
DATA
DATA
© 2008 NetApp. All rights reserved.
NetApp Confidential -- Do Not Distribute
8
A-SIS Deduplication: How it really works!
Change Log File Change Log File
Gather Gatherer File
Gathering
Byte-by-Byte Compare Increment and decrement Block Ref. Count File Update new inode
Sort by Inode
© 2008 NetApp. All rights reserved.
NetApp Confidential -- Do Not Distribute
6
Deduplication “数据块级 重复项合并 数据块级” 数据块级
原始数 据文件 重复数据 块确认 重复数据 块移除
(在字节级校验后)
对于应用和用户来 说文件没有任 何变化
业界第一个普遍意义的重复数据删除技术 到2008年5月,已经安装了~6,600 个许可
– 系统总容量约 185PB – 平均空间节省达 30%
© 2008 NetApp. All rights reserved.
NetApp Confidential -- Do Not Distribute
2
支持Deduplication的FAS系统 的 支持 系统
State Enabled OR
Status Progress Active 30MB Verified
/vol/vol5
Enabled
Active
10% Merged
© 2008 NetApp. All rights reserved.
NetApp Confidential -- Do Not Distribute
应用透明的重复项合并 显著的容量节省:
– 备份数据 – 归档数据 – 访问压力不大的主数据
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 7
实现的技术:WAFL 数据块共享
Deduplication 在 WAFL 文件系统树中实现数据块共享 一个单独的数据块可被索引 256 次
NetApp Deduplication
FAS6040 FAS6080
FAS3070 FAS3040
Before
After
FAS3020
FAS2050 FAS2020
New in ONTAP 7.3
NetApp Deduplication for V-Series
All V-Series
© 2008 NetApp. All rights reserved.
A-SIS Deduplication Upcoming Features
© 2008 NetApp. All rights reserved.
NetApp Confidential -- Do Not Distribute
4
Deduplication for FAS
高级单一实例存储
– 数据块级重复识别
卷级操作 – 支持任何协议
– CIFS/NFS, FCP/iSCSI, FTP, HTTP, NDMP
应用透明
– Content Agnostic
最小化的开销
– 写开销 <10% – 读开销 0% – 容量开销 1-3%
Data Ontap 7.2.4以后在任意有 NearStore 授权的 FAS 和 R200 的存储上免费提供
Original Data Deduplicated Data New Data
Actual Storage Consumed
基于时间的重复项扫描 从若干个备份拷贝中去除重复项 空间节省率随者时间的推移而提 高 每次备份结束运行Deduplication 重复项扫描 节省可见空间率: 20:1甚至更多
SIS Check
NetApp Confidential -- Do Not Distribute Checking 15
Deduplication: “sis status” 进度信息和阶段
Filer> sis status
收集
Path /vol/vol5
State Enabled
Status Progress Active 25 MB Scanned
排序
Path /vol/vol5
State Enabled
Status Progress Active 25 MB Searched
重复删除
Path /vol/vol5
State Enabled
Status Progress Active 40MB (20%) done
核验
Path /vol/vol5
检查状态
– sis status [-l] <vol>
检查节省的空间!
– df –s <vol>
© 2008 NetApp. All rights reserved. NetApp Confidential -- Do Not Distribute 17
A-SIS Deduplication 空间节省
16
A-SIS Deduplication: 命令
授权激活
– license add <a_sis>
启动
– sis on <vol>
重复项处理已存在的数据
– sis start -s <vol>
规划何时进行 deduplicate 或是手动
– sis config [-s schedule] <vol> – sis start <vol>
NetApp Confidential -- Do Not Distribute 14
A-SIS Deduplication: How it really works!
Initialization (only necessary on pre-existing volume
Block Write Log New FPs