自动化运维管理解决方案--白皮书

合集下载

IT数据中心运维服务白皮书

IT数据中心运维服务白皮书

IT服务白皮书鹏博士电信传媒集团股份有限公司IT服务白皮书二零一三年十一月目 录第一章运维服务概述 ...........................................................................1、 ........................................................................................2、 ........................................................................................3、 ........................................................................................第二章监控巡检服务 ...........................................................................1、实时监控 ...............................................................................2、日常监控 ...............................................................................第三章服务器运维管理服务 ....................................................................1、服务器健康检查 ........................................................................2、服务器日常维护 ........................................................................3、服务器配置管理 ........................................................................4、服务器性能管理 ........................................................................第四章网络运维管理服务 .......................................................................1、网络拓扑规划和优化 .....................................................................2、网络设备安装、配置、调试 ................................................................3、网络设备“高可用”配置和维护 ..........................................................4、网络设备性能管理 ......................................................................第五章存储运维管理服务 .......................................................................1、存储设备安装、配置、调试 ................................................................2、存储容量空间管理 ......................................................................3、存储性能管理 ............................................................................第六章数据库管理服务 .........................................................................1、数据库安装、配置、调试 ..................................................................2、数据库性能管理 ........................................................................3、数据库容量空间管理 .....................................................................4、数据库备份和恢复管理 ...................................................................第七章安全管理 ..............................................................................1、服务器安全管理 ........................................................................2、网络安全管理 ............................................................................第八章管理制度、流程 .........................................................................1、服务支持 ...............................................................................2、服务交付 ...............................................................................第九章应急管理 ..............................................................................1、应急预案开发和维护 .....................................................................2、应急演练 ...............................................................................第一章运维服务概述1、2、3、第二章监控巡检服务1、实时监控2、日常监控第三章服务器运维管理服务1、服务器健康检查为了提高系统的可用性,将故障排除在发生之前至关重要。

自动化运维方案全套

自动化运维方案全套

自动化运维方案全套1. 引言本文档旨在提供一套完整的自动化运维方案,以帮助组织实现高效、可靠的运维管理。

该方案包括自动化工具选择、实施流程、监控与报警等关键要素。

2. 自动化工具选择在选择自动化工具时,以下几个关键因素应被考虑:- 功能覆盖:选择工具时应确保其覆盖所需的运维任务,如配置管理、部署管理、编排等。

- 可扩展性:工具应支持灵活的扩展和定制,以适应组织的特定需求。

- 社区支持:选择有活跃社区支持的开源工具,以确保及时获取补丁、修复和新功能。

- 可靠性和稳定性:选择经过长期验证和使用广泛的工具,减少风险。

根据以上考虑,我们推荐以下自动化工具:- 配置管理:Ansible- 部署管理:Kubernetes- 监控与报警:Prometheus3. 实施流程为确保顺利实施自动化运维方案,以下是一套基本的实施流程:1. 环境准备:搭建运维自动化平台所需的基础设施,如服务器、网络等。

2. 工具安装和配置:安装和配置所选自动化工具,确保其与目标系统的兼容性和一致性。

3. 资源定义与管理:定义和管理所需的资源和配置信息,包括服务器、应用程序、网络等。

4. 告警设置:配置监控与报警系统,针对关键指标和事件设定合适的告警规则。

5. 测试和验证:对自动化流程进行测试和验证,确保其正常运行且符合预期。

6. 持续优化:定期检查和优化自动化方案,以适应系统和业务的变化。

4. 监控与报警在自动化运维方案中,监控与报警是至关重要的环节,以下是一些关键点:- 监控指标:设定关键性能指标和事件,如CPU使用率、内存利用率、服务宕机等。

- 实时监控:确保监控系统能够实时监测系统的状态和性能数据。

- 告警通知:配置告警规则,并设置及时的通知方式,如邮件、短信、Slack等。

- 告警处理:定义和执行告警处理流程,包括问题追踪、故障排查和修复等。

5. 结论本文档提供了一套完整的自动化运维方案,包括自动化工具选择、实施流程和监控与报警。

自动化运维实施方案

自动化运维实施方案

自动化运维实施方案随着软件和系统的复杂性不断增加,传统的手动运维已经无法满足企业的需求。

为了提高运维效率和降低运维成本,自动化运维成为企业的首选。

下面是一个自动化运维实施方案的示例,包括自动化监控、自动化部署和自动化故障处理。

1. 自动化监控自动化监控是自动化运维的重要基础,可以实时监测系统的状态和性能指标。

监控方案应包括以下几个方面:- 设定关键性能指标和阈值,如系统负载、网络流量、磁盘空间等,并在指标超过阈值时触发报警。

- 使用监控工具对服务器、网络设备和应用程序进行自动化监控,收集指标数据,并展示在监控面板上。

- 搭建集中式日志管理平台,将日志文件收集、存储和分析整合,提供快速诊断问题的能力。

2. 自动化部署自动化部署是通过工具和脚本实现的,可以大幅度减少人员手动操作的时间和错误。

部署方案应包括以下几个步骤:- 使用版本管理工具(如Git)进行代码版本管理,确保每一次部署都是可追溯的。

- 编写部署脚本,将部署的过程自动化,包括安装依赖、创建配置文件、编译代码、发布到生产环境等。

- 使用虚拟化或容器化技术,将部署环境进行标准化,并使用容器编排工具(如Docker)进行发布和扩容。

3. 自动化故障处理自动化故障处理可以快速诊断和修复故障,提高系统的可用性和稳定性。

故障处理方案应包括以下几个步骤:- 实施监控告警自动化处理,当监控指标超过阈值时,自动触发故障处理程序,进行自动化的故障诊断和处理。

- 编写故障自愈脚本,实现故障的自动修复,包括重启服务、调整配置参数、放入维护模式等。

- 搭建故障自愈系统,提供故障的自动发现、自动排查、自动修复等能力,并记录故障处理过程,便于事后分析。

通过上述自动化运维实施方案,企业可以大幅度提高运维效率,降低运维的人力成本和错误率。

此外,自动化运维还能够实时监测系统状态,及时发现和解决问题,提高系统的稳定性和可用性。

因此,自动化运维已经成为企业提升核心竞争力的重要手段之一。

中国信通院 企业it运维发展白皮书

中国信通院 企业it运维发展白皮书

我国信通院企业IT运维发展白皮书一、概述近年来,随着信息技术的不断发展和应用,企业的IT系统运维工作越来越重要。

作为企业信息化建设的基础和支撑,IT运维对企业的稳定运行和发展起着至关重要的作用。

我国信通院作为国内领先的通信和信息技术研究机构,对企业IT运维的发展进行了深入研究,并撰写了本白皮书,以期为企业提供参考和指导。

二、企业IT运维的发展现状1. 企业IT运维的重要性IT运维是企业信息化建设的重要环节,它关系到企业整体运行的稳定性和高效性。

合理的IT运维工作能够确保企业的业务系统正常运行、数据安全可靠、故障能够及时处理,从而为企业的发展提供有力支持。

2. 企业IT运维存在的问题虽然企业对IT运维的重视程度不断提高,但在实际运行过程中,仍然存在一些问题。

人员技术能力不足、工作流程不够规范、设备和系统管理混乱等。

这些问题严重影响了企业IT运维的效率和质量。

三、我国信通院对企业IT运维的建议1. 提高人员技术能力我国信通院建议企业加大对IT运维人员的培训和学习力度,提高他们的技术能力和服务意识。

只有拥有一支高素质的IT运维团队,企业的IT系统才能得到有效保障。

2. 规范IT运维流程规范的运维流程是确保IT系统正常运行的基础。

我国信通院提倡企业建立完善的IT运维管理制度,明确各项工作的责任和流程,保障运维工作的有序进行。

3. 部署先进的运维工具在IT运维过程中,合适的工具和系统对提高工作效率和质量至关重要。

我国信通院建议企业积极引进和使用先进的运维工具,提高系统监控、故障分析和处理的能力。

4. 加强设备和系统管理设备和系统是IT运维的基础,对其进行有效的管理能够提高IT系统的稳定性和可靠性。

我国信通院建议企业加强对设备和系统的管理,定期检查和维护,保证其正常运行。

四、结语企业IT运维的发展是一个系统工程,需要全面的考虑和有效的措施。

我国信通院将继续深入研究和探讨企业IT运维的相关问题,为企业提供更多的指导和支持。

华为Edge OTN解决方案技术白皮书V1.1说明书

华为Edge OTN解决方案技术白皮书V1.1说明书

Edge OTN 解决方案技术白皮书文档版本 V1.1 发布日期2021-03-20华为技术有限公司版权所有© 华为技术有限公司2021。

保留一切权利。

非经本公司书面许可,任何单位和个人不得擅自摘抄、复制本文档内容的部分或全部,并不得以任何形式传播。

商标声明和其他华为商标均为华为技术有限公司的商标。

本文档提及的其他所有商标或注册商标,由各自的所有人拥有。

注意您购买的产品、服务或特性等应受华为公司商业合同和条款的约束,本文档中描述的全部或部分产品、服务或特性可能不在您的购买或使用范围之内。

除非合同另有约定,华为公司对本文档内容不做任何明示或默示的声明或保证。

由于产品版本升级或其他原因,本文档内容会不定期进行更新。

除非另有约定,本文档仅作为使用指导,本文档中的所有陈述、信息和建议不构成任何明示或暗示的担保。

华为技术有限公司地址:深圳市龙岗区坂田华为总部办公楼邮编:518129网址:https://客户服务邮箱:******************客户服务电话:4008302118文档版本V1.1 (2021-03-20) 版权所有© 华为技术有限公司第 2 共29目录1 FMEC网络融合的趋势与挑战 (4)1.1 品质业务需求快速增长 (4)1.2 融合业务成为趋势 (6)1.3 FMEC网络建设面临的挑战 (7)1.4 总结 (8)2 Edge OTN方案是FMEC融合建网的最佳选择 (9)2.1 Edge OTN架构 (9)2.2 基于价值区域的精准布局建网方式 (10)2.3 总结 (12)3 Edge OTN关键技术 (13)3.1 环境适应性增强技术 (13)3.2 灰光彩光混合传输 (13)3.3 Liquid OTN技术 (14)3.4 高精度时间同步 (15)4 华为Edge OTN解决方案 (16)4.1 精准规划工具 (16)4.2 全场景部署能力 (17)4.3 光层电层创新方案 (19)4.3.1 极简光层 (19)4.3.2 X+Y分布式电层 (20)4.3.3 创新线路速率 (22)4.3.4 平滑演进典型方案 (22)4.4 智慧运维 (23)4.4.1 NCE智能管控 (23)4.4.2 光层自动调测 (24)4.4.3 智能光纤管理 (24)4.4.4 智慧光性能管理 (24)5 总结 (26)A 缩略语 (27)1 FMEC网络融合的趋势与挑战1.1 品质业务需求快速增长宽带成为人们生产、生活必需的基础资源。

企业AIOps智能运维方案白皮书

企业AIOps智能运维方案白皮书

企业AIOps智能运维方案白皮书目录背景介绍4组织单位4编写成员5发起人5顾问5编审成员5本版本核心编写成员61、整体介绍82、AIOps 目标103、AIOps 能力框架114、AIOps 平台能力体系145、 AIOps 团队角色17 5.1 运维工程师17 5.2 运维开发工程师175.3 运维 AI 工程师176、AIOps 常见应用场景19 6.1 效率提升方向216.1.1 智能变更226.1.2 智能问答226.1.3 智能决策236.1.4 容量预测23 6.2 质量保障方向246.2.1 异常检测246.2.2 故障诊断256.2.3 故障预测256.2.4 故障自愈26 6.3 成本管理方向266.3.1 成本优化266.3.2资源优化276.3.3容量规划286.3.4性能优化287、AIOps 实施及关键技术29 7.1数据采集29 7.2数据处理30 7.3数据存储30 7.4离线和在线计算30 7.5面向 AIOps 的算法技术30说明:31附录:案例33案例1:海量时间序列异常检测的技术方案331、案例陈述332、海量时间序列异常检测的常见问题与解决方案333、总结34案例2:金融场景下的根源告警分析351、案例概述352、根源告警分析处理流程353、根源告警分析处理方法374、总结39案例3:单机房故障自愈压缩401、案例概述402、单机房故障止损流程403、单机房故障自愈的常见问题和解决方案414、单机房故障自愈的架构435、总结44背景介绍AIOps 即智能运维,其目标是,基于已有的运维数据(日志、监控信息、应用信息等),通过机器学习的方式来进一步解决自动化运维所未能解决的问题,提高系统的预判能力、稳定性、降低 IT 成本,并提高企业的产品竞争力。

Gartner 在 2016 年时便提出了 AIOps 的概念,并预测到 2020 年,AIOps 的采用率将会达到 50%。

运维 白皮书

运维 白皮书

运维白皮书运维白皮书是一份详细说明了运维相关信息和策略的文档,旨在帮助组织或企业实施高效的运维管理和支持。

以下是关于运维白皮书的一些重要内容:1. 简介:在这一部分,我们会介绍运维管理的定义和目标。

我们会解释为什么运维对于保持业务运转的连续性和稳定性是如此重要,并列举一些运维优化可以带来的好处。

2. 团队和责任:这一部分会涵盖团队结构和组织,明确各个角色的职责和责任。

我们会详细描述不同级别的运维团队成员,从管理员到工程师,及其所承担的任务。

3. 流程和策略:在这一部分,我们会描述运维团队所需遵循的流程和策略。

我们会提及一些常用的ITIL(信息技术基础架构库)流程,例如变更管理、问题管理、发布管理等等。

我们还会介绍紧急响应计划和备份恢复策略等关键策略。

4. 工具和技术:这一部分将涵盖运维所需的工具和技术。

我们会介绍监控工具、自动化工具、故障诊断工具等等,以及这些工具如何帮助运维团队更好地管理和支持系统和应用。

5. 安全和合规:安全和合规性对于运维至关重要。

在这一部分,我们会讨论运维团队应遵循的安全最佳实践和合规性标准。

我们会提及访问控制、身份验证、数据保护等关键方面。

6. 持续改进:运维团队需要不断改进和创新,以适应新的技术和业务需求。

在这一部分,我们会描述一些持续改进方法和工具,例如Kaizen、PDCA(计划-执行-检查-行动)循环等等。

7. 成果和度量:最后,我们会介绍如何衡量和评估运维团队的绩效。

我们会讨论关键绩效指标(KPIs)和报告机制。

通过运维白皮书的指导,组织和企业可以建立健全的运维管理框架,并提高效率、降低风险、提供更稳定的服务。

这样的白皮书不仅可以帮助运维团队更好地组织和管理工作,也可以为其他团队和利益相关者提供清晰的指导和了解。

IT运维管理解决方案

IT运维管理解决方案

IT运维管理解决方案标题:IT运维管理解决方案引言概述:随着信息技术的不断发展,企业对于IT运维管理的需求也越来越高。

有效的IT运维管理解决方案能够帮助企业提高运维效率、降低成本、保障系统稳定性。

本文将介绍几种常见的IT运维管理解决方案,帮助企业选择适合自身需求的解决方案。

一、自动化运维管理解决方案1.1 自动化运维工具:利用自动化运维工具可以实现任务的自动化执行,减少人工干预,提高效率。

1.2 自动化监控系统:通过自动化监控系统可以实时监测系统运行状态,及时发现和解决问题。

1.3 自动化配置管理:自动化配置管理可以确保系统配置的一致性,降低配置错误的风险。

二、云计算运维管理解决方案2.1 云监控服务:云监控服务可以帮助企业监控云端资源的使用情况,及时调整资源配置。

2.2 自动化扩展服务:通过自动化扩展服务,可以根据需求自动扩展云端资源,提高系统的弹性和稳定性。

2.3 云安全管理:云安全管理可以保护云端数据的安全,防止数据泄露和攻击。

三、容器化运维管理解决方案3.1 容器编排工具:容器编排工具可以帮助企业管理容器集群,实现容器的自动部署和调度。

3.2 容器监控系统:容器监控系统可以监控容器的运行状态,及时发现和解决问题。

3.3 容器安全管理:容器安全管理可以确保容器环境的安全,防止容器被攻击和滥用。

四、DevOps运维管理解决方案4.1 自动化部署工具:通过自动化部署工具可以实现持续集成和持续部署,加快软件发布周期。

4.2 自动化测试工具:自动化测试工具可以帮助企业进行自动化测试,提高软件质量。

4.3 运维团队协作工具:DevOps运维管理解决方案也包括运维团队协作工具,帮助团队成员协作高效。

五、智能运维管理解决方案5.1 人工智能监控系统:人工智能监控系统可以通过机器学习算法实现自动化故障预测和诊断。

5.2 智能分析工具:智能分析工具可以帮助企业分析运维数据,发现潜在问题并提出解决方案。

5.3 智能运维平台:智能运维平台整合了各种智能工具,提供全方位的智能运维管理解决方案。

自动化运维解决方案

自动化运维解决方案

自动化运维解决方案引言随着软件系统的复杂性增加,传统的手动运维方式已经无法满足大规模和高效的运维需求。

自动化运维解决方案应运而生,通过自动化工具和技术实现运维任务的自动化执行,提高系统的稳定性和运维效率。

本文将介绍自动化运维的概念、原则以及一些常见的自动化运维解决方案。

自动化运维概述自动化运维是指利用自动化工具和技术来实现运维任务的自动化执行。

通过编写脚本、使用配置管理工具、自动化部署等手段,减少人工干预,提高运维效率和系统稳定性。

自动化运维的主要目标包括:1.降低人工操作成本:自动化运维可以减少人工操作的时间和工作量,降低人工运维成本。

2.提高运维效率:自动化工具能够快速、准确地执行运维任务,提高运维效率。

3.增强系统稳定性:自动化运维可以减少人为错误的发生,降低系统故障的概率,提高系统的稳定性。

自动化运维的原则在进行自动化运维时,需要遵循一些原则,以确保自动化运维的效果和效益:1.规划和设计合理的自动化策略:在进行自动化运维之前,需要进行全面的规划和设计,确定自动化的目标和范围,选择适合的工具和技术,确保自动化运维能够真正发挥作用。

2.保证运维任务的正确性:自动化运维涉及到系统的重要操作,需要保证运维任务的正确性,避免因自动化操作而引起的系统故障。

在编写自动化脚本时,需要进行充分的测试和验证,确保脚本的正确性和稳定性。

3.持续优化和改进:自动化运维是一个持续优化和改进的过程,需要根据实际情况和需求不断优化和改进自动化运维策略和工具,提高自动化运维的效率和效果。

常见的自动化运维解决方案1. 配置管理工具配置管理工具是一种用于管理系统配置和组件的工具,可以自动化完成配置的部署、更新和管理。

常见的配置管理工具包括Ansible、Puppet、Chef等。

这些工具提供了丰富的功能和灵活的配置方式,可以实现自动化的配置管理。

2. 自动化部署工具自动化部署工具是用于自动化部署应用程序和系统的工具,能够快速、准确地进行系统的部署和更新。

统一IT运维管理平台(BMC)解决方案技术白皮书v4.3

统一IT运维管理平台(BMC)解决方案技术白皮书v4.3

BMC统一IT运维管理平台解决方案技术白皮书博思软件(中国)有限公司2010年1月文档说明本文档所涉及到的文字、图表等,仅限于博思软件(中国)有限公司和被呈送方内部使用,未经双方书面许可,请勿扩散到第三方。

文档属性项目名称:文档主题:技术白皮书文档编号:文档版本: 4.1版本日期:2010.1.10文档状态:作者:文档变更2.0 2007.9.153.0 2009.6.64.0 2009.12.29 陈傲寒4.3 2010.1.17文档送呈目录1方案体系架构 (5)1.1方案逻辑结构 (5)1.2CMS/CMDB配置管理系统 (7)1.3集中监控平台............................................................................ 错误!未定义书签。

1.3.1数据采集层........................................................................ 错误!未定义书签。

1.3.2数据处理层........................................................................ 错误!未定义书签。

1.4自动化管理平台........................................................................ 错误!未定义书签。

1.5流程管理平台 (7)1.6数据展现平台 (8)1.7对应的BMC产品 (9)1.7.1CMS/CMDB配置管理系统 (9)1.7.2集中监控平台.................................................................... 错误!未定义书签。

1.7.3自动化管理平台................................................................ 错误!未定义书签。

H3C SeerEngine-DC Underlay自动化运维技术白皮书-V1.0

H3C SeerEngine-DC Underlay自动化运维技术白皮书-V1.0

H3C SeerEngine-DCUnderlay自动化运维技术白皮书Copyright © 2020 新华三技术有限公司版权所有,保留一切权利。

非经本公司书面许可,任何单位和个人不得擅自摘抄、复制本文档内容的部分或全部,并不得以任何形式传播。

除新华三技术有限公司的商标外,本手册中出现的其它公司的商标、产品标识及商品名称,由各自权利人拥有。

本文档中的信息可能变动,恕不另行通知。

目录1 概述 (1)1.1 产生背景 (1)1.2 技术优点 (1)2 Underlay自动化上线 (1)2.1 概念介绍 (1)2.1.1 网络层级 (1)2.1.2 Fabric (2)2.1.3 设备角色 (2)2.1.4 设备类型 (3)2.1.5 自动化模板 (3)2.1.6 设备配置模板 (3)2.1.7 设备清单 (4)2.1.8 白名单 (4)2.1.9 精细配置 (4)2.1.10 TFTP服务 (4)2.1.11 DHCP server (4)2.1.12 版本库 (4)2.1.13 自动化上线地址池 (5)2.1.14 管理网地址池 (5)2.2 运行机制 (5)2.2.1 配置流程 (5)2.2.2 运行流程 (5)3 设备维护 (8)3.1 设备版本升级 (8)3.1.1 版本库管理 (8)3.1.2 升级流程 (8)3.2 设备备份和替换 (9)3.2.1 设备备份 (9)3.2.2 设备替换 (9)i4 典型组网应用 (9)4.1 自动化上线预配置 (9)ii1 概述1.1 产生背景当前由云、网络、终端组成的IT基础架构正经历着巨大的技术变革,传统终端向智能化、移动化演进,传统IT架构也向云迁移,实现计算资源的弹性扩张、随需交付、应需而动。

在此环境下,传统的一种应用一种架构的数据中心烟囱式架构,系统的可扩展性差、普适性差,已经不适用于云业务的部署要求。

必须要对现有的IT基础架构进行变革,打通网络平台、云管理平台以及终端平台的界限,使整个IT系统成为一个融合架构,使其能够承载所有应用。

数字化运维 白皮书

数字化运维 白皮书

数字化运维白皮书主要介绍了数字化运维的概念、应用和实践。

以下是数字化运维白皮书的核心要点:
1.数字化运维的概念:数字化运维是指利用数字化技术进行运维管理的过程,包
括自动化、智能化、精细化等方面的应用。

2.数字化运维的应用:数字化运维在各个领域都有广泛的应用,例如IT运维、智
能制造、智慧城市等。

数字化运维可以帮助企业提高效率、降低成本、提升服务质量,是数字化转型的重要组成部分。

3.数字化运维的实践:数字化运维的实践需要结合企业实际情况进行,包括制定
数字化运维战略、建立数字化运维团队、完善数字化运维体系等方面。

企业需要不断探索和实践,逐步完善数字化运维的实践经验和方法。

总之,数字化运维是数字化时代的重要趋势,企业需要积极探索和实践,不断提升自身的数字化运维能力和水平。

afci白皮书

afci白皮书

afci白皮书AFCI白皮书是指由阿里云智能计算联盟(AFCI)发布的技术白皮书,该白皮书详细介绍了阿里云在智能计算领域的技术理念、产品架构和解决方案。

在本文中,我们将根据AFCI白皮书的内容,分享一些与之相关的参考内容,以探讨智能计算的发展趋势和应用场景。

首先,AFCI白皮书强调了边缘计算的重要性和前景。

边缘计算是指将数据处理和分析的任务从云端转移到离用户近处的边缘设备上,以提高计算效率和降低延迟。

边缘计算的兴起受益于物联网的普及和边缘设备性能的提升,现如今已经广泛应用于工业自动化、智能交通等领域。

关于边缘计算,我们可以参考IDC发布的《边缘计算技术发展与应用研究报告》,该报告详细介绍了边缘计算的技术架构、应用场景和市场前景。

其次,AFCI白皮书阐述了人工智能与区块链的融合。

人工智能和区块链是近年来备受关注的两大热门技术。

人工智能的发展在于提高数据处理能力和算法模型的智能性,而区块链的发展在于构建安全可信的去中心化网络。

人工智能和区块链的结合,可以实现算法模型的验证和数据的隐私保护。

关于人工智能与区块链的融合,我们可以参考《人工智能与区块链融合的研究与应用》一书,该书从理论、算法和应用等维度进行了深入探讨。

此外,AFCI白皮书强调了云原生技术的重要性。

云原生是一种构建和运行在云计算平台上的应用程序的方法论,旨在实现应用的高可用性、弹性和可扩展性。

云原生技术包括容器、微服务框架和自动化运维工具等。

关于云原生技术,我们可以参考《云原生技术与实践》一书,该书介绍了云原生技术的基本概念、原理和实践案例,有助于我们深入了解云原生的应用和发展。

最后,AFCI白皮书提到了基于AI的物联网技术。

物联网(Internet of Things, IoT)是指将传感器、设备和其他物理实体通过互联网互相连接,形成一个网络,实现设备之间的信息交互和智能控制。

基于AI的物联网技术能够通过感知、识别和推理等能力,实现对物理世界的智能感知和智能决策。

数据中心运维管理技术白皮书

数据中心运维管理技术白皮书

数据中心运维管理技术白皮书1. 引言数据中心是现代企业不可或缺的重要部分,它承载着企业的关键应用、业务数据和信息系统。

数据中心运维管理技术的有效应用,可以提高数据中心的稳定性、可用性和安全性,从而保障企业的业务运营和数据安全。

本白皮书旨在介绍数据中心运维管理技术的相关概念、原则和实践,帮助企业更好地理解和应用这些技术。

2. 数据中心运维管理技术概述数据中心运维管理技术是指通过采用各种管理工具和技术手段,对数据中心资源进行有效监控、管理和维护的一系列操作。

其核心目标是提高数据中心的效率、可靠性和安全性。

数据中心运维管理技术包括但不限于以下几个方面:2.1 基础设施管理技术基础设施管理技术是指对数据中心的物理设备进行管理的技术,包括机房环境监控、设备巡检、机柜管理、电力管理等。

通过对基础设施的有效管理,可以提高数据中心的稳定性和可用性。

2.2 服务器管理技术服务器管理技术是指对数据中心的服务器进行管理的技术,包括服务器监控、性能管理、配置管理、容量规划等。

通过对服务器资源的合理配置和管理,可以提高数据中心的资源利用率和性能。

2.3 网络管理技术网络管理技术是指对数据中心的网络设备进行管理的技术,包括网络拓扑管理、流量监控、带宽管理、安全管理等。

通过对网络的有效管理,可以提高数据中心的网络带宽利用率和安全性。

2.4 存储管理技术存储管理技术是指对数据中心的存储设备进行管理的技术,包括存储管理、备份恢复、存储性能管理等。

通过对存储设备的有效管理,可以提高数据中心的数据备份和恢复能力。

3. 数据中心运维管理技术的原则在应用数据中心运维管理技术时,需要遵循以下几个原则:3.1 自动化数据中心运维管理技术应该借助自动化工具或脚本来实现对数据中心资源的自动化监控和管理。

这样可以减少人工干预和错误,提高运维效率和可靠性。

3.2 统一管理数据中心运维管理技术应该采用统一的管理平台或工具来管理数据中心的各类资源,包括物理设备、服务器、网络设备和存储设备等。

BMC统一IT运维管理平台解决方案技术白皮书

BMC统一IT运维管理平台解决方案技术白皮书

BMC统一IT运维管理平台解决方案技术白皮书BMC统一IT运维管理平台解决方案技术白皮书博思软件(中国)有限公司2010年1月BMC 解决方案技术白皮书文档说明文档属性属性内容客户名称:项目名称:文档主题: 技术白皮书文档编号:4.1 文档版本:2010.1.10 版本日期:文档状态:作者:文档变更版本修订日期修订人描述 1.0 2005.3.26 2.0 2007.9.15 3.0 2009.6.6 4.0 2009.12.29 XXXX 4.1 2010.1.10文档送呈单位姓名目的第 2 页共 123 页BMC 解决方案技术白皮书目录1 方案体系架构 ..................................................................... .............................................. 5 1.1 方案逻辑结构 ..................................................................... .. (5)CMS/CMDB配置管理系统...................................................................... ............... 7 1.21.3 集中监控平台 ..................................................................... .. (7)1.3.1 数据采集层 ..................................................................... . (7)1.3.2 数据处理层 ..................................................................... .................................. 8 1.4 自动化管理平台 ..................................................................... .................................. 8 1.5 流程管理平台 ..................................................................... ...................................... 9 1.6 数据展现平台 ..................................................................... ...................................... 9 1.7 本解决方案对应的BMC产品 ..................................................................... (10)1.7.1 CMS/CMDB配置管理系统 ..................................................................... (10)1.7.2 集中监控平台 ..................................................................... . (11)1.7.3 自动化管理平台 ..................................................................... (12)1.7.4 流程管理平台 ..................................................................... ............................ 12 2 系统组成及功能 ..................................................................... ........................................ 14 2.1 CMS/CMDB配置管理系统...................................................................... . (14)2.1.1 系统逻辑架构 ............................................................. 错误~未定义书签。

Huawei Ansible自动化技术白皮书(2018年1月20日)说明书

Huawei Ansible自动化技术白皮书(2018年1月20日)说明书

Ansible Automation Technology White PaperIssue 01Date 2018-01-20Copyright © Huawei Technologies Co., Ltd. 2018. All rights reserved.No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.Trademarks and Permissionsand other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respective holders.NoticeThe purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied.The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied.Huawei Technologies Co., Ltd.Address:Huawei Industrial Base Bantian, Longgang Shenzhen 518129 People's Republic of ChinaWebsite: Email: ****************** Tel:4008302118Contents1 Background and Pain Points (1)2 Ansible Solution (5)2.1 Introduction to the Solution for Connecting Ansible and CE Switches (5)2.2 Using Huawei Ansible Modules (6)2.2.1 Installing Ansible (7)2.2.2 Using ZTP to Deploy Devices (7)2.2.3 Configuring Services in Batches (9)2.3 Application Scenarios Recommended by Huawei (10)2.3.1 Standard T emplates (10)2.3.1.1 T emplate for DCN3.0 Delivery (11)2.3.1.2 T emplate for Manual VXLAN Delivery (11)2.3.1.3 Process of Using a T emplate to Configure Overlay and Underlay Networks (12)2.3.2 Using Ansible to Perform O&M (13)3 Method of Using Ansible (14)3.1 Preparing the Environment (14)3.1.1 Configuring a CE Switch (14)3.1.2 Installing Ansible (15)3.2 Configuration Procedure (15)3.2.1 Creating the Inventory Hosts File (15)3.2.2 Creating a Playbook (17)3.2.3 Running a Playbook (19)4 Application Constraints and Model Requirements (23)5 Summary (24)FiguresFigure 1-2 Ansible architecture (2)Figure 1-3 Ansible architecture expanded with CE plugins and modules (3)Figure 1-4 T ypical application of Ansible (3)Figure 2-1 Connecting Ansible to CE switches (5)Figure 2-2 Process of setting up a connection between Ansible and a switch (7)Figure 2-3 Process of deploying devices using ZTP (8)Figure 2-4 Batch configuration of services (9)Figure 2-5 Example yml file (10)Figure 2-6 Process of using a template (12)Figure 2-7 Process of using Ansible to perform O&M (13)Ansible Automation Technology White PaperKeywords:Ansible, automationAbstract:Ansible enables automated management of CE switches during data center O&M.1 Background and Pain Points The rapid development of the data center poses an urgent requirement for automated service deployment and O&M to address challenges such as insufficient O&M manpower, low maintenance efficiency, and long service delivery time.Currently, zero touch provisioning (ZTP) can only complete initial configuration and does not provide the batch modification function.Currently, popular automated O&M tools include puppet, chef, SaltStack, and Ansible. Table 1-1 compares these tools.Table 1-1Comparison between automated O&M toolsDifferent from its equivalents, Ansible does not require an agent on the device to which Ansible connects. Therefore, the Ansible version is decoupled from the device software version, which is why Ansible is popular among customers.Figure 1-2Ansible architectureTable 1-2Description of Ansible modulesHuawei CloudFabric open ecosystem is developed to eliminate interoperability problems between cloud platforms, management tools, and network devices from different vendors and thereby improve capabilities for integrated deployment and maturity of data center network solutions. Huawei has expanded the Ansible platform in order to manage CE switches on this platform.Figure 1-3Ansible architecture expanded with CE plugins and modulesAnsible provides a network automation platform to make network management to be as simple, efficient, and plug-in-free as system and application management. Huawei CloudFabric solution integrates with Ansible to allow automated network O&M and management to be more secure, efficient, and reliable.In terms of the tool positioning, Ansible is a configuration management tool without a "brain". Figure 1-4 shows typical application of Ansible.Figure 1-4Typical application of AnsibleService policies can be managed by the administrator, cloud platform, and automation framework, depending on the actual application scenario.Ansible Automation T echnology White Paper 2 Ansible Solution2Ansible Solution2.1 Introduction to the Solution for Connecting Ansible and CE SwitchesFigure 2-1 Connecting Ansible to CE switchesPOD1PODn1.bridge-domain bd-id2.vxlan vni vni-id3.interface nve nve-number4.source ip-address5.vni vni-id head-end peer-list ip-address &<1-10>Huawei provides multiple types of service modules that function as APIs for playbooks to invoke. These modules further encapsulate the CLI or NETCONF interfaces for typical services (for example, encapsulates the six CLI commands for configuring a basic VXLAN Layer 2 gateway into an API), which reduces the development workload. For the modules that are provided by Huawei, visithttps:///HuaweiSwitch/CloudEngine-Ansible/tree/master/library . Table 2-1 provides the features and functions for which modules are provided.Table 2-1Features and functions for which modules are providedThe features listed in Table 2-1 are used to complete basic and common functions, and not all parameterscan be delivered through Ansible.Customers can invoke several modules in playbooks to deploy services, such as service A inFigure 2-1.If existing modules cannot meet customers’ requirements, the customers can invoke themodules that provide CLI or NETCONF interfaces in playbooks to develop their ownplaybooks or modules, such as service B in Figure 2-1.The following section describes how to use Huawei Ansible modules.2.2 Using Huawei Ansible ModulesIf a switch has been installed on the network, you can use the management IP address toremotely log in to the switch, without the need to use ZTP.If a switch is just delivered to the site and has not been installed, you need to use Ansible toautomatically deploy services by following the procedure provided in this section.Figure 2-2 shows the process of setting up a connection between Ansible and a switch.Figure 2-2Process of setting up a connection between Ansible and a switch2.2.1 Installing AnsibleHuawei Ansible modules are released on GitHub athttps:///HuaweiSwitch/CloudEngine-Ansible.For details about the installation procedure, see the install.sh file.2.2.2 Using ZTP to Deploy DevicesAfter Ansible is installed, it can manage a device only when the device IP address is reachable.If a large number of devices need to be deployed and configured, ZTP is a desired tool to usebecause configuration using the serial port is inefficient. Figure 2-3 shows the process ofdeploying devices using ZTPFigure 2-3Process of deploying devices using ZTPIP: 192.168.10.2IP: 192.168.10.3Steps 1 and 2 are standard ZTP steps, and step 3 is a simplified one, which only involves configuring the management IP address and enabling SSH and NETCONF. After step 4 is complete, each device has an independent management IP address and the NETCONF and SSH connections are enabled.2.2.3 Configuring Services in BatchesFigure 2-4 Batch configuration of servicesSpine 1Spine 2Spine 3Leaf 5Spine 4Leaf 1Leaf 2Leaf 3Leaf 4Leaf 6VNI.yml QOS.ymlConfiguration object: group ACL.yml GroupDeviceDivide configuration objects into groups. Table 2-2 provides group information. Table 2-2 Groups of configuration objectsCompile yml files based on service requirements to invoke required modules. Figure 2-5provides an example yml file.Figure 2-5Example yml fileAfter the VNI.yml script is executed, Ansible will automatically create VNI 3 on spine 1 andspine 2.2.3 Application Scenarios Recommended by Huawei2.3.1 Standard TemplatesBased on the characteristics of Ansible and years of deployment experience, Huaweideveloped multiple deployment templates to help customers quickly deploy services,improving efficiency in initial service deployment.2.3.1.1 Template for DCN3.0 DeliveryIn the DCN3.0 delivery scenario, Ansible is used to configure the underlay network.2.3.1.2 Template for Manual VXLAN DeliveryIf no Agile Controller is deployed, Ansible templates cover the configurations of bothunderlay and overlay networks.2.3.1.3 Process of Using a Template to Configure Overlay and Underlay NetworksFigure 2-6 Process of using a templateLinuxDeliver theThe process of using a template is as follows:1. Use tools provided by Huawei to generate device configurations based on the typicalscenario. 2. Copy the configurations to Ansible, and run Ansible. 3. Ansible updates configurations for devices.2.3.2 Using Ansible to Perform O&MFigure 2-7Process of using Ansible to perform O&MCollect information.Match by preconfiguredinformation.Match by preconfiguredinformation.The administrator can set concerned information, such as the MAC address table size and optical module power, in playbooks provided by Ansible. The playbooks then periodically collect the information from devices and process it. If an exception is detected, Ansible triggers an email notification.3 Method of Using Ansible3.1 Preparing the EnvironmentTable 3-1 lists environment version information required for installing Ansible.Table 3-1Environment version information3.1.1 Configuring a CE SwitchAnsible establishes a connection with a CE switch using SSH. Therefore, you need toconfigure an SSH login user on the CE switch.The procedure of configuring an SSH user on the CE switch is as follows:Step 1Generate a local key pair on the CE switch.<HUAWEI> system-view[~HUAWEI] rsa local-key-pair create // Generate the local RSA host and serverkey pairs.The key name will be: HUAWEIThe range of public key size is (2048~2048).NOTE: Key pair generation will take a short while.[*HUAWEI] commitStep 2Configure the SSH user login interface.[HUAWEI] user-interface vty 8 13[HUAWEI-ui-vty13] authentication-mode aaa[*HUAWEI-ui-vty13] commit[*HUAWEI-ui-vty13] protocol inbound ssh[*HUAWEI-ui-vty13] commitStep 3Create an SSH user on the CE switch.# Create an SSH user root0001.[HUAWEI] aaa[HUAWEI-aaa] local-user root0001 password irreversible-cipher Root_$0001 //Configurea local user name and password.[*HUAWEI-aaa] local-user root0001 level 3 // Set the local user level to 3.[*HUAWEI-aaa] local-user root0001 service-type ssh // Configure the VTY user interface to supportthe SSH protocol.[*HUAWEI-aaa] quit[*HUAWEI] ssh user root0001 authentication-type password // Set the authentication mode for theSSH user root001 to password authentication.[*HUAWEI] commitStep 4Enable the STelnet server function on the CE switch.[HUAWEI] stelent server enable[*HUAWEI] commitStep 5Set the service type of the SSH user root001 to all.[HUAWEI] ssh user root0001 service-type all[*HUAWEI] commit----End3.1.2 Installing AnsibleThe following uses ubuntu 14 as an example to describe how to install Ansible and use CEmodules. The procedure varies in other operating systems because different Ansible versionsare used.Command to install Ansible:#sudo pip install -v ansible==2.4.2.0Huawei Ansible modules have been integrated into the Ansible mainline. You can directly usethe Huawei modules so long as you have Ansible v2.4.2.0 or a later version installed.3.2 Configuration Procedure3.2.1 Creating the Inventory Hosts FileWhen managing a large-scale network, network administrators need to manage hosts runningdifferent services. Network devices can be considered as hosts for Ansible. Information ofthese hosts is saved in Ansible’s inventory hosts file. Ansible’s inventory hosts file is a staticfile in INI format and is stored in the /etc/ansible/hosts directory by default.You can specify the directory using the ANSIBLE_HOSTS environment variable or using the -iparameter when running Ansible and Ansible-playbook.The /etc/hosts file contains IP addresses and corresponding host names. Updating the file isnot mandatory, but you can modify the file to facilitate host IP address maintenance. Forexample, you can add the following host information to the /etc/hosts file.# vi /etc/hosts127.0.0.1 localhost10.10.10.10 ce12800-1 # Indicates the IP address and name of a host.10.10.10.11 ce12800-2After completing the configuration, you can run the ping command to check whether theconfigured host names take effect.# ping ce12800-1Step 1(Optional) Create the inventory hosts file.By default, the inventory hosts file is created during Ansible installation and is located in the/etc/ansible/hosts directory.# ls /etc/ansible/ansible.cfg hostsIf the inventory hosts file does not exist, you can create it manually.# touch /etc/ansible/hostsStep 2Define hosts and host groups.# vi /etc/ansible/hosts[all:vars]ansible_connection=localansible_ssh_user=root0001ansible_ssh_pass=Root_$0001ansible_ssh_port=22[spine]ce12800-1 # Add a host to a host group using the host name.ce12800-2[leaf]10.10.10.12 # Add a host to a host group using the IP address.10.10.10.13The value in brackets ([ ]) indicates a host group name. Host group names are used to classify systems, facilitating management of different systems.In the host group all, vars indicates that the group defines variables. The variables are:●ansible_connection: specifies the host connection type.●ansible_ssh_user: specifies the user name of the connected host. The value must be thesame as that configured on the host.●ansible_ssh_pass: specifies the password corresponding to a host user name. The valuemust be the same as that configured on the host.●ansible_ssh_ssh: specifies the SSH port number. The default value is 22. The value mustbe the same as that configured on the host.The vi /etc/ansible/hosts command output shows that two hosts are defined in the hostgroup spine. The two hosts are added to the host group using their host names ce12800-1and ce12800-2 respectively. After Ansible is executed, it will automatically convert thehost names into IP addresses based on the configuration in the /etc/hosts file.Two hosts are defined in the host group leaf using their IP addresses 10.10.10.12 and10.10.10.13.----End3.2.2 Creating a PlaybookCreate a playbook named ce-vlan.yml and save it in your working directory. The followingprocedure uses /usr/huawei/ansible as the working directory.Step 1Create a playbook.# touch /usr/huawei/ansible/ce-vlan.ymlStep 2Edit the playbook.You can edit the ce-vlan.yml file using an editor, such as vi, vim, or gedit, or copy the contentedited in another file to the ce-vlan.yml file. The following is the content of the ce-vlan.ymlfile.---- name: "sample playbook"gather_facts: nohosts: spinetasks:- name: "Create vlan 100"ce_vlan: vlan_id=100 state=present host={{ inventory_hostname }}username={{ ansible_ssh_user }} password={{ ansible_ssh_pass }}port={{ ansible_ssh_port }}- name: "Add interface to vlan 100"ce_switchport: interface=10ge2/0/10 mode=access access_vlan=100 state=presenthost={{ inventory_hostname }} username={{ ansible_ssh_user }}password={{ ansible_ssh_pass }} port={{ ansible_ssh_port }}All YAML files have --- in the first row, indicating the beginning of a file.Each Ansible YAML file starts with a list, in which each item is a key-value pair. Thesekey-value pairs form a dictionary. All items of the list start with a hyphen and a space, andhave the same indentation.The preceding playbook defines two tasks. The first task is to create VLAN 100, and thesecond task is to add 10GE interface 2/0/10 to VLAN 100 in access mode.Parameters in the playbook are described as follows:●gather_facts: indicates whether to collect switch status. In the example playbook, itsvalue is no, which means that the switch status will not be collected.●name: indicates the description of the playbook.●tasks: indicates that the following content is about Ansible tasks.●name under tasks: indicates the name or description of a specific task.●ce_vlan: indicates the VLAN configuration module of the CE switch.●ce_switchport: indicates the switchport configuration module of the CE switch.Table 3-2Description of ce_vlan module parametersTable 3-3Description of ce_switchport module parametersFor the switch functions and features supported by the CloudEngine Ansible library and description ofrelated parameters, visit https:///HuaweiSwitch/CloudEngine-Ansible/tree/master/docs. 3.2.3 Running a PlaybookBefore running a playbook, ensure the following:1.(Optional) The host names have been configured correctly in /etc/hosts.2.The inventory hosts file has been configured correctly.3.The playbook has been created.Perform the following steps to run the playbook:Step 1Run a playbook.# cd /usr/huawei/ansible# ansible-playbook ce-vlan.ymlPLAY [sample playbook] ***********************************************************TASK [Create vlan 100] ***********************************************************changed: [ce12800-1]changed: [ce12800-2]TASK [Add interface to vlan 100] *************************************************changed: [ce12800-1]changed: [ce12800-2]PLAY RECAP ***********************************************************************ce12800-1 : ok=2 changed=2 unreachable=0 failed=0ce12800-2 : ok=2 changed=2 unreachable=0 failed=0The fields in the command output are described as follows:●PLAY: indicates the running playbook. The name of the playbook defined in thece-vlan.yml file is included in the brackets.●TASK: indicates the ongoing task. The task name defined in the playbook is included inthe brackets. The result of each task is displayed in real time. In this example, changed:[ce12800-1] under a task indicates that the task has been executed correctly on thespecified host and configuration of the host has changed.●PLAY RECAP: indicates the playbook execution result, including the number ofsuccessful tasks, configuration changes, host unreachable events, and failed tasks oneach host.Step 2Verify configurations on the CE switches.After running the playbook, log in to the CE switches to check whether the configurations ofthe switches are consistent with the playbook execution result.<HUAWEI>display vlanThe total number of vlans is : 2-------------------------------------------------------------------------------- U: Up; D: Down; TG: Tagged; UT: Untagged;MP: Vlan-mapping; ST: Vlan-stacking;#: ProtocolTransparent-vlan; *: Management-vlan;MAC-LRN: MAC-address learning; STAT: Statistic;BC: Broadcast; MC: Multicast; UC: Unknown-unicast;FWD: Forward; DSD: Discard;--------------------------------------------------------------------------------VID Ports--------------------------------------------------------------------------------1 UT:Eth-Trunk100(D) 10GE2/0/0(D) 10GE2/0/1(D) 10GE2/0/2(D)10GE2/0/3(D) 10GE2/0/4(D) 10GE2/0/5(D) 10GE2/0/6(D)10GE2/0/7(D) 10GE2/0/8(D) 10GE2/0/9(D) 10GE2/0/11(D)10GE2/0/12(D) 10GE2/0/13(D) 10GE2/0/14(D) 10GE2/0/15(D)10GE2/0/16(D) 10GE2/0/18(D) 10GE2/0/19(D) 10GE2/0/20(D)10GE2/0/21(D) 10GE2/0/22(D) 10GE2/0/23(D) 10GE2/0/24(D)10GE2/0/26(D) 10GE2/0/27(D) 10GE2/0/28(D) 10GE2/0/29(D)10GE2/0/30(D) 10GE2/0/31(D) 10GE2/0/32(D) 10GE2/0/33(D)10GE2/0/34(D) 10GE2/0/35(D) 10GE2/0/36(D) 10GE2/0/37(D)10GE2/0/38(D) 10GE2/0/39(D) 10GE2/0/40(D) 10GE2/0/41(D)10GE2/0/42(D) 10GE2/0/43(D) 10GE2/0/44(D) 10GE2/0/45(D)10GE2/0/46(D) 10GE2/0/47(D)100 UT:10GE2/0/10(D) //The interface has been added to VLAN 100.----EndThe playbook execution result is displayed on the server. To save the execution result in a file, perform the following steps:Step 1Create the templates directory under the directory where the playbook is saved, and add a vlan.j2 file in the directory. The vlan.j2 file will save the VLAN information of the hostsafter the playbook is executed.# cd /usr/huawei/ansible# mkdir templates # Create the templates directory.# cd templates# vi vlan.j2{{ data.end_state_vlans_list | to_nice_json}} # end_state_vlans_list is the function indicating the Playbook execution result.#For more information about playbook templates, visit/ansible/playbooks_templating.html.Step 2Create the configs directory under the directory where the playbook is saved. The file recording the playbook execution result will be saved in this directory.# cd /usr/huawei/ansible# mkdir configs # Create the configs directory.Step 3Add a task to write the playbook execution result function in a file.Use the vi editor to edit the ce-vlan.yml file. The file content is as follows after the task isadded:---- name: "sample playbook"gather_facts: nohosts: spinetasks:- name: "Create vlan 200"ce_vlan: vlan_id=200 state=present host={{ inventory_hostname }}username={{ ansible_ssh_user }} password={{ ansible_ssh_pass }}port={{ ansible_ssh_port }}- name: "collection data to file"template: src=vlan.j2 dest=configs/vlan.jsonStep 4Execute the ce-vlan.yml file.# cd /usr/huawei/ansible# ansible-playbook ce-vlan.ymlPLAY [sample playbook] *********************************************************TASK [create vlan] *************************************************************changed: [ce12800-1]changed: [ce12800-2]TASK [collection data to file] *************************************************changed: [ce12800-1]changed: [ce12800-2]PLAY RECAP *********************************************************************ce12800-1 : ok=2 changed=2 unreachable=0 failed=0ce12800-2 : ok=2 changed=2 unreachable=0 failed=0Step 5In the configs directory, check the file that stores the VLAN information after the playbook is executed.# cd /usr/huawei/ansible/configs# cat vlan.json["1","2","100", "110", "200" ]----EndAnsible Automation T echnology White Paper 4 Application Constraints and Model Requirements 4 Application Constraints and ModelRequirementsThe constraints of using Ansible are as follows:●Ansible can only run on Linux.●Software version dependency:The host where Ansible runs needs to be installed with an operating system version thatAnsible supports, including Debian, Ubuntu, and Red Hat.●Recommended CE switch models:All CE switch models are supported.Ansible Automation T echnology White Paper 5 Summary5 SummaryAnsible does not require an agent on the device to which Ansible connects, so Ansible isdecoupled from the device software. In addition, Ansible has powerful community andsoftware expansion capabilities, which allow customers to easily integrate Ansible in theirown environments.The integration of Huawei CE switches and Ansible facilitates switch management usingexisting IT O&M capabilities. After the integration, configurations can be modified in acentralized manner in batches, greatly improving the automated deployment and O&Mcapabilities. In addition, typical configuration templates are provided to reduce the workloadof initial service deployment.。

Viavi Solutions 移动网络自动化优化性能白皮书说明书

Viavi Solutions 移动网络自动化优化性能白皮书说明书

White PaperMobile Networks:Automation forOptimized PerformanceMobile networks are becoming increasingly important worldwide as people transition to a more transient lifestyle. People now use mobile networks to work remotely, stream video, and access social media applications. Soon, mobile networks will play a major role in areas such as the Internet of Things (IoT), cloud computing, and vehicle communication.This dependency on mobile networks has increased Quality of Experience (QoE) pressures on service providers at a time when bandwidth demands are also at an all-time high. How can service providers keep up with bandwidth needs and keep QoE at high levels?Service providers are doing their best to meet these demands by making macro level adjustments to networks to achieve incremental improvements in performance. But this has come at a cost. Service providers are seeing profits decline as more money and staff are needed to keep networks running in this new, complex environment. Even with the increase in operating expenditures (OpEx), traditional network optimization is not enough to keep up with the dynamic nature of today’s network traffic.What is needed is a way to automate network performance to create major leaps in optimization on a granular level, while also decreasing OpEx and freeing up staff to maintain the infrastructure and plan for expanding the network to deliver greater capacities. Major advancements have been made in recent months to make automated optimization a reality. Let’s take a closer look at the limitations of current network optimization methods, how automated optimization can overcome these limitations, and how this new method of optimizing networks can create a strategic advantage for service providers when the time comes to deploy 5G.Challenges Facing NetworksAs mobile networks continue to evolve, there are three main challenges that service providers face: interdependency, non-uniformity, and complexity. Each is a problem on its own, but together they create a network environment that is nearly impossible to optimize using traditional methods.Many of the metrics used to optimize networks are now interdependent. Changing a parameter, or parameters, to enhance the characteristics in one part of the network can have implications on other characteristics in other parts of the network. For instance, trying to increase data throughput in a certain area could affect voice traffic – either positively or negatively – in the network.This could also have a detrimental effect on design. Current designs that focus on one Key Performance Indicator (KPI) differ from designs that focus on other KPIs. This means that designs that focus on a specific KPI in isolation may or may not be the right choice for the overall performance of the network – especially as networks become increasingly non-uniform.Extreme non-uniformity is the new normal for mobile networks as regular users become power users and the overall subscriber population becomes more mobile. According to the VIAVI Mobile Data Trends report, 50 percent of data is consumed by only one percent of users. In addition, 50 percent of data is consumed in less than one percent of a network area, and this area is constantly changing. This change can be dramatic. In extreme cases,the amount of data that a cell is expected to support can increase by orders of magnitude over a period of a few minutes.This last data point is an important one. Not only has it become increasingly difficult to optimize networks because of non-uniformity, the non-uniformity is now dynamic. As this trend continues to grow, it will make it impossible to manually optimize networks in the future as this method cannot keep pace with the dynamic changes taking place. This leads to the overall problem with optimizing mobile networks: complexity. Not only are subscribers using networks in new and dynamic ways, technologies such as L TE, VoL TE, and heterogeneous networks (HetNets) have added layers of complexity that mean that changes to a network layer will not only change how that layer responds to the traffic it must convey, but it will also change the way that layer interacts with other layers. For example, changing an L TE layer may make it more or less attractive at a given location to traffic on the 3G network, and vice-versa.The number of tunable parameters is now enormous. For example, tuning just two parameters on each of 100 cells – where each parameter has 10 possible values – creates 10200 different ways these cells could be configured. That’s more than the number of atoms in the observable universe!Limitations of Network-Centric OptimizationThe three main challenges put a spotlight on the limitations of current optimization methods. While networks have become increasingly complex and dynamic, most optimization efforts are still primarily network-centric: a problem is located using network statistics and then adjustments are made to the network parameters to solve the problem. This network-centric approach of characterizing a problem using network statistics and then making macro site parameter adjustments no longer works when optimization is needed on a more granular level. This approach is also less effective when the intention is to change the configuration such that the performance is improved, rather than solve a specific problem.T aking this a step further, most macro-based adjustments create and maintain a baseline for overall network performance, but do little to optimize performance for specific locations within the network at any given time. For example, workers based in an office might tend to use voice services during the morning but then leave their office during lunch hour and go outside. While outside, their usage might migrate away from voice to data services. This illustrates the changing nature of the services demanded from the network and where they need to be delivered. An effective optimization would have to configure the network to deliver an acceptable user experience for this cohort of users, not just during the work hours and lunch break, but also during the commute time, evenings,and weekends. At each of these times the usage profile will be different and the locations will generally change. T aking automation to the limit sees the network able to adapt its configuration as the day progresses in response to the changes in the demands placed on it.But current optimization methods can only see macro locations based on overall network metrics. This creates “blind optimization” where multiple types of users at the various locations around the network are blended into one as the network tries to optimize an entire area. Doing so creates an imbalance where some users will have more resources than they need, while others will experience impaired usability.Another limitation is the iterative approach toward optimization – making small, incremental changes over time – due to the inter-dependent nature of today’s networks. This ensures that changes will not have an adverse effect on the network, but it also means that improvements are small with no major step changes in optimization. Most of these changes are use case driven and analyzed in isolation. If there is a problem with VoL TE performance for instance, current methods typically try to solve the problem in isolation without considering how it will affect other parameters such as data performance or energy consumption.Drive testing is often used in network optimization. However, drive testing uses synthetic data and is OpEx heavy. It can also take a considerable amount of time and effort to come to a network design optimized for the drive test traffic rather than the commercial users of the network.Most of all, today’s network-centric methods only focus on the network itself and have limited ability to measure or enhance the subscriber experience of using a network.Benefits of Automated Subscriber-Centric OptimizationNew methods of optimization take the focus from the network to the subscriber. Subscriber-centric optimization considers where subscribers are located, how are they using the network, and what their current QoE is at any given time. But what must happen behind the scenes to make this happen?Several advancements have made subscriber-centric optimization possible. Solutions can now collect, locate, store, and analyze data from mobile connection events, creating a repository of location intelligence from all subscribers throughout a network. This location intelligence is then transformed to deliver subscriber-centric performance engineering and Radio Access Network (RAN) planning information.Most recently, subscriber-centric performance has been taken one step further by automating network performance optimization. This new automated subscriber-centric optimization addresses the network challenges created by interdependency, non-uniformity and complexity, and can keep up with increasingly dynamic traffic patterns.Where traditional network optimization is a manual process and can take up to two weeks per site, automated optimization can optimize multiple sites at a time within hours rather than days. Where the focus of manual optimization must be a single site or a small group of sites, automated network optimization can focus on much larger clusters of hundreds of sites. Not only is the focus on larger clusters of sites possible with an automated approach, it is desirable since the exponential growth in possible parameterizations gives the optimization more scope to find configurations that maximize the performance for the mix of subscribers and applications in that region of the network. Once the area for optimization is selected, goals and success criteria are then established. KPI constraints and trade-off levels are then selected.The optimization task is then scheduled – typically processing tens of millions of events based on subscriber data with granular location intelligence. If the results create the intended improvement, the changes can be actuated into the network. The result is a fast turnaround with major step improvements in optimization without adversely affecting other parts of the network.Because this approach is automated, it also greatly reduces the staffing and OpEx needed to optimize a network.Engineers are typically able to turn around optimized designs for large areas in a very short time. In addition, automated subscriber-centric optimization directly maps revenue to QoE to keep service providers profitable and subscribers happy.In addition, the problems of interdependency and non-uniformity are overcome. Automated optimization can analyze KPIs in parallel and predict the impact of planned changes to make sure other parameters of the network will not be negatively affected. Algorithms calculate effects by predicting gains and the net costs of those gains to the network before any changes are made; and predictive decision making can resolve contradictions before they happen.This more proactive approach saves time and prevents subscribers from experiencing negative events that are common using traditional, reactionary optimization methods. As an added benefit, the ability to use granular data at the subscriber level also allows network optimization to prioritize specific subscriber groups such as VIPs or high-net individuals.In summary , traditional methods focus on network and synthetic data, are OpEx heavy , and take considerable effort and time to come to a conclusion that does not necessarily end up addressing the QoE and capacity issues. However, using subscriber-centric data ensures optimization is aligned with subscriber QoE, is OpEx light, and delivers network designs in a significantly shorter timeframe.Automated Subscriber-Centric Optimization in ActionAutomated optimization sounds good in theory , but does it work with real network traffic? Let’s look at a few real-life examples.A major mobile provider wanted to maximize data coverage and throughput by reducing the number of L TE data users on 3G. The goal was to improve data traffic volumes on an already optimized network while maintaining 3G voice service. The network had 233 cells across two Radio Network Controllers (RNCs).CollectReview areasDrive test prioritized sites Collect PM statsCollectSelect ClustersEstablish Goals & Constraints Schedule T askTIME = 1 HOUR AnalyzeOvershootersDrop Calls, Congestion, Load BalancingAnalyzeProcess Milllions of events Granular Subscriber Data Automatic analysisTIME = 1 HOURActuateOnce manually correlated Then fix all sectors that have the issuesActuateActuate optimization Design into the networkTIME = 1 HOUR ConfirmRe-drive problem areas Make final a djustments Check PM statsConfirmCompare results with predictionsTIME = 1 HOURTraditionalAutomated OptimizationAutomated optimization used subscriber-centric intelligence to analyze the current subscriber usage. Based on this intelligence, power changes were made to 67 cells, and 63 cells received antenna e-tilt changes. The result was a 1.3-point improvement in the L TE quality index and a 24 percent increase in data traffic volume – all without affecting 3G voice services. See diagram on left of page.Another service provider wanted to maximize retainability of VoL TE calls and improve VoL TE throughput while maintaining accessibility. They also wanted to make sure the changes wouldn’t impact data services. Automated subscriber-centric optimization maintained VoL TE accessibility at 99.82 percent while improvingVoL TE retainability from 97.48 percent to 98.03 percent. At the same time, the mean throughput improved by more than 13 percent. This was a major step change improvement without impacting data services. See diagram on right of page.Voice and data are not the only uses of automated optimization. Service providers can also use it to optimize energy consumption to reduce OpEx without affecting subscriber services.One service provider wanted to reduce energy consumption on their 3G network at major sites in a city while ensuring service availability. Automated optimization analyzed subscriber usage at key sites outside of normal hours and analyzed handset carrier capability. The solution also determined the optimal carrier configuration per site to optimize energy consumption while maintaining service levels. The result was a reduction in energy costs by 25 percent, saving the provider an estimated $2.4 million annually.These step changes in optimization were all possible because real subscriber-centric intelligence was being used instead of traditional synthetic data. This allowed the service providers to see what the true results would be once the changes were actuated. Automated optimization allows engineers to establish specific goals to optimize aspects such as capacity, throughput, service drops or energy savings. Service providers can also focus on a select set of parameters for the most cost effective improvements such as only changing power or e-tilt parameters.Automated Optimization and 5GSubscriber-centric automation will become even more important as mobile networks become more complex.An analysis of a number of third-party industry resources shows that networks will see several major changes by 2025:y 720 percent increase in video trafficy 700 Billion things will be connected to the Internet y 66 times increase in wireless traffic y 2000 times increase in cloud objectsy 620 times increase in data analyzed in the cloudFor mobile networks, service providers are looking to 5G to keep up with this changing demand. Although a lot of progress has been made, the standards for 5G have not been finalized. But the capabilities 5G must have to keep up with demand are staggering. According to the GSMA, 5G must accomplish: y 1G to 10G connections to end points in the field y Have 99.999 percent availability y Reduce energy usage by 90 percentA key characteristic of 5G is the expectation that it will be able to deliver connectivity to an even wider range of devices than are seen today. This will include public safety , and a plethora of Io T devices such as connected cars, smart meters and asset trackers. These devices will have a vast range of different requirements in terms of bandwidth, latency , jitter, reliability , and dynamics that will require a network to tailor the service to each set of subscribers and devices. The specific requirements for each group further compounds the problem of network-centric optimization as it’s unable to discern the impact on each device and how it needs to change to meet QoE targets.There will also be a trend towards RAN centralization and virtualization with the functionality of a traditional base station being split between centralized units and distributed units. In many cases these will need to be configured, managed and optimized in the context of their topology and transport constraints, and the subscribers they are serving. Advanced, coordinated radio transmission and reception schemes will be available which will provide better resilience to adverse radio conditions such as poor coverage and interference, but will come at a cost by placing more demands on the transport network.10 10 10101010WIRELESS FIBER© 2017 VIAVI Solutions Inc.Product specifications and descriptions in this document are subject to change without notice.mobilenetworks-wp-maa-nse-ae 30186254 900 1017Contact Us +1 844 GO VIAVI (+1 844 468 4284)To reach the VIAVI office nearest you, visit /contacts.The advent of 5G will also bring more use of Network Function Virtualization (NFV) and Software Defined Networks (SDN) to deliver network infrastructure. This will also require configuration, management andoptimization. Other inflections such as Mobile Edge Computing will mean that functionality can be distributed and configured to meet constraints such as service latency and usage of transmission bandwidth.5G will need to coexist and interwork with older technologies such as 2, 3 and 4G. Networks will gain another layer that must work optimally with the older technologies so that devices are still able to achieve their QoE targets. Any system that automates network optimization must perform effectively by taking advantage of all the layers, managing the selection of each layer, and transitions between them such that it sweats the assets and drives performance.T aken together, these various developments make tomorrow’s network more powerful by allowing devices more ways to achieve their various QoE needs. But this also creates a problem for management and optimization since there will be many more parameters to tune, the number of possible configurations explodes exponentially , and finding the optimal configurations becomes much harder.The other impact of this increased configurability is the interdependency between different parts of the network. If changes are made in the RAN to address an interference problem, this may change the backhaul demands on a network. This issue is further compounded as some subscribers may derive service from different cells. The relationship between a 5G network and the 2/3/4G layers may change as subscribers derive a service from these other layers in addition to – or instead of – the 5G layer. In addition, more devices may be attracted to the 5G layer. The network load could change as a result and place more demands on virtualized core elements. Any optimization solution must be able to consider the holistic impact of configuration changes that are under consideration, as well as their ability to deliver the variety of QoE required by the different devices. Doing this effectively in the complex and configurable network will require advanced modelling of radio, RAN, transport and core elements along with mature configuration optimization capability to optimize the infrastructure and spectrum assets while delivering the required service.The only way for this to happen is to automate optimization using subscriber-centric methods as a starting point and then add more automated features as they become available. Eventually , networks will need to have the capabilities of self-configuration, self-optimization and self-healing to keep up with subscriber demand and maintain a high level of QoE.This may sound like science fiction, but it must happen and time is not on the industry’s side. Currently , most service providers are planning mass deployments of 5G by 2020. Some service providers are already planning to make smaller deployments in 2018 and 2019. This means that automated subscriber-centric optimization is not a “nice to have” feature, but a vital step toward future networks. It’s the only way service providers will be able to keep up with the complexity of networks and the dynamic traffic patterns of the future.。

运维自动化平台白皮书

运维自动化平台白皮书

运维自动化平台白皮书运维自动化平台白皮书目录一、概述 (3)二、功能介绍 (3)1.平台整体功能 (3)2.安装部署 (4)3.配置更新 (4)4.任务执行 (4)5.监控报警 (5)6.巡检管理 (5)三、技术特点 (6)1.Python语言开发 (6)2.融合云计算平台 (6)3.规则知识库 (6)4.标准RESTful API (6)5.运维控制台 (6)一、概述本产品为运维自动化平台,集安装部署、配置更新、任务执行、监控报警、巡检管理等功能为一体,将运维管理员的经验和运维工具有效的结合,引入丰富的运维规则库,辅助管理员完成日常运维工作。

运维自动化平台立足于传统的数据中心架构,也能更好的支持Openstack 等框架下的私有云平台和公有云平台,做到传统运维和云运维的结合。

其设计原则是“平台化、模块化、松耦合、全开放”,以平台化、模块化实现工具集成、功能聚合,改变原有运检工具分散独立运行的现状,将运维工作全部整合在统一的平台中,并且各模块均提供标准化接口,满足模块化、松耦合的原则,可以与其他系统的功能模块方便地集成;其核心是从配置管理着手,配合监控工具,对各类应用系统进行从基础资源的部署到应用发布,再到运行维护的全生命周期的管理,最终实现运维的自动化、可视化、智能化。

二、功能介绍1.平台整体功能(1)权限管理目前的权限管理主要指对平台的普通用户可使用的运维功能模块进行管理,由管理员统一进行权限的管理。

如用户A只拥有安装部署的权限,则其他的权限对用户A来说是隐藏的。

(2)用户管理管理员对平台的普通用户进行增加、修改和删除的操作,也可以由使用者自己注册平台用户,并申请权限。

注册功能可以启用或者禁用。

(3)通知管理用户可以接收到平台运行中发生较严重的事件,在平台使用界面的菜单栏中可以查看。

(4)规则库管理平台中的每个模块都需要建立规则库,以支撑运维操作的执行。

目前规则库分散到各个模块中独立管理。

2.安装部署本功能主要分为两部分,一是实现对物理机的操作系统的推送和自动化安装,二是实现在目标操作系统上实现对中间件、数据库及其他软件的自动化安装、更新及卸载。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

自动化运维管理解决方案目录1IT运维管理面临挑战 (3)2应运而生的自动化解决方案 (5)3自动化应用场景 (7)3.1灾备切换自动化 (7)3.2故障现场快照 (8)3.3批量设备操作处理 (8)3.4周期性作业调度 (9)3.5应急处理流程 (9)3.6重要配置备份、基线比对 (10)4产品简介 (12)4.1运维脚本集中管理 (12)4.2可视化流程配置引擎 (12)4.3作业流程人工干预 (13)4.4作业执行验证/持续监控 (13)4.5作业操作手册自动生成 (13)4.6作业执行结果展现 (14)4.7配置备份/基线库管理 (14)5产品优势 (16)6运行环境 (17)1 IT 运维管理面临挑战 24%31%45%IT 运营费用比例新系统开发维护开发运维管理⏹ 分散于各服务器上的运维脚本,存在管理风险,且耗费大量管理成本;⏹ 日常操作消耗大量人力资源,误操作风险较大,操作执行效率低;➢操作过程可控度低,运维风险大:⏹操作与执行方案匹配度无法保证,实际操作过程可控度较低;⏹日常操作对人员水平要求高,人力资源风险大;➢运维操作透明度低:⏹实际操作不便于监督,存在“黑盒”操作风险;⏹日常工作与实际操作无法有效关联,不利于日后审计;2应运而生的自动化解决方案面对IT运维管理中的诸多问题,单靠人工已经无法满足在技术、业务等方面的要求,那么标准化、自动化、架构优化、过程优化等降低IT服务成本的因素越来越被人们所重视。

其中,IT运维自动化是指将IT运维中日常的、大量的重复性工作自动化,把过去的手工执行转为自动化操作。

自动化是IT运维工作的升华,IT运维自动化不单纯是一个维护过程,更是一个管理的提升过程,是IT运维的最高层次,也是未来的发展趋势。

IT运维自动化从诞生发展至今,其重要属性之一已经不仅仅只是代替人工操作,更重要的是深层探知和全局分析,关注的是在当前条件下如何实现性能与服务最优化,同时保障投资收益最大化。

自动化对IT运维的影响,已经不仅仅是人与设备之间的关系,已经发展到了面向客户服务驱动IT运维决策的层面。

融海咨询借鉴IT运维自动化发展趋势,结合多年IT服务经验中对用户需求的把控,自行研发自动化解决方案。

自动化方案概述:通过自动化手段,实现IT运维管理操作的自动化调度、“一键式”处理;通过异常处理的支持,对操作的结果进行验证,并持续监控作业执行结果;通过时间约束条件,对按照时间计划执行的运维作业进行管理;通过复杂的关系运算条件,处理复杂的作业流程的关系;通过UserTasks人工接口,方便运维人员对作业调度流程的参与;通过图形化的工作流定制平台,实现对业务活动进行整体规划;通过与其它平台的集成,实现了与报警、监控等运维系统的一体化管理。

从而实现功能强大、简单易用、安全可靠的作业调度自动化。

通过自动化解决方案,推动了企业IT运维管理操作的指令化、标准化、流程化进程。

自动化建设目标:3自动化应用场景3.1灾备切换自动化随着IT设备数量持续增加,IT系统日益复杂,依靠手动方式进行系统灾备切换的传统模式遇到了一定瓶颈,主要表现:➢操作效率低:RTO (Recovery Time Objective,复原时间目标)无法保证;➢操作风险大:系统复杂度日益增高,操作复杂度也随之增强,操作失误的几率也随之增高;➢过分依赖个人水平:在尽短的时间完成整个切换流程,对操作员本人的技能水平、操作规程熟悉程度、环境熟悉程度有着非常高的要求。

➢操作过程不透明:灾备切换有相应的操作规程,但是切换过程中,每个环节执行状态、当前执行环节情况只有操作员本人了解,旁人无从知晓。

通过对灾备切换流程进行标准化配置,实现灾备切换管理的配置可视化、执行可视化、过程可视化、规程可视化。

➢配置可视化:提供类似Visio配置方式,每个操作环节配置成为一个节点,然后通过拖拽、连线的方式配置整个切换流程,避免了代码编写工作,降低了系统使用门槛,增加了系统易用性;➢执行可视化:提供图形界面方式,灾备切换流程无需通过繁琐命令行方式执行,管理员仅需在图形界面中选择相应流程并点击执行即可,执行前还可以在界面进行多人确认;➢过程可视化:提供操作流程视图,所有人员在流程图中可以清晰了解灾备切换整体流程情况、当前执行环节,以及每个节点执行状态,为保证不同使用习惯,执行过程提供流程视图、树状视图两种展现形态;➢规程可视化:提供自动生成操作规程文档功能,管理员配置完某自动化流程时,系统能够自动生成该流程的操作规程文档,系统使用人员可以根据文档清晰了解到本自动化流程完整信息。

3.2故障现场快照传统IT监控系统在故障发生时能及时告警,但是运维人员、厂商、开发商仅仅获得一条告警信息,无法从监控平台获取更多的信息完成故障分析及预防。

通过监控/监控集成、作业调度双重技术手段,在故障发生时,对故障现场的本机环境、跨服务器/跨设备环境进行全方位捕捉,对故障现场进行最大程度的保留,协助管理员、厂商、开发商进行事故后的详细分析,并设定相应预案。

通过故障现场快照,可以完成:1、故障现场全方位场景获取,获取容可以根据不同角色需求灵活定制;2、故障发生时系统自行完成场景捕获、保存,并将其分发给不同的角色进行联合“会诊”;3、根据预案,通过定制化纳入到中,实现故障发生前的预警,并且提供及时处理,避免故障再次发生。

3.3批量设备操作处理随着IT规模不断扩大,IT设备数量不断增多,原有简单的运维操作,也随之成倍增长,即增加了运维工作量,又使重复性操作过程中,由于人员注意力降低导致的操作失误次数成倍增长。

例如:对设备批量程序更新、批量巡检等、批量密码修改等大批量、重复性操作。

提供批量作业并行处理平台,实现多设备同时批处理操作。

通过自动化流程把,将简单的操作在大批量设备上操作,并对作业执行过程进行监控对执行结果进行检查。

通过部署批量设备操作流程:1、批量设备并发执行,缩短批量操作执行时间,提高执行效率,并且提高系统升级一致性;2、减少批量操作过程中,由于实施过程中因操作人员操作疲劳、注意力降低导致的误操作,从而减少人工失误导致的生产系统故障;3、提高IT运维自动化水平,减少人工投入,降低运营成本;3.4周期性作业调度随着IT应用系统不断上线,运维中周期性、重复性操作逐渐增多。

此类操作即占用了运维人员大量日常工作时间,又存在人工误操作的风险。

并且对于复杂作业流程,还需要运维人员有较高的技术水平及较高的系统熟练度,由此增加了因人员、岗位变动导致的而操作意外发生几率。

例如:可以对每日巡检、日终批量操作、事务数据收集、月结批处理、年结批处理等作业执行的自动化。

提供统一应用运行操控平台,实现跨平台、跨作业段、跨设备的作业协同调度操作。

平台将周期性、重复性批处理作业,以及庞大复杂的作业流程,固化为自动化作业流程,通过时间调度引擎,按照既定时间规则,在指定时间点进行调度。

从而实现作业的周期性自动化调度,运维人员仅需查看相应的作业执行过程、执行成功与否、执行结果报告(截图、操作命令输出结果等操作结果信息)。

通过自动化流程把成百上千的批处理作业组织起来,规跑批作业,对作业执行过程进行监控对执行结果进行检查。

通过周期性作业调度自动化:1、降低了关键岗位的技能要求:过去,必须由全面掌握各业务系统的运维专家完成各类批处理作业的操作和判断;现在,普通运维人员即可使用自动化工具完成。

2、消除故障隐患、保证作业效率:一方面,对关键数据的完整性、合规性进行校验;另一方面,在发生异常时快速定位故障数据源,以便排查。

3、降低日常运维工作中人员的时间投入、精力投入,从而将运维人员解放出来,投入到更重要的IT运维工作中。

3.5应急处理流程IT系统发生严重故障发生时,时间就是金钱。

一方面,如何缩短管理员接收通知到故障处理时间;另一方面,如何缩短故障处理过程花费的时间,并提高故障处理成功率,显得极其重要。

例如:文件系统满,导致新的日志无法写入;数据库归档日志空间满;数据库表空间满,数据无法写入;生产系统故障,需要紧急切换备份系统等。

自动化处理与监控告警集成,第一时间发现,调用预定义的故障应急处理流程;故障处理后,系统会调用检查流程,对故障恢复情况进行检查。

全部操作完毕后,系统会将故障发生现场镜像、故障处理结果、处理完毕检查确认结果,一并发送给管理员,由管理员确认整个流程执行无误。

通过应急处理流程:1、极大缩短了从故障发生到故障响应的间隔,为系统恢复赢得了宝贵的时间,从而极大的减少了系统故障所产生的影响;2、管理员在应急故障处理过程中肩负着极大的压力,精神高度紧,可能会产生处理流程顺序颠倒或者忘记某一环节等现象,并且存在压力下导致操作失误的可能,由自动化平台应急处理,一方面提高了故障处理过程中的操作效率,一方面提高了故障处理的成功率。

3、系统自动将故障发生的现场镜像、故障处理过程、结果以及故障恢复后的检查结果,以报告形式保存下来,为今后审查、统计提供了依据。

3.6重要配置备份、基线比对IT系统中配置文件非常重要,因为误操作导致配置文件被删除、容被修改,经常会导致非常严重的后果。

日常运维工作中,管理员需要花费大量的时间,对系统中包括操作系统、数据库、中间件、应用系统及其他软件的重要配置文件进行周期性备份,并且配置文件被误修改、恶意篡改时,不容易被发现,只有当产生严重后果才会被人们注意。

重要文件例如:Oracle的initSID.ora, listener.ora文件, sqlnet.ora文件, tnsnames.ora文件,操作系统/etc/passwd文件,以及Weblogic、应用系统等配置文件。

通过基线保护模块,帮助企业建立信息系统的安全基线,并持续监控关键文件和系统的完整性和一致性。

平台定期对企业中各层面、各级别配置文件进行周期性备份,并以此建立文件基线。

平台会定期对系统中配置文件修改日期、大小、容等进行扫描,并将扫描结果与基线版本进行比对,当发现两者不一致时,及时通知管理员进行审查,并在极端情况以基线版本为准则,对现有环境进行备份、更新。

通过文件基线管理功能:1、系统自动对重要配置进行备份,极大减轻了管理员日常工作压力,并且减少了备份过程中的遗漏情况发生;2、系统通过自动扫描、对比,发现配置异常,改变原来配置文件修改不易发现的缺点;3、系统自动恢复被篡改的配置文件,从而完全避免了因配置文件修改导致的重大故障;4、保证两地三中心主备系统的配置信息一致性。

4产品简介4.1运维脚本集中管理日常运维工作中,管理员积累了丰富的运维知识,并将部分整理为运维脚本,方便日常运维管理工作。

提供运维脚本集中管理功能,将日常大量零散的运维脚本集中管理、统一下发。

既实现了日常运维脚本的集中存储、统一版本控制,也实现了自动下发、批量下发、批量更新。

相关文档
最新文档