ABSTRACT Fault Tolerant High Performance Computing by a Coding Approach
DELL服务器面板指示灯错误代码

Yes
Any
E1222
VCACHE #PwrGd
High
VCACHE # voltage regulator has failed.
AC Cycle or SEL clear
Yes
Any
E1223
VRM #PwrGd
High
VRM # voltage regulator has failed.
Failing device is reseated/replaced/repaired.
Yes
Any
W1228
ROMBห้องสมุดไป่ตู้att< 24 hr
Low
This is a predictive failure warning message telling the user that the PERC5I RAID battery has less then 24 hours of charge left init. Wee provide this message as a warning message to the customer.
System Phase When Event Can Occur?
E1210
CMOSBatt
Low
CMOS battery is missing or the voltage is outside of the allowable range.
Failing device is reseated/replaced/repaired.
LCD Messages
The following table provides detail on error messaged that may be displayed on the system LCD.
fault-tolerant的中文 -回复

fault-tolerant的中文-回复【faulttolerant的中文】故障容错技术在计算机领域的应用【引言】在计算机技术不断发展的今天,系统可靠性和容错性成为人们越来越关注的重要问题。
故障容错技术(faulttolerant)以其在系统设计和实现中的重要地位,成为计算机领域研究的热门话题。
故障容错技术的主要目标是提供系统的可靠性,能够在出现故障时自动恢复,保证系统的稳定运行。
本文将深入探讨故障容错技术的中文含义、原理和在计算机领域的应用。
【正文】一、故障容错技术的中文含义故障容错技术(faulttolerant)中的"fault"指的是系统或设备出现故障,而"tolerant"则代表了处理故障能力强、具备自我修复能力的特性。
故障容错技术是通过系统设计和实现,保证在出现硬件或软件故障情况下,系统仍然能够保持稳定运行,不会导致整个系统崩溃。
二、故障容错技术的原理故障容错技术的原理主要包括以下几个方面:1.冗余技术:通过将系统的关键组件或数据进行冗余设计,当其中一个组件或数据出现故障时,可以快速切换到备用组件或数据,确保系统的持续可用性。
2.自检和自修复:系统能够定期检测自身状态,及时发现并修复故障。
这可以通过使用故障检测算法、自动备份等方式实现。
3.错误检测和纠正:故障容错技术的核心是识别系统中存在的错误并加以纠正。
这可以通过利用校验码、差错控制码等方法实现。
三、故障容错技术在计算机领域的应用故障容错技术在计算机领域的应用非常广泛,涉及到操作系统、数据库、网络等多个方面。
1.操作系统操作系统是计算机系统的核心,它负责管理和控制硬件资源,保证各个应用程序的正常运行。
故障容错技术在操作系统中的应用主要体现在以下几个方面:- 容错文件系统:通过数据冗余、错误检测和自动修复机制保证文件系统的可靠性和可用性。
- 容错虚拟内存:在内存管理中,通过实现容错机制,可以避免出现内存故障导致的系统崩溃。
C语言出错提示英汉对照

C语言出错提示英汉对照转载自:王者之心转载于:今天 20:07 | 分类:个人日记阅读:(1) 评论:(0)错误信息说明:Turbo C 的源程序错误分为三种类型:致命错误、一般错误和警告。
其中,致命错误通常是内部编译出错;一般错误指程序的语法错误、磁盘或内存存取错误或命令行错误等;警告则只是指出一些得怀疑的情况,它并不防止编译的进行。
下面按字母顺序A~Z分别列出致命错误及一般错误信息,英汉对照及处理方法:(一)、致命错误英汉对照及处理方法:A-B致命错误Bad call of in-line function (内部函数非法调用)分析与处理:在使用一个宏定义的内部函数时,没能正确调用。
一个内部函数以两个下划线(__)开始和结束。
Irreducable expression_r tree (不可约表达式树)分析与处理:这种错误指的是文件行中的表达式太复杂,使得代码生成程序无法为它生成代码。
这种表达式必须避免使用。
Register allocation failure (存储器分配失败)分析与处理:这种错误指的是文件行中的表达式太复杂,代码生成程序无法为它生成代码。
此时应简化这种繁杂的表达式或干脆避免使用它。
(二)、一般错误信息英汉照及处理方法#operator not followed by maco argument name(#运算符后没跟宏变元名) 分析与处理:在宏定义中,#用于标识一宏变串。
“#”号后必须跟一个宏变元名。
'xxxxxx' not anargument ('xxxxxx'不是函数参数)分析与处理:在源程序中将该标识符定义为一个函数参数,但此标识符没有在函数中出现。
Ambiguous symbol 'xxxxxx' (二义性符号'xxxxxx')分析与处理:两个或多个结构的某一域名相同,但具有的偏移、类型不同。
在变量或表达式中引用该域而未带结构名时,会产生二义性,此时需修改某个域名或在引用时加上结构名。
施耐德接地故障保护

24
24 29
24 28 30 31
4.2. A multisource system with several earthings
Conclusion
5.1. Implementation 5.2. Wiring diagram study
34
34 34
5.2.1. Single-source system 5.2.2. Multisource / single-ground system 5.2.3. Multisource / multiground system 5.3.1. Depending on the installation system 5.3.2. Advantages and disadvantages depending on the type of GFP
For the user or the operator, electrical power supply must be: ■ risk free (safety of persons and goods) ■ always available (continuity of supply). These needs signify: ■ in terms of safety, using technical solutions to prevent the risks that are caused by insulation faults. These risks are: v electrification (even electrocution) of persons v destruction of loads and the risk of fire. The occurrence of an insulation fault in not negligible. Safety of electrical installations is ensured by: - respecting installation standards - implementing protection devices in conformity with product standards (in particuliar with different IEC 60 947 standards). b in terms of availability, choosing appropriate solutions. The coordination of protection devices is a key factor in attaining this goal.
Indradrive 系列 故障代码

Error MessagesF9001 Error internal function call.F9002 Error internal RTOS function callF9003 WatchdogF9004 Hardware trapF8000 Fatal hardware errorF8010 Autom. commutation: Max. motion range when moving back F8011 Commutation offset could not be determinedF8012 Autom. commutation: Max. motion rangeF8013 Automatic commutation: Current too lowF8014 Automatic commutation: OvercurrentF8015 Automatic commutation: TimeoutF8016 Automatic commutation: Iteration without resultF8017 Automatic commutation: Incorrect commutation adjustment F8018 Device overtemperature shutdownF8022 Enc. 1: Enc. signals incorr. (can be cleared in ph. 2) F8023 Error mechanical link of encoder or motor connectionF8025 Overvoltage in power sectionF8027 Safe torque off while drive enabledF8028 Overcurrent in power sectionF8030 Safe stop 1 while drive enabledF8042 Encoder 2 error: Signal amplitude incorrectF8057 Device overload shutdownF8060 Overcurrent in power sectionF8064 Interruption of motor phaseF8067 Synchronization PWM-Timer wrongF8069 +/-15Volt DC errorF8070 +24Volt DC errorF8076 Error in error angle loopF8078 Speed loop error.F8079 Velocity limit value exceededF8091 Power section defectiveF8100 Error when initializing the parameter handlingF8102 Error when initializing power sectionF8118 Invalid power section/firmware combinationF8120 Invalid control section/firmware combinationF8122 Control section defectiveF8129 Incorrect optional module firmwareF8130 Firmware of option 2 of safety technology defectiveF8133 Error when checking interrupting circuitsF8134 SBS: Fatal errorF8135 SMD: Velocity exceededF8140 Fatal CCD error.F8201 Safety command for basic initialization incorrectF8203 Safety technology configuration parameter invalidF8813 Connection error mains chokeF8830 Power section errorF8838 Overcurrent external braking resistorF7010 Safely-limited increment exceededF7011 Safely-monitored position, exceeded in pos. DirectionF7012 Safely-monitored position, exceeded in neg. DirectionF7013 Safely-limited speed exceededF7020 Safe maximum speed exceededF7021 Safely-limited position exceededF7030 Position window Safe stop 2 exceededF7031 Incorrect direction of motionF7040 Validation error parameterized - effective thresholdF7041 Actual position value validation errorF7042 Validation error of safe operation modeF7043 Error of output stage interlockF7050 Time for stopping process exceeded8.3.15 F7051 Safely-monitored deceleration exceeded (159)8.4 Travel Range Errors (F6xxx) (161)8.4.1 Behavior in the Case of Travel Range Errors (161)8.4.2 F6010 PLC Runtime Error (162)8.4.3 F6024 Maximum braking time exceeded (163)8.4.4 F6028 Position limit value exceeded (overflow) (164)8.4.5 F6029 Positive position limit exceeded (164)8.4.6 F6030 Negative position limit exceeded (165)8.4.7 F6034 Emergency-Stop (166)8.4.8 F6042 Both travel range limit switches activated (167)8.4.9 F6043 Positive travel range limit switch activated (167)8.4.10 F6044 Negative travel range limit switch activated (168)8.4.11 F6140 CCD slave error (emergency halt) (169)8.5 Interface Errors (F4xxx) (169)8.5.1 Behavior in the Case of Interface Errors (169)8.5.2 F4001 Sync telegram failure (170)8.5.3 F4002 RTD telegram failure (171)8.5.4 F4003 Invalid communication phase shutdown (172)8.5.5 F4004 Error during phase progression (172)8.5.6 F4005 Error during phase regression (173)8.5.7 F4006 Phase switching without ready signal (173)8.5.8 F4009 Bus failure (173)8.5.9 F4012 Incorrect I/O length (175)8.5.10 F4016 PLC double real-time channel failure (176)8.5.11 F4017 S-III: Incorrect sequence during phase switch (176)8.5.12 F4034 Emergency-Stop (177)8.5.13 F4140 CCD communication error (178)8.6 Non-Fatal Safety Technology Errors (F3xxx) (178)8.6.1 Behavior in the Case of Non-Fatal Safety Technology Errors (178)8.6.2 F3111 Refer. missing when selecting safety related end pos (179)8.6.3 F3112 Safe reference missing (179)8.6.4 F3115 Brake check time interval exceeded (181)Troubleshooting Guide | Rexroth IndraDrive Electric Drivesand ControlsI Bosch Rexroth AG VII/XXIITable of ContentsPage8.6.5 F3116 Nominal load torque of holding system exceeded (182)8.6.6 F3117 Actual position values validation error (182)8.6.7 F3122 SBS: System error (183)8.6.8 F3123 SBS: Brake check missing (184)8.6.9 F3130 Error when checking input signals (185)8.6.10 F3131 Error when checking acknowledgment signal (185)8.6.11 F3132 Error when checking diagnostic output signal (186)8.6.12 F3133 Error when checking interrupting circuits (187)8.6.13 F3134 Dynamization time interval incorrect (188)8.6.14 F3135 Dynamization pulse width incorrect (189)8.6.15 F3140 Safety parameters validation error (192)8.6.16 F3141 Selection validation error (192)8.6.17 F3142 Activation time of enabling control exceeded (193)8.6.18 F3143 Safety command for clearing errors incorrect (194)8.6.19 F3144 Incorrect safety configuration (195)8.6.20 F3145 Error when unlocking the safety door (196)8.6.21 F3146 System error channel 2 (197)8.6.22 F3147 System error channel 1 (198)8.6.23 F3150 Safety command for system start incorrect (199)8.6.24 F3151 Safety command for system halt incorrect (200)8.6.25 F3152 Incorrect backup of safety technology data (201)8.6.26 F3160 Communication error of safe communication (202)8.7 Non-Fatal Errors (F2xxx) (202)8.7.1 Behavior in the Case of Non-Fatal Errors (202)8.7.2 F2002 Encoder assignment not allowed for synchronization (203)8.7.3 F2003 Motion step skipped (203)8.7.4 F2004 Error in MotionProfile (204)8.7.5 F2005 Cam table invalid (205)8.7.6 F2006 MMC was removed (206)8.7.7 F2007 Switching to non-initialized operation mode (206)8.7.8 F2008 RL The motor type has changed (207)8.7.9 F2009 PL Load parameter default values (208)8.7.10 F2010 Error when initializing digital I/O (-> S-0-0423) (209)8.7.11 F2011 PLC - Error no. 1 (210)8.7.12 F2012 PLC - Error no. 2 (210)8.7.13 F2013 PLC - Error no. 3 (211)8.7.14 F2014 PLC - Error no. 4 (211)8.7.15 F2018 Device overtemperature shutdown (211)8.7.16 F2019 Motor overtemperature shutdown (212)8.7.17 F2021 Motor temperature monitor defective (213)8.7.18 F2022 Device temperature monitor defective (214)8.7.19 F2025 Drive not ready for control (214)8.7.20 F2026 Undervoltage in power section (215)8.7.21 F2027 Excessive oscillation in DC bus (216)8.7.22 F2028 Excessive deviation (216)8.7.23 F2031 Encoder 1 error: Signal amplitude incorrect (217)VIII/XXII Bosch Rexroth AG | Electric Drivesand ControlsRexroth IndraDrive | Troubleshooting GuideTable of ContentsPage8.7.24 F2032 Validation error during commutation fine adjustment (217)8.7.25 F2033 External power supply X10 error (218)8.7.26 F2036 Excessive position feedback difference (219)8.7.27 F2037 Excessive position command difference (220)8.7.28 F2039 Maximum acceleration exceeded (220)8.7.29 F2040 Device overtemperature 2 shutdown (221)8.7.30 F2042 Encoder 2: Encoder signals incorrect (222)8.7.31 F2043 Measuring encoder: Encoder signals incorrect (222)8.7.32 F2044 External power supply X15 error (223)8.7.33 F2048 Low battery voltage (224)8.7.34 F2050 Overflow of target position preset memory (225)8.7.35 F2051 No sequential block in target position preset memory (225)8.7.36 F2053 Incr. encoder emulator: Pulse frequency too high (226)8.7.37 F2054 Incr. encoder emulator: Hardware error (226)8.7.38 F2055 External power supply dig. I/O error (227)8.7.39 F2057 Target position out of travel range (227)8.7.40 F2058 Internal overflow by positioning input (228)8.7.41 F2059 Incorrect command value direction when positioning (229)8.7.42 F2063 Internal overflow master axis generator (230)8.7.43 F2064 Incorrect cmd value direction master axis generator (230)8.7.44 F2067 Synchronization to master communication incorrect (231)8.7.45 F2068 Brake error (231)8.7.46 F2069 Error when releasing the motor holding brake (232)8.7.47 F2074 Actual pos. value 1 outside absolute encoder window (232)8.7.48 F2075 Actual pos. value 2 outside absolute encoder window (233)8.7.49 F2076 Actual pos. value 3 outside absolute encoder window (234)8.7.50 F2077 Current measurement trim wrong (235)8.7.51 F2086 Error supply module (236)8.7.52 F2087 Module group communication error (236)8.7.53 F2100 Incorrect access to command value memory (237)8.7.54 F2101 It was impossible to address MMC (237)8.7.55 F2102 It was impossible to address I2C memory (238)8.7.56 F2103 It was impossible to address EnDat memory (238)8.7.57 F2104 Commutation offset invalid (239)8.7.58 F2105 It was impossible to address Hiperface memory (239)8.7.59 F2110 Error in non-cyclical data communic. of power section (240)8.7.60 F2120 MMC: Defective or missing, replace (240)8.7.61 F2121 MMC: Incorrect data or file, create correctly (241)8.7.62 F2122 MMC: Incorrect IBF file, correct it (241)8.7.63 F2123 Retain data backup impossible (242)8.7.64 F2124 MMC: Saving too slowly, replace (243)8.7.65 F2130 Error comfort control panel (243)8.7.66 F2140 CCD slave error (243)8.7.67 F2150 MLD motion function block error (244)8.7.68 F2174 Loss of motor encoder reference (244)8.7.69 F2175 Loss of optional encoder reference (245)Troubleshooting Guide | Rexroth IndraDrive Electric Drivesand Controls| Bosch Rexroth AG IX/XXIITable of ContentsPage8.7.70 F2176 Loss of measuring encoder reference (246)8.7.71 F2177 Modulo limitation error of motor encoder (246)8.7.72 F2178 Modulo limitation error of optional encoder (247)8.7.73 F2179 Modulo limitation error of measuring encoder (247)8.7.74 F2190 Incorrect Ethernet configuration (248)8.7.75 F2260 Command current limit shutoff (249)8.7.76 F2270 Analog input 1 or 2, wire break (249)8.7.77 F2802 PLL is not synchronized (250)8.7.78 F2814 Undervoltage in mains (250)8.7.79 F2815 Overvoltage in mains (251)8.7.80 F2816 Softstart fault power supply unit (251)8.7.81 F2817 Overvoltage in power section (251)8.7.82 F2818 Phase failure (252)8.7.83 F2819 Mains failure (253)8.7.84 F2820 Braking resistor overload (253)8.7.85 F2821 Error in control of braking resistor (254)8.7.86 F2825 Switch-on threshold braking resistor too low (255)8.7.87 F2833 Ground fault in motor line (255)8.7.88 F2834 Contactor control error (256)8.7.89 F2835 Mains contactor wiring error (256)8.7.90 F2836 DC bus balancing monitor error (257)8.7.91 F2837 Contactor monitoring error (257)8.7.92 F2840 Error supply shutdown (257)8.7.93 F2860 Overcurrent in mains-side power section (258)8.7.94 F2890 Invalid device code (259)8.7.95 F2891 Incorrect interrupt timing (259)8.7.96 F2892 Hardware variant not supported (259)8.8 SERCOS Error Codes / Error Messages of Serial Communication (259)9 Warnings (Exxxx) (263)9.1 Fatal Warnings (E8xxx) (263)9.1.1 Behavior in the Case of Fatal Warnings (263)9.1.2 E8025 Overvoltage in power section (263)9.1.3 E8026 Undervoltage in power section (264)9.1.4 E8027 Safe torque off while drive enabled (265)9.1.5 E8028 Overcurrent in power section (265)9.1.6 E8029 Positive position limit exceeded (266)9.1.7 E8030 Negative position limit exceeded (267)9.1.8 E8034 Emergency-Stop (268)9.1.9 E8040 Torque/force actual value limit active (268)9.1.10 E8041 Current limit active (269)9.1.11 E8042 Both travel range limit switches activated (269)9.1.12 E8043 Positive travel range limit switch activated (270)9.1.13 E8044 Negative travel range limit switch activated (271)9.1.14 E8055 Motor overload, current limit active (271)9.1.15 E8057 Device overload, current limit active (272)X/XXII Bosch Rexroth AG | Electric Drivesand ControlsRexroth IndraDrive | Troubleshooting GuideTable of ContentsPage9.1.16 E8058 Drive system not ready for operation (273)9.1.17 E8260 Torque/force command value limit active (273)9.1.18 E8802 PLL is not synchronized (274)9.1.19 E8814 Undervoltage in mains (275)9.1.20 E8815 Overvoltage in mains (275)9.1.21 E8818 Phase failure (276)9.1.22 E8819 Mains failure (276)9.2 Warnings of Category E4xxx (277)9.2.1 E4001 Double MST failure shutdown (277)9.2.2 E4002 Double MDT failure shutdown (278)9.2.3 E4005 No command value input via master communication (279)9.2.4 E4007 SERCOS III: Consumer connection failed (280)9.2.5 E4008 Invalid addressing command value data container A (280)9.2.6 E4009 Invalid addressing actual value data container A (281)9.2.7 E4010 Slave not scanned or address 0 (281)9.2.8 E4012 Maximum number of CCD slaves exceeded (282)9.2.9 E4013 Incorrect CCD addressing (282)9.2.10 E4014 Incorrect phase switch of CCD slaves (283)9.3 Possible Warnings When Operating Safety Technology (E3xxx) (283)9.3.1 Behavior in Case a Safety Technology Warning Occurs (283)9.3.2 E3100 Error when checking input signals (284)9.3.3 E3101 Error when checking acknowledgment signal (284)9.3.4 E3102 Actual position values validation error (285)9.3.5 E3103 Dynamization failed (285)9.3.6 E3104 Safety parameters validation error (286)9.3.7 E3105 Validation error of safe operation mode (286)9.3.8 E3106 System error safety technology (287)9.3.9 E3107 Safe reference missing (287)9.3.10 E3108 Safely-monitored deceleration exceeded (288)9.3.11 E3110 Time interval of forced dynamization exceeded (289)9.3.12 E3115 Prewarning, end of brake check time interval (289)9.3.13 E3116 Nominal load torque of holding system reached (290)9.4 Non-Fatal Warnings (E2xxx) (290)9.4.1 Behavior in Case a Non-Fatal Warning Occurs (290)9.4.2 E2010 Position control with encoder 2 not possible (291)9.4.3 E2011 PLC - Warning no. 1 (291)9.4.4 E2012 PLC - Warning no. 2 (291)9.4.5 E2013 PLC - Warning no. 3 (292)9.4.6 E2014 PLC - Warning no. 4 (292)9.4.7 E2021 Motor temperature outside of measuring range (292)9.4.8 E2026 Undervoltage in power section (293)9.4.9 E2040 Device overtemperature 2 prewarning (294)9.4.10 E2047 Interpolation velocity = 0 (294)9.4.11 E2048 Interpolation acceleration = 0 (295)9.4.12 E2049 Positioning velocity >= limit value (296)9.4.13 E2050 Device overtemp. Prewarning (297)Troubleshooting Guide | Rexroth IndraDrive Electric Drivesand Controls| Bosch Rexroth AG XI/XXIITable of ContentsPage9.4.14 E2051 Motor overtemp. prewarning (298)9.4.15 E2053 Target position out of travel range (298)9.4.16 E2054 Not homed (300)9.4.17 E2055 Feedrate override S-0-0108 = 0 (300)9.4.18 E2056 Torque limit = 0 (301)9.4.19 E2058 Selected positioning block has not been programmed (302)9.4.20 E2059 Velocity command value limit active (302)9.4.21 E2061 Device overload prewarning (303)9.4.22 E2063 Velocity command value > limit value (304)9.4.23 E2064 Target position out of num. range (304)9.4.24 E2069 Holding brake torque too low (305)9.4.25 E2070 Acceleration limit active (306)9.4.26 E2074 Encoder 1: Encoder signals disturbed (306)9.4.27 E2075 Encoder 2: Encoder signals disturbed (307)9.4.28 E2076 Measuring encoder: Encoder signals disturbed (308)9.4.29 E2077 Absolute encoder monitoring, motor encoder (encoder alarm) (308)9.4.30 E2078 Absolute encoder monitoring, opt. encoder (encoder alarm) (309)9.4.31 E2079 Absolute enc. monitoring, measuring encoder (encoder alarm) (309)9.4.32 E2086 Prewarning supply module overload (310)9.4.33 E2092 Internal synchronization defective (310)9.4.34 E2100 Positioning velocity of master axis generator too high (311)9.4.35 E2101 Acceleration of master axis generator is zero (312)9.4.36 E2140 CCD error at node (312)9.4.37 E2270 Analog input 1 or 2, wire break (312)9.4.38 E2802 HW control of braking resistor (313)9.4.39 E2810 Drive system not ready for operation (314)9.4.40 E2814 Undervoltage in mains (314)9.4.41 E2816 Undervoltage in power section (314)9.4.42 E2818 Phase failure (315)9.4.43 E2819 Mains failure (315)9.4.44 E2820 Braking resistor overload prewarning (316)9.4.45 E2829 Not ready for power on (316)。
SHT3018-2003_石油化工安全仪表设计规范-中石化标准[1]
![SHT3018-2003_石油化工安全仪表设计规范-中石化标准[1]](https://img.taocdn.com/s3/m/10c5b454326c1eb91a37f111f18583d049640f9d.png)
石油化工安全仪表系统设计规范(SH/T 3018-2003)目录1 范围 (3)1.1 本规范适用于新建、改扩建石油化工装置(或工厂)安全仪表系统的工程设计,储运系统、公用工程及辅助设施等工程设计可参照执行。
----------------------------------------------------------------------------------------------31.2 安全仪表系统的工程设计必须满足石油化工装置(或工厂)安全度等级的要求---------------------------------31.3 相关标准如下: ------------------------------------------------------------------------------------------------------------31.4 执行本标准时,尚应符合国家现行有关强制性标准规范的要求。
--------------------------------------------------32 术语和定义 (3)2.1 危险故障dangerous failure ------------------------------------------------------------------------------------------------32.2 安全故障safe failure -------------------------------------------------------------------------------------------------------32.3 安全仪表系统safety instrumented system(SIS)----------------------------------------------------------------------32.4 安全度等级safety integrity level(SIL)--------------------------------------------------------------------------------32.5 最终执行元件final element -----------------------------------------------------------------------------------------------32.6 逻辑运算器logic solver----------------------------------------------------------------------------------------------------42.7 可编程电子系统programmable electronic system(PES) ------------------------------------------------------------42.8 过程控制系统process control system (PCS) ------------------------------------------------------------------------42.9 冗余redundancy ------------------------------------------------------------------------------------------------------------42.10 容错fault tolerant ---------------------------------------------------------------------------------------------------------42.11 表决voting-----------------------------------------------------------------------------------------------------------------42.12 故障安全fail to safe ------------------------------------------------------------------------------------------------------42.13 显性故障overt fault-------------------------------------------------------------------------------------------------------42.14 隐性故障covert fault -----------------------------------------------------------------------------------------------------42.15 平均故障间隔时间mean time between failures(MTBF)-----------------------------------------------------------42.16 平均修复时间mean time to repair(MTTR) -------------------------------------------------------------------------42.17 平均失效时间mean time to failure(MTTF)-------------------------------------------------------------------------42.18 可用性availability(A)-------------------------------------------------------------------------------------------------42.19 可靠性reliability(R)---------------------------------------------------------------------------------------------------52.20 传感器sensor --------------------------------------------------------------------------------------------------------------53 基本原则 (5)3.1 安全仪表系统独立于过程控制系统,独立完成安全保护功能 -------------------------------------------------------53.2 当过程达到预定条件时,安全仪表系统动作,使被控制过程转入安全状态 --------------------------------------53.3 根据以下要求确定安全仪表系统的功能:对过程危险性及可操作性分析:人员、过程、设备及环境保护:安全度等级。
Backstepping Fault Tolerant Control for Induction Motor

Where: (3)
(4)
And :
,
,
,
1
(5)
, Where is the coefficient of dispersion given by:
1
, , M are stator, rotor and mutual inductance,
respectively. , are respectively stator and rotor
In the two-phase model, it is possible to model this effect
thinking of a sinusoidal component which corrupts the stator
currents, i.e: ∑
∑
So:
∑
∑
(8)
1, … , With: is faults number, and:
II. INDUCTION MOTOR MODEL
In the stator reference frame, the state-space model of voltage-fed induction motor is derived from the Park model. The state vector is composed of the stator current components ( , ), rotor flux components (Φ , Φ ) and rotor rotating pulsation , whereas a vector control is composed of the stator voltage components ( , ) and the external disturbance is represented by the load torque
航空机务常用英语大全汇总

飞机A/C(是aircraft 的英文缩写形式)航前检查preflight check航后检查postflight check短停检查transit check定检scheduled maintenance发现found或revealed(注:对已发生事情的描述常用过去式)故障trouble 或failure 或fault失效fail 或malfunction不工作inoperative 缩写INOP工作不稳定rough工作稳定smoothly排故troubleshooting为了排故for troubleshooting不能fail to…或can’t根据refer 或per to 或according to;维护手册AMM更换replaced件号part number 缩写P/N时控件time control part拆下removed安装installed上件part on下件part off放行标准DDG(是dispatch deviation guide的英文缩写形式)最低设备清单MEL(是minimum equipment list的缩写形式)同意放行dispatch approved 或released因为due 或because因停场时间不足due time short没有备件lack parts 或no spare parts available 或no parts in stock待件wait for parts申请保留apply for reservation保留故障defer defect保留项目defer item保留期限due time关闭保留项目close deferred item 撤消保留项目rescind deferred item 前部forward 缩写fwd后部after 缩写aft左left 缩写L 或LH右right 缩写R 或RH上面upper下面lower内侧inboard 缩写I/B外侧outboard 缩写O/B内侧发动机inboard engine外侧发动机overboard engine起落架landing gear 缩写LDG主轮main wheel前轮nose wheel测试test通电测试power-on test安装测试test for installation操作测试operational test系统测试test for system正常normal 或OK异常abnormal在空中in flight在地面on ground地面检查GND check试车检查running-up test(同机)对串件interchanged…with…(不同机间)对串件exchanged…with… 或robbe d…from…更新数据库updated database开关switch电门switch正常位NORM备用位ALTN人工manual自动auto选择select(注:通常也用缩写形式SEL)按钮button旋钮knob按压press把住hold释放release顺时针clockwise逆时针counterclockwise预置preset重新设置reset一致agree不一致disagree不对称asymmetric卡阻jammed杆lever 或stick 或column操纵杆control column控制面板control panel手柄handle方向盘steering wheel放电刷static(静电)discharge(排放)wick遮光板glareshield风挡windscreen 或windshield 雨刷wiper调整adjust 或regulate重新调整readjust销子pin 例如安全销safety pin插头plug插座socket插针pin电阻resistor线路wire在…之间between…and…引线lead跳开关circuit breaker(断路器)brake failure 刹车失灵继电器relay螺帽nut螺栓bolt螺钉screw松动loose脱落fall off拧紧tighten丢失lost 或missing鸟击birdstrike凹坑dent损坏damaged烧蚀burn through扎伤punctured烧坏burn out磨损wearing 或worn在范围内within limits超标out of limits见线exposed threads油箱tank燃油fuel滑油oil液压油hydraulic fluid泄露leak(注:也常用名词leakage)& 是and的简写符号,表示“和”,“又”等意思No. 是number的缩写,表示号码,例如1号为No.1 机长captain副驾驶first officer 缩写F/O观察员observer乘务员attendant飞机大概机头nose机腹belly蒙皮skin机身airframe翼肋rib翼梁spar机翼wing翼尖wing tip前缘leading edge后缘trailing edge操纵面control surface客舱cabin 或passenger compartment座位seat排row(如:第5排译作row 5)过道aisle地板floor天花板ceiling隔板partition 厨房galley厕所toilet驾驶舱cockpit货舱cargo轮舱wheel well 缩写W/W设备舱bayATA 21空调空调air-conditioning 缩写a/c空调舱air-conditioning pack自动驾驶autopilot冲压空气作动器ram air actuator出气活门air outlet valve排气活门exhaust valve温度控制活门TEMP CONT valve引气bleed air自动automatic缩写AUTO人工manual正常normal 缩写NORM备用alternate 缩写ALTN设备冷却equipment cooling排气扇exhaust fan供气扇supply fan低流量low flow头顶分配管overhead distribution duct 进气管inlet duct主热交换器primary heat exchanger次热交换器secondary heat exchanger 增压室plenum增压pressurization加热heating压力控制pressure control滤网filter温度指示器temperature indicator客舱高度cabin altitude压差指示器differential press indicator客舱爬升率cabin rate of climb空气循环机air cycle machine 缩写ACM主分配管main distribution manifold客舱压力控制组件cabin press control module水分离器water separator压力选择面板press selector panel控制继电器control relay人工超控继电器manual override relay传感器sensor过热电门overheat switch压气机出口过热电门compressor outlet overheat s witch外溢活门outflow valve单向活门check valve关断活门shutoff valve释压活门relief valve配平调节活门trim modulating valve风挡windshieldATA22 自动驾驶自动驾驶autopilot液压hydraulic 缩写HYD副翼aileron襟翼flap缝翼slat安定面stabilizer 方向舵rudder升降舵elevator扰流板spoiler减速板airbrake传感器transducer发射机transmitter马赫配平作动器mach trim actuator自动油门auto-throttle偏航阻尼器yaw damper失速管理stall management机长captain副驾驶first officer 缩写F/O观察员observer伺服马达servo motor速度配平speed trim巡航cruise起飞/复飞电门takeoff/go around switch 俯仰pitch横滚roll驱动组件drive unit跳开关circuit breakerATA23 通信通信communication旅客广播passenger address 缩写PA天线antenna选呼select call 缩写SELCAL娱乐系统entertainment system磁带机tape reproducer内话interphone语音记录器voice recorder天线偶合器antenna coupler乘务员面板attendant panel话筒headset指示灯indicator light氧气面罩oxygen mask音频控制面板audio control panel音频选择面板audio select panel语音/数据继电器voice/data relay呼叫电门call switch收发机transceiver应急dingwei发射机emergency locator transmitter 无线电频率组件radio frequency unit遥控电子组件remote electronics unitATA24 电源电源electrical power电瓶battery电压计voltmeter发电机generator启动机starter静变流机static inverter外电源external power地面服务电门ground service switch变压器transformer汇流条电源控制组件bus power control unit备用电源控制组件standby power control unit变频器converter整流器rectifier保险丝fuse ATA25设备和装饰设备equipment装饰furnishing医药箱medical kit厨房galley厕所lavatory旅客座位passenger seat过道aisle逃离绳escape lanyard隔板sidewall panel电子舱electronics bay旅客服务组件passenger service unit站位station逃离滑梯escape slide内窗inner window后货舱AFT cargoATA26 防火防火fire protection灭火瓶fire extinguisher bottle头顶探测器overhead detector龙骨梁keel beam过热探测器overheat detector发动机支架engine strut大翼过热探测器wing overheat detector主轮舱main wheel well烟雾探测smoke detector厕所烟雾指示灯lavatory smoke indicator light 火警灯fire warning light过热探测控制组件overheat detector control modul e测试电门test switch故障/不工作和过热/火测试电门FAULT/INOP and OVHT/FIRE test switch灭火测试电门extinguisher switchATA27 飞行控制飞行控制flight control机翼wing翼尖wing tip副翼配平作动器aileron trim actuator升降舵调整片elevator tab地面扰流板ground spoiler前缘缝翼leading edge slat缩写LE slat后缘襟翼trailing edge flap缩写TE flap方向舵配平作动器rudder trim actuator副翼组件aileron assembly抖杆stick shaker放出extend收起retract保险fuse襟翼位置指示器flap position indicator襟翼收放测试test for R/E(retract/ extend的缩写形式)flaps预位arm转换机构transfer mechanism驱动电动马达drive electric motor驱动液压马达drive hydraulic motor失速警告测试面板stall warning test panel放下down收上up 安定面配平stabilizer trim控制杆control column速度刹车speed-brake自动缝翼控制活门auto-slat control valve 内锁活门interlock valve旁通活门bypass valve控制轮control wheelATA28 燃油燃油fuel中央油箱center tank加油指示fueling indicator预选preselect加油喷嘴refuel nozzle燃油系统面板fuel system panel大翼加油面板wing fueling panel增压泵boost pump加油浮子电门refuel float switchATA 29液压液压hydraulic液压管路hydraulic line液压作动筒hydraulic actuator液压泵hydraulic pump压力控制活门pressure control valve压力传感器pressure sensor释压活门relief valve流量控制活门flow control valve密封圈seals刹车压力指示brake press indicator故障探测fault detector热交换器heat exchanger排放油滤drain filter储油箱reservoir回油滤return filter低压电门low pressure switch液压过热警告电门hydraulic overheat warning sw itch压力传感器pressure transmitterATA30 防冰/防雨防冰和雨anti-ice & rain风挡windshield雨刮rain wiper皮托管pitot废物waste窗户加热电源window heat power加热面板heat panel总温传感器total temperature sensor防冰自动油门电门anti-ice auto-throttle switch风挡传感器电门windshield sensor switch热电门thermal switch左/右侧窗户温度热控制组件L / R side window h eat control unit进气整流罩inlet cowl热防冰活门thermal anti-ice valveATA31 指示/记录系统指示indicating 仪表indicator 或gauge记录recording显示display控制面板control panel飞行记录仪flight recorder位置传感器position sensor起落架音响警告landing gear aural warning时钟显示clock display主警告master caution打印机printer测试接头test connector耦合器coupler马赫空速mach airspeed固态飞行数据记录仪solid state flight data record er控制杆位置传感器control column position sensor 控制轮control wheel地面扰流板内锁活门GND spoiler interlock valve 升降舵位置传感器elevator position sensor方向舵踏板位置传感器rudder pedal position sen sor副翼位置传感器aileron position地面扰流板升起压力电门GND spoiler up press s witch显示组件display unitATA32 起落架起落架landing gear 缩写LDG机轮wheel主轮main wheel前轮nose wheel左外left outboard 或left outside(注:left通常可用大写L表示)左内left inboard 或left inside右外right outboard 或right outside(注:right通常可用大写R表示)右内right inboard 或right inside刹车brake刹车蓄压器brake accumulator主起落架main landing gear上锁作动器uplock actuator下锁作动器downlock actuator前起落架nose landing gear转弯steering防滞anti-skid起落架锁定gear locked起落架舱门wheel door轮舱wheel well放轮gear extension收轮gear retraction收放测试test for R/E(retract/ extend的缩写形式)起落架被卡阻gear jammed轮胎tyre爆胎burst瘪胎deflated tyre轮胎被扎破puncture轮胎tyre见线exposed threads磨损wear 或worn out裂纹cracks超限out of limits在范围内within limits 空/地继电器air/ground relay自动刹车autobrake刹车保险brake fuse人工释放manual extension液压刹车压力指示器hydraulic brake pressure indi cator停留刹车parking brake故障灯fault light解除预位灯disarm light不工作灯INOP light限流器flow limiter扭力杆torsion link人工放出机构manual extension mechanism放下并锁上传感器down and locked sensor收上并锁上传感器up and locked sensor下锁弹簧downlock spring减振支柱shock strut侧支柱side strut阻力支柱drag strut人工放出限制电门manual extension limit switch 刹车踏板电门brake pedal switch前轮转弯电门nose wheel steering switch防滞传感器antiskid transducer刹车调节活门brake metering valve加注活门charging valve选择活门selector valve转换活门transfer valve隔离活门isolation valve防滞活门antiskid valve往复活门shuttle valveATA33 灯灯light闪光灯strobe light航行灯navigation light 缩写NAV light 防撞灯anti-collision light着陆灯landing light信号灯signal light下滑灯approach light机翼照明灯wing illumination light标志灯logo light滑行灯taxi light应急照明灯emergency light机身灯fuselage light转弯灯turnoff light尾灯tail light窗灯window light荧光灯fluorescent lamp地板灯floor light阅读灯reading lamp灯座lamp socket 或lamp base灯架lampholder地图灯cap light灯组件light module明bright暗dim灯泡bulb烧坏burn out灯罩lamp cover灯亮on 或illuminate灯灭off 或turn off ATA34 导航导航navigation 缩写NAV天线antenna数据库database顶端top底端bottom下滑道glide slope指点信标marker beacon无线电高度接收机radio altitude receiver无线电高度发射天线radio altitude transmitter ante nna气象雷达weather radar近地警告ground proximity warning 缩写GND PR OX WARN无线电导航radio navigation姿态指示仪attitude indicator测距仪询问机DME interrogator应答机transponder收发机transceiver惯导inertial reference主注意master cautionATA35 氧气氧气oxygen氧气面罩oxygen mask流量控制flow control充气控制inflation control热补偿thermal compensator机组氧气瓶crew oxygen cylinder氧气发生器oxygen generator氧气压力指示oxygen pressure indicator旅客氧气面罩passenger oxygen mask机组氧气面罩crew oxygen mask氧气系统组件oxygen system module氧气系统面板oxygen system panel安全销safety pin压力调节器pressure regulator氧气指示继电器oxygen indicator relay高度压力电门altitude press switch机组氧气传感器crew oxygen transducer乘务员服务组件attendant service unit厕所服务组件lavatory unit旅客服务组件passenger service unit超压释放活门overpressure relief valveATA36 气源系统气源pneumatics引气活门bleed air valve双压力指示dual pressure indicator引气调节器bleed air regulator预冷控制活门传感器precooler control valve sens or超温电门over temperature switch恒温器thermostat总管压力传感器manifold press transmitterAPU引气活门APU bleed air valve单向活门check valve引气隔离活门bleed air isolation valve预冷控制活门precooler control valve 地面气源ground pneumaticATA38 水和废物水和废物water & waste真空通风机vacuum blower排放管drain line排放接头drain fitting厕所水加热器lavatory water heater水压缩机water compressor水量指示water quantity indicator空气过滤air filter水服务面板water service panel控制手柄control handle废物量指示waste quantity indicator不工作/测试电门INOP/TEST switch逻辑控制组件logic control module压缩机控制继电器compressor control relay 内锁电门interlock switch水压限制电门water pressure limit switch 废物箱waste tank水箱water tank真空马桶vacuum toilet加注/溢流活门fill/overflow valve球形活门ball valveATA49 辅助动力装置辅助动力装置auxiliary power unit 缩写APU 进气门作动器air inlet actuator导向叶片guide vane燃油增压泵fuel boost pump排气温度exhaust gas temperature 缩写EGT 数据记忆组件data memory module滑油位传感器oil level sensor滑油温度传感器oil temperature sensor速度传感器speed sensor进气门位置电门air inlet door position switch 热电耦thermocouple燃油控制组件fuel control unit点火组件ignition unit喘振活门surge valveATA52 门门door登机门entry door警告电门warning switch平衡组件counterbalance assembly紧急出口emergency exit货舱门闩cargo door latch接近面板access panel门上锁面板door unlock panel安定面配平stabilizer trimATA56 窗窗window门窗door mounted window旅客窗passenger compartment windowATA71 动力装置动力装置power plant 发动机engine发动机短舱nacelle发动机吊舱pod风扇fan风扇叶片fan blade进气罩inlet cowl涡流控制装置vortex control device 缩写VCD 释压门pressure relief door接近门access door滑油箱oil tank风扇罩fan cowl铰链hinge风扇罩面板闩fan cowl panel latch发动机架engine mount发动机燃烧室engine combustor高压涡轮组件high pressure turbine assembly 附件齿轮箱accessory gearbox传动齿轮箱transfer gearbox转子,叶轮spinner气缸cylinder转速revolution per minute 缩写RPM尾喷管nozzle磁堵magnetic plug发动机慢车位idle发动机喘振surge发动机停车engine shutdown发动机熄火engine flame outATA73 发动机燃油和控制发动机燃油和控制engine fuel & control燃油系统fuel system燃油箱fuel tank通气孔vent加油栓fuel hydrant燃油管路fuel line放油活门dump valve燃油泵fuel pump燃油滤fuel filter接头joint燃油流量表fuel flow indicator导线束wiring harness燃油总管fuel manifold燃油喷嘴fuel nozzle发动机控制面板engine control panel插头plug备用电源继电器alternate power relay风扇进气温度传感器fan inlet temperature sensor 压差电门differential pressure switch电子发动机控制electronic engine control 缩写EE C燃油流量传感器fuel flow transmitterATA74 点火点火ignition点火激励器ignition exciter发动机起动电门engine start switch发动机点火电门engine ignition switchATA75 发动机空气发动机空气engine air可变引气活门variable bleed valve 缩写VBV高压涡轮间隙活门HP turbine clearance valve 缩写HPTCV低压涡轮间隙活门LP turbine clearance valve 缩写LPTCV位置传感器position transducer过渡引气活门transient bleed valveATA76 发动机控制发动机控制engine control起动杆start lever点火电门ignition switchATA77 发动机指示发动机指示engine indicatingATA78 排气排气exhaust反推thrust reverser套筒sleeve同步锁SYNC lock顺序电门sequence relay控制电门control switch发动机附件组件engine accessory unitATA79 发动机滑油发动机滑油engine oil润滑剂lubricant润滑lubrication滑油管路oil line探测器detector回油scavenge回油过滤scavenge filter主滑油/燃油热交换器main oil/fuel heat exchanger 滑油压力传感器oil pressure sensor滑油压力表oil pressure indicator滑油温度表oil temperature indicator滑油散热器oil cooler滑油温度传感器oil temperature sensor滑油滤旁通警告电门oil filter bypass warning switc h滑油箱oil tank滑油量传感器oil quantity transmitter润滑组件lubrication unit常用语句……检查发现……1. 航后检查发现左后航行灯不亮。
C++调试常见错误英汉翻译

C++调试出错提示英汉对照表Ambiguous operators need parentheses -----------不明确的运算需要用括号括起Ambiguous symbol ''xxx'' ---------------- 不明确的符号Argument list syntax error ---------------- 参数表语法错误Array bounds missing ------------------ 丢失数组界限符Array size toolarge ----------------- 数组尺寸太大Bad character in paramenters ------------------ 参数中有不适当的字符Bad file name format in include directive ---------包含命令中文件名格式不正确Bad ifdef directive synatax --------------------------编译预处理ifdef有语法错Bad undef directive syntax ---------------------------编译预处理undef有语法错Bit field too large ---------------- 位字段太长Call of non-function ----------------- 调用未定义的函数Call to function with no prototype --------------- 调用函数时没有函数的说明Cannot modify a const object --------------- 不允许修改常量对象Case outside of switch ---------------- 漏掉了case 语句Case syntax error ------------------ Case 语法错误Code has no effect ----------------- 代码不可述不可能执行到Compound statement missing{ -------------------分程序漏掉"{" Conflicting type modifiers ------------------ 不明确的类型说明符Constant expression required ---------------- 要求常量表达式Constant out of range in comparison -------------在比较中常量超出范围Conversion may lose significant digits -----------转换时会丢失意义的数字Conversion of near pointer not allowed ---------不允许转换近指针Could not find file ''xxx'' ----------------------- 找不到XXX文件Declaration missing ; ---------------- 说明缺少";" houjiuming Declaration syntax error -----------------说明中出现语法错误Default outside of switch ------------------ Default 出现在switch语句之外Define directive needs an identifier ------------------定义编译预处理需要标识符Division by zero ------------------用零作除数Do statement must have while ------------------ Do-while语句中缺少while部分Enum syntax error ---------------------枚举类型语法错误Enumeration constant syntax error -----------------枚举常数语法错误Error directive :xxx ------------------------错误的编译预处理命令Error writing output file ---------------------写输出文件错误Expression syntax error -----------------------表达式语法错误Extra parameter in call ------------------------调用时出现多余错误File name too long ----------------文件名太长Function call missing -----------------函数调用缺少右括号Fuction definition out of place ------------------函数定义位置错误Fuction should return a value ------------------函数必需返回一个值Goto statement missing label ------------------ Goto语句没有标号Hexadecimal or octal constant too large ------------------16进制或8进制常数太大Illegal character ''x'' ------------------非法字符xIllegal initialization ------------------非法的初始化Illegal octal digit ------------------非法的8进制数字houjiuming Illegal pointer subtraction ------------------非法的指针相减Illegal structure operation ------------------非法的结构体操作Illegal use of floating point -----------------非法的浮点运算Illegal use of pointer --------------------指针使用非法Improper use of a typedefsymbol ----------------类型定义符号使用不恰当In-line assembly not allowed -----------------不允许使用行间汇编Incompatible storage class -----------------存储类别不相容Incompatible type conversion --------------------不相容的类型转换Incorrect number format -----------------------错误的数据格式Incorrect use of default --------------------- Default使用不当Invalid indirection ---------------------无效的间接运算Invalid pointer addition ------------------指针相加无效Irreducible expression tree -----------------------无法执行的表达式运算Lvalue required ---------------------------需要逻辑值0或非0值Macro argument syntax error -------------------宏参数语法错误Macro expansion too long ----------------------宏的扩展以后太长Mismatched number of parameters in definition ---------------------定义中参数个数不匹配Misplaced break ---------------------此处不应出现break语句Misplaced continue ------------------------此处不应出现continue语句Misplaced decimal point --------------------此处不应出现小数点Misplaced elif directive --------------------不应编译预处理elif Misplaced else ----------------------此处不应出现else houjiuming Misplaced else directive ------------------此处不应出现编译预处理else Misplaced endif directive -------------------此处不应出现编译预处理endifMust be addressable ----------------------必须是可以编址的Must take address of memory location ------------------必须存储定位的地址No declaration for function ''xxx'' -------------------没有函数xxx的说明No stack ---------------缺少堆栈No type information ------------------没有类型信息Non-portable pointer assignment --------------------不可移动的指针(地址常数)赋值Non-portable pointer comparison --------------------不可移动的指针(地址常数)比较Non-portable pointer conversion ----------------------不可移动的指针(地址常数)转换Not a valid expression format type ---------------------不合法的表达式格式Not an allowed type ---------------------不允许使用的类型Numeric constant too large -------------------数值常太大Out of memory -------------------内存不够用houjiumingParameter ''xxx'' is never used ------------------能数xxx没有用到Pointer required on left side of -> -----------------------符号->的左边必须是指针Possible use of ''xxx'' before definition -------------------在定义之前就使用了xxx(警告)Possibly incorrect assignment ----------------赋值可能不正确Redeclaration of ''xxx'' -------------------重复定义了xxxRedefinition of ''xxx'' is not identical ------------------- xxx的两次定义不一致Register allocation failure ------------------寄存器定址失败Repeat count needs an lvalue ------------------重复计数需要逻辑值Size of structure or array not known ------------------结构体或数给大小不确定Statement missing ; ------------------语句后缺少";"Structure or union syntax error --------------结构体或联合体语法错误Structure size too large ----------------结构体尺寸太大Sub scripting missing ] ----------------下标缺少右方括号Superfluous & with function or array ------------------函数或数组中有多余的"&"Suspicious pointer conversion ---------------------可疑的指针转换Symbol limit exceeded ---------------符号超限Too few parameters in call -----------------函数调用时的实参少于函数的参数不Too many default cases ------------------- Default太多(switch语句中一个)Too many error or warning messages --------------------错误或警告信息太多Too many type in declaration -----------------说明中类型太多houjiumingToo much auto memory in function -----------------函数用到的局部存储太多Too much global data defined in file ------------------文件中全局数据太多Two consecutive dots -----------------两个连续的句点Type mismatch in parameter xxx ----------------参数xxx类型不匹配Type mismatch in redeclaration of ''xxx'' ---------------- xxx重定义的类型不匹配Unable to create output file ''xxx'' ----------------无法建立输出文件xxx Unable to open include file ''xxx'' ---------------无法打开被包含的文件xxxUnable to open input file ''xxx'' ----------------无法打开输入文件xxx Undefined label ''xxx'' -------------------没有定义的标号xxx Undefined structure ''xxx'' -----------------没有定义的结构xxx Undefined symbol ''xxx'' -----------------没有定义的符号xxx Unexpected end of file in comment started on line xxx ----------从xxx行开始的注解尚未结束文件不能结束Unexpected end of file in conditional started on line xxx ----从xxx 开始的条件语句尚未结束文件不能结束Unknown assemble instruction ----------------未知的汇编结构houjiumingUnknown option ---------------未知的操作Unknown preprocessor directive: ''xxx'' -----------------不认识的预处理命令xxxUnreachable code ------------------无路可达的代码Unterminated string or character constant -----------------字符串缺少引号User break ----------------用户强行中断了程序Void functions may not return a value ----------------- Void类型的函数不应有返回值Wrong number of arguments -----------------调用函数的参数数目错''xxx'' not an argument ----------------- xxx不是参数''xxx'' not part of structure -------------------- xxx不是结构体的一部分xxx statement missing ( -------------------- xxx语句缺少左括号xxx statement missing ) ------------------ xxx语句缺少右括号xxx statement missing ; -------------------- xxx缺少分号houjiumingxxx'' declared but never used -------------------说明了xxx但没有使用xxx'' is assigned a value which is never used ----------------------给xxx赋了值但未用过Zero length structure ------------------结构体的长度为零。
数控机床常用英语词汇

数控机床常用英语词汇数控机床常用英语词汇分类:机械、自动化-数控发布时间:2022/12/25 15:17:01 langfly T WORD ERROR 〔T 码错误〕LOW OIL LEVEL 〔油位低〕SPINPLE FAULT 〔主轴故障〕SPINDLE ALARM 〔主轴报警〕EXTERNAL EMG STOP 〔急停按钮被按下〕 AC NOT READY 〔交流盘未准备好〕SPINPLE LUBE FAULT 〔主轴润滑故障〕 T CODE ERROR 〔T 代码出错,非法T 代码〕 M CODE ERROR 〔M 代码出错,非法M代码〕 SERVO NOT READY 〔伺服未准备好〕NC NOT READY 〔NC 没准备好〕 TURRET FAULT 〔转塔故障〕 TURRET LIMIT 〔转塔限位〕 DC 24V OPEN 〔直流24断开〕+24V NOT READY 〔+24V 没准备好〕 GRAR DRIFT 〔档位漂移〕PLEASE AXIS RETURN HOME 〔轴未回零〕PLEASE DRUM RETURN HOME 〔刀库未回零〕AIRPRESS FAILURE 〔气压故障〕UNCL TOOL FALL 〔松刀失败〕AIR PRESSURE DROP 〔压缩空气压力过低〕CLAMP TOOL FALL 〔夹刀失败〕DRUM NOT PARKED 〔刀库未在原值〕X ZERO POINT NOT REACHED 〔X 轴未回零〕Y ZERO POINT NOT REACHED 〔Y 轴未回零〕Z ZERO POINT NOT REACHED 〔Z 轴未回零〕4TH ZERO POINT NOT REACHED 〔第4轴未回零〕X AXIS OVERTRAVL 〔X 轴超限〕Y AXIS OVERTRAVL 〔Y 轴超限〕Z AXIS OVERTRAVL 〔Z 轴超限〕COUNTER SWITCH REEOR 〔计数开关故障〕MASTERT RANSFER OVER TEMP 主变压器过热Z AXIS NOT AT FIRST REF POSITION 〔Z 轴未在第一参考点〕SPINDLE ORIENTATION FALLURE 主轴定向失败TOOL DESENT OR TOOL DATA REEOR 〔刀具数据错误〕PLEASE UNLOAD THE TOOL ON SPRINELK 〔请卸下主轴上的刀〕PLEASE LOAD TOOL ON APINDLE 〔请装上主轴上的刀〕A AXIS UNCLAMP FAIL 〔A 轴松开失败〕A AXIS CLAMP FAIL 〔A 轴夹紧失败〕DRUM OUT TO APRONDLEIS FALL 〔刀库摆向换刀位失败〕 MG SWING OVERLOAD 〔刀库摆动过载〕DRUM BACK PARK IS FALL 〔刀库摆回原始位失败〕TURRENT MOTOR1 OVERLOAD 〔刀库移动电机过载〕COOLANT MOTOR OVERLOAD 冷却泵过载DRUM ATC FAULT 〔自动换刀失败〕 TOOLS UNLOCKED 〔刀具未锁紧〕 BATTERY ALARM 〔电池报警〕DRUM POSITION SWITCH ERROR 〔刀库位置检测开关故障〕 DRUM NOW NOT AT PARK 刀库未在原始位置IT DANGOU TO MOVE DRUM 〔刀库禁动〕POT UO FAILOR POT NOT AT UP POSITION 〔刀套未在水平位〕 POT DOWN FAIL 〔刀套翻下动作失败〕IT IS DANGOUR TO MOVE ARM 机械手禁动THE SPINDLE STATU IS ERROR 主轴状态错误。
DT产品缺陷分类及代码Rev.2

缺陷类别Defect Type A 代码Defect code 106描述Description 缺陷类别Defect Type B 代码Defect code 203描述Description 缺陷类别Defect Type B 代码Defect code 203描述Description 缺陷类别Defect Type B 代码Defect code 204描述Description 螺纹乱牙/Wrong thread form 螺纹乱牙/Wrong thread pitch 原因分析Root cause analysis1、二次装夹/ Second clamping1、刀补错误/Wrong tools compensation2、换刀调试错误/tool change operational error3、测量失误/Measurement mistake原因分析Root cause analysis1、二次装夹/ Second clamping螺纹尺寸超差/Thread dimension out of tolerance 原因分析Root cause analysis工件断裂-校直断裂/Part fracture 3、校直机器故障/straightening machine malfunction原因分析Root cause analysis1、热处理变形过大/much distortion by heat treatment2、校直操作错误/wrong straightening operation缺陷类别Defect Type B 代码Defect code 212描述Description 缺陷类别Defect Type B 代码Defect code 213描述Description 缺陷类别Defect Type B 代码Defect code 222描述Description 缺陷类别Defect Type B 代码Defect code 222描述Description 冲击端外径尺寸超差/Impact O.D. out ot tolerance 原因分析Root cause analysis1、卡抓跳动大/chuck runout out of tolerance2、工件跳动大/part runout out of tolerance 腰孔位置错误/Water hole wrong position 腰孔尺寸超差/Water hole dimension out of tolerance 原因分析Root cause analysis 1、刀具错误/Wrong tolling 2、操作失误/Operational error 3、机床故障/machine malfunction原因分析Root cause analysis1、二次装夹/ Second clamping3、冲击端外径偏小/Impact of smaller dia.冲击端外圆尺寸超差/Impact O.D dimension OOT 原因分析Root cause analysis 1、程序错误/ Wrong program2、刀补错误/Wrong tools compensation3、看错图纸/Misunderstanding drawings4、机床故障/Machine malfunction缺陷类别Defect Type B 代码Defect code 223描述Description 缺陷类别Defect Type B 代码Defect code 224描述Description 缺陷类别Defect Type B 代码Defect code 225描述Description 缺陷类别Defect Type B 代码Defect code 225描述Description 3、测量失误/Measurement mistake 4、刀具损坏/Tool damaged螺纹端外圆破损(铣花键)/Thread O.D damaged 原因分析Root cause analysis 1、限位错误/Wrong limit2、操作失误(退刀失误)//Operational (tool withdrawal) error 2、机床故障/machine malfunction原因分析Root cause analysis冲击端外圆破损/Imapct outer circle damaged 原因分析Root cause analysis 螺纹端外径尺寸超差/O.D of thread end out of tolerance 原因分析Root cause analysis1、外圆尺寸超差/O.D out of tolerance 1、刀补错误/Wrong tools compensation2、换刀调试错误/tool change operational error 螺纹端外圆破损(铣花键)/Thread O.D damaged 1、装夹错误/wrong clamping 2、搬运碰伤/ damaged by handling缺陷类别Defect Type B 代码Defect code 226描述Description 缺陷类别Defect Type B 代码Defect code 226描述Description 缺陷类别Defect Type B 代码Defect code 227描述Description 缺陷类别Defect Type B 代码Defect code 228描述Description 花键位置错误/Wrong position of spline 原因分析Root cause analysis1、分度错误/Wrong cutting pitch深孔尺寸(倒角)超差/Deep hole chamfer OOT 原因分析Root cause analysis 1、操作失误/Operational error2、看错图纸/Misunderstanding drawings原因分析Root cause analysis 1、刀具错误/Wrong tools2、刀补错误/Wrong tools compensation3、看错图纸/Misunderstood the drawing4、程序错误/Wrong program be selected花键尺寸超差(花键底径过小)/Spline dimension OOT 2、刀具问题/Caused by tooling1、机床故障/machine malfunction 花键尺寸超差/spline dimension out of tolerance 原因分析Root cause analysis缺陷类别Defect Type B 代码Defect code 234描述Description 缺陷类别Defect Type B 代码Defect code 243描述Description 缺陷类别Defect Type B 代码Defect code 244描述Description 缺陷类别Defect Type B 代码Defect code 245描述Description 2、卡爪跳动大/chuck runout out of tolerance 3、刀具问题/Caused by tooling孔内留有残余钻头/the residual drill in the hole 外圆粗糙度不合格/Outer circularity roughness bad 原因分析Root cause analysis1、中心孔粗糙度不合格/center hole roughness bad 原因分析Root cause analysis1、刀刃问题/Caused by tooling2、操作不当/Improper operation螺纹破损/Thread damaged 原因分析Root cause analysis 1、刀具损坏/Tool damaged 2、机床故障/machine malfunction花键破损/Spline damaged 原因分析Root cause analysis 1、机床故障(顶针松动)/Machine malfunction (loose thimble)2、刀具损坏/Tool damaged缺陷类别Defect Type B 代码Defect code 246描述Description 缺陷类别Defect Type B 代码Defect code 247描述Description 缺陷类别Defect Type B 代码Defect code 248描述Description 缺陷类别Defect Type B 代码Defect code 249描述Description 2、走刀速度过快/Operate at high speed 3、操作失误/Operational error腰孔形状误差/Flushing shape error 原因分析Root cause analysis 1、刀具错误/Wrong tools腰孔未钻通/ Flushing slot unthrough 原因分析Root cause analysis 2、看错图纸/Misunderstanding drawings1、刀具崩刃/Tool broken2、看错图纸/Misunderstanding drawings3、机床限位错误/Wrong limitO 形槽深度尺寸超差/O ring depth dimension OOT 原因分析Root cause analysis 腰孔尺寸超差/ Flushing slot dimension OOT 原因分析Root cause analysis 1、刀具错误/Wrong tool1、操作失误/Operational error2、看错图纸/Misunderstanding drawings缺陷类别Defect Type B 代码Defect code 250描述Description 缺陷类别Defect Type B 代码Defect code 251描述Description Approved by Xu JunfeiO 形槽破损/O ring slot damaged 原因分析Root cause analysis Zhang Dong Wan XiaoyanChen Jiannian1、刀具崩刃/Tool broken2、操作失误/Operational errorPrepared by Checked by 冲击端面倒角不合格/The raidus of chuck end OOT 原因分析Root cause analysis1、工艺问题/Technical problem2、程序错误/Wrong program be selected倒角小倒角合格。
FAULT TOLERANCE

Safety and Reliability
Safety: freedom from those conditions that can cause death, injury, occupational illness, damage to (or loss of) equipment (or property), or environmental harm
Faults can remain dormant for long periods
Usually related to resource usage e.g. memory leaks
Failure Modes
Failure mode
Value domain
Timing domain
Arbitrary (Fail uncontrolled)
E.g. hardware components which have an adverse reaction to radioactivity Many faults in communication systems are transient
Permanent faults remain in the system until they are repaired; e.g., a broken wire or a software design error Intermittent faults are transient faults that occur from time to time
Aims
To understand the factors which affect the reliability of a system and introduce how software design faults can be tolerated
Aurora 错误代码+光强模块-中英文(1)

[0x1034, 0x1034] < Sensor corresponded with TEC sliding rod is not in the normal state (up sensor is not in the SL-Relase state, down sensor is not in the SL-Lock state).>
[0x1017,0x1017] < Please confirm whether the TEC is correctly installed or not!> [0x1018,0x1018] <请确认设备转鼓上是否有版。如果有版,请同步转鼓后,执行自动退版操作后取 走版。等待版门关闭后重新自检。如果没有版,请检查设备版尾夹的安装。> [0x1018,0x1018] <Please confirm if there is plate on the drum. If yes, please synchronize the drum and take plate way after executing auto-plate-unload. And do self-checking after the plate door closed. If no, please confirm whether the TEC is correctly installed or not.> [0x1021-0x1021] <执行重新自检之前,转鼓压辊不处于升起状态!> [0x1021-0x1021] <Before self-check again, drum RO (plate roller) is not in the up state! > [0x1022-0x1022] <执行重新自检之前,版头夹压杆不处于升起状态!> [0x1022- 0x1022] <Before self-check again, HC press bar is not in the up state!> [0x1023-0x1023] <执行重新自检之前,版尾夹压杆不处于升起状态!> [0x1023- 0x1023] <Before self-check again, TEC press bar is not in the up state!> [0x1024-0x1024] <执行重新自检之前,版尾夹横拉杆不处于锁定状态!> [0x1024- 0x1024] <Before self-check again, TEC sliding rod is not in the SL-Lock state!> [0x1025-0x1025] <执行重新自检之前,动平衡插杆不处于拔出状态!> [0x1025- 0x1025] <Before self-check again, dynamic balance inserted piston is not in the out state! > [0x1026-0x1026] <执行重新自检之前,版门不处于关闭状态!> [0x1026- 0x1026] < Before self-check again, PD is not in the close state!> [0x1027-0x1027] <执行重新自检之前,版道不处于升起状态!> [0x1027- 0x1027] < Before self-check again, PT is not in the up state!> [0x1031,0x1031] <转鼓压辊对应的传感器显示不在常态(上位左和右传感器不处于升起状态,下位 左和右传感器不处于压下状态)。> [0x1031, 0x1031] <Sensor corresponded with drum RO (plate roller) is not in the normal state (up left and right sensor is not in the up state, down left and right sensor is not in the down state).> [0x1032,0x1032] <版头夹压杆对应的传感器显示不在常态(上位左和右传感器不处于升起状态,下 位左和右传感器不处于压下状态)。> [0x1032, 0x1032] < Sensor corresponded with HC press bar is not in the normal state (up left and right sensor is not in the up state, down left and right sensor is not in the down state).> [0x1033,0x1033] <版尾夹压杆对应的传感器显示不在常态(上位左和右传感器不处于升起状态,下 位左和右传感器不处于压下状态)。>
Fault-tolerant systems

Chapter1Hume:aFunctionally-Inspired Language for Safety-Critical SystemsKevin Hammond1and Greg Michaelson2Abstract:Hume is a novel functionally-based language that targets safety-critical applications.The language is designed to support rigorous cost and space anal-yses,whilst providing a high level of abstraction including polymorphic type inference,automatic memory management,higher-order functions,exception-handling and a good range of primitive types.1.1INTRODUCTIONThis paper describes the Hume programming language.Hume(Higher-order Unified Meta-Environment)is a strongly typed,mostly-functional language with an integrated tool set for developing,proving and assessing concurrent,safety-critical systems.Hume aims to extend the frontiers of safety-critical language design,introducing new levels of abstraction and provability.A full description of Hume including a formal dynamic semantics and a partially completed static semantics can be found at /hume. We anticipate publishing this material as a formal language definition in the near future.1.1.1BackgroundHume is named for the Scottish Enlightenment sceptical philosopher David Hume (1711-1776),who counseled that:To begin with clear and self-evident principles,to advance by timorous and sure steps,to review frequently our conclusions,and examine accurately all their consequences;though by these means we shall make both a slow and a short progress in our systems;are the only methods,by which we can ever hope to reach truth,and attain a proper stability and certainty in our determinations.D.Hume,An Enquiry Concerning Human Understanding,1748These sentiments epitomise the philosophy of programming language design that we have followed in constructing Hume.1.2MOTIV ATION AND OBJECTIVESSince the focus of the Hume design is on safety critical applications,it is paramount that Hume programs have predictable and,preferably,provable properties.How-ever,the strong properties of program equivalence,termination and time and space use are undecidable for Turing complete languages.Conversely,languages in which such properties are decidable(i.e.finite state machines)lack expressive-ness.The goal of the Hume language design is to support a high level of expres-sive power,whilst providing strong guarantees of dynamic behavioural properties such as execution time and space usage.Hume reflects these considerations in: the separation of the expression and coordination aspects of the language;and the provision of an integrated tool set,spanning both static and dynamic pro-gram analysis and manipulation.1.2.1Important Design CharacteristicsAny system that is aimed at safety critical applications must meet a number of stringent criteria if it is to be used with confidence.Hume has been designed to support Leveson’s guidelines for Safeware—software intended for safety critical applications[11],as well as similar guidelines provided by other authors[2].In general,safety critical systems must meet both strong correctness criteria and strict performance criteria.The latter are most easily attained by working at a low level,whereas the former are most easily attained by working at a high level.A primary objective of the Hume design is to allow both types of criteria to be met while working at a high level of abstraction.The Hume language has been designed to allow relatively simple formal cost models to be developed,capable of costing both space and time usage.This re-quires some restrictions on the expression language in cases which are cost or2space critical.Thisfirst version of the language design is deliberately rather spar-tan,allowing experimentation with essential features but omitting some desirable syntax or other language features,such as type classes.Future versions of the language should address these omissions.The language definition does support a wide range of(particularly numeric)basic types.This is because issues of type coercion and type safety are fundamental to ensuring both correctness and safety.Both system level and process level exceptions are supported,including the ability to set timeouts for expression computations.Exceptions may be raised from within the expression language but can only be handled by the process lan-guage.This reduces the cost of handling exceptions and maintains a pure expres-sion language,as well as simplifying the expression cost calculus.A radical design decision for safety critical systems is the use of automatic memory management techniques.Automatic memory management has the ad-vantage of reducing errors due to poor manual management of memory.The disadvantage lies in terms of excessive time or space usage.Hume implementa-tions will use static analysis tools to limit space usage,and will incorporate recent developments in bounded-time memory management techniques[3].1.3THE HUME LANGUAGEThis section introduces the Hume language informally.A full description,from which this section is abstracted,can be found in the Hume language definition[13].1.3.1Structure of the LanguageIn common with other coordination language approaches such as Linda[4],Hume takes a layered approach.The outermost layer is a static declaration language that provides definitions of types,streams etc.to be used in the dynamic parts of the language.The innermost layer is a fairly conventional expression language which is used to define values and(potentially higher-order)functions using purely func-tional expressions.Finally,the middle layer is a coordination language that links functions into possibly concurrent processes.Exceptions transcend the layers.They are declared by exception<id> <type>within declarations,raised by raise<id><expr>within expres-sions and handled by boxes.1.3.2The Hume Expression LanguageThe Hume expression language is a purely functional language with a strict se-mantics.It is intended to be used for the description of single,one-shot,non-reentrant processes.It is deterministic,and has statically bounded time and space behaviour.In order to achieve this,expressions that are not the target of dynamic timeouts must be restricted to statically-checkable primitive recursive forms.This allows the construction of verifiable static cost analyses.Dynamically bounded time and space expressions(within-expressions)specify that evaluation of the3associated expression must complete within the specified constant time or space. Failure to do so causes the Timeout or HeapOverflow exception to be raised.Note that the expression language has no concept of external,imperative state. Such state considerations are encapsulated entirely within the coordination lan-guage.1.3.3The Hume Coordination LanguageThe Hume coordination language is afinite state language for the description of multiple,interacting,re-entrant processes built from the purely functional ex-pression layer.The coordination language is designed to have statically provable properties that include both process equivalence and safety properties such as the absence of deadlock,livelock or resource starvation.The coordination language also inherits properties from the expression language that is embedded within it.The basic unit of coordination is the box,an abstract notion of a process that specifies the links between its input and output channels in terms of functional expressions,and which provides exception handling facilities including timeouts and system exceptions The coordination language is responsible for interaction with external,imperative state through streams and ports that are ultimately con-nected to external devices.1.3.4The Hume Declaration LanguageThe declaration language introduces types and values that scope over either or both the coordination and expression languages.The coordination language is embedded in the declaration language through box and wiring declarations while the expression language is embedded through function and value declarations. While it is possible to define recursive and mutually recursive functions,it is not possible to do the same for simple values.1.4EXAMPLE PROGRAMS1.4.1Parity CheckingOurfirst example is a simple even parity checker whose input is wired to a com-munications port(comm1),and whose output is wired to its second input.The initial parity value is defined to be true.box even_parityin(b::word1,p::bool)out(p’::bool)match(0,true)->true|(1,true)->false|(0,false)->false4|(1,false)->true;wire even_parity(comm1,even_parity.p’initially true)(even_parity.p);1.4.2ChecksumThe second example calculates the checksum of256input numbers.It returns a string which is either"OK"or an error message.This example illustrates(some-what artificially)the use of exceptions and handlers.type short=int16;type long=int32;box checksumin(n::short,sum::long,count::short)out(sum’::long,count’::short,message::string)handles CheckSumErrormatch(n,sum,256)->(0,0,if n=sum then"OK"else raise CheckSumError())|(n,sum,count)->(sum+n,count+1,"")handleCheckSumError_->(0,0,"Checksum error:sum="+tostring(sum)+"checksum="+tostring(n));wire checksum(comm1,checksum.sum’initially0,checksum.count’initially0)(checksum.sum,checksum.count,stdout); 1.5COSTING HUME PROGRAMSFigures1.1–1.6give rules that can be used to statically derive a bounded time cost forfirst-order Hume programs(we are independently working on time sys-tems for higher-order programs[12]).The rules are a simple big-step operational semantics,with extensions to timeouts and exceptions.We have deliberately used a very simple cost semantics here rather than the more general form that we are developing as part of our work in parallel computation[12].The latter is intended for arbitrary recursion and requires the use of a constraint solving engine,whereas Hume is restricted to primitive recursive forms.It would be straightforward to al-ter these rules to cover space rather than time cost,as required.The cost rules specified here derive notation from the Hume static and dynamic semantics,which is in turn derived from that used for Haskell and Standard ML.5E decls E Costni1i n EE’j decl i E i c ij1E var exp cE var∞matches c cE body CostE matches c c E handlers cE matches time handles min t c c cFIGURE1.2.Cost axioms for boxesOnly two forms of declaration are interesting.Variable declarations are evaluatedat the start of a declaration block.They therefore have afixed dynamic cost.The cost of a function(of the form var matches)is defined as the cost of matching allpatterns in the set of matches,plus the cost of the most expensive expression on the right-hand-sides of the matches.This therefore computes an upper bound on the cost of evaluating a function body.Recursive definitions are assigned infinitecost.The cost is bound to the function name in the environment so that the cost of each and every function call is correctly calculated.The result of a declaration sequence is the union of the environment recording function costsFigure1.2gives the cost of a single box iteration.This is defined similarly to the cost of a function definition,but including the upper bound of the exceptionhandling cost.There are two cases.If no timeout clause is given,then the matches must befinite.Otherwise the cost is the lesser of the timeout and the actual execution cost plus the handler cost.Figures1.3–1.4give cost rules for expressions.The cost of a function ap-plication is the cost of evaluating the body of the function plus the cost of each argument.The cost of building a new constructor such as a list or tuple is thecost of evaluating the arguments to that constructor plus one for each argument (representing the cost of building each heap cell).Note that the cost of a non-nulllist exp1exp n includes the cost of thefinal,and is therefore greater than the equivalently sized tuple exp1exp n or vector exp1exp n.The cost of a coercing an expression to a given type using an as-expression dependson the original type of the expression and the one that it is coerced to.There is,of course,no cost for a simple type cast.The cost of raising an exception is the cost of the enclosed expression plus one(representing the cost of actually throwing that exception).Finally the cost of an expression enclosed within a timeout is the minimum of the expression cost and the specified timeout.Figure1.5gives costs for pattern matches.Two values are returned from a match sequence.Thefirst is the total cost of matching all the patterns in the match sequence.This places an upper bound on the match cost.The second is the cost of evaluating the most expensive right-hand-side.This places an upper7E exp Costn0E var exp1exp n∑n i1c i cE con exp1exp n∑n i1c i1E exp1exp n∑n i1c i n1E exp1exp n∑n i1c i nE exp1exp n∑n i1c i nFIGURE 1.3.Cost axioms for expressions(variables,applications and construc-tors)bound on the expression cost.The cost of matching multiple patterns in a single match is the sum of matching each pattern.Wildcard patterns(E exp Cost E exp c E match cE if exp1then exp2else exp3c1max c2c3E decls E’c E E’exp cE exnid exp c1value exp2t E exp1cE Matches Cost CostE match c c E matches c cE pat1pat n exp∑n i1c i cE pat CostE con pat1pat n∑n i1c i1i1i n E pat i c iE0i1i n E pat i c iE0i1i n E pat i c iE1FIGURE1.5.Cost axioms for pattern matchesapproach separatesfinite data structures such as tuples from potentially infinite structures such as streams.This allows the definition of functions that are guar-anteed to be primitive recursive.In contrast with Hume,it is necessary to identify functions that may be more generally recursive.Ada is widely used for embedded systems,and many tools have been con-structed to assist the understanding of space and time behaviour[2].Compared with ANSI standard Ada,Hume provides much higher level of abstraction with a far more rigorously defined semantics,which is specifically designed to support cost semantics.Finally,there has been recent interest in using variants of Java as the ba-sis for embedded systems,though to our knowledge there is as yet no specif-ically safety-critical design.Two interesting variants are Embedded Java[17]10E Handlers CostE pat c E exp cE handler handlers max c cFIGURE1.6.Cost axioms for exception handlersand RTJava[9],for soft real-time applications.Like Hume,both languages sup-port dynamic memory allocation with automatic garbage collection and provide strong exception handling mechanism.The primary differences from Hume are the incorporation of arbitrary recursion in all cases,an absence of formal design principles,the use of a single-layered approach in which coordination is merged with computation,and of course the use of an object-oriented expression language rather than one that is purely functional.We believe that the design choices made here are more suitable for applications where safety is paramount.For example, the use of purely functional rather than dynamically-linked object-oriented design allows straightforward static reasoning about the meaning of programs,at the cost of convenience in modifying a running system.1.7CONCLUSIONThis paper has introduced the Hume language,a novel functionally-inspired lan-guage that is geared towards safety critical applications.A key feature of the design is the incorporation of a purely functional expression language that pro-vides strong guarantees of functional correctness with equally strong guarantees of time and space behaviour.To our knowledge,this is thefirst fully-featured functional language that supports such guarantees.To illustrate the key features of our language design,we have produced a sim-ple boundedfirst-order time cost analysis.The analysis makes a number of sim-plifying assumptions,such as costing all heap allocations,exception raises and variable bindings identically(as1in this set of definitions).It would,however, be straightforward to parameterise the analysis on more realistic costs,or even to modify it so as to calculate space rather than time costs.Clearly,Hume is still at an early stage of development.None of our language tools has yet been constructed and important design details are still to be resolved. We feel,however,that the properties designed into the language should yield long-term benefits in the safety-critical arena.11REFERENCES[1]W.Ackermann,Solvable Cases of the Decision Problem,North Holland,1954.[2]J.G.P.Barnes,High Integrity Ada:the Spark Approach,Addison-Wesley,1997.[3]C.Clack,“The Demense Model for Bounded Automatic Memory Management”,Un-published Report,University College,London,2000.[4]D.Gelernter,and N.Carriero,“Coordination Languages and Their Significance”,CACM,32(2),February,1992,pp.97–107.[5]A.D.Gordon,S.Marlow and S.L.Peyton Jones,“A Semantics for Imprecise Excep-tions”,Proc.Symposium on Principles of Programming Languages(POPL2000), January2000.[6]M.G.Hinchey and J.P.Bowen,Applications of Formal Methods,Prentice-Hall,1995.[7]International Organisation for Standardisation,“ISO:Information Processing Sys-tems—Open Systems Interconnection—LOTOS,A Formal Description Technique based on the Temporal Ordering of Observational Behaviour”,ISO8807,Geneva, August1988.[8]International Organisation for Standardisation,“ISO:Information Processing Sys-tems—Open Systems Interconnection—Estelle,A Formal Description Technique based on an Extended State Transition Model”,ISO9074,Geneva,May1989. [9]D.Jensen,“‘Real-time Java’.What could and what should it mean?”URL:/carnahan/real-time/vocab/ rt-java。
Fault Tolerant Data Flow Modeling Using the Generic Modeling Environment

Fault Tolerant Data Flow Modeling Using the Generic Modeling Environment Mark L. McKelvin Jr., Jonathan Sprinkle, Claudio Pinello +, and Alberto Sangiovanni-Vincentelli Electrical Engineering and Computer Science DepartmentUniversity of California, Berkeley, CA 94720+ General Motors Berkeley LabsBerkeley, CA 94720{mckelvin, sprinkle, pinello, alberto}@AbstractDesigning embedded software for safety-critical, real-time feedback control applications is a complex and error prone task. Fault tolerance is an important aspect of safety. In general, fault tolerance is achieved by duplicating hardware components, a solution that is often more expensive than needed. In applications such as automotive electronics, a subset of the functionalities has to be guaranteed while others are not crucial to the safety of the operation of the vehicle. In this case, we must make sure that this subset is operational under the potential faults of the architecture. A model of computation called Fault-Tolerant Data Flow (FTDF) was recently introduced to describe at the highest level of abstraction of the design the fault tolerance requirements on the functionality of the system. Then, the problem of implementing the system efficiently on a platform consists of finding a mapping of the FTDF model on the components of the platform. A complete design flow for this kind of application requires a user-friendly graphical interface to capture the functionality of the systems with the FTDF model, algorithms for choosing an architecture optimally, (possibly automatic) code generation for the parts of the system to be implemented in software and verification tools. In this paper, we use the Generic Modeling Environment (GME) developed at Vanderbilt University to design a graphical design capture system and to provide the infrastructure for automatic code generation. The design flow is embedded into the Metropolis environment developed at the University of California at Berkeley to provide the necessary verification and analysis framework1. IntroductionDesigners of complex heterogeneous embedded systems are often faced with increasing design costs and time-to-market pressures. In real-time feedback control systems, sensors and actuators interact with a plant and complex control algorithms execute periodically on an execution platform, as shown in Figure 1. The execution platform is a distributed system that consists of software components that implement a control algorithm and a hardware layer that includes a set of processing elements, or electronic control units (ECUs) connected via communication media, such as busses. Safety-critical systems may contain a number of redundant components for fault tolerance, thus, adding to the system design complexity. The lack of a rigorous design method and supporting tools often leads to costly iterations, and inconsistencies between system specification and implementation. Furthermore, design specifications may change throughout the process, thus, it becomes difficult for a system to evolve as changes are made in specifications.When designing fault tolerant systems, we assume that hardware or software components may fail. Faults in embedded software are commonly caused by deviations from specification of the supporting platforms or programming errors. Fault tolerance is a technique that attempts to neutralize the potential faults to avoid system failures, by incorporating redundancy in system components. Blindly duplicating all components that may fail in the implementation platform is clearly an inefficient solution especially when, as is the case for automotive applications, only a subset of the functionality of the system has to be operational when a fault occurs. To optimize the implementation of systems with partial fault tolerance, Pinello et al. introduced recently a new model of computation (MoC) Fault Tolerant Data Flow (FTDF) [1], a data flow [2][4] variant that is designed to address the specification of fault-tolerance in safety-critical, real-time feedback control systems. A model of computation is a mathematical formalism that describes the interaction between components in a system [5] and has well-defined semantics that enables formal validation techniques. The FTDF MoC enables formal analysis and automatic synthesis tools and techniques.A synthesis-based design methodology is proposed in [1] to involve the designer in a high-level exploration of fault coverage and cost tradeoff. It features a synthesis tool that uses FTDF as the central programming model. The synthesis tool automatically deduces the necessary software and hardware replication, distributes each software process on the execution platform, and derives an optimal scheduling of the processes on each ECU to minimize latency. The tools that support the synthesis-based design flow are embedded in the Metropolis [11] design environment.In this paper, we extend the method allowing a designer to construct an application in the FTDF domain in a user-friendly environment and by implementing the FTDF specification as automatically generated software. The tool used to do so is Generic Modeling Environment (GME) [8], a tool that takes a Model Integrated Computing (MIC) [6] approach to constructing domain specific environments.1.1 Model Integrated Computing and GenericModeling EnvironmentMIC facilitates model analysis and automatic program synthesis by incorporating Model Integrated Program Synthesis (MIPS) to transform a model in a specific domain to a physical artifact. More precisely, GME is a domain specific modeling tool that supports the MIC methodology and contains a MIPS environment for interpreting and generating physical artifacts of that model. GME uses the Unified Modeling Language (UML) and the Object Constraint Language (OCL) technologies [10] to construct domain specific environments.GME is a component-based architecture used for constructing domain specific modeling environments developed by the Institute for Software Integrated Systems at Vanderbilt University. GME uses UML class diagrams to compose domain specific environments. GME supports MIPS through the Builder Object Network version 2.0 (BON2) [8]. BON2 is a GME component interface used to transform models to physical artifacts. BON2 consists of software classes and interfaces to support the interpretation of a model. These classes and interfaces are automatically generated by GME, and most of them are independent of the specific domain. However, a domain-specific interface is generated that allows traversal of objects in a specific domain. We illustrate with an example of how this approach can yield a better implementation than using a manual coding method.This paper is organized as follows. In Section 2, FTDF semantics and structure will be reviewed. Section 3 will give the overall design methodology and various steps in the process. Section 4 will follow with a simple example. Results of the example application are given in Section 5, and Section 6 will conclude.Figure 1. An example of an execution platform.1.2 Related WorkUML-based tools such as the ones described in [12] and [13] are used for modeling safety-critical applications. These tools are familiar to the software programmer, who is often times not the domain expert, and they provide flexibility in the modeling of domains. Our work closely relates to commercial graphical block diagram and data flow oriented environments, such as Matlab/Simulink [14] by MathWorks and SCADE [15] by Esterel Technologies, which are commonly used to model safety-critical systems. They are general-purpose control dominated design environments and their semantics do not explicitly deal with fault tolerance. Our work provides precise semantics using the FTDF model of computation and a more natural way of visually describing the structural dependencies amongst components in a safety-critical application, similar to using reliability block-diagram models, as described by Viswanadham and others [16].2. Fault Tolerant Data Flow SemanticsFTDF is a synchronous [3] MOC: every actor executes once per iteration, satisfying the precedence order dictated by the data dependencies. Then, the next iteration may start. This section reviews the fundamental components of an FTDF model: tokens, actors, and communication media. An FTDF graph structure provides the structural dependencies amongst components in a FTDF model.2.1 TokensTokens are encapsulations of data. In addition, in the FTDF domain, tokens are appended with two fields: the epoch field and the valid field. The valid field is used to record the Boolean outcome of some fault-detection algorithm (e.g. majority voting, checksum, CRC). Moreover, an actor may explicitly mark any of its output tokens as invalid to inform quickly the downstream actors of some error. An actor receiving a token can check the token’s validity before attempting to use it. The epoch field is used in the execution model as a synchronization mechanism for the distributed processes. Finally, replicas of the sending actor will run on different ECUs and produce replicas of a same token.2.2 ActorsIn an actor-oriented design framework [7], actors are functional components that execute and communicate with other actors in a model. An actor contains ports that are connected via an abstraction of communication channels, or connections. Actors also contain a firing rule and firing function. A firing rule is a guard condition that must be satisfied by input values to the actor. The firing function executes a body code that implements a particular functionality of the actor. A firing is a single execution of the firing rule followed by the firing function of an actor.In FTDF, actors are typed. The FTDF actors have four types. Source actors initiate execution of their code without accepting any input tokens and produce output tokens. Sink actors accept input tokens to fire and produce no output tokens. Source and sink actors are abstractions of sensors and actuators, respectively. Regular actors have inputs and outputs. The firing rule for N-input regular actors prescribes that the actor fires when all N inputs are available. Firing on all inputs is typical of other dataflow languages [2][4]. The input actor is a regular actor that can fire on a subset of inputs. For example, an input actor that may fire if at least two of three inputs are available would have the following set of firing rules: U = [{*, *, *}, {⊥, *, *}, {*,⊥, *}, {*, *, ⊥}], where “*” represents the presence of a value and “⊥“ denotes the absence of a value. Input actors behave similarly to N-of-M components, which are often found in reliability block diagram models.2.3. Communication MediaCommunication media act as unidirectional channels that transmit tokens between actors. Media may be affected by communication errors. The composition of actors and communication media is represented as a directed, acyclic graph called an FTDF graph. The rules given here are simplifications of the rules given in [1]. Given a set of actors, say A, and a set of communication media M, an FTDF graph, G is given as G = (V, E), where V is the set of vertices, E is the set of directed arcs, V = A, and E = M. In the FTDF semantics, a FTDF graph is legal if the following conditions hold:1. G is connected.2. Data types between input and output ports connected viachannel must be the same.3. G is acyclic.4. At least one source and one sink actor must exist in themodel.An FTDF graph represents data dependencies of the actors (causality relations) that must be satisfied when scheduling the actors for execution on an implementation platform.3. Design MethodologyThe overall design flow of a modeling environment for FTDF applications is presented in Figure 2. The flow begins with the designer constructing a visual application model using the FTDF paradigm. GME stores objects in the model in a model database. The designer instantiates an interpreter from the GME user interface to initiate the model interpretation on objects in the model database. The graphical model is interpreted to generate a configuration file for the execution model (the run-time environment). The configuration file is used to generate and build the code for a complete executable model of the FTDF graph.3.1 Fault Tolerant Data Flow Domain ConstructionThe FTDF domain is constructed as a paradigm in GME. The paradigm is created using UML class diagrams, an internal syntactic structure in GME. The structure is used to define a finite set of visual objects. An instantiation of the FTDF paradigm allows the designer to use elements of that set of objects to construct an FTDF application model. The primary visual objects that may be used in the GME graphical interface are channels, ports, actors, and ECUs. Objects are parameterized for ease of differentiating between multiple components and configuring the components. If a designer chooses to duplicate an actor or ECU, then a copy is created that copies all attributes of the object and, it is automatically labeled to distinguish the difference between the two copies. If any attributes differ, then it is the designer’s job to alter the attributes.A UML class diagram depicts their relationship in Figure 3.A port is a typed, atomic object in the FTDF paradigm that does not provide any hierarchy. It can be of type internal or external. The types are used to distinguish whether or not a port is connected to actors on the same ECU or actors located on another ECU. A channel is a typed object that forms a connection between two ports – a source port and a destination port. The ports place constraints on the type of channel that connects two ports. For example, it is possible to have a system implement multiple protocols on different types of channels. Ports ensure that the channel type match the type of ports on each end of a channel.An actor is a typed object that could contain at least one port, depending on the type of actor. GME eases the use of adding, deleting, or modifying actor types in the domain from within GME. Actors have attributes that are used to configure important properties, including fields for implementing a firing rule and a firing function (also called an execution rule in Figure 3) for each actor and a field for specifying a timeout value for the execution model. Ideally, the code entered into the firing rule and firing function fields would be user specified code in the host programming language used to implement the execution model, like the ANSI C programming language.An ECU object is a hierarchical abstraction of a processing element. It contains at least one actor object. It concurrently executes its member actors. Structural relationships between different ECUs are captured with typed channels and ports.Figure 2. Design flow for a domain-specific environment for specifying FTDF applications.3.2 Interpreter DesignThe BON2 component generates an interface for the network of objects used in an FTDF paradigm during model interpretation. The FTDF interpreter utilizes the interfaces generated by BON2 to traverse FTDF objects. Model checking in GME can be done using OCL, using the interpreter, or by carefully defining appropriate relationship between a domain’sUML diagrams. For the FTDF paradigm, we chose to use the interpreter and relationships between FTDF objects. The interpreter in the FTDF paradigm not only transforms a model, but it can also be used to check whether the designer has constructed a legal FTDF graph.The interpreter uses a visitor-style traversal scheme introduced by Gamma et al. [9] that begins by traversing each ECU object in a model. This style eases modifications or extensions to the FTDF paradigm. Each actor of an ECU is traversed, and the actor information is obtained. Each actor’s connections can be queried to allow traversal of channel objects to determine the source and destination ports and associated actors. As the objects are traversed, a configuration file is created. This configuration file is a textual description of a network of FTDF application model objects and provides enough information to generate the code for the executable model. The configuration file has a flexible format so that it may be modified later. A sample configuration format for a single actor on a single processor is given in Figure 4. The sample configuration file is composed of unique information about each ECU and actor in the FTDF network. It contains destination actors and user specified C code that is obtained from the actor objects.Figure 3. A UML class diagram for objects in theFTDF paradigm.Figure 4. Sample template for a configuration file.3.3 The Executable ModelApplications in the domain of safety-critical control systems are expected to execute periodically on a possibly infinite stream of input data on an implementation platform with bounded memory. The executable model is a distributed system based on a runtime platform. The model can be executed to simulate the FTDF application, including the timing behavior of components. It can also be used as a final implementation if the target execution platform supports directly the runtime platform. The runtime platform and the executable model are currently implemented as a separate entity from the GME environment to allow changes in the underlying architecture. The runtime platform is a debugged C library that creates necessary data structures for coordinating the communication and computation of a network of FTDF components on a target platform. Our target platform is a network of personal computers running the Linux operating system communicating with an Internet Protocol (IP) stack. This enables rapid prototyping and simulation on various systems that can be networked using the IP protocol suite. Ideally, the designer could choose the target platform to execute its model on, for example using a real-time operating system instead of Linux. . The runtime platform parses the configuration file that is generated by the interpreter, and compiles the code into an executable file per host. The configuration file produced by the interpreter contains information from the FTDF application model, such as a network of ECUs, actor connections, and additional attributes such as actor code specified by the designer to allow reconstruction of the FTDF application in the runtime platform.In the runtime platform, actors are functions coded in C with well-defined interfaces. The designer only has to care about the functions he or she writes that characterizes the functionality of that actor. Since a legal FTDF graph operates under synchronous semantics and each actor in a schedule is assumed to execute once in a cycle, a schedule of actors on the runtime platform may execute infinitely (assuming no faults that may cause an actor’s firing function to not execute) with bounded memory. As a result, channels between actors are implemented with one-place buffers. ECUs operate concurrently.The runtime platform supports the simulation of more than one ECU on a same host, using Unix-based processes to run actors on different ECUs concurrently. Communication channels are simulated using pipes between processes to transmit and receive tokens or, when operating over a network of hosts, using the IP stack.4. An ExampleA simple example is given to illustrate the code generation capability of the FTDF design environment. The original application is illustrated in Figure 5. Figure 6 illustrates the application modeled in the GME graphical user interface using FTDF objects.Figure 5. Example application.The example given contains five actors. The actors are a source and sink actor denoted as “Sensor” and “Actuator” respectively, two regular actors denoted as “Controller” and “Fine Controller” in Figure 5, and an input actor denoted as “Arbiter”. The firing function of each is very simple; each actor’s firing rule and firing function contains a total of 12 lines of C code in the execution platform. Figure 7 illustrates a sample output after executing the interpreter in the FTDF paradigm on this example model. The configuration file is then read and parsed by the runtime platform. The runtime platform configures memory requirements and other data structures based on this configuration file. For sake of simplicity, the firing rule and firing function are given in pseudo-code in Figure 7. Ideally, this would be C code checked by the compiler during compilation on the runtime platform.Figure 6. Example application modeled in GME. 5. ResultsThe goal of this project is to create a FTDF design environment in GME that will aid in automatic deployment of embedded software for fault tolerant applications. The approach uses a visual modeling environment where the designer can easily enter models and realize application instances from the FTDF domain such that errors in application specification can be reduced and the time to specify an application can be reduced. The results given in this section are results from the sample application described in the previous section.Table 1 gives a summary of the estimated time and effort to create the FTDF design environment. The FTDF paradigm is constructed in GME, along with the interpreter. The runtime environment is implemented in 1084 lines of C code. Those three tasks are performed only once and can be reused for any subsequent application. The estimated time to develop and create the FTDF paradigm in GME assumes a basic understanding of UML class diagrams. In GME, the software architecture that supports an interpreter design is composed of a number of C++ files. To implement an interpreter for the FTDF paradigm, the first author implemented over 800 additional lines of code. The code utilizes the software architecture of GME to produce the configuration file that is used by the runtime environment to configure a FTDF application.Table 2 contrasts the amount of time it takes to configure the runtime environment with the appropriate connections between actors, location of actors on specific ECUs, and actor code for each actor in an application. Configuring the runtime environment consists of producing the configuration file shown in Figure 7 that details the structure of the application. The results in this table will vary depending on the size of the application. The example application contains 5 actor instances with approximately 50 total lines of code for actor code implementation and 2 ECU instances. So, one can imagine how the gain of using the FTDF paradigm in GME to specify an application for the runtime environment can scale with the application size. Table 3 gives a summary of the number of lines of code used to implement the given example application. These numbers will vary with the size of the application.Task Time (hours) FTDF Paradigm Development 12Interpreter Design 40Runtime Environment 85Table 1: Implementation time of the designenvironment.Time to enterapplication(hours)Time to produceconfiguration(Minutes) GME 1 0.15Manual 2 20Table 2: Time for entering example application and configuring the runtime environment.Lines of codeActor code for the application(firing rules and firing functionsonly)50 Configuration file 24Table 3: Lines of code used to implement the givenexample application.6. ConclusionsIn this paper, we presented a design flow for fault-tolerant applications running on a distributed implementation platform. The Fault-Tolerant Data Flow (FTDF) model of computation has been used for the specification and modeling of fault-tolerance in safety-critical, real-time feedback control applications. To aid in the design process of specifying an FTDF model and integrating validation and analysis tools, a design environment for FTDF was developed in the General Model Environment (GME). The environment allows the domain expert to model a system hierarchically using FTDF semantics and to automatically generate code for an execution of that model.Figure 7. Configuration file for example.An intuitive user interface is constructed that allows a visual representation of an FTDF model. The user interface allows a designer to produce quickly an execution model that may be used to verify timing requirements or generate code for a target system. Results show that the semi-automatic approach to code generation we propose in this paper made it quicker to prototype an execution model. Freeing the domain expert from detailed coding or from a difficult interaction with coding experts, results in reduced errors and shorter design time.The runtime system as of today is not completely integrated with the design environment. Complete integration in the environment as well as of the environment in Metropolis is one of our future goals. In addition, reliability analysis tools are being considered for addition to the design flow.7. AcknowledgementsThis work is partially supported by the Center for Hybrid and Embedded Software and Systems under the National Science Foundation ITR (Cooperative Agreement) CCR-0225610, MARCO/Gigascale Research Center, and General Motors Berkeley Labs. Interactions with the Metropolis design team from UC Berkeley is recognized and acknowledged.8. References[1] C. Pinello, L. P. Carloni, and A. L. Sangiovanni-Vincentelli. “Fault-tolerant deployment of embedded software for cost-sensitive real-time feedback control applications,” In Proc. Conf. Design, Automation, and Test in Europe (DATE), 2004.[2] E. A. Lee and D. G. Messerschmitt. “Synchronous data flow,” Proc. of the IEEE, vol. 75, no. 9, September, 1987.[3] A. Beneviste, P. Caspi, S. Edwards, N. Halbwachs, P. Le Guernic, and R. de Simone. “The synchronous language twelve years later,” In Proc. of the IEEE, March 1997.[4] E. A. Lee and T. M. Parks, “Dataflow process networks,” Proc. of the IEEE, vol. 83, no. 5, pp. 773-801, May, 1995.[5] S. Edwards, L. Lavagno, E. Lee, A. Sangiovanni-Vincentelli. “Design of embedded systems: formal methods, validation and synthesis,” Proc. of the IEEE, vol. 85(n.3), March 1997.[6] J. Sprinkle. “Model-integrated computing,” IEEE Potentials, Vol. 23, No. 1, pp. 28-30, February, 2004.[7] E. A. Lee and S. Neuendorffer. “Classes and subclasses in actor oriented designs,” Proc. of the Conference on Formal Methods and Models for Codesign (MEMOCODE), June 2004.[8] A. Ledeczi, M. Maroti, A. Bakay, G. Karsai, J. Garrett, C. Thomason IV, G. Nordstrom, J. Sprinkle, P. Volgyesi, “The generic modeling environment,” Workshop on Intelligent Signal Processing, Budapest, Hungary, May 17, 2001.[9] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995.[10] OMG UML Documentation website available at:/technology/uml .[11] The Metropolis Project Team. “The metropolis meta model version 0.4,” Technical Report UCB/ERL M04/38, University of California, Berkeley, CA USA 94720, September 2004.[12] J. Jurjens, E. Fernandez, R. France, B. Rumpe, ”Critical systems development with UML,” In Proc. of UML'03 satellite workshop, TUM technical report, 2003.[13] C. Leangsuksun, H. Song, and L. Shen. “Reliability modeling using UML,” The 2003 International Conference on Software Engineering Research and Practice, Las Vegas, June 2003.[14] MathWorks, “MATLAB/SIMULINK”, available at:.[15] Esterel Technologies, “SCADE Suite for Avionics”, available at: .[16] N. Viswanadham, V. V. S. Sarma, and M. G. Singh. Reliability of Computer and Control Systems. North-Holland, vol. 8, 1987.。
螺柱焊故障代码

l螺柱焊故障代码1.Weld head rests in R-Position---焊头在原位无动作2.No SOW—与工件无接触信号3.Weld head not in R-Position---枪头不到位4.SOW not reset---SOW信号没有复位5.Sequence fault during weld process---送料器不按顺序进行6.Program preselection missing---程序预选失败7.Program not active---程序没被激活8.Lift fault---提成距离报警9.Short circuit welding---焊接短路10.Droptime Timeout---下落时间超差11.Ldle welding---焊接中断12.Measuring line broken---测量线损坏13.Maintenance free counter necessary---14.Maintenance collet necessary---需更换夹持器15.Maintenance weld tool necessary---需维修焊接工具16.Maintenance feed tube necessary---需维修送料管17.Handgun,Sequence fault—操作顺序不当18.Feeder: Datum line not found—数据线没有找到19.Wrong stud loaded—焊钉装载错误20.Feeder resp.SD2sequence fault--21.Tolerance exceedance general---一般性超差22.SMPS :Control beyond operating range---输出电流小于设定电流23.Earth measuring line broken---地线损坏24.SMPS:Fault security circuit----SMPS安全回路报警25.System configuration---系统设置26.LWL-data connection feeder----通讯光缆损坏27.LWL-data connection customer interface-----机器人接口错误28.LWL-data connection SMPS----CPU与电源模块断路29.SMPS-Program not loaded---程序没有装载30.Feeder :V oltage still at security relais---无24v电压31.Feeder configuration---送料器配置问题32.Weld start not reset—开始焊接信号没有复位33.No stud loaded---夹持器内无钉34.Test mode conditios not met---模拟焊接35.Condition partial OS not met---36.Feeder not in automatic-mode---不在自动模式37.ZCPU:RAM-Module application----存储卡问题38.ZCPU:-RAM-Module memory configuration----存储卡配置问题39.ZCPU:RAM-Module missing or defective-----存储卡错误报警40.SMPS: Boot Loader started----电源电路板损坏41.Feeder: Auxiliary supplies missing----缺辅助电源42.Feeder: Fault at stud divider---道岔送钉错误43.Air pressure too low---外部气压过低44.Feeder: feed tube locking----送料管没有接到焊接工具上45.Feeder: Feed cycle Timeout---送料周期超时46.Feeder: Protective gas pressure controller---送料器的气体压力报警47.DSP-Communication ---DSP卡损坏48.Feeder:+5v console missing—缺少5v电源49.Feeder: LM/solenoid undervoltage---电压过低50.Wrong stud—无法分钉51.Feeder: LM/solenoid overvoltage---马达或电磁线圈电压过高52.LM/solenoid not connected----无法检测到LM电磁线圈的反馈53.Feeder: short circuit outlet LM/solenoid----提升电机安全回路短路54.Feeder: Amplifier earth fault---放大器接地故障55.Feeder: no path measurement signals----无测量信号56.Feeder : no space for lift---提升空间不足57.Feeder: Lift height Timeout----提升时间超差58.Feeder: Amplifier excess temperature---放大器温度过高59.Feeder: no valid Flash-SW ROM DSP----系统没检测到软件60.Feeder: no valid Flash –SW ROMC161----送料器软件不合法61.SD-5: V oltages missing----电压错误62.Stud divider: Position not reached---分钉器没有到位63.SD-5: not in start-position----螺柱没有到指定的位置64.Feeder: LM system fault-----提升电机系统报警65.Feeder: LM not ready---下次焊接没有准备好66.Feeder: Amplifier-card missing/defective---- 放大器板卡故障/报警67Feeder: stud length too short---钉的突出长度不足68.Feeder: Colour marking not possible----送料器电路板软件错误69.Feeder: Wrong software for detected hardware---错误的软件检测硬件70.Feeder: +5v Encoder missing----无5v编码器电压71.SMPS: Temperature too high―――温度过高72.SMPS: Hardware fault―――功率模块故障。
无心磨削错误的常见原因

Intermittent Cut 间歇性磨削
Grinding wheel not correctly trued 砂轮没修整好 Regulating wheel “camming” 导轮形状没修整好 Warped work piece 工件弯曲
Fish Tails 末端呈鱼尾状
Dirty coolant 冷却液太脏 Loose grit in wheel 轮子上砂粒没粘紧
Grinding Wheel – Too Hard 砂轮太硬
May cause burn marks 可能导致杆上有烧伤痕迹 Can cause squeal because it is not free cutting 刺耳的噪音 Glazing may occur on wheel face 砂轮表面出现釉面 Soft materials may “load” wheel 材质软的材料可能附在砂 轮上 Often produce chatter on work piece 制造噪音 Sizing trouble, due to no spark out 尺寸问题 May cause heat checks or cracks 可能导致热龟裂或裂开
Feed Lines 进料线纹
Grinding wheel not relieved on exit side 砂轮出口端没有修 整好 Work guides not properly set 夹板没调好 On long bar fixtures – excessive pressure on work piece by roller hold-down 长轴夹具- 工件受压过大
常见数控机床报警信息(中英文对照),太全啦!

常见数控机床报警信息(中英文对照),太全啦!平时在操作数控机床时,总会遇到报警的信息提示,有些英文的对照不免让新学者头疼,小编特意整理了一些常见的数控机床报警信息中英文对照版的,没事翻翻就当小小工具书了!T WORD ERROR (T 码错误)LOW OIL LEVEL (油位低)SPINPLE FAULT (主轴故障)SPINDLE ALARM (主轴报警)EXTERNAL EMG STOP (急停按钮被按下)AC NOT READY (交流盘未准备好)SPINPLE LUBE FAULT (主轴润滑故障)T CODE ERROR (T代码出错,非法T代码)M CODE ERROR (M代码出错,非法M代码)SERVO NOT READY (伺服未准备好)NC NOT READY(NC没准备好)TURRET FAULT (转塔故障)TURRET LIMIT (转塔限位)DC 24V OPEN (直流24断开)+24V NOT READY(+24V没准备好)GRAR DRIFT (档位漂移)PLEASE AXIS RETURN HOME(轴未回零)PLEASE DRUM RETURN HOME(刀库未回零)AIR PRESSURE DROP (压缩空气压力过低)CLAMP TOOL FALL(夹刀失败)DRUM NOT PARKED(刀库未在原值)X ZERO POINT NOT REACHED (X 轴未回零)Y ZERO POINT NOT REACHED (Y 轴未回零)Z ZERO POINT NOT REACHED (Z 轴未回零)4TH ZERO POINT NOT REACHED (第4轴未回零)X AXIS OVERTRAVL(X轴超限)Y AXIS OVERTRAVL (Y轴超限)Z AXIS OVERTRAVL (Z轴超限)COUNTER SWITCH REEOR (计数开关故障)MASTERT RANSFER OVER TEMP (主变压器过热)Z AXIS NOT AT FIRST REF POSITION (Z轴未在第一参考点)SPINDLE ORIENTATION FALLURE (主轴定向失败)TOOL DESENT OR TOOL DATA REEOR (刀具数据错误)PLEASE UNLOAD THE TOOL ON SPRINELK (请卸下主轴上的刀)PLEASE LOAD TOOL ON APINDLE (请装上主轴上的刀)A AXIS UNCLAMP FAIL (A 轴松开失败)A AXIS CLAMP FAIL (A 轴夹紧失败)DRUM OUT TO APRONDLEIS FALL (刀库摆向换刀位失败)MG SWING OVERLOAD(刀库摆动过载)DRUM BACK PARK IS FALL (刀库摆回原始位失败)TURRENT MOTOR1 OVERLOAD (刀库移动电机过载)COOLANT MOTOR OVERLOAD (冷却泵过载)DRUM ATC FAULT (自动换刀失败)TOOLS UNLOCKED (刀具未锁紧)BATTERY ALARM (电池报警)DRUM POSITION SWITCH ERROR (刀库位置检测开关故障)DRUM NOW NOT AT PARK (刀库未在原始位置)IT DANGOU TO MOVE DRUM (刀库禁动)POT UO FAILOR POT NOT AT UP POSITION (刀套未在水平位)POT DOWN FAIL (刀套翻下动作失败)IT IS DANGOUR TO MOVE ARM (机械手禁动)THE SPINDLE STATU IS ERROR (主轴状态错误)ARM MOTOR OR ARM SWITCH FALL (机械手或机械手开关故障)CENTRE LUBRICATION FALL (中心润滑故障)THE WORK NOT CLAMPED (工件未夹紧)AUTO TOOL CHANGE FAULT (自动换刀失败)TOOL DATA OUT OF RANGE (指令刀具号超出范围)THE ORDER TOOL NOW IN SPINDLE (目标刀具在主轴上)THE THREE SPINDLE SWITCH FAULT (主轴上的接近开关)THE CENTRE COOLANT IS LOWER (刀具内冷泵液位过低)DRUM RETURN 1# POSITION FAULT (刀库自动回零失败)SPINDLE OVERLOAD (主轴过载)TURRENT MOTOR OVERLOAD (刀盘转动电机过载)CHIP CONVEYER OVERLOAD (拉屑器过载)HARD LIMIT OR SERVO ALARM (硬限位或伺服报警)NO LUB OIL (无润滑油)INDEX HEAD UNLOCKED (分度头未锁紧)MT NOT READY(机床没准备好)MG OVERLOAD (刀库过载)LUB EMPTY(润滑无油)AIR PRESSURE(气压不足)COOLANT NOT READY(冷却没准备好)LUBE EMPTY(油雾油位低,润滑油位低)LUB PRESSURE LOW(润滑压力低)CONVEY VERLOAD(排屑过载)LUB OVERLOAD(润滑过载)LUBE PRESSURE LOW(油雾压力低)SERIAL SPINDLE ALARM (串行主轴报警)NC BATTERY LOW ALARM(NC电池低报警)MAGAZINE MOVE LIMIT SWITCH ERRORSPINDLE TOOL UNLAMP POSITION LIMIT SWITCH ERROR MAGAZINE NOT IN POSITION OR SENSOR ERRORAIR PRESSURE LOW ALARM(气压低报警)MOTOR OVERLOAD(电机过载)T CODE > MAGAZINE TOOLST CODE < 1 ERROR5TH AXIS HARDWARE OVERTRAVER LIMIT ERROR DOOR IS OPENED(开门)LUB PRESSURE SWITCH ERROR(油压开关错误) SPINDLE OIL COOLANT UNIT ERRORSPINDLE LOAD ABNORMAL(主轴负荷异常)TRANSDUCER ALARM(传感器报警)BED-HEAD LUBRICATE OFF(床头润滑关闭)EMG OFF太HYDRAULIC CHUCK PRESS LOW(液压夹头压力低)HYDRAULIC TAIL PRESS LOW(液压尾座压力低)LUB 0IL LOW(油压低)TURRET CODE ERROR(转塔码错误)TURRET RUN OVERTIME(转塔运行超时)MANUAL HANDLE INTERRUPTTRY TO RUN SPINDLE WHILE CHUCK NOT LOCK TRY TO RUN SPINDLE WHILE TAIL NOT LOCK SPINDLE NEUTRAL GEARMAGAZINE ADJUSTHYDRAULIC NOT RUNSAFETY DOOR BE OPENEDSAFETY DOOR NOT CLOSENOT ALL AXIS HAVE GONE BACK REFIN ADJUST,IGNORE GOING BACK REFAFTER EXCHANGE TOOL,CYCLE STARTATC MOTOR QF16 OFFX AXIS IS LOCKED(X 轴被锁定)Y AXIS IS LOCKED(Y 轴被锁定)Z AXIS IS LOCKED(Z 轴被锁定)A AXIS IS LOCKED(A 轴被锁定)SPINDLE MOTOR FAN QF26 OFF(主轴电机风扇QF26关闭)关注我们的微信公众号shukongzhpngguo,获取更多数控技术文章。
cnc检讨书

cnc检讨书篇一:数控撞刀检讨书篇一:数控车床操作时如何防止撞刀数控车床操作时如何防止撞刀撞刀是指刀具(包括刀架、拖板等)在移动过程中与工件、卡盘或尾座发生意外碰撞的机床事故,撞刀是数控车床操作新手最有可能发生的事故,一旦发生撞刀事故,轻者影响机床精度,重者造成机床损坏,必须引起操作者的高度重视。
为防止发生撞刀,建议应从操作工及程序编写二方面做好工作:操作工应注意以下几点:(1)经常检查车床限位挡块是否在正确位置,有否松动;(但应注意机床限位只能在行程极限位置处起到保护作用,由于刀具伸出位置的不同、工件毛坯大小不同等情况的存在,在大多数情况下,机床限位在加工过程中并不能有效起到防止撞刀的作用)。
(2)程序输入完成后必须仔细检查是否存在错误,避免因坐标数字输错而引起撞刀。
(3)正确对刀并设置刀补,注意z方向试切对刀时,必须注意对刀使用的z向零点应与编程使用的z向零点统一,避免因工件坐标系设置不统一而造成撞刀。
(4)开始阶段运行时,把快速倍率设置得慢一些(例如可设置到25%)。
(5)程序编好后应先进行单段调试,并把显示屏幕切换到能同时看到工件坐标系及正在执行的程序的页面。
(6)调试过程中随时注意当前绝对坐标值及下一个程序段的终点坐标位置以确定刀具将移动的距离,然后观察当前刀具位置至工件位置之间的距离,从而判断是否可能相撞,并请特别注意下面二点:★特别注意程序中第一个g00移动指令(及换刀以后的第一个g00移动指令),许多撞刀事故都发生在这一程序段,运行该程序段时请把左手放在《暂停》(《进给保持》)按钮处,必要时按下《暂停》。
★在不熟练的情况下,可把第一个g00坐标设置在离毛坯稍远处,接着用第二个g00定位到开始加工位置,以便在单段运行时及时发现问题。
★如下一程序段是换刀指令,必须考虑相关刀具的伸出长度,确信刀架转动时不会发生撞刀后,才可运行下一个程序段。
(7)gsk980系列产品对刀如使用g50设置坐标,必须注意回机械零点后有可能(根据系统参数设置而定)绝对坐标被恢复到初始值,从而导致意外发生。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Fault Tolerant High Performance Computingby a Coding Approach∗Zizhong Chen,Graham E.Fagg,Edgar Gabriel,Julien Langou,Thara Angskun,George Bosilca,and Jack DongarraComputer Science Department,University of Tennessee1122Volunteer Blvd.,Suite233,Knoxville,TN37996-3450,USA {zchen,fagg,egabriel,langou,angskun,bosilca,dongarra}@ABSTRACTAs the number of processors in today’s high performancecomputers continues to grow,the mean-time-to-failure ofthese computers are becoming significantly shorter than theexecution time of many current high performance computingapplications.Although today’s architectures are usually ro-bust enough to survive node failures without suffering com-plete system failure,most today’s high performance comput-ing applications can not survive node failures and,therefore,whenever a node fails,have to abort themselves and restartfrom the beginning or a stable-storage-based checkpoint.This paper explores the use of thefloating-point arith-metic coding approach to build fault survivable high per-formance computing applications so that they can adaptto node failures without aborting themselves.Despite theuse of erasure codes over Galoisfield has been theoreticallyattempted before in diskless checkpointing,few actual im-plementations exist.This probably derives from concernsrelated to both the efficiency and the complexity of imple-menting such codes in high performance computing appli-cations.In this paper,we introduce the simple but effi-cientfloating-point arithmetic coding approach into disklesscheckpointing and address the associated round-offerror is-sue.We also implement afloating-point arithmetic versionof the Reed-Solomon coding scheme into a conjugate gra-dient equation solver and evaluate both the performanceand the numerical impact of this scheme.Experimentalresults demonstrate that the proposedfloating-point arith-metic coding approach is able to survive a small number ofsimultaneous node failures with low performance overheadand little numerical impact.plication processes are aborted and the whole application is restarted from the last checkpoint.The major source of overhead in all stable-storage-based checkpoint systems is the time it takes to write checkpoints to stable storage[21]. The checkpoint of an application on a,say,ten-thousand-processor computer implies that all critical data for the ap-plication on all ten thousand processors have to be written into stable storage periodically,which may introduce an un-acceptable amount of overhead into the checkpointing sys-tem.The restart of such an application implies that all processes have to be recreated and all data for each pro-cess have to be re-read from stable storage into memory or re-generated by computation,which often brings a large amount of overhead into restart.It may also be very ex-pensive or unrealistic for many large systems such as grids to provide the large amount of stable storage necessary to hold all process state of an application of thousands of pro-cesses.Therefore,due to the high frequency of failures for the next generation computing systems,the classical check-point/restart fault tolerance approach may become a very inefficient way to deal with failures.Alternative fault toler-ance approaches need to be investigated.In this paper,we implement and evaluate an alterna-tive approach to build fault tolerant high performance com-puting applications so that they can survive a small num-ber of simultaneous processor failures without aborting the whole application.Based on diskless checkpointing[21]and FT-MPI,a fault tolerant version of MPI we developed[9, 10],our fault tolerance approach removes stable storage from fault tolerance and takes an application-level approach, which gives the application developer an opportunity to achieve as low fault tolerance overhead as possible according the specific characteristics of an application.Unlike in tra-ditional checkpoint/restart fault tolerance paradigm,in our fault tolerance framework,if a small number of application processes failed,the survival application processes will not be aborted.Instead,the application will keep all survival processes,and adapt itself to failures.Despite the use of erasure codes over Galoisfield has been theoretically attempted[18]before in diskless checkpoint-ing,few actual implementations exist.This probably de-rives from concerns related to both the efficiency and the complexity of implementing such codes in high performance computing applications[16,19].In this paper,we intro-duce the simple but efficientfloating-point arithmetic coding approach into diskless checkpointing and address the asso-ciated round-offerror issue.We also implement afloating-point arithmetic version of the Reed-Solomon coding scheme into a conjugate gradient equation solver and evaluate both the performance and the numerical impact of this coding scheme.Experimental results demonstrate that the pro-posedfloating-point arithmetic coding approach can survive a small number of simultaneous processor failures with low performance overhead and little numerical impact.The rest of the paper is organized as follow.Section2 gives a brief introduction to FT-MPI from the user point of view.Section3introduces thefloating-point arithmetic coding approach into diskless checkpointing and address the associated round-offerror issue.In Section4,we give a de-tailed presentation on how to write a fault survivable appli-cation with FT-MPI by using a conjugate gradient equation solver as an example.In Section5,we evaluate both the performance overhead of our fault tolerance approach and the numerical impact of ourfloating-point arithmetic encod-ing.Section6discusses the limitations of our approach and possible improvements.Section7concludes the paper and discusses future work.2.FT-MPI:A FAULT TOLERANT MPIIMPLEMENTATIONCurrent parallel programming paradigms for high-performance computing systems are typically based on message passing, especially on the Message-Passing Interface(MPI)specifica-tion[17].However,the current MPI specification does not deal with the case where one or more process failures occur during runtime.MPI gives the user the choice between two possibilities of how to handle failures.Thefirst one,whichis also the default mode of MPI,is to immediately abort allthe processes of the application.The second possibility is just slightly moreflexible,handing control back to the user application without guaranteeing,however,that any further communication can occur.2.1FT-MPI OverviewFT-MPI[10]is a fault tolerant version of MPI that is able to provide basic system services to support fault surviv-able applications.FT-MPI implements the complete MPI-1.2specification,some parts of the MPI-2document and extends some of the semantics of MPI for allowing the ap-plication the possibility to survive process failures.FT-MPI can survive the failure of n-1processes in a n-process job, and,if required,can re-spawn the failed processes.How-ever,the application is still responsible for recovering the data structures and the data of the failed processes.Although FT-MPI provides basic system services to sup-port fault survivable applications,prevailing benchmarks show that the performance of FT-MPI is comparable[11]to the current state-of-the-art MPI implementations.2.2FT-MPI SemanticsFT-MPI provides semantics that answer the following ques-tions:1.what is the status of an MPI object after recovery?2.what is the status of ongoing communication and mes-sages during and after recovery?When running an FT-MPI application,there are two pa-rameters used to specify which modes the application is run-ning.Thefirst parameter,the’communicator mode’,indicates what is the status of an MPI object after recovery.FT-MPI provides four different communicator modes,which can be specified when starting the application:•ABORT:like any other MPI implementation,FT-MPIcan abort on an error.•BLANK:failed processes are not replaced,all surviv-ing processes have the same rank as before the crashand MPI WORLD has the same size as be-fore.•SHRINK:failed processes are not replaced,howeverthe new communicator after the crash has no’holes’inits list of processes.Thus,processes might have a newrank after recovery and the size of MPI WORLDwill change.•REBUILD:failed processes are re-spawned,surviving processes have the same rank as before.The RE-BUILD mode is the default,and the most used mode of FT-MPI.The second parameter,the’communication mode’,indi-cates how messages,which are on the’fly’while an error occurs,are treated.FT-MPI provides two different commu-nication modes,which can be specified while starting the application:•CONT/CONTINUE:all operations which returned the error code MPIIn this section,we explore the possibility of treating the checkpoint data asfloating-point numbers rather than bit-streams.In the following subsections,we discuss how the local checkpoint can be encoded so that applications can survive failures and address the associated round-offerror issue.3.1Neighbor-Based EncodingIn neighbor-based encoding,a neighbor processor isfirst defined for each computation processor.Then,in addition to keep a local checkpoint in its memory,each computation processor stores a copy of its local checkpoint in the memory of its neighbor processor.Whenever a computation proces-sor fails,the lost local checkpoint data can be recovered from its neighbor processor.The performance overhead of the neighbor-based encod-ing is usually very low.The checkpoint are localized to only two processors:a computation processor and its neigh-bor.The recovery only involves the failed processors and its neighbors.There is no global communications or en-coding/decoding calculations needed in the checkpoint and recovery.Because nofloating point operations are involved in the checkpoint and recovery,no round-offerrors are introduced in the neighbor-based encoding.Depending on how we define the neighbor processor of a computation processor,there are three neighbor-based en-coding schemes3.1.1MirroringThe mirroring scheme of neighbor-based encoding is origi-nally proposed in[21].In this scheme,if there are n compu-tation processors,another n checkpoint processors are ded-icated as neighbors of the computation processors.The i-th computation processor simply stores a copy of its local checkpoint data in the i-th checkpoint processor(see Figure 1(a)).Up to n processor failures may be tolerated,although the failure of both a computation processor and its neighbor pro-cessor can not be tolerated.If we assume that the failure of each processor are independent and identically distributed, then the probability that the mirroring scheme survives k processor failures isC k n2k2 processor failures in a n processor job de-pending on the distribution of the failed processors.Compared with mirroring scheme,the advantage of the ring neighbor scheme is that there is no processor redun-dancy in the scheme.However,two copies of checkpoints have to be maintained in the memory of each computation processor.The degree of fault tolerance of the ring neighbor scheme is also lower than the mirroring scheme.(a) (b) (c)Figure1:Neighbor-Based Schemes3.1.3Pair NeighborAnother possibility is to organize all computation proces-sors as pairs(assume there are even number of computation processors).The two processors in a pair are neighbors of each other.Each processor sends a copy of its local check-point to its neighbor processor(see Figure1(c)).Like the ring neighbor scheme,there is no processor re-dundancy used in the paired neighbor scheme and two copies of checkpoints have to be maintained in the memory of each computation processor.However,compared with the ring neighbor scheme,the degree of fault tolerance for the pair neighbor scheme are improved.Like the mirroring scheme,if we assume that the failure of each processes are independent and identically dis-tributed,then the probability that the pair neighbor scheme survives k failures in a n processor job isC k n/22ksume P i is the local checkpoint data in the memory of the i-th computation processor.C is the checksum of the lo-cal checkpoint in the checkpoint processor.If we look at the checkpoint data as an array of real numbers,then the checkpoint encoding actually establishes an identityP1+...+P n=C(1) between the checkpoint data P i on computation processors and the checksum data C on the checksum processor.If any processor fails then the identity(1)becomes an equation with one unknown.Therefore,the data in the failed proces-sor can be reconstructed through solving this equation. Due to thefloating-point arithmetic used in the check-point and recovery,there will be round-offerrors in the checkpoint and recovery.However,the checkpoint involves only additions and the recovery involves additions and only one subtraction.In practice,the increased possibility of overflows,underflows,and cancellations due to round-offer-rors in the checkpoint and recovery algorithm is negligible. The basic checksum scheme can survive only one failure. However,it can be used to construct one dimensional check-sum scheme to survive certain multiple failures.(a)(b)Figure2:Checksum Based Schemes3.2.2One Dimensional Checksum SchemeThe one dimensional checksum scheme works as follow. Assume the program is running on mn processors.Par-tition the mn processors into m groups with n processors in each group.Dedicate one checksum processor for each group.At each group,the checkpoint are done using the basic checksum scheme(see Figure2(b)).The advantage of this scheme is that the checkpoint are localized to a subgroup of processors,so the checkpoint en-coding in each sub-group can be done parallelly.Therefore, compared with the basic checksum scheme,the performance of the one dimensional checksum scheme is usually better. If we assume that the failure of each processes are indepen-dent and identically distributed,then the probability that the one dimensional checksum scheme survives k(k<m) failures isC k m(n+1)kandC ih+1=a ih+11P1+...+a ih+1nP n ...C i m=a i m1P1+...+a i m n P n.(4) Let A r denote the coefficient matrix of the linear system(3).If A r has full column rank,then P j1,...,P jkcan berecovered by solving(3),and C ih+1,...,C i m can be recoveredby substituting P j1,...,P jkinto(4).Whether we can recover the lost data on the failed pro-cesses or not directly depends on whether A r has full column rank or not.However,A r in(3)can be any sub-matrix(in-cluding minor)of A depending on the distribution of the failed processors.If any square sub-matrix(including mi-nor)of A is non-singular and there are no more than m process failed,then A r can be guaranteed to have full col-umn rank.Therefore,to be able to recover from any no more than m failures,the checkpoint matrix A has to satisfy any square sub-matrix(including minor)of A is non-singular. How can wefind such kind of matrices?It is well known that some structured matrices such as Vandermonde matrix and Cauchy matrix satisfy any square sub-matrix(including minor)of the matrix is non-singular.However,in computerfloating point arithmetic where no computation is exact due to round-offerrors,it is well known that,in solving a linear system of equations,a condition number of10k for the coefficient matrix leads to a loss of ac-curacy of about k decimal digits in the solution.Therefore, in order to get a reasonably accurate recovery,the check-point matrix A actually has to satisfy any square sub-matrix (including minor)of A is well-conditioned.It is well-known[8]that Gaussian random matrices are well-conditioned.To estimate how well conditioned Gaus-sian random matrices are,we have proved the following The-orem:Theorem 1.Let G m×n be an m×n real random matrix whose elements are independent and identically distributed standard normal random variables,and letκ2(G m×n)be the 2-norm condition number of G m×n.Then,for any m≥2, n≥2and x≥|n−m|+1,κ2(G m×n)satisfiesP κ2(G m×n)√x|n−m|+1,andE(lnκ2(G m×n))<lnnCompute r(0)=b−Ax(0)for some initial guess x(0)for i=1,2,...solve Mz(i−1)=r(i−1)ρi−1=r(i−1)T z(i−1)if i=1p(1)=z(0)elseβi−1=ρi−1/ρi−2p(i)=z(i−1)+βi−1p(i−1)endifq(i)=Ap(i)αi=ρi−1/p(i)T q(i)x(i)=x(i−1)+αi p(i)r(i)=r(i−1)−αi q(i)check convergence;continue if necessaryendFigure4:Preconditioned Conjugate Gradient Algo-rithmWe then incorporate the basic weighted checksum scheme into the PCG code.Assume the PCG code uses n MPI processes to do computation.We dedicate another m MPI processes to hold the weighted checksums of the local check-point of the n computation processes.The checkpoint ma-trix we use is a pseudo random matrix.Note that the sparse matrix does not change during computation,therefore,we only need to checkpoint three vectors(i.e.the iterate,the residual and the search direction)and two scalars(i.e.the iteration index andρ(i−1)in Figure4).The communicator mode we use is the REBUILD mode. The communication mode we use is the NOOP/RESET mode.Therefore,when processes failed,FT-MPI will drop all ongoing messages and re-spawn all failed processes with-out changing the rank of the surviving processes.An FT-MPI application can detect and handle failure events using two different methods:either the return code of every MPI function is checked,or the application makes use of MPI error handlers.The second mode gives users the possibility to incorporate fault tolerance into applications that call existing parallel numerical libraries which do not check the return code of their MPI calls.In PCG code,we detect and handle failure events by checking the return code of every MPI function.The recovery algorithm in PCG makes use of the longjmp function of the C-standard.In case the return code of an MPI function indicates that an error has occurred,all sur-viving processes set their state variable to RECOVER and jump to the recovery section in the code.The recovery al-gorithm consists of the following steps:1.Re-spawn the failed processes and recover the FT-MPIruntime environment by calling a specific,predefined MPI function.2.Determining how many processes have died and whohas died.3.Recover the lost data from the weighted checksumsusing the algorithm described in Section4.3.1.4.Resume the computation.Another issue is that how a process can determine whether it is a survival process or it is a re-spawned process.FT-MPI offers the user two possibilities to solve this problem:•In thefirst method,when a process is a replacement for a failed process,the return value of its MPIINIT PROCS).•The second possibility is that the application intro-duces a static variable.By comparing the value of this variable to the value on the other processes,the application can detect,whether everybody has been newly started(in which case all processes will have the pre-initialized value),or whether a subset of processes have a different value,since each processes modifies the value of this variable after the initial check.This second approach is somewhat more complex,however, it is fully portable and can also be used with any other non fault-tolerant MPI library.In PCG,each process checks whether it is a re-spawned pro-cess or a surviving process by checking the return code of its MPI5.EXPERIMENTAL EV ALUATIONIn this section,we evaluate both the performance over-head of our fault tolerance approach and the numerical im-pact of ourfloating-point arithmetic encoding using the PCG code implemented in the last section.We performed four sets of experiments to answer the fol-lowing four questions:1.What is the performance of FT-MPI compared withother state-of-the-art MPI implementations?2.What is the performance overhead of performing check-pointing?3.What is the performance overhead of performing re-covery?4.What is the numerical impact of round-offerrors inrecovery?For each set of experiments,we test PCG with four different problems.The size of the problems and the number of com-putation processors used(not include checkpoint processors) for each problem are listed in table1.All experiments were performed on a cluster of64dual-processor2.4GHz AMD Opteron nodes.Each node of the cluster has2GB of memory and runs the Linux operating system.The nodes are connected with a Gigabit Ethernet. The timer we used in all measurements is MPISize of the ProblemProb#115329,220Prob#3601,316,880Prob#1Prob#3MPICH-1.2.61985.310199.8510.92331.4FT-MPI1052.26606.9482.72247.5FT-MPI rcvr1061.36634.0Figure5:PCG Performance with Different MPI Im-plementations5.2Performance Overhead of TakingCheckpointThe purpose of the second set of experiments is to mea-sure the performance penalty of taking checkpoints to sur-vive general multiple simultaneous processor failures.There is no processor failures involved in this set of experiments. At each run,we divided the processors into two classes.The first class of processors are dedicated to perform PCG com-putation work.The second class of processors are dedicated to perform checkpoint.In table3and4,thefirst column of the table indicates the number of checkpoint processors used in each test.If the number of checkpoint processors used in a run is zero,then there is no checkpoint in this run.For all experiments,we ran PCG for2000iterations and checkpoint every100iterations.Table3:PCG Execution Time(in seconds)with CheckpointTime Prob#2Prob#4480.32241.81ckpt1055.16614.5484.42250.33ckpt1059.96619.7488.12254.75ckpt1064.36625.1Table3reports the execution time of each test.In order to reduce the disturbance of the noise of the program execution time to the checkpoint time,we measure the time used for checkpointing separately for all experiments.Table4reports the individual checkpoint time for each experiment.Figure6Table4:PCG Checkpointing Time(in seconds)Time Prob#2Prob#42.6 5.52ckpt 5.810.66.010.24ckpt9.915.09.814.1Prob#1Prob#30proc1052.26606.9485.82256.02proc1063.66633.5490.02262.14proc1068.86638.2494.92267.5Prob#1Prob#31proc 5.018.23.79.23proc 6.020.04.510.45proc7.021.5From table6,we can see the recovery time increases ap-proximately linearly as the number of failed processors in-creases.However,the recovery time for a failure of one pro-cessor is much longer than the increase of the recovery timefrom a failure of k(where k>0)processors to a failure ofk+1processors.This is because,from no failure to a failurewith one failed processor,the additional work the PCG hasto perform includesfirst setting up the recovery environmentand then recovering data.However,from a failure with k(where k>0)processors to a failure with k+1processors,the only additional work is to recover data for an additionalprocessor.From Figure7,we can see the overheads for recovery in alltests are within1%of the program execution time,which isagain within the noise margin of a program execution time.5.4Numerical Impact of Round-Off Errors inRecoveryAs discussed in Section3,our diskless checkpointing schemesare based onfloating-point arithmetic encodings,therefore,introduce round-offerrors into the checkpointing system.The experiments in this sub-section are designed to measureFigure7:PCG Recovery Overheadthe numerical impact of the round-offerrors in our check-pointing system.All experiment configurations are the same as previous section except that we report the norm of the residual at the end of each computation.Note that if no failures occur,the computation proceeds with the same computational data as without checkpoint. Therefore,the computational results are affected only when there is a recovery in the computation.Table7reports the norm of the residual at the end of each computation when there is0,1,2,3,4,and5simultaneous process failures. Table7:Numerical Impact of Round-OffErrors in PCG RecoveryResidual Prob#2Prob#43.050e-6 3.071e-61proc 4.500e-6 4.472e-62.973e-6 2.731e-63proc 3.213e-6 3.585e-63.438e-6 2.732e-65proc 4.082e-6 4.238e-6 From table7,we can see that the norm of the residuals are different for different number of simultaneous process failures.This is because,after recovery,due to the impact of round-offerrors in the recovery algorithm,the PCG com-putations are performed based on different recovered data. However,table7also indicates that the residuals with re-covery do not have much difference from the residuals with-out recovery.This is because the PCG algorithm we use are numerically stable for the test problems and,from the floating-point number point of view,all the recovered data do not have much difference from the lost data. However,if the applications are itself numerically unsta-ble,then the small difference between the lost data and the recovered data can be amplified.Fortunately,most high per-formance numerical computing applications require numer-ically stable algorithms.Therefore,our recovery schemes would introduce very little impact on them.6.DISCUSSIONThe size of the checkpoint affects the performance of any checkpointing scheme.The larger the checkpoint size is,the higher the diskless checkpoint overhead would be.In the PCG example,we only need to checkpoint three vectors and two scalars periodically,therefore,the performance overhead is very low.Diskless checkpointing is good for applications that mod-ify a small amount of memory between checkpoints.There are many such applications in high performance computing field.For example,in typical iterative methods for sparse matrix computation,the sparse matrix is often not modi-fied during the program execution,only some vectors and scalars are modified between checkpoints.For this type of application,the overhead for surviving a small number of processor failures is very low.Even for applications which modify a relatively large amount of memory between two checkpoints,decent performance re-sults to survive single processor failure were still reported in[16,19].The basic weighted checksum scheme implemented in the PCG example has a higher performance overhead than other schemes discussed in Section3.When an application is exe-cuted on large number of processors,to survive general mul-tiple simultaneous processor failures,the one dimensional weighted checksum scheme will achieve a much lower perfor-mance overhead than the basic weighted checksum scheme. If processor fails one after another(i.e.no multiple simulta-neous processor failures),the neighbor based schemes can achieve even lower performance overhead.It was shown in[6]that a neighbor-based checkpointing was an order of magnitude faster than a parity-based checkpointing,but takes twice as much storage overhead.Diskless checkpointing could not survive a failure of all processors.Also,to survive a failure occurred during check-point or recovery,the storage overhead would double.If an application needs to tolerate these types of failures,a two level recovery scheme[24]which uses both diskless check-pointing and stable-storage-based checkpointing is a good choice.Another drawback of our fault tolerance approach is that it requires the programmer to be involved in the fault tol-erance.However,if the fault tolerance schemes are imple-mented into numerical softwares such as LFC[5],then trans-parent fault tolerance can also be achieved for programmers using these software tools.7.CONCLUSION AND FUTURE WORKIn this paper,we presented how to build fault tolerant high performance computing applications with FT-MPI by a coding approach.We introduced the simple but efficient floating-point arithmetic coding approach into diskless check-pointing and addressed the associated round-offissue.We implemented afloating-point arithmetic version of the Reed-Solomon coding scheme into a conjugate gradient equation solver and evaluated both the performance and the numer-ical impact of this coding scheme.Experimental results demonstrated that the proposedfloating-point arithmetic coding approach is able to survive a small number of simul-taneous node failures with low performance overhead and little numerical impact.For the future,we will evaluate our fault tolerance ap-proach on systems with larger number of processors.We would also like to evaluate our fault tolerance approach with more applications and more coding schemes.。