OpenEmbedded 是一个自动化构建框架,用于创建和交付嵌入式Linux 系统。

这个开源项目的目标是提供一个灵活、可扩展的基础架构,用于构建自定义的Linux 发行版。

OpenEmbedded 提供了一种方式来管理包和构建系统以及处理系统依赖关系等问题。

本文将一步一步解答OpenEmbedded MiniProgram 的参数和其相关主题。

第一步:什么是OpenEmbedded MiniProgram?OpenEmbedded MiniProgram 是一个针对OpenEmbedded 自动化构建框架设计的一个小型程序。

它基于不同的参数,可以帮助用户更方便地使用OpenEmbedded 构建他们的嵌入式Linux 系统。

第二步:OpenEmbedded MiniProgram 的参数介绍OpenEmbedded MiniProgram 提供了多个参数,让用户可以灵活控制构建过程和生成的系统。

以下是一些常见参数的介绍:1. `layers`: 用于指定OpenEmbedded 项目所需的层配置。

层是一种逻辑组织和管理OpenEmbedded 构建所需的软件包、配置文件和脚本的方式。


2. `target`: 指定目标硬件平台或设备。

不同的硬件平台或设备可能具有不同的体系结构和特点,根据不同的目标,OpenEmbedded 可以生成相应的系统。

3. `image`: 指定生成的镜像文件的类型。

OpenEmbedded 可以生成多种类型的镜像文件,如根文件系统镜像、内核镜像等。


4. `machine`: 确定要为哪个设备构建系统。


5. `distro`: 指定使用的发行版。

发行版是指基于Linux 内核的软件集成的特定版本。

在本文中,我们将深入探讨OpenMMLab源码的各个部分,以帮助读者更好地理解这个强大的机器学习框架。











MMSegmentation基于PyTorch实现,它支持多种分割算法,例如FCN、Mask R-CNN等。










OpenHarmony 是华为开源的分布式操作系统,旨在为不同的物联网设备提供统一的操作系统底层支持。

本文将对 OpenHarmony 的代码进行一次深入的剖析,以期帮助读者更好地理解这一开源项目的内部结构和实现原理。

一、代码架构OpenHarmony 的代码架构主要包括内核层、系统服务层和应用框架层三部分。


在 OpenHarmony 的代码中,可以看到这三个层次的代码组织结构清晰,相互之间的功能划分明确,各个模块之间的依赖关系也得到了良好的管理和维护。

这种良好的代码架构为 OpenHarmony 提供了良好的可扩展性和可移植性,使得它可以轻松地适应不同类型的物联网设备。

二、内核层代码剖析在 OpenHarmony 的内核层代码中,最为关键的部分是硬件抽象层(HAL)和内核服务层(Kernel Service)。



在 HAL 部分的代码中,可以看到对各种硬件设备的抽象接口定义和实现,这些接口将各种硬件设备的底层操作进行了统一,为系统的可移植性提供了保障。

在 Kernel Service 部分的代码中,则包括了对进程管理、调度管理等基本系统调度和管理功能的实现,这些功能的稳定性和高效性对整个系统的性能和稳定性都有着至关重要的作用。



SUN T系列服务器产品介绍

SUN T系列服务器产品介绍
Sun UltraSPARC T系列服 务器介绍
王佃华 技术支持管理部
客户目前面临的问题 ULTRASPARC T1处理器介绍 ULTRASPARC T2处理器介绍

IBM p5+ p550 rx2660 Itanium II HP
SPECweb2005 SPECjappServer2004
性能Performance 16407/884 能耗Power Usage 340W

最好的性能、性能/电源功耗、和性能/机架高度 最佳的数据中心适用性,包括电源功耗和机架密度 最高的计算扩展性,包括多线程和高内存
Sun Fire/SPARC Enterprise T5120/T5220 新一代的UltraSparc T系列平台
GA 9 Nov. 2007
• 新浪选择12台Sun Fire T1000服务器运行Solaris 10和Sun Java System Directory Server,用以代替30台运行Linux的Dell Xeon 服务器 • • • • 增加5倍的服务器性能 只占原1/4的机架空间 只消耗元1/7的电源功耗 减少维护/电源/空调的成本

一个芯片上有8个核心 一个核心上有4个线程, 共32线程 一颗处理器相当于以往32 颗处理器 超过100项专利



一、OpenHarmony项目简介1. OpenHarmony是华为推出的面向全场景、开放原子的分布式操作系统,致力于构建开放、协作、共享的技术生态。

2. OpenHarmony的开源代码由华为公司贡献,基于Apache 2.0开源协议,旨在为全球开发者提供一个自由使用、修改和分发的操作系统评台。

二、OpenHarmony项目结构分析1. 内核层a. 内核层包括鸿蒙微内核和鸿蒙架构,提供了轻量级、高性能、低功耗的操作系统内核。

b. 鸿蒙微内核采用微内核架构,支持多内核调度和轻量级的全局锁管理,实现高效的资源调度和管理。

2. 核心服务层a. 核心服务层包括通用运行时服务框架、分布式软总线等,提供了设备驱动、内核服务和系统服务的支持。

b. 通用运行时服务框架提供了跨评台的应用框架和运行环境,可实现多设备、多场景的统一开发和部署。

3. 基础组件层a. 基础组件层包括基础设施、应用框架、媒体服务和图形引擎等,提供了多媒体、图形、安全等各类基础功能的支持。

b. 基础设施包括文件系统、网络协议栈、硬件抽象层等,实现了操作系统与硬件设备的无缝集成和互操作。

4. 应用框架层a. 应用框架层包括应用环境、交互框架、数据存储等,提供了多样化的应用支持和开发工具。

b. 应用环境包括应用容器、安全容器和容器引擎等,实现了不同应用的安全隔离和多租户共享。

三、OpenHarmony项目优势与特点1. 开放原子a. OpenHarmony支持多种架构和设备类型,包括物联网、智能设备、汽车、智能家居等,构建了一体化的开放原子评台。

b. 开放原子评台提供了统一的开发接口和标准规范,降低了开发者的学习成本和开发成本,实现了软硬件的无缝集成和互操作。

2. 分布式架构a. OpenHarmony基于鸿蒙微内核和架构,实现了轻量级分布式架构和高效的分布式通信。

b. 分布式架构支持多设备、多场景的协同工作和互联互通,实现了智能硬件的一体化管理和智能服务的统一部署。



openharmony的编译体系摘要:一、OpenHarmony 编译体系简介- 什么是OpenHarmony- OpenHarmony 的编译体系的作用二、OpenHarmony 编译体系详解- 编译体系的基本流程- 编译体系的主要组件- 编译工具链- 编译环境- 编译选项三、OpenHarmony 编译体系的应用- 在不同平台上的应用- 开发者如何使用OpenHarmony 编译体系四、OpenHarmony 编译体系的优缺点- 优点- 高效- 灵活- 可持续- 缺点- 复杂- 需要专业知识正文:一、OpenHarmony 编译体系简介OpenHarmony 是一款面向全场景的开源分布式操作系统,其目标是为各种智能终端提供统一的操作系统平台。

而OpenHarmony 的编译体系则是支撑其操作系统能够运行在不同硬件平台上的重要工具。

它负责将OpenHarmony 的源代码转换为可执行的二进制文件,从而实现在多种硬件平台上的运行。

二、OpenHarmony 编译体系详解OpenHarmony 的编译体系包含多个组件,共同构成了一个完整的编译流程。

1.编译工具链编译工具链是OpenHarmony 编译体系的核心部分,包括了编译器、链接器、调试器等工具。


2.编译环境编译环境是OpenHarmony 编译体系的基础,包括了编译器所需的运行环境、库文件以及头文件等。




三、OpenHarmony 编译体系的应用OpenHarmony 的编译体系可以应用于多种硬件平台,包括嵌入式设备、智能手机、平板电脑等。

OpenEmbedded是一个开源的嵌入式Linux软件框架,它旨在简化嵌入式系统的开发和定制化过程。





OpenEmbedded使用BitBake作为构建系统,并且使用一个名为Recipe 的文件来描述软件组件。








配置完成后,可以使用BitBake命令构建软件,例如bitbake <recipe>,其中<recipe>是要构建的Recipe 文件的名称。




OpenEmbedded是一个开源的嵌入式Linux发行版构建框架,它被广泛用于构建各种Linux发行版、嵌入式系统和IoT设备。



1. 什么是OpenEmbedded?OpenEmbedded是一个构建系统,可以用于构建嵌入式Linux发行版。



2. OpenEmbedded的主要特点- 灵活性:OpenEmbedded支持各种处理器架构和操作系统,可以根据需求进行定制。

- 可扩展性:OpenEmbedded提供了一套模块化的构建框架,可以方便地添加或移除软件包。

- 自动化:OpenEmbedded的构建过程是自动化的,可以节省开发者的时间和精力。

- 社区支持:OpenEmbedded有一个庞大的社区和活跃的开发者,可以提供支持和解决问题。

3. 参数配置在使用OpenEmbedded构建嵌入式系统时,有一些重要的参数需要配置。

下面是一些常见的配置参数:- MACHINE:指定目标机器的配置,包括处理器架构、硬件平台等。


- DISTRO:选择要构建的发行版。







开源鸿蒙OpenHarmony40Beta2发布多平台开发框架ArkUI-X首发8 月 7 日消息,开源鸿蒙近日在 gitee 发布了 OpenHarmony 4.0 Beta2 版本更新,带来了标准系统能力的持续完善。

此外,方舟开发框架 ArkUI-X 1.0.0 于 8 月 6 日迎来 Canary1 版本,主要能力范围包括:应用开发范式:支持基于 ArkTS 的声明式开发范式。

应用开发模型:支持 Stage 开发模型。

开发者工具:提供 DevEco Studio(IDE)和 ACE Tools(命令行)两种 ArkUI-X 应用构建工具。

混合开发能力:ArkTS 声明式开发范式和 Stage 模型支持集成在现有 iOS / Android 应用中,通过现有应用加载,解析和运行。

跨语言调用能力:提供 FFI(Node-API)和平台桥接两种机制,用于 API 扩展和平台插件开发。

基础测试调试:支持单元 / UI / XTS 集成测试和 ArkTS 断点调试。

据介绍,方舟开发框架(注:简称 ArkUI)为 OpenHarmony 应用的 UI 开发提供了完整的基础设施,包括简洁的 UI 语法、丰富的 UI 功能(组件、布局、动画以及交互事件),以及实时界面预览工具等,可以支持开发者进行可视化界面开发。

ArkUI-X 进一步将 ArkUI 扩展到了多个 OS 平台:目前支持 OpenHarmony、HarmonyOS、Android、 iOS,后续会逐步增加更多平台支持。


更新 ArkTS 卡片刷新方式,支持通过数据代理的方式刷新内容。

ArkTS 卡片支持静态卡片配置、静态图展示。

动画效果增强,包括:布局属性、背景图片大小位置属性、显隐属性支持隐式动效,list 支持 scrollToIndex 动效、Tabs 模糊动效、popup 出现 / 消失动效,支持自定义动画,满足开发者多种多样动画诉求。

英特尔发布至强可扩展Skylake处理器 采用28核心56线程设计

英特尔发布至强可扩展Skylake处理器 采用28核心56线程设计











Sun对Unix进行了扩展和改进, 推出了Solaris操作系统,具有更 好的安全性和稳定性。
Sun工作站支持各种Unix图形界 面,如X Window System,使 得Unix更加友好易用。
随着开源软件的兴起,Sun工作站将继续支持并推动Unix 与开源软件的结合。
Sun工作站利用Unix的强大网络功能,提供各种网 络服务,如FTP、SMTP、POP3等。
Unix下的各种开发工具和环境在Sun工作站 上同样适用,方便开发者进行软件开发。
Sun工作站针对Unix操作系统进 行了硬件优化,提供了高性能的 工作环境。
使用ifconfig和ip命令可以配置网络接口参数,包括 IP地址、子网掩码等。
使用ping和traceroute命令可以测试网络连接状态和 路径。
使用ssh和ftp命令可以连接到远程服务器,进行文件 传输和管理任务。
随着技术的不断进步,Sun工作站逐渐发展壮 大,成为高性能计算和网络服务器的代表产品 之一。
Sun工作站采用RISC架构, 具有高性能的计算能力和处 理速度,适用于需要大量计 算和数据处理的应用。
Sun工作站具备高度的稳定 性和可靠性,能够保证长时 间无故障运行,适用于需要 不间断运行的关键任务应用。






















DolphinScheduler 是一款分布式易扩展的可视化工作流任务调度系统,是一款非常适合用于企业级调度需求的开源项目。


对于 DolphinScheduler 的二次开发,可以基于其提供的接口和插件机制进行扩展和定制化,以满足特定的业务需求。

下面将简要介绍DolphinScheduler 的架构以及如何进行二次开发。

1. DolphinScheduler 的架构DolphinScheduler 采用分层架构,主要包含以下几个模块:- API 模块:提供 HTTP 接口,接受用户请求并将其转发给Scheduler 模块处理。

- Scheduler 模块:负责任务的调度和执行,包含调度器和执行器两部分。

- Master 模块:负责任务的管理,包括任务的创建、更新、删除等操作。

- Worker 模块:负责任务的实际执行,通过注册到 Master 上获取任务并执行。

- Alert 模块:用于任务执行状态的告警。


2. DolphinScheduler 的二次开发针对 DolphinScheduler 的二次开发,可以从以下几个方面入手:2.1.自定义任务类型DolphinScheduler 支持自定义任务类型,可以根据自己的需求进行扩展。

具体的步骤如下:- 实现 Task 接口并重写其中的方法,定义自己的任务逻辑。

- 在工程中创建一个名为 tasks 的子模块,并将实现的任务类型类放到此模块下。

- 在 project-ext.json 文件中配置自定义任务类型的信息,包括任务类型的标识、名称、描述等。

- 重新编译打包项目,将新的任务类型部署到 DolphinScheduler 中。

2.2.自定义插件DolphinScheduler 提供了插件机制,可以通过自定义插件来为系统增加一些特定的功能。

Sun将UltraSparc T2处理器开源

Sun将UltraSparc T2处理器开源

Sun将UltraSparc T2处理器开源
【摘要】日前,Sun将其UltraSparc T2处理器的硬件设计描述完全开放,理论

这一开源版本被称为OpenSparc T2,根据GPL2.0许可协议分发,之前早在2006年3月UltraSparc T1细节就已经公布。

1.新型UltraSPARC T2处理器 [J], Sun公司
2.Sun发布OpenSPARC项目,为突破性UltraSPARC T1处理器点燃新的开源社团 [J],
3.风河与Sun联合推出基于UltraSPARC T2处理器的电信级Linux平台 [J],
4.风河与Sun联合推出基于UltraSPARC T2处理器的电信级Linux平台 [J],
5.风河与Sun联合推出基于UltraSPARC(R)T2处理器的电信级Linux平台 [J],因版权原因,仅展示原文概要,查看原文内容请购买。



物联网操作系统牛逼,这个国产系统开源了小金子程序员掘金 2022-04-09 21:30哈喽,大家好,我是小金子。


最近小金子在逛GitHub 的时候发现一个不错的开源项目,项目名为rt-thread,这个项目目前在码云上收获了 3.2K star,小金子觉得不错,值得拿出来和大家分享下。

T-Thread是一个来自中国的开源物联网操作系统,它提供了非常强的可伸缩能力:从一个可以运行在ARM Cortex-M0芯片上的极小内核,到中等的ARM Cortex-M3/4/7系统,甚至是运行于MIPS32、ARM Cortex-A系列处理器上功能丰富系统RT-Thread是一个集实时操作系统(RTOS)内核、中间件组件的物联网操作系统,架构如下:这个项目可以说是一个小而美的国产开源系统了,虽然与Windows 无法相对,但是值得推广学习。

RT-Thread的特点•资源占用极低,超低功耗设计,最小内核(Nano版本)仅需1.2KB RAM,3KB Flash。






代码目录支持的 IDE 和编译器RT-Thread主要支持的IDE/编译器包括:MDK KEILIARGccRT-Thread Studio最后,想学习这个项目的可以查看项目地址: /rtthread/rt-thread。

M ultithreaded Application A cceleration withC hip Multithreading (CMt), M ultiCore/Multithreadu ltraSparC ® proCeSSorSUltraSPARC Processor ApplicationsWhite PaperAugust 2008Sun Microsystems, Inc.Table of ContentsIntroduction (3)Traditional single-threaded architectures have hit a wall (3)Multicore processors can still have memory latency issues (3)Chip multithreading (CMT) — a breakthrough solution (4)CMT enables more efficient use of microprocessor resources (4)Boosting telco infrastructure performance with no added cost (6)Virtualization at the chip level (7)Economy by simplification (7)Flexibility through threading (7)Near-linear crypto acceleration (8)Master and helper threads — microparallelism (8)CMT accelerates string searching (10)Fast, cost-effective virus protection (11)Eliminating the need for TCAMs (11)Conclusion (12)For more information (12)Disclosures (12)3 IntroductionSun Microsystems, Inc.Chapter 1IntroductionThe proliferation of billions of devices — ranging from 3G cell phones to netbook computers — means consumers increasingly demand instantaneous access to infor-mation, whether the latest news story or a sports video clip. To meet this burgeoning demand, the processors that power modern networks must deliver huge improve-ments in performance and throughput.Traditional single-threaded architectures have hit a wall Standard methods of increasing real-world application performance, such as higher clock speeds and pipeline branch prediction, have been yielding diminishing returns for the last few years. Higher-frequency processors waste a lot of power and generate so much heat that they need expensive HVAC systems to properly cool the systems in which they run.In addition, a critical gating factor is memory latency/speed of data access. Memory speed has grown at a much slower rate than processor speeds, because memory suppliers have focused on increasing density and lowering cost.Multicore processors can still have memory latency issues An early stage in the evolution of microprocessor design has been to group two or four conventional processor cores on a single physical die, creating a multicore processor. However, most current offerings simply replicate cores from existing single-threaded processor designs. This approach typically yields only slight improve-ments in aggregate performance, since it ignores key performance bottlenecks such as memory speed and hardware thread context switching. While multiple programs can be accommodated in parallel, the principal problem of cache misses and consequent stalling is not addressed.Chapter 2Chip multithreading (CMT) — a breakthrough solution Sun Microsystems was the first company to recognize that the speed of data access from memory was the critical bottleneck, and has overcome this problem with the chip multithreading (CMT) architecture that is the basis of the UltraSPARC® T1 and T2 processors.Figure 1. The UltraSPARC T2 processor uses two instruction pipelines per core, with four threads per pipeline, for a total of eight threads per core, and up to 64 threads per processor.Using Sun’s CMT architecture, applications are divided into active threads, and each processor core is designed to switch between up to eight threads on each clock cycle (UltraSPARC T2 processor). Even if a particular thread stalls while waiting for data to be available from memory, the core can switch immediately to another thread and the pipeline remains continuously active, doing useful work. The processor’s entire execution pipeline is thus optimized to execute active threads as much as possible, rather than be held up by any particular thread waiting for data to be available from memory. The negative effect of memory latency is therefore masked and minimized. CMT enables more efficient use of microprocessor resources The old prevailing philosophy of instruction-level parallelism (ILP) required complicated tactics, such as deep pipelines, large caches, speculative prefetches, and out-of-order execution. This resulted in complex, hot, and power-hungry processors.Core 8Core 1By contrast, thread-level parallelism (TLP) enables the use of much simpler pipelines that focus on scaling with threads rather than frequency. These simpler pipelines can process a large number of simultaneous threads, rather than running a single thread as quickly as possible. This approach is more congruent with the profile of present-day applications, which typically need to address high simultaneous user or transac-tion counts. An added advantage of the high pipeline utilization is that an efficient processor consumes less energy and consequently runs much cooler.Figure 2. Chip multithreading combines chip multiprocessing and fine-grained hardware multithreading.Fine-grained multithreading works in tandem with chip multiprocessing to result in a multiplier effect on application performance without a corresponding need for increased frequency and resultant heat generation. Since multiple instructions can execute at every clock cycle, the improvement in throughput and execution speed can be very high.CMTChip multiprocessingn = Core per processor FG-MT Fine-grain multithreading n = Strands (hardwarethreads) per core CMT Chip multithreading n x m = Threads per processorSun Microsystems, Inc.6 Boosting telco infratructure performance with no added costChapter 3Boosting telco infrastructure performancewith no added costA good example is provided in a set of benchmarks run recently at Continuous Computing, a global provider of integrated services that helps telecommunication equipment manufacturers deploy next-generation networks. Among the company’s key offerings is the Trillium software suite, a line of more than 60 standards-based telecommunication protocols. A key product among these is the Session Initiation Protocol (SIP), which forms the backbone for voice communications.In May 2008, Continuous Computing ran the multicore Trillium SIP on a server using an eight-core UltraSPARC T2 multithreaded processor and achieved a breakthrough 6,000 calls per second, more than twice as many as any other processor that they had previously tested. Most importantly, the application effectively utilized all 64 available threads to scale performance via higher CPU utilization, versus the addition of more CPUs. This equates to a company needing less than half its previous hardware footprint to serve the same number of customers, thereby reducing infrastructure and power costs while improving density.Calls per Messages per tCp Cpu udp CpuSecond Second utilization utilization(CpS) (sent plus received) (mpstat) (mpstat)500 6,000 6 61,000 12,000 11 121,500 18,000 17 182,000 24,000 23 242,500 30,000 29 303,000 36,000 35 363,500 42,000 42 424,000 48,000 49 484,500 54,000 56 555,000 60,000 63 625,500 66,000 70 696,000 72,000 78 76Figure 3. Continuous Computing’s benchmark of the Trillium SIP protocol on an UltraSPARC T2 processor-based Sun Netra™ T5220 server showed near-linear scalability as more threads were used, and processor utilization increased.1When an application is run as multiple threads as in the case of the Trillium SIP protocol discussed above, the effect is to scale almost linearly because of progres-sively higher utilization of the processor through more threads. Better performance is achieved through greater utilization efficiency of a single processor, as opposedto the cost and complexity of adding multiple processors.1. Source: /papers/multicoresip/.7 Virtualization at the chip levelSun Microsystems, Inc.Chapter 4Virtualization at the chip levelThe model underlying the UltraSPARC T1 and T2 processor architectures is that of fine-grained interleaved threading, where a new thread (or process) is launched on each clock cycle. Where previously there was only one running thread monopolizing the processor, now multiple threads can share its apparatus and features. The effect of multiple threads is thus to virtualize the processor, where the processor essential-ly can be considered as a group of individual, bootable machines, each with memory and I/O carved out.Economy by simplificationA positive secondary outcome is that what was formerly a deep process pipeline that had to do branch prediction in hardware to be efficient is now replaced by very short pipelines where the effect of a branch miss is resolved in two clock ticks. The savings on prediction-oriented hardware enable the use of fewer transistors and storage, and thus less power consumption.In current network devices, there might be separate chipsets designated for the control unit, for routing, and for media. This requires the knowledge of three differ-ent instruction sets — that of a microprocessor, a specialized network processor, and a digital signal processing chip — each made by a different vendor. Chip multi-threading enables individual threads in the same processor to be devoted to each of these functions, so the knowledge requirement for engineers can be reduced to one chip architecture.Flexibility through threadingThe ability to devote specific threads to certain functions makes the processor very flexible. For instance, one could designate two threads to processing, two to the network interface, and the rest to important features such as firewall and encryption. CMT especially shines in improving the implementation of some of these latter security functions.Chapter 5Near-linear crypto accelerationIn this age of wireless and mobile computing, encryption of packets across networks is becoming increasingly important. It is critical for privacy of VPNs, secure VoIP, and cellular 3G networks. Encryption also is fundamental for the protection of intellec-tual property in media gateways, and for securing file systems and databases. The traditional challenge is that it adds a lot of computational overhead, and results in a throttling of packet throughput. An example of an important encryption tool for 3G mobile telephony is the KASUMI algorithm.2Current software implementations of this algorithm on traditional processors use multiple lookup tables sized to all fit in their small Level 1 caches. This use of multiple small tables requires a significant number of arithmetic operations in order to access the various tables, manipulate the data, and recombine the results from them.For an optimized CMT implementation, one can combine many small lookup tables into a few large ones that can instead reside in Level 2 cache. Lookup tables are constant and can be precomputed. The net effect is that the number of instructions to access the data from the lookup table can be greatly reduced. Normally the trade-off is that this would result in a greater number of memory stalls; but this plays to the strength of CMT, because a memory stall simply results in the firing off of a new thread. It consumes much less of a processor core’s resources while it is stalled.As a result, as the number of threads is increased, performance scales almost lin-early. For the UltraSPARC T2 processor, per-core KASUMI performance is around eight times the performance of a single thread, and per-chip KASUMI performance is close to 64 times a single-strand performance.Master and helper threads — microparallelismAnother optimization technique that is well suited to CMT processors is microparallel-ism,3 or the division of small chunks of work between multiple threads. These helper threads are assigned to helping master threads in the task of rapidly processing performance-limiting serial components. It is the key to eliminating single-threaded performance limitations.2. KASUMI is a block cipher that forms the heart of the 3GPP confidentiality algorithm f8, and the 3GPP integrityalgorithm f9.3. “Multicore Processors and Microparallelism,” by Lawrence Spracklen, 3 April 2008, /publications/presentations/multicore-expo-2008-multicore-processors-and-microparallelism.html.In a CMT processor, the inter-thread latency is very low because the threads com-municate through Level- 2 cache. The time this takes is on the order of tens of clock ticks, as opposed to the hundreds of clock ticks that is characteristic of “bare-metal” latency, when communicating across processors. Furthermore, problems such as hot lock are overcome.Microparallelism means that one can think of adding threading to operations that have traditionally been considered single-threaded, such as traversing a linked list, B-Copy, and string location. By judicious use of helper threads, applications can be greatly sped up by the use of CMT across a wide range of applications in areas such as genomics, high-performance computing, and Web searches.Sun Microsystems, Inc.4. The Aho-Corasick algorithm is a string-searching algorithm. A kind of dictionary-matching algorithm, it locates elements of a finite set of strings (the “dictionary”) within an input text.5. See: IEEE Computer, Volume 41, Number 4, pp. 42–50, April 2008.6. See “Disclosures” for details.7. The OpenSolaris™ file /source/xref/onnv/onnv-gate/usr/src/cmd/spell/list.8. /users/bmcgin/kjv12.zip.10 CMT accelerates string searching Chapter 6CMT accelerates string searchingOne of the core functions needed for text identification algorithms in data repositories is real-time string searching. A popular procedure used in this space is the Aho-Corasick algorithm.4 Recently, Sun compared the results of an Aho-Corasick string searchusing an UltraSPARC T2 processor-based system (Sun SPARC® Enterprise T5220 server) to the published results 5 of a similar search using an IBM Cell Broadband Engine (Cell/BE) DD3 Blade.6To test the performance of their processor, IBM used a 4.4 MB variant of the King James Version of the Bible using a dictionary of the 20,000 most used words in the English language (average word length of 7.59 characters). To approximate the dictionary and Bible IBM used, Sun used a dictionary of 25,144 English words 7 (average word length: 8.22 characters) and a 4.6 MB variant of the King James Version of the Bible.8The results of the test are summarized in the following table:Systemprocessor throughput number of number of Frequency(gb/sec) processors Cores (ghz)Sun SPARC EnterpriseUltraSPARC T2 24.6 1 8 1.4 T5220 serverprocessor IBM CellIBM Cell 3.8 2 16 3.2Broadband EngineBroadband Engine DD3 Blade Figure 4. In a test of string searching, the UltraSPARC T2 processor outperformed the IBM Cell Broadband Engine in throughput and efficiency.Comparing the systems, the throughput of the single-processor, UltraSPARC T2-based Sun SPARC Enterprise T5220 server was 6.5 times greater than the IBM system, which used twice as many processors and cores, running at over twice the clock speed.This also demonstrates the huge advantage that is obtained from rapid access to the large, shared Level 2 cache in the UltraSPARC T2 processor.11 CMT accelerates string searchingSun Microsystems, Inc.In addition to its throughput superiority, the UltraSPARC T2’s multicore/multithreaded architecture was much easier to code for than the IBM Cell Broadband Engine. IBM’s article cites numerous, elaborate optimizations that were required to achieve its results. By contrast, Sun implemented the Aho-Corasick algorithm using ANSI C and a simple compilation — no special optimizations of the algorithm were requiredto achieve the reported performance. So the UltraSPARC T2 processor not only delivered superior throughput, but did so while saving significant time and effortin software development.Fast, cost-effective virus protectionA practical application of this algorithm is in screening for computer viruses. Each virus has a certain signature — the “words” by which it can be identified. Virus recogni-tion requires deep packet inspection, that is, a detailed examination of every byte of every packet received. So a real-time method of screening for viruses is necessary. CMT offers a simple solution. One could build a dictionary starting with the particu-lar sequences of bytes that characterize a virus, and add to the dictionary all such words for all relevant viruses. The dictionary can then be searched using the text of incoming packets for the presence of these words.Eliminating the need for TCAMsTypically, virus signatures are stored in ternary content addressable memory (TCAM). This is a specialized type of memory that is designed for searching in a single lookup operation but which is very expensive to add on to a network device. By being able to scan and identify viruses in real time in software itself, a designer using the UltraSPARC T2 processor can avoid the additional cost and complexity of both the TCAM interface and the additional memory.12 ConclusionSun Microsystems, Inc.Chapter 7ConclusionAs the preceding benchmarks show, the combination of multithreaded softwarerunning on a multicore/multithreaded, CMT processor results in a huge increase inapplication performance, with no additional hardware. The ability to designate thefunctions of different threads on the fly enables tremendous flexibility and customiz-ability, as well as radical levels of consolidation. The increase in performance affordedby better processor utilization — rather than the addition of more processors — alsoreduces time to market, by simplifying design and development, and it enablessmaller form factors. CMT technology-based devices can draw less power, while atthe same time being compact and powerful enough to handle the most complex real-time networking tasks. Sun’s CMT, multicore/multithreaded processor architecture isthe best option for system designers building embedded network infrastructure.For additional information on how UltraSPARC processorscan help with application throughput and more efficient utilizationof compute resources, go to /microelectronics.DisclosuresPattern matching benchmark: Sun SPARC Enterprise T5220 server (1x 1.4 GHz UltraSPARC T2 processor,one chip, eight cores); the Solaris™ 10 Operating System (OS); Sun C 5.9, 21.7 GB/secIBM Cell Broadband Engine (Cell/BE) DD3 Blade (2x 3.2 GHz Cell Broadband Engine, two chips, 16 cores);Linux kernel v2.6.16; IBM CBE Software Development Kit v2.1, 3.8 GB/secIBM results were obtained from: Figure 7(d) of IEEE Computer, Volume 41, Number 4, pp. 42–50,April 2008. Sun benchmark results as of 8/01/2008.Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 USA Phone 1-650-960-1300 or 1-800-555-9SUN (9786) Web © 2008 Sun Microsystems, Inc. All rights reserved. Sun, Sun Microsystems, the Sun logo, OpenSolaris, Solaris, and Sun Netra are trademarks or registered trademarks of Sun Microsystems, Inc., or its subsidiaries in the United States and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. Information subject to change without notice. SunWIN #541622 Lit. #SYWP14470-0 8/08。
