NAO机器人__一种类人机器人的语音交互与软件设计

合集下载

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

HUNAN UNIVERSITY 毕业论文

论文题目一种类人机器人的语音交互与

软件设计

学生姓名陈明

学生学号201208070103

专业班级智能科学与技术1班

学院名称信息科学与工程学院

指导老师李仁发

学院院长李仁发

2016年 6月1日

一种类人机器人的语音交互与软件设计

摘要

本文阐述了利用NAO机器人进行语音识别研究并涉及了机器人相关的常见行为交互。语音识别技术是一门涉及了语音学、声学、语言学、信号处理、人工智能等多学科的综合性技术，目前其应用越来越广。NAO机器人作为标准机器人平台应用在比赛、教育、科研等方方面面，基于NAO机器人进行相关科研是符合时代趋势与研究趋势所在。

论文前面部分介绍了语音识别领域的基础方法与知识，并且简要介绍了NAO机器人的结构和功能。在理论部分，第3章介绍了GMM-HMM，即高斯混合模型-隐马尔科夫模型的理论知识。这两个模型在实验中都应用到了语音识别中。

在语音识别的实验部分，通过对由NAO机器人捕获的音频流进行处理操作：音频分轨、滤波、分帧、加Hamming窗函数、语音特征提取、对样本音频流进行机器学习训练等等。完成必要的处理后，处理结果将会由本地计算机的matlab客户端传到NAO机器人控制软件Choregraphe的服务器端。机器人将会根据识别的传回的结果做出相应的行为。

论文除了进行语音识别的研究外，还对NAO机器人进行定向运动、多任务并行的舞蹈、给定话题下的交流这些行为交互功能进行了设计。定向运动能够使机器人运动具体的角度和旋转方向。多任务并行的舞蹈的设计实际上是把多种任务聚合在behavior层，这些任务包括：头、足、手臂的分帧运动设计，LEDs灯组的颜色变化以及根据Aldebaran Robotics公司的官方文档中QiChat Syntax部分进行给定话题下的对话设计。

总的来说，本项目设计和论文的撰写包含语音识别和行为交互设计两大部分。语音识别是通过NAO机器人捕获目标音频流并通过ftp传入本地计算机继续处理。行为设计是在NAO机器人的顶层控制软件Choregraphe中进行多种行为的设计，这些行为中的特定行为将会依据语音识别的结果被触发，成功完成规定的设计任务。

关键词：高斯混合模型；隐马尔可夫模型；定向运动；多任务并行舞蹈；音频流分帧；窗函数；语音特征提取；TCP/IP通信

I

An approach of speech interaction and software design for

humanoid robots

Abstract

The essay illustrates the research on speech recognition and the usual behavior interactions on basis of a NAO robot. Speech recognition is a kind of comprehensive technique concerning Acoustics, Phonetics, Linguistics, Signal Processing and Artificial Intelligence, etc. Currently, the techniques of Speech Recognition are widely applied into an increasingly number of fields. As the standard platform in a variety of areas, such as competitions, elementary and tertiary education, scientific research, NAO robots are of significant importance in terms of doing studies, which is in accordance with the mainstream research.

At the beginning of the essay, basic approaches and knowledge related to Speech Recognition are briefly discussed, followed by the introduction of structure and functions about NAO robots. As for the part of the applied theories, GMM (Gaussian Mixture Model) and HMM (Hidden Markov Model) are main points in chapter 3. Both these theories would be applied in my research experiment.

In the part of practical experiments on Speech Recognition, capturing the audio stream is the initial operation for NAO robot, after which the target stream would be downloaded by local computer through ftp commands in matlab command window. Then the processing of the target audio stream is supposed to be divided into a series of operations, concerning separating audio tracks (4 tracks are captured by NAO’s microphones), filtering audio wave, framing target audio, adding Hamming window function, extracting features of the audio, using machine learning methods for training audio data set. After the above indispensable steps, the processed result of the target audio stream would be transferred to NAO robot’s socket sever in Choregraphe through TCP/IP communication protocols. As long as the result is transferred to NAO, the robot would begin to do the pre-designed behavior in accordance with the result.

In addition to the research of speech recognition, multiple behaviors of NAO robot are studied and designed as well. The designed behaviors involve orientation-moving, multi-task dancing and talking on a given topic. The behavior of orientation-moving enables the robot to move along a specific direction with accurate distance and angle of rotation. Multi-tasking dancing is about the concept of parallel processing and it needs the convergence of the

II