Gesture Learning Based On manifold learning

合集下载

Introduction to Artificial Intelli智慧树知到课后章节答案2023年

Introduction to Artificial Intelli智慧树知到课后章节答案2023年

Introduction to Artificial Intelligence智慧树知到课后章节答案2023年下哈尔滨工程大学哈尔滨工程大学第一章测试1.All life has intelligence The following statements about intelligence arewrong()A:All life has intelligence B:Bacteria do not have intelligence C:At present,human intelligence is the highest level of nature D:From the perspective of life, intelligence is the basic ability of life to adapt to the natural world答案:Bacteria do not have intelligence2.Which of the following techniques is unsupervised learning in artificialintelligence?()A:Neural network B:Support vector machine C:Decision tree D:Clustering答案:Clustering3.To which period can the history of the development of artificial intelligencebe traced back?()A:1970s B:Late 19th century C:Early 21st century D:1950s答案:Late 19th century4.Which of the following fields does not belong to the scope of artificialintelligence application?()A:Aviation B:Medical C:Agriculture D:Finance答案:Aviation5.The first artificial neuron model in human history was the MP model,proposed by Hebb.()A:对 B:错答案:错6.Big data will bring considerable value in government public services, medicalservices, retail, manufacturing, and personal location services. ()A:错 B:对答案:对第二章测试1.Which of the following options is not human reason:()A:Value rationality B:Intellectual rationality C:Methodological rationalityD:Cognitive rationality答案:Intellectual rationality2.When did life begin? ()A:Between 10 billion and 4.5 billion years B:Between 13.8 billion years and10 billion years C:Between 4.5 billion and 3.5 billion years D:Before 13.8billion years答案:Between 4.5 billion and 3.5 billion years3.Which of the following statements is true regarding the philosophicalthinking about artificial intelligence?()A:Philosophical thinking has hindered the progress of artificial intelligence.B:Philosophical thinking has contributed to the development of artificialintelligence. C:Philosophical thinking is only concerned with the ethicalimplications of artificial intelligence. D:Philosophical thinking has no impact on the development of artificial intelligence.答案:Philosophical thinking has contributed to the development ofartificial intelligence.4.What is the rational nature of artificial intelligence?()A:The ability to communicate effectively with humans. B:The ability to feel emotions and express creativity. C:The ability to reason and make logicaldeductions. D:The ability to learn from experience and adapt to newsituations.答案:The ability to reason and make logical deductions.5.Which of the following statements is true regarding the rational nature ofartificial intelligence?()A:The rational nature of artificial intelligence includes emotional intelligence.B:The rational nature of artificial intelligence is limited to logical reasoning.C:The rational nature of artificial intelligence is not important for itsdevelopment. D:The rational nature of artificial intelligence is only concerned with mathematical calculations.答案:The rational nature of artificial intelligence is limited to logicalreasoning.6.Connectionism believes that the basic element of human thinking is symbol,not neuron; Human's cognitive process is a self-organization process ofsymbol operation rather than weight. ()A:对 B:错答案:错第三章测试1.The brain of all organisms can be divided into three primitive parts:forebrain, midbrain and hindbrain. Specifically, the human brain is composed of brainstem, cerebellum and brain (forebrain). ()A:错 B:对答案:对2.The neural connections in the brain are chaotic. ()A:对 B:错答案:错3.The following statement about the left and right half of the brain and itsfunction is wrong ().A:When dictating questions, the left brain is responsible for logical thinking,and the right brain is responsible for language description. B:The left brain is like a scientist, good at abstract thinking and complex calculation, but lacking rich emotion. C:The right brain is like an artist, creative in music, art andother artistic activities, and rich in emotion D:The left and right hemispheres of the brain have the same shape, but their functions are quite different. They are generally called the left brain and the right brain respectively.答案:When dictating questions, the left brain is responsible for logicalthinking, and the right brain is responsible for language description.4.What is the basic unit of the nervous system?()A:Neuron B:Gene C:Atom D:Molecule答案:Neuron5.What is the role of the prefrontal cortex in cognitive functions?()A:It is responsible for sensory processing. B:It is involved in emotionalprocessing. C:It is responsible for higher-level cognitive functions. D:It isinvolved in motor control.答案:It is responsible for higher-level cognitive functions.6.What is the definition of intelligence?()A:The ability to communicate effectively. B:The ability to perform physicaltasks. C:The ability to acquire and apply knowledge and skills. D:The abilityto regulate emotions.答案:The ability to acquire and apply knowledge and skills.第四章测试1.The forward propagation neural network is based on the mathematicalmodel of neurons and is composed of neurons connected together by specific connection methods. Different artificial neural networks generally havedifferent structures, but the basis is still the mathematical model of neurons.()A:对 B:错答案:对2.In the perceptron, the weights are adjusted by learning so that the networkcan get the desired output for any input. ()A:对 B:错答案:对3.Convolution neural network is a feedforward neural network, which hasmany advantages and has excellent performance for large image processing.Among the following options, the advantage of convolution neural network is().A:Implicit learning avoids explicit feature extraction B:Weight sharingC:Translation invariance D:Strong robustness答案:Implicit learning avoids explicit feature extraction;Weightsharing;Strong robustness4.In a feedforward neural network, information travels in which direction?()A:Forward B:Both A and B C:None of the above D:Backward答案:Forward5.What is the main feature of a convolutional neural network?()A:They are used for speech recognition. B:They are used for natural languageprocessing. C:They are used for reinforcement learning. D:They are used forimage recognition.答案:They are used for image recognition.6.Which of the following is a characteristic of deep neural networks?()A:They require less training data than shallow neural networks. B:They havefewer hidden layers than shallow neural networks. C:They have loweraccuracy than shallow neural networks. D:They are more computationallyexpensive than shallow neural networks.答案:They are more computationally expensive than shallow neuralnetworks.第五章测试1.Machine learning refers to how the computer simulates or realizes humanlearning behavior to obtain new knowledge or skills, and reorganizes the existing knowledge structure to continuously improve its own performance.()A:对 B:错答案:对2.The best decision sequence of Markov decision process is solved by Bellmanequation, and the value of each state is determined not only by the current state but also by the later state.()A:对 B:错答案:对3.Alex Net's contributions to this work include: ().A:Use GPUNVIDIAGTX580 to reduce the training time B:Use the modified linear unit (Re LU) as the nonlinear activation function C:Cover the larger pool to avoid the average effect of average pool D:Use the Dropouttechnology to selectively ignore the single neuron during training to avoid over-fitting the model答案:Use GPUNVIDIAGTX580 to reduce the training time;Use themodified linear unit (Re LU) as the nonlinear activation function;Cover the larger pool to avoid the average effect of average pool;Use theDropout technology to selectively ignore the single neuron duringtraining to avoid over-fitting the model4.In supervised learning, what is the role of the labeled data?()A:To evaluate the model B:To train the model C:None of the above D:To test the model答案:To train the model5.In reinforcement learning, what is the goal of the agent?()A:To identify patterns in input data B:To minimize the error between thepredicted and actual output C:To maximize the reward obtained from theenvironment D:To classify input data into different categories答案:To maximize the reward obtained from the environment6.Which of the following is a characteristic of transfer learning?()A:It can only be used for supervised learning tasks B:It requires a largeamount of labeled data C:It involves transferring knowledge from onedomain to another D:It is only applicable to small-scale problems答案:It involves transferring knowledge from one domain to another第六章测试1.Image segmentation is the technology and process of dividing an image intoseveral specific regions with unique properties and proposing objects ofinterest. In the following statement about image segmentation algorithm, the error is ().A:Region growth method is to complete the segmentation by calculating the mean vector of the offset. B:Watershed algorithm, MeanShift segmentation,region growth and Ostu threshold segmentation can complete imagesegmentation. C:Watershed algorithm is often used to segment the objectsconnected in the image. D:Otsu threshold segmentation, also known as themaximum between-class difference method, realizes the automatic selection of global threshold T by counting the histogram characteristics of the entire image答案:Region growth method is to complete the segmentation bycalculating the mean vector of the offset.2.Camera calibration is a key step when using machine vision to measureobjects. Its calibration accuracy will directly affect the measurementaccuracy. Among them, camera calibration generally involves the mutualconversion of object point coordinates in several coordinate systems. So,what coordinate systems do you mean by "several coordinate systems" here?()A:Image coordinate system B:Image plane coordinate system C:Cameracoordinate system D:World coordinate system答案:Image coordinate system;Image plane coordinate system;Camera coordinate system;World coordinate systemmonly used digital image filtering methods:().A:bilateral filtering B:median filter C:mean filtering D:Gaussian filter答案:bilateral filtering;median filter;mean filtering;Gaussian filter4.Application areas of digital image processing include:()A:Industrial inspection B:Biomedical Science C:Scenario simulation D:remote sensing答案:Industrial inspection;Biomedical Science5.Image segmentation is the technology and process of dividing an image intoseveral specific regions with unique properties and proposing objects ofinterest. In the following statement about image segmentation algorithm, the error is ( ).A:Otsu threshold segmentation, also known as the maximum between-class difference method, realizes the automatic selection of global threshold T by counting the histogram characteristics of the entire imageB: Watershed algorithm is often used to segment the objects connected in the image. C:Region growth method is to complete the segmentation bycalculating the mean vector of the offset. D:Watershed algorithm, MeanShift segmentation, region growth and Ostu threshold segmentation can complete image segmentation.答案:Region growth method is to complete the segmentation bycalculating the mean vector of the offset.第七章测试1.Blind search can be applied to many different search problems, but it has notbeen widely used due to its low efficiency.()A:错 B:对答案:对2.Which of the following search methods uses a FIFO queue ().A:width-first search B:random search C:depth-first search D:generation-test method答案:width-first search3.What causes the complexity of the semantic network ().A:There is no recognized formal representation system B:The quantifiernetwork is inadequate C:The means of knowledge representation are diverse D:The relationship between nodes can be linear, nonlinear, or even recursive 答案:The means of knowledge representation are diverse;Therelationship between nodes can be linear, nonlinear, or even recursive4.In the knowledge graph taking Leonardo da Vinci as an example, the entity ofthe character represents a node, and the relationship between the artist and the character represents an edge. Search is the process of finding the actionsequence of an intelligent system.()A:对 B:错答案:对5.Which of the following statements about common methods of path search iswrong()A:When using the artificial potential field method, when there are someobstacles in any distance around the target point, it is easy to cause the path to be unreachable B:The A* algorithm occupies too much memory during the search, the search efficiency is reduced, and the optimal result cannot beguaranteed C:The artificial potential field method can quickly search for acollision-free path with strong flexibility D:A* algorithm can solve theshortest path of state space search答案:When using the artificial potential field method, when there aresome obstacles in any distance around the target point, it is easy tocause the path to be unreachable第八章测试1.The language, spoken language, written language, sign language and Pythonlanguage of human communication are all natural languages.()A:对 B:错答案:错2.The following statement about machine translation is wrong ().A:The analysis stage of machine translation is mainly lexical analysis andpragmatic analysis B:The essence of machine translation is the discovery and application of bilingual translation laws. C:The four stages of machinetranslation are retrieval, analysis, conversion and generation. D:At present,natural language machine translation generally takes sentences as thetranslation unit.答案:The analysis stage of machine translation is mainly lexical analysis and pragmatic analysis3.Which of the following fields does machine translation belong to? ()A:Expert system B:Machine learning C:Human sensory simulation D:Natural language system答案:Natural language system4.The following statements about language are wrong: ()。

人手抓持识别与灵巧手的抓持规划

人手抓持识别与灵巧手的抓持规划

人手抓持识别与灵巧手的抓持规划*GRASP IDENTIFICATION OF HUMAN HANDS AND GRASP PLANNING OF DEXTEROUS HANDS李继婷,张玉茹,张启先北京航空航天大学机器人研究所摘要本文研究灵巧手采用指尖抓持方式时的抓持规划方法。

在相同的操作环境和操作对象下,由人手决定抓持接触点的位置,利用人手运动测量装置测量人手抓持位置,通过一定的映射关系将其转换为灵巧手的抓持位置及其掌系的位姿,再根据灵巧手自身的结构通过运动学反解确定其抓持位形。

关键词:抓持规划,人手运动识别,灵巧手Key words: Grasp planning, identification of human hand motion, Dexterous hand0 前言灵巧手的主要功能是抓持和操作物体,抓持是操作的基础。

灵巧手在什么位置以及采用什么方式抓持物体,是抓持规划必须解决的问题。

抓持规划的理想目标是实现自主规划,这需要对抓持任务、被抓持物体形状、灵巧手等诸多抓持因素进行建模。

然而,在影响抓持规划的众多因素中,有些很难用准确的数学模型描述,如抓持任务、某些形状复杂的物体、预抓持位形和不同抓持类型间的过渡。

另一方面,从应用的角度看,要实现对任意物体的抓取,对每一种被操作的物体都建立数学模型显然是不切实际的。

为了有效地解决灵巧手的抓持规划问题,目前一些研究者采用了一种新的向人手学习的主从操作策略,核心思想是将人纳入控制系统回路,由人作规划,由灵巧手实施运动,从而使人的智能与机器人的机械运动能力有效结合。

例如,美国国家航空航天局(NASA)所属JPL实验室通过手上佩带式运动测量装置,将人手关节的运动映射成灵巧手关节的运动[1]。

NASA的Johnson 空间中心通过在小臂处测量人手运动的肌电信号进行人手与灵巧手之间的运动映射[2]。

德国宇航研究中心(DLR)的机器人与系统动力学研究所采用数据手套进行运动映射[3]。

基于机器视觉的认知康复机器人系统设计

基于机器视觉的认知康复机器人系统设计
encounter difficulties, firstly, based on machine vision, highlight hint will be carried out
through the interface. If the patient is still unable to complete the task, the robotic arm
image information from time domain to frequency domain, and target detection is
accomplished by comparing the descriptors of reference objects and objects to be tested.
了完整的辅助康复系统,并且针对所用的六自由度机械臂进行了运动学建模和分
析,分析了机械臂的运动和避障策略。
最后,在康复系统样机中,通过模拟病人进行了完整的测试。仿真和实际测
试的结果充分的展示了辅助康复策略的可行性以及该辅助康复机器人系统的高效
性。
关键词:认知康复;机器视觉;傅里叶描述算子;模长位移算法;机械臂
been proved that, to a certain extent, MCI can be slowed down or even cured by human
intervention. However, due to the lack and uneven distribution of related medical
康复过程往往不及时不到位,缺乏持续性和有效性。针对此,本课题设计了一套
基于机器视觉的辅助认知康复机器人系统,用以提高 MCI 等疾病的康复效率。

模拟ai英文面试题目及答案

模拟ai英文面试题目及答案

模拟ai英文面试题目及答案模拟AI英文面试题目及答案1. 题目: What is the difference between a neural network anda deep learning model?答案: A neural network is a set of algorithms modeled loosely after the human brain that are designed to recognize patterns. A deep learning model is a neural network with multiple layers, allowing it to learn more complex patterns and features from data.2. 题目: Explain the concept of 'overfitting' in machine learning.答案: Overfitting occurs when a machine learning model learns the training data too well, including its noise and outliers, resulting in poor generalization to new, unseen data.3. 题目: What is the role of a 'bias' in an AI model?答案: Bias in an AI model refers to the systematic errors introduced by the model during the learning process. It can be due to the choice of model, the training data, or the algorithm's assumptions, and it can lead to unfair or inaccurate predictions.4. 题目: Describe the importance of data preprocessing in AI.答案: Data preprocessing is crucial in AI as it involves cleaning, transforming, and reducing the data to a suitableformat for the model to learn effectively. Proper preprocessing can significantly improve the performance of AI models by ensuring that the input data is relevant, accurate, and free from noise.5. 题目: How does reinforcement learning differ from supervised learning?答案: Reinforcement learning is a type of machine learning where an agent learns to make decisions by performing actions in an environment to maximize a reward signal. It differs from supervised learning, where the model learns from labeled data to predict outcomes based on input features.6. 题目: What is the purpose of a 'convolutional neural network' (CNN)?答案: A convolutional neural network (CNN) is a type of deep learning model that is particularly effective for processing data with a grid-like topology, such as images. CNNs use convolutional layers to automatically and adaptively learn spatial hierarchies of features from input images.7. 题目: Explain the concept of 'feature extraction' in AI.答案: Feature extraction in AI is the process of identifying and extracting relevant pieces of information from the raw data. It is a crucial step in many machine learning algorithms, as it helps to reduce the dimensionality of the data and to focus on the most informative aspects that can be used to make predictions or classifications.8. 题目: What is the significance of 'gradient descent' in training AI models?答案: Gradient descent is an optimization algorithm used to minimize a function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In the context of AI, it is used to minimize the loss function of a model, thus refining the model's parameters to improve its accuracy.9. 题目: How does 'transfer learning' work in AI?答案: Transfer learning is a technique where a pre-trained model is used as the starting point for learning a new task. It leverages the knowledge gained from one problem to improve performance on a different but related problem, reducing the need for large amounts of labeled data and computational resources.10. 题目: What is the role of 'regularization' in preventing overfitting?答案: Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function, which discourages overly complex models. It helps to control the model's capacity, forcing it to generalize better to new data by not fitting too closely to the training data.。

融合手势全局运动和手指局部运动的动态手势识别

融合手势全局运动和手指局部运动的动态手势识别

第32卷第9期计算机辅助设计与图形学学报Vol.32No.9 2020年9月Journal of Computer-Aided Design & Computer Graphics Sept. 2020融合手势全局运动和手指局部运动的动态手势识别缪永伟, 李佳颖, 孙树森(浙江理工大学信息学院杭州 310018)(***************.cn)摘要: 传统基于手部轮廓或手部运动轨迹的动态手势识别方法, 其提取的特征通常难以准确表示动态手势之间的区别. 针对动态手势的复杂时序、空间可变性、特征表示不准确等问题, 提出一种融合手势全局运动和手指局部运动的手势识别方法. 首先进行动态手势数据预处理, 包括去除手势无效帧、手势帧数据补全和关节长度归一化; 然后根据给定的手部关节坐标, 利用手势距离函数分段提取动态手势关键帧, 并基于手势关键帧提取手在空间中的全局运动特征和手内部手指的局部运动特征; 其次融合手势全局运动和手指局部运动的关键帧手势特征, 并采用线性判别分析进行特征降维; 最后利用带高斯核的支持向量机实现动态手势识别与分类. 对DHG-14/28动态手势数据集中14类手势和28类手势数据集进行实验, 其分类识别准确率分别为98.57%和88.29%, 比现有方法分别提高11.27%和4.89%. 实验结果表明, 该方法能准确地表征动态手势并进行手势识别.关键词: 动态手势识别; 手势全局运动; 手指局部运动; 关键帧; 线性判别分析; 支持向量机中图法分类号: TP391. 41 DOI: 10.3724/SP.J.1089.2020.18126Dynamic Gesture Recognition Combining Global Gesture Motion and Local Finger MotionMiao Yongwei, Li Jiaying, and Sun Shusen(College of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018)Abstract: Traditional gesture recognition methods always focus on hand contours or hand movement track, and the extracted gesture features are often difficult to represent the difference between dynamic gestures accurately. To overcome the issues of complex time series, the spatial variability and inaccurate feature rep-resentation of different dynamic gestures, a novel dynamic gesture recognition method is proposed here by combining global gesture motion and local finger motion. Firstly, based on the given hand joint positions, several data pre-processing steps are performed for dynamic gesture data, such as removing of the invalid gesture frames, completing the gesture frames, and the normalization of joint lengths for different gestures.Secondly, the key gesture frames will be extracted according to the distance function defined by the differ-ence of hand translation and rotation, fused by the difference of panning and rotating of fingers. Meanwhile, according to the extracted key gesture frames, the gesture features of global gesture motion and local finger motion can be calculated. Finally, by combining the extracted gesture features, dynamic hand gestures can be classified and recognized using linear discriminant analysis (LDA) and Gaussian kernel based SVM. The proposed method has been evaluated on the DHG-14/28 datasets, which includes 14 kinds of gestures and 28 kinds of gestures. And the accuracy of hand gesture recognition is 98.57% and 88.29% respectively, which is收稿日期: 2019-10-24; 修回日期: 2020-03-28. 基金项目: 国家自然科学基金(61972458); 浙江理工大学科研基金(17032001-Y).缪永伟(1971—), 男, 博士, 教授, 博士生导师, CCF杰出会员, 主要研究方向为计算机图形学、数字几何处理、计算机视觉、机器学习; 李佳颖(1995—), 女, 硕士研究生, 主要研究方向为计算机图形学、机器学习; 孙树森(1975—), 男, 博士, 副教授, 主要研究方向为计算机图形学、虚拟现实.第9期缪永伟, 等: 融合手势全局运动和手指局部运动的动态手势识别 149311.27% and 4.89% higher than the existing methods. Experimental results demonstrate that our method can represent the difference between dynamic hand gestures accurately and recognize them effectively.Key words: dynamic gesture recognition; global gesture motion; local finger motion; key frame; linear discrimi-nant analysis; support vector machine作为计算机图形学、虚拟现实、人机交互和手语翻译等领域的一种重要交互模式, 手势交互提供了一种简单便捷的交互体验[1]. 根据手势是否具有时序性, 可以将手势分为静态手势和动态手势2类[2], 静态手势顾名思义指的是单帧静止的手势, 而动态手势指的是一段时间内连续的多帧手势. 相比于静态手势, 动态手势由于不仅需要关注手部手形的变化, 还要关注手指在时间、空间中的运动而变得难以准确识别[3]. 通常, 复杂动态手势的运动规律具有以下3个明显特点: (1) 时间的可变性. 动态手势的运动速度不确定, 对于相同的手势, 不同的人可以用不同的速度来完成; 即使是同一个人, 每次的完成速度也并不一样. (2) 手势完整性的可变性. 在许多情况下, 与系统预先定义的手势相比, 用户/操作员的手势是不完整的或冗余的. (3) 空间的可变性. 手势的运动空间和运动距离是不同的, 不同的人所做相同手势的距离和范围也总是不同的. 这些特点将导致难以准确表征不同动态手势的特征. 动态手势的复杂时序、空间可变性、特征表示不准确等问题, 给动态手势的识别和分类带来困难与挑战[2].许多动态手势识别的工作都是基于RGB图像、深度图像、光流信息或手势轨迹[4]. Simonyan 等[4]利用双数据流特征进行动态手势分类, 其中一个数据流利用静态的RGB图像进行分类, 而另一个数据流利用光流和轨迹信息. RGB图像信息中包含了单帧手势的局部特征信息, 光流和轨迹信息中包含了手势的全局特征信息, 但是该方法并没有将2个数据流的特征相结合, 仅仅是分开使用2个数据流. 本文考虑手势全局运动特征和手内部手指局部运动特征, 并将融合2个特征进行动态手势识别和分类. 基于手势图像, Molchanov等[5]采用联接时间分类(connectionist temporal classifica-tion, CTC)方法解决动态手势时序问题, 但是该方法具有条件独立性, 假设不同时间帧的输出之间是独立的, 对于动态手势序列而言, 手势序列是具有时间空间连续性的, 该假设并不符合动态手势运动.最近受益于Intel real sense, Microsoft Kinect, OpenPose等硬件设备的广泛使用以及高精度手部跟踪方法的发展, 使得人们很容易获取高精度的手部骨架数据. 实际上, 手部骨骼的运动通常能准确反映不同动态手势的特征差异[3,6]. 基于手部关节点坐标输入, 针对动态手势时间的可变性和手势完整性的可变性等问题, 本文首先提出动态手势关键帧的有效提取方法, 从而去除不同动态手势中的冗余帧, 并将不同长度的动态手势视频统一到同一长度; 然后基于动态手势关键帧, 将手势运动特征表征为手部在空间中的全局运动和手内部手指的局部运动, 并融合2类特征进行降维; 最后利用带高斯核的支持向量机(support vector ma-chine, SVM)实现有效的动态手势识别. 本文提出了一种动态手势特征表示, 该表示能够有效表征动态手势的运动特征, 并为手势准确识别奠定了基础.1 相关工作动态手势的时空信息特征处理是动态手势识别与分类的关键和难点[2]. 动态手势识别大致可以分为传统手工特征提取方法和深度学习方法等.针对动态手势的传统手工特征提取方法, 大多采用动态时间规划(dynamic time warping, DTW)[7-8]、傅里叶时间金字塔[9]、隐马尔可夫模型(hidden Markov models, HMM)[10]等解决动态手势的时空信息处理问题. 其中DTW方法[7-8]采用两两对比的策略来规整时间信息, 该方法依赖于一个标准手势版本进行对比, 但是在手势数据集中并没有这个标准版本可供对比, 只能人为设定标准手势; 傅里叶时间金字塔方法[9]采用将完整的手势帧进行分段提取的方式来处理动态手势的时空信息特征; HMM则认为动态事件的下一状态只与上一状态有关, 与之前的状态都没有关系[10], 其忽略了动态手势的连贯性.针对动态手势识别的深度学习方法往往利用HMMs[10]、长短期时间记忆(long short-term mem-1494计算机辅助设计与图形学学报 第32卷ory, LSTM)[11-12]、广义时间规划(generalized time warping, GTW)[13], DTW [7-8]、空间金字塔池化(spatial pyramid pooling, SPP)[14]等解决时空信息处理问题. Wu 等[15]使用HMMs, 结合深度置信网络和卷积神经网络, 从RGB-D 数据中提取骨架特征中的时间依赖性. 然而, 由于深度置信网络采取无监督方式学习, 并没有结合手势类别对数据进行压缩. Nguyen 等[6]提出一种基于手部关节点坐标的对称正定(symmetrical positive determined, SPD)矩阵流形学习的神经网络方法. 该网络由3个部分组成: 一层卷积层、一层时空高斯聚合层和从骨架数据中学习到的最终SPD 矩阵. 该方法与本文类似利用关节之间的物理链接点提取特征. 然而, 该方法对时间序列的处理较粗糙, 为了捕获骨架序列的时间顺序, 采用了时空手势识别网络构造许多子序列: 原始序列、将原始序列分成2个子序列、再分成3个子序列等. Abavisani 等[16]提出了一种基于多模态训练的单模态动态手势识别方法, 对时间和位置信息利用时空语义对齐损失进行对齐, 这与协方差矩阵对齐密切相关. 然而, 利用神经网络进行动态手势识别的方法中网络设计往往难以充分考虑动态手势特定的手势运动特征. 本文提出了一种新的动态手势识别方法. 该方法将动态手势的运动分为手部在空间的全局运动和手内部手指的局部运动2部分, 并利用关键帧提取解决时间信息处理问题.2 动态手势识别方法本文从动态手势运动的内在特性出发, 结合手势所具有的个体差异性、时空连续性等特点, 提出了一种新的动态手势识别框架. 如图1所示, 该框架输入为动态手势3D 关节坐标, 首先进行数据预处理, 包括去除手势无效帧、手势帧数据补全和关节长度归一化; 然后提取动态手势关键帧, 并基于手势关键帧提取手在空间中运动的全局特征和手内部手指的局部特征; 并将两者特征融合后进行线性判别分析(linear discriminant analysis, LDA)特征降维, 最后利用带高斯核的SVM 进行动态手势识别分类. 该框架结合动态手势的时空连续特性, 解决了手势的时序问题, 同时有效提取了手部运动全局特征和手指运动局部特征.图1 动态手势识别框架2.1 动态手势数据预处理首先, 针对动态手势的时间可变性, 本文认为对于相同的手势动作, 由于测试者的动作有快有慢导致手势视频中出现较多冗余帧. 另外, 在手势提取过程中, 由于初始化问题或者出于提取关节位置信息考虑, 通常会需要测试者保持若干秒静止状态, 该手势帧与手势类别无关, 本文中将与手势类别无关的手势帧定义为手势无效帧, 为避免在关键帧提取中产生干扰, 需要首先去除手势无效帧.其次, 针对动态手势的完整性, 对不满关键帧帧数的手势将采用手势帧补全的方法, 使得动态手势数据帧数达到关键帧帧数的要求.最后, 针对动态手势的空间可变性, 本文认为当不同人做相同手势时, 不同的手掌大小和不同的手势幅度等通常会产生个体差异性. 本文将利用关节长度归一化方法消除个体差异性带来的影响, 从而解决动态手势的空间可变性问题. 2.1.1 手势的无效帧删除动态手势是一段时间内连续变化的手势序列, 手的形状和位置随着时间而变化. 动态手势数据集通常通过深度相机或数据手套获取, 获取的动态手势通常存在如何定义起始帧和结束帧的问题.第9期缪永伟, 等: 融合手势全局运动和手指局部运动的动态手势识别 1495本文所采用的数据集序列中, 要求参与者在每一个序列开始前的几秒内将整个手完全打开在摄像机前, 这一操作主要用于初始化手势估计算法. 因此, 每个手势序列中都有一些与手势类别无关的手势无效帧, 为了避免无效帧对手势分类造成干扰, 首先需要删除手势无效帧. 另外, 动态手势起始帧提取也是动态手势分类中的一个难点, 本文采用的动态手势数据集中已手工标注了有效的起止帧, 因此本文只需根据数据集中提供的手势起止帧数, 删除起始帧之前和结束帧之后的无效帧. 2.1.2 手势帧数据补全当手势关键帧确定之后, 对于关键帧帧数不足的手势处理问题, 本文考虑如果直接将小于关键帧数量的手势视为无效手势删除, 随着关键帧数量的增加, 数据集中的手势数量将急剧下降. 因此, 本文对帧数小于关键帧的手势数据采用数据补全的方法, 利用重复手势帧进行数据补全; 即从起始帧开始不断依次重复所有现有帧, 且为了保持手势运动特性, 将重复的手势帧直接插在被重复的手势帧之后, 直至手势视频达到规定帧数为止, 然后删除一个起始帧. 通过手势帧补全, 可以使训练数据集中样本数保持不变, 而重复现有手势帧可以有效地保持动态手势的完整性, 更好地说明手势识别准确率的提升和本文方法的泛化性. 2.1.3 手部关节长度归一化手势数据集通常需要由不同参与者采集数据, 并保持手势的通用性. 但是, 不同的参与者手的大小和关节之间的长度不同. 为了消除手部的个体差异性, 本文将手部关节长度归一化为相同长度, 即改变关节长度但不改变关节间的夹角. 例如, 在握拳手势时可能会出现指尖穿过手掌平面的异常运动. de Smedt 等[3]将手部关节长度归一化为数据集的平均长度, 但增加了计算量. 本文在标准手指长度的基础上, 对手部关节长度进行归一化.不妨以某一帧为例简述归一化过程. 利用,i j W 表示第j 帧中第i 个关节点位置. 为方便起见,归一化过程中下标j 均省略, 即表示为i W , 其中,0,1,2,,21i = . 利用向量表示22个关节点构成的关节对, 即15,1216,10,14,18 ,6,10,14,18i i i i i i i --≠⎧=⎨-=⎩W W V W W 且≤≤.归一化过程为,iii iL =V V V015, 0+,1216,10,14,18+, 6,10,14,18i i i i i i i i -=⎧⎪=≠⎨⎪=⎩W W V W V W ≤≤且 (1)需要指出的是, 本文对手部关节长度归一化时基于一个标准手指长度进行, 标准手指长度参考ACT hand 关节段[17]确立, 其中, i L 为对应第i 节关节段标准长度.2.2 动态手势特征表示首先, 从全局来看, 动态手势是手随着时间的流逝发生的一系列空间上的变化, 该变化可以根据物体运动的特性划分为平移运动和旋转运动. 其中平移运动通过手部中心点的移动距离表示, 根据手的运动特性, 手掌中心点的位置可以唯一确定手在空间中的位置. 旋转运动则是通过手的主方向向量的改变来进行刻画, 本文中手的主方向定义为: 手肘指向手掌中心点的向量. 考虑交互手势特征, 并不包含手绕中指指根关节与手肘连线所在直线的自旋转运动, 所以本文没有考虑自旋转运动的特征.其次, 从局部来看, 除了手在空间上的变化, 还有手内部手指的局部运动引起的手形变化, 本文将手部关节等同于21段链段结构. 而手指的局部运动是由手指的关节弯曲所引起, 可以理解为链段之间的角度变化引起的链段结构的整体变化. 考虑旋转矩阵所使用的元素多达16个, 而欧拉角会出现万向节死锁现象, 故本文中利用旋转四元数表示该变化. 而对于链段结构而言, 细微的角度误差将被累积, 经过多段链段后容易引起较大的距离误差[3], 故本文为消除由于角度误差累积引起的距离误差, 将手指相对距离特征加入手指的局部运动特征中. 同样考虑手部物理特征, 手指不存在绕该指指根关节与手肘连线所在直线的自旋转运动.综上所述, 本文基于手势的几何特性和时间空间连续性的角度, 提出了动态手势的4个特征表示. 动态手势运动的过程包括整只手在空间中的全局运动(即手在空间中的平移运动、旋转运动)和手内部手指的局部运动(即手内部手指的平移运动、旋转运动). 具体表示如下:(1) 手在空间中的平移运动.手在空间中的移动过程通过前后2帧手中心1496计算机辅助设计与图形学学报 第32卷点(关节点1)的距离刻画, 即1,11,1j j j j T T --=--W W .(2) 手在空间中的旋转运动.手在空间中的翻转信息通过前后2帧之间的手主方向向量距离刻画, 本文中手的主方向定义为10-W W , 翻转信息表示为1,0,1011,,1j j j j j j P P ---=----W W W W . (3) 手内部手指的平移运动.手指的平移运动则利用手指指尖相对距离特征刻画. 为避免因关节段之间旋转角度信息作为特征而出现旋转误差累积的现象, 本文提取手指相邻指尖之间的距离和手指指尖相对于手腕的距离作为手指平移特征, 具体表征为手指相邻指尖之间的距离094D =-W W ,1139D =-W W , 21713D =-W W ,32117D =-W W ,以及手指指尖相对于手腕的距离440D =-W W ,590D =-W W ,1630D =-W W , 1770D =-W W ,2810D =-W W .(4) 手内部手指的旋转运动.手的弯曲变化利用手部关节之间的旋转四元数刻画, 以00001111(,,)(,,)x y z x y z V V 关节段之间的四元数为例, 可得四元数中的旋转角度特征为Q = 01cos(arccos()/2)⨯V V .2.3 动态手势关键帧提取 2.3.1 手势距离函数为了有效提取动态手势的关键帧, 融合手势全局运动和手指局部运动的4个特征表示, 本文提出了一种手势距离函数, 并通过对手势距离进行排序选取动态手势中特征变化显著的手势帧作为关键帧, 即产生运动突变的帧作为手势关键帧. 定义一个动态手势前后2帧之间的距离为1381,,12,,103141()()()(),1,,j i j i j k j k j i k j j j j L Q Q D D P P T T j S E λλλλ--==--=-+-+-+-=+∑∑ (2)其中动态手势起始帧序号为S , 结束帧序号为E .实验中参数取1234=100, =1, =1, =1λλλλ. 2.3.2 关键帧分段提取对于动态手势视频序列, 若直接选取手势距离函数最大的前k 帧作为关键帧, 容易出现关键帧全是邻近帧的情况, 如对于图2所示的有效帧为第44~66帧的向上滑动的手势动态序列, 直接利用手势距离函数提取出的关键帧为第52~56帧, 这些关键帧全是邻近帧, 从而无法有效地表示整个手势过程. 为方便观察, 以深度图为例, 如图2所示, 图中显示不经过分段直接提取手势关键帧时出现严重信息冗余, 且不包含起始手势, 丢失了动态手势的完整信息. 为了避免信息冗余和保持手势的完整性, 需要考虑分段提取动态手势的关键帧.a. 输入的动态手势视频帧b. 提取的动态手势关键帧图2 不分段提取的手势关键帧第9期缪永伟, 等: 融合手势全局运动和手指局部运动的动态手势识别 1497在采用分段提取动态手势关键帧中, 假设手势起始帧为S F , 结束帧为E F , 则整个有效手势可表示为,}{,S E F F . 若提取k 帧关键帧则可将整个手势均匀分成k 段, 经分段后手势段I 为1(1){{,,},,{,,}}S S d S k d E +-+-⋅=F F F I F (3) 其中, (1)/d E S k -+⎢⎥⎣⎦=. 然后在每个手势段内选取距离函数式(2)最大的帧作为该段关键帧.本文数据集删除无效帧后帧数范围为7~149帧, 可选取的关键帧帧数范围较广. 为不失一般性, 考虑人体动作识别视频序列长度与手势动作的差异性, 通过手势识别准确率的对比实验选取关键帧帧数为31帧. 最后, 为保证手势完整性, 添加手势起止帧作为关键帧. 若起始帧(结束帧)已经包含在关键帧中则选取该帧的邻近帧, 即后1帧或前1帧取代该帧, 并添加起止帧. 算法步骤如下.算法1. 关键帧提取算法.输入. 动态手势的22个关节点3D 坐标信息. 输出. 该动态手势的k 帧关键帧.Step1. 根据手势起止帧, 删除手势无效帧{}{}1,,,S E N ''←F F F F .Step2. 补全手势帧{}{},,,,S E S E ''←F F F F .Step3. 利用式(1)对每一帧的关节长度进行归一化, 得到归一化后关节点位置信息,,0,1,,21,i j i S j E =≤≤W .Step4. 将动态手势按照式(3)进行分段.Step5. 根据式(2)计算视频段内前后2帧距离j L . Step6. 在每个视频段中分别选取具有最大距离的帧作为其关键帧,,,m m m F F F .Step7. 加入手势起止帧S F 和E F , 最终得到动态手势的关键帧为,,,,,S m m m E F F F F F .以抓取手势为例, 说明本文中关键帧提取的有效性. 图3a 给出了抓取手势中的每隔5帧手势深度图, 分别对应第10帧、第15帧、第20帧、第25帧、第30帧、第35帧、第40帧、第45帧手势图; 图3b 给出了利用算法1提取的抓取手势的关键帧, 分别对应第10帧、第17帧、第22帧、第31帧、第38帧、第40帧、第45帧手势图, 可以看出利用分段提取动态手势的关键帧能够有效地表示手势的完整变化过程.a. 每隔5帧的手势深度图b. 分段提取的手势关键帧图3 抓取手势的关键帧提取本文采用的动态手势关键帧提取算法包含手势数据预处理、手势分段、手势帧距离计算等, 由于手势帧数的不同, 其关键帧提取时间也不尽相同. 表1给出了对不同手势帧帧数统计其关键帧提表1 关键帧提取时间统计手势帧帧数平均时间/s0~31 0.0032~40 40.1541~50 41.8951~60 42.25 61~70 46.07 >70 74.63取的平均时间列表. 当手势帧帧数不超过31帧时, 仅需补全手势帧, 不计关键帧提取时间; 随着手势帧帧数的增多, 关键帧提取时间变长. 本文中的关键帧提取实时性较低, 在未来工作中将探讨如何进一步提高关键帧提取的实时性.2.4 动态手势识别和分类 2.4.1 手势特征融合由于本文提出的融合手势全局运动和手指局部运动的特征将共同表征一个动态手势, 类似于Luvizon 等[18]将特征进行融合的思路, 本文将特征进行联接融合为单个手势的m 维特征向量=Y 1,],[ m y y . 在含有N 个样本的数据集中分别得到1498计算机辅助设计与图形学学报 第32卷N 个手势特征向量为,1,,[],1,2,,,. m i i i y y i N ==Y对特征向量中各维特征分别归一化,,i j ji j jf f f σ-=.其中, ,1/;N j i j j i f f N σ===∑ 从而得到N 个手势的归一化特征向量为,1,[], ,,,,1,2i i i m f f i N ==F .2.4.2 手势特征降维对于SVM 来说, 本文的样本特征向量维数过多, 在动态手势关键帧中存在信息冗余. 为了使变量相互独立并去除手势特征中的噪声, 同时考虑样本中存在的类别标签, 这里采用监督降维中的LDA 方法进行特征降维. 该方法降维原理如下: 同类数据应尽可能接近, 不同类别的数据应尽量远离, 即投影后类内方差最小, 类间方差最大. 本文在降维过程中, 充分利用手势类别的先验知识. 将手势特征映射到一个低维空间中, 该过程充分利用了手势类别的信息, 使得不同类别手势间的特征方差最大, 同一类别手势间的特征方差最小, 方便进行手势识别和分类.2.4.3 基于带高斯核SVM 的手势识别和分类与其他机器学习分类方法相比, SVM 理论避开了高维空间的复杂性并直接利用核函数向高维空间进行映射, 再利用线性可分情况下的求解方法直接求解对应的高维空间决策问题. 当核函数已知时可以简化高维空间问题的求解难度. 同时SVM 有很好的理论基础, 不涉及概率测度, 最终的决策函数也只由少量的支持向量决定, 计算复杂度取决于支持向量的数目, 而不是样本空间的维数, 从而避免了维数灾难.本文采用带高斯核的SVM 实现对动态手势的识别和分类. 该方法能根据有限样本信息找到特定训练样本的学习精度与学习能力之间的折中, 在解决小样本、非线性和高维识别方面具有优势.3 实验结果与分析本文实验平台为Intel Core i5-7500, 4 GB RAM, 操作系统为Windows10 64位. 本文基于手部关节点的3D 坐标信息, 通过确定起止帧、删除手势无效帧; 然后进行关节长度归一化以消除个体差异性, 提取手势关键帧, 再分别提取手在空间中的全局运动和手内部手指的局部运动特征, 并进行特征融合和LDA 降维; 最后利用带高斯核的SVM 进行动态手势识别与分类.3.1 实验数据集本文方法所采用的数据集是DHG-14/28动态手势数据集[3], 该数据集中包含有14类动态手势类别, 如表2所示, 并以2种方式执行手势: 只用一个手指的方式和整个手的方式. 每个手势由20名参与者以上述2种方式完成, 每个执行方式各完成5次, 共2 800个动态手势序列. 14种手势中5种为Fine 类手势, 9种为Coarse 类手势. 同时, 数据集中不仅包含动态手势视频帧深度图像, 还包含2D 深度图像中和3D 空间中的22个手部关节坐标, 其中深度图像分辨率为640×480, 深度图和手骨架均以30帧/s 的速度拍摄获取.表2 数据集中包含的手势类别序号 手势类别1 Grab(抓取) Fine2 Expand(展开) Fine3 Pinch(抓紧) Fine4 Rotation CW(顺时针旋转) Fine5 Rotation CCW(逆时针旋转) Fine6 Tap(轻敲) Coarse7 Swipe right(向右滑动) Coarse8 Swipe left(向左滑动) Coarse 9Swipe up(向上滑动) Coarse10 Swipe down(向下滑动) Coarse 11 Swipe X(在空中画X) Coarse 12 Swipe V(在空中画V) Coarse 13 Swipe +(在空中画+) Coarse14 Shake(摇手) Coarse3.2 手势关键帧帧数的确定需要说明的是, 在动态手势关键帧提取中首先需要确定手势关键帧帧数, 选取合适的关键帧帧数将影响手势的识别准确率. 本文对比手势识别准确率, 对不同关键帧帧数k 值进行实验分析. 如图4所示, 随着关键帧帧数的增加, 手势识别准确率有所上升且趋于稳定; 当关键帧帧数大于31时, 手势识别准确率趋于下降. 从图5可以看出, 对于DHG-14/28动态手势数据集[3]中28种手势,关键帧帧数31k =时, 手势识别准确率为88.29%, 达到最高. 实验表明, 若关键帧帧数较少, 则同一种手势的关键帧选取可能具有较大差异性, 导致手势识别准确率较低. 因此, 为了提高手势识别准。

基于神经网络的手势识别技术研究与应用

基于神经网络的手势识别技术研究与应用

基于神经网络的手势识别技术研究与应用摘要:手势识别技术在人机交互方面具有广泛的应用,如虚拟现实、智能家居等领域。

本文提出了一种基于神经网络的手势识别技术,采用卷积神经网络(CNN)和循环神经网络(RNN)相结合的方式,提高了手势识别的准确率和鲁棒性。

实验结果表明,本文所提出的方法具有较好的识别效果,可以实现对常见手势的准确识别。

关键词:手势识别;神经网络;卷积神经网络;循环神经网络Abstract: Gesture recognition technology has a wide range of applications in human-computer interaction, such as virtual reality, smart homes and other fields. This paper proposes a gesture recognition technology based on neural networks, which combines convolutional neural networks (CNN) and recurrent neural networks (RNN) to improve the accuracy and robustness of gesture recognition. Experimental results show that the method proposed in this paper has good recognition effect and can accurately recognize common gestures.Keywords: Gesture recognition; Neural network; Convolutional neural network; Recurrent neural network引言随着智能设备的普及和人机交互技术的发展,手势识别技术已经成为了一种重要的交互方式。

面向模型未知的自由漂浮空间机械臂自适应神经鲁棒控制

面向模型未知的自由漂浮空间机械臂自适应神经鲁棒控制

WANGChao, JIN G Lijian, YEXiaoping,JIANGLihong,ZHANG Wenhui (School of Engineering,Lishui University,Lishui 323000,Zhejiang,China)
A bstract:In order to solve the problem that the precise mathematical model of free-floating space
确 获 得 ,利用神经网络控制器来补偿机械臂动力学模型, 设计网络权值的自适应学习律实现在线实时调整,避免
对 数 学 模 型 的 依 赖 .设 计 自 适 应 鲁 棒 控 制 器 来 抑 制 外 界 扰 动 和 补 偿 逼 近 误 差 ,提 高 系 统 鲁 棒 性 和 控 制 精 度 .基 于 Lyapunov理 论 ,证明了闭环系统的稳定性.仿真试验验证了所提控制方法的有效性,对于自由漂浮空间机器人 研究具有重要意义.
关 键 词 :空 间 机 器 人 '神 经 网 络 '鲁 棒 控 制 '自 适 应 ;稳定性
中图分类号!T P 24
文献标志码: A
文 章 编 号 !1672- 5581(2019)02-0153 - 06
Adaptive neural robust control for free-floatin* space manipulator facin* unknown model
manipulators is difficult ot obtain and the parameters of the dynamic model will change due to the external

federated learning based on dynamic regularization

federated learning based on dynamic regularization

federated learning based on dynamic regularization 随着人工智能技术的发展,越来越多的企业和机构开始将其应用于各种商业和科学领域。

然而,在实际应用中,由于数据保密性和隐私性的问题,数据共享和联合学习成为了制约人工智能技术发展的一个瓶颈。

为了解决这个问题,研究人员提出了一种新的联合学习方法:基于动态正则化的联邦学习。

联邦学习的基本思想是将训练数据分散在多个设备或节点中,每个节点只训练本地数据,然后将本地模型的参数上传到中央服务器进行模型融合,从而实现全局模型的更新。

这种方法可以有效地保护数据隐私,但是由于节点之间的数据分布和样本量的不同,会导致模型的过拟合和欠拟合问题。

为了解决这个问题,研究人员提出了一种新的动态正则化方法,即基于动态正则化的联邦学习。

动态正则化的联邦学习方法是在传统联邦学习的基础上引入了正则化项,通过对模型参数进行约束,减少模型的过拟合和欠拟合问题。

与传统的正则化方法不同的是,动态正则化方法可以根据节点的数据分布和样本量动态地调整正则化系数,从而实现更好的模型泛化能力。

具体来说,动态正则化的联邦学习方法包括以下步骤:1. 将训练数据分散在多个节点中,每个节点只训练本地数据,得到本地模型参数。

2. 将本地模型参数上传到中央服务器,进行模型融合。

3. 在模型融合的过程中,引入动态正则化项,对模型参数进行约束。

4. 根据节点的数据分布和样本量,动态调整正则化系数,从而实现更好的模型泛化能力。

动态正则化的联邦学习方法在实验中取得了很好的效果。

与传统的联邦学习方法相比,动态正则化方法可以有效地减少模型的过拟合和欠拟合问题,提高模型的泛化能力。

同时,该方法还可以根据节点的数据分布和样本量动态地调整正则化系数,从而实现更好的模型适应性和鲁棒性。

总之,动态正则化的联邦学习方法是一种新的联合学习方法,可以有效地解决数据隐私和共享问题。

该方法可以根据节点的数据分布和样本量动态地调整正则化系数,从而实现更好的模型泛化能力。

基于leapmotion的动态手势识别

基于leapmotion的动态手势识别

ABSTRACTDynamic hand gesture recognition is a crucial but challenging task in the pattern recognition and computer vision communities. In this paper, we propose a novel feature vector which is suitable for representing dynamic hand gestures, and present a satisfactory solution to recognizing dynamic hand gestures with a Leap Motion controller (LMC) only. These have not been reported in other papers. The feature vector with depth information is computed and fed into the Hidden Conditional Neural Field (HCNF) classifier to recognize dynamic hand gestures.The systematic framework of the proposed method includes two main steps: feature extraction and classification with the HCNF classifier. The proposed feature vector that consists of single-finger features and double-finger features has two main benefits. First, single-finger features solve the problem of mislabeling which is caused by executing dynamic hand gesture in different positions. Second, double-finger features can help in distinguishing the different types of interactions between adjacent fingertips. The HCNF-based classifier considers the two main factors for dynamic hand gesture recognition: different kinds of features and complex underlying structure of dynamic hand gesture sequences. The proposed method is evaluated on two dynamic hand gesture datasets with frames acquired with a LMC. The recognition accuracy is 89.5% for the LeapMotion-Gesture3D dataset and 95.0% for the Handicraft-Gesture dataset. Experimental results show that the proposed method is suitable for certain dynamic hand gesture recognition tasks.KEY WORDS:Dynamic hand gesture recognition, Depth data, Leap Motion controller, Hidden Conditional Neural Field目录摘要 (I)ABSTRACT (III)目录 (V)第1章绪论 (1)1.1引言 (1)1.2国内外研究现状 (4)1.2.1基于数据手套的手势识别 (4)1.2.2基于计算机视觉的手势识别 (5)1.2.3基于深度信息的手势识别 (6)1.3研究内容 (7)1.4论文结构 (8)第2章特征的分析与提取 (11)2.1三维梯度方向直方图 (11)2.2 DT特征 (12)2.3链码(Chain Code) (13)2.4单指特征与双指特征 (13)2.4.1单指特征 (14)2.4.2双指特征 (15)2.5本章小结 (15)第3章分类器模型 (17)3.1隐马尔科夫模型 (17)3.1.1隐马尔科夫模型的定义 (17)3.1.2隐马尔科夫模型的三个基本问题 (19)3.1.3隐马尔科夫模型的局限性 (20)3.2条件随机场 (20)3.2.1概率无向图 (20)3.2.2条件随机场的定义与形式 (23)3.2.3条件随机场的基本问题 (24)3.2.4条件随机场的特点 (25)3.3隐条件随机场 (25)3.3.1隐条件随机场的定义 (26)3.3.2隐条件随机场的学习与推断 (28)3.4隐条件神经场 (29)3.4.1隐条件神经场的定义 (29)3.4.2隐条件神经场的学习与推断 (31)3.5本章小结 (31)第4章实验结果与分析 (32)4.1动态手势数据库 (32)4.1.1基于Leap Motion的动态手势采集系统 (32)4.1.2动态手势数据库 (33)4.2实验结果与分析 (36)4.3本章小结 (41)第5章总结与展望 (44)5.1总结 (44)5.2工作展望 (45)参考文献 (46)发表论文和参加科研情况说明 (51)致谢 (53)第1章绪论1.1引言人机交互技术[1](Human-Computer Interaction)是研究人与机器之间相互作用的技术,该技术的研究目的是利用一切可能的信息及其通道进行人与机器之间的沟通与交流,从而提升人机交互技术的自然性和高效性。

改进卷积神经网络的动态手势识别

改进卷积神经网络的动态手势识别

改进卷积神经网络的动态手势识别①付天豪, 于力革(江南大学 轻工过程先进控制教育部重点实验室, 无锡 214122)通讯作者: 付天豪, E-mail: *****************摘 要: 针对现有的单目视觉下动态手势识别率低、识别手势种类少等问题提出一种联合卷积神经网络和支持向量机分类(CNN-Softmax-SVM)的动态手势识别算法. 首先采用一种基于YCbCr 颜色空间和HSV 颜色空间的快速指尖检测跟踪, 能在复杂背景下实时获取指尖运动轨迹; 其次将指尖运动轨迹作为联合CNN-Softmax-SVM 网络的输入, 最终通过训练网络来识别动态手势. 测试结果显示, 采用联合CNN-Softmax-SVM 算法能够很好地识别动态手势.关键词: 动态手势识别; 肤色检测; 卷积神经网络; 支持向量机; 颜色空间引用格式: 付天豪,于力革.改进卷积神经网络的动态手势识别.计算机系统应用,2020,29(9):225–230. /1003-3254/7546.htmlImproved Dynamic Gesture Recognition Method Based on Convolutional Neural NetworkFU Tian-Hao, YU Li-Ge(Key Laboratory of Advanced Process Control for Light Industry, Ministry of Education, Jiangnan University, Wuxi 214122, China)Abstract : A dynamic gesture recognition algorithm based on convolutional neural network and support vector machine classification (CNN-Softmax-SVM) is proposed to solve the problems of low recognition rate and few gesture recognition types in monocular vision. Firstly, the fast fingertip detection and tracking algorithm based on YCbCr and HSV color space is employed, which can acquire fingertip trajectory in real time under complex background. Secondly, fingertip trajectory is used as input of joint CNN-Softmax-SVM network, and finally dynamic gesture trajectory is recognized by trained network. The test results show that the combined CNN-Softmax-SVM algorithm can identify the dynamic gesture trajectory well.Key words : dynamic gesture recognition; skin color detection; Convolutional Neural Network (CNN); Support Vector Machine (SVM); color space人机交互的方式有很多种, 如鼠标键盘交互、触摸屏交互、语音交互、体感交互等. 手势交互作为一种新型的人机交互方式, 目前已经成为人机交互的重要方式之一. 人手作为身体上最灵活的器官, 将它作为人机交互的方式, 可以使交互更加便利、通用性能更强.非接触式的动态手势交互已经被应用到体感游戏、辅助驾驶以及手语识别等领域, 为人们带来了简洁方便的用户体验. 但是因为手势具有时间和空间上的多样性和不确定性, 而且人手本身也是复杂的可变形体, 所以目前这种人机交互方式还处于实验阶段, 理论不是很成熟, 能够识别的范围比较小. 因此, 对动态手势的识别技术研究具有重要意义. 常用的动态手势识别方法有: 基于模板匹配的动态时间规整(Dynamic Time Warping, DTW)方法[1,2]和基于模式识别的隐马尔科夫模型方法[3,4]. 其中, 采用DTW 方法在解决大数据量、复杂手势、组合手势识别等问题比较其他方法计算机系统应用 ISSN 1003-3254, CODEN CSAOBNE-mail: ************.cn Computer Systems & Applications,2020,29(9):225−230 [doi: 10.15888/ki.csa.007546] ©中国科学院软件研究所版权所有.Tel: +86-10-62661041① 收稿时间: 2019-12-30; 修改时间: 2020-01-22, 2020-02-25; 采用时间: 2020-03-11; csa 在线出版时间: 2020-09-04处于劣势[5]; 采用隐马尔科夫模型识别手势, 该方法复杂、需要大量的训练数据[2]并且识别总类少. 由于深度学习模型具有强大的非线性建模能力, 能够表达更高层次、更加抽象的内部特征等特点, 近年来深度学习被用于动态手势识别[6,7]. 文献[6]结合EgoHands手势数据集, 改进深度卷积网络模型中的参数和网络结构, 对深度卷积网络模型进行训练, 得到高识别率动态手势识别模型, 识别准确率能够达到85.9%, 识别速度每秒钟可达到16.8帧. 文献[7]利用改进的多维卷积神经网络提取手势的时空特征, 融合多传感器信息并通过支持向量机实现微动手势识别, 针对多类动态手势数据集能达到87%以上的识别准确率, 并且对手势的背景和光照都具有较好的鲁棒性. 目前动态手势识别仍存在以下问题: 复杂背景干扰、指尖特征点识别和定位困难、光照环境的变化[8], 这些问题导致了动态手势识别识别率低.针对以上存在问题, 提出的动态手势轨迹特征提取方法能够有效排除复杂背景的干扰、指尖特征点获取和定位困难; 提出的联合CNN-Softmax-SVM的手势识别算法能够对动态手势进行有效识别, 从而大大提高了动态手势识别率. 本文中的动态手势识别系统主要流程如图1所示.图1 动态手势识别系统流程首先在HSV颜色空间下提取指尖贴纸坐标获取动态指尖重心坐标, 通过肤色分割和射线法排除肤色轮廓外复杂背景的干扰, 得到待识别的二值化轨迹图片; 其次采用CNN-Softmax和CNN-SVM对手写轨迹样本训练分别得到轨迹样本训练模型, 最后利用训练好的轨迹样本分类模型对动态手势识别.1 动态手势特征提取1.1 获取每帧图像指尖轨迹点为了在复杂背景下很好地对动态手势指尖顺序轨迹点进行采集, 在指尖贴上便携式贴纸, 通过检测每帧贴纸轮廓的质心坐标来定位指尖中心点, 其中贴纸要被指尖完全覆盖. 先把每帧图像从RGB (Red, Green, Blue)颜色空间转换到HSV (Hue, Saturation, Value)颜色空间:H∈[100◦,124◦]S∈[0.169,1]V∈[0.180,1]式中, H为色调, S为饱和度, V为明亮度. 利用式(1),将RGB颜色空间的图像(图2(a))转换成HSV颜色空间(图2(b)). 本文所选指尖贴纸为蓝色, 设置图2(c)的HSV 各个分量的阈值范围为、、得到图2(c)所示的指尖贴纸二值化图.(a) RGB 颜色空间(b) HSV 颜色空间(c) 蓝色分量二值化图2 指尖贴纸预处理对图2(c)采用轮廓检测获得指尖贴纸轮廓在每一帧图上的坐标, 通过轮廓的矩计算轮廓的质心并作为指尖中心点坐标.1.2 剔除异常轨迹点C r C b当复杂背景中存在类贴片颜色干扰时, 仍然会被误判为指尖中心点, 因此通过判断指尖中心点是否在肤色轮廓内部, 来排除类贴片颜色的背景干扰. 在RGB 空间里人的肤色受亮度影响较大, 把RGB转为YCbCr 颜色空间提取肤色分量、信息, 在一定范围内可以忽略亮度分量Y的影响.C r C b C r∈[127, 176]C b∈[85,129]C r C b利用式(2)提取图2(a)的、分量在和范围内的值, 如图3(a)和图3(b)所示. 利用大津法(Otsu)对图3(a)和图3(b)进行自适应阈值分割, 得到相应的二值化图. 然后将和分量计算机系统应用2020 年 第 29 卷 第 9 期的二值图与操作, 则得到二值化的肤色图. 最后采用轮廓检测获取二值化肤色图的连通域轮廓坐标集合.(a) C r 分量(b) C b 分量(c) 肤色二值化图3 肤色分割通过射线法判断指尖中心点是否在肤色轮廓坐标连通域内. 以待判断点的坐标点开始, 沿横坐标水平向右作射线, 判断射线与肤色轮廓交点的个数, 若为奇数则为指尖中心点如图4(a)所示, 若为偶数则为异常干扰点如图4(b)所示.图4 射线法剔除异常指尖轨迹点1.3 获取动态手势首先获取每一帧图像的肤色轮廓和指尖中心点;其次剔除背景异常点之后将每帧指尖中心点按顺序保存至数组中; 最后以第一个指尖中心点坐标为手势起始状态, 以最后一个检测的指尖中心坐标为手势结束状态, 将每一个指尖中心点按数组顺序相连, 即可构成表示该手势的运动轨迹. 如图5所示为获取数字字符“2”的动态手势, 其中图5(b)的字符轨迹包含150帧的指尖中心顺序相连, 待用于手势识别输入.(a) 连续帧下获取手势轨迹连续点(b) 运动手势轨迹点图5 获取动态手势2 联合CNN-Softmax-SVM 的手势识别2.1 卷积神经网络卷积神经网络(Convolutional Neural Networks,CNN)是一种深度前馈人工神经网络, 广泛应用于图像分类等计算机视觉问题. CNN 模型主要包括卷积层、降采样层、全连接层[9]. 卷积层由滤波器组成, 用于“滑动”输入图像的宽度和高度, 并计算输入区域的点积和权值学习参数, 提取图像的纹理特征, 实现图像特征的增强. 第m 层特征图第j 个神经元的输入为[10]:h m j x m −1j M j x m −1j k m i j b m j f (·)式中, 称为下采样层m 的第j 通道的净激活, 它由前一层输出特征图进行下采样加权、偏置后得到;为选取的输入特征图子集; 为第m –1卷积层的第i 个神经元的输出; 是卷积核矩阵; 为第m 卷积层特征图第j 个神经元的偏置参数; “*”是卷积符号;为激活函数, 通常可使用Sigmoid, tanh, ReLU 等函数, 文中采用ReLU 函数:当输入经过卷积层时, 若得到的特征图仍然较大,可以通过下采样层来对每一个特征图进行降维操作.下采样层公式为[10]:b m j down (·)x m −1j 式中, β是下采样层的权重系数; 是下采样层的偏置项; 符号表示下采样函数, 它通过对输入特征图通过滑动窗口方法划分为多个不重叠的n ×n 图像块, 然后对每个图像块内的像素求和、求均值或最大值.提取的特征向量首先通过全连接层, 然后再通过Softmax 层得到分类权重, Softmax 层的函数为:a j a j P j e 式中, 表示通过全连接层之后的向量里的第j 个值,由于没有范围大小, 通过Softmax 函数之后得到的为类别的概率, 范围在0–1之间; n 为待输出总类数;为欧拉常数.2.2 支持向量机支持向量机(Support Vector Machine, SVM)[11]是2020 年 第 29 卷 第 9 期计算机系统应用一种机器学习算法, 通过将输入向量映射到一个高维的特征空间, 在这个空间中构造最优分类面, 从而达到两类或者多类的样本划分, 在多数情况下, SVM 比CNN 全连接层的线性分类泛化能力好[12].(u i ,v i ),i =1,2,···,n ,u i ∈R d ,v i ∈{1,−1}v i 设样本集为,其中是类别标签; 设M 为分类间距: 过各类样本中离分类平面最近且平行于分类面的平面之间距离; 设所求最优超平面方程为:w s 式中, 为超平面法向量, 决定了超平面的方向, 为位移量, 决定了超平面与原点的距离. 对式(7)进行归一化处理使得满足式(8):||w ||2||w ||2若使M 最大, 即求最小. 以式(9)为约束条件求最小是一个二次规划问题, 可以转为对偶问题求解.a i 以式(9)为约束条件, 对求解式(10)的最大值:a i 其中, 为所求问题的Lagrange 乘子. 则最优超平面分类问题可表示为:K (u ,v )=exp[−||u −v ||2(2σ2)]K (u i ,u j )u i ·u j 由于基于RBF 核函数的支持向量机识别精度高,稳定性强[13]. 因此文中采用基于RBF 核函数 的支持向量机, 即令=, 则式(10)的分类函数可表示为:若支持向量机需要识别k 类, 采用1-v-r 算法使用k 个支持向量机独立训练, 每个分类器表示1个类别[14].若k 个分类的预测结果中存在多个二分类的正样本类,将精确率作为分类的置信度, 精确率定义如下:其中, TP 是测试的样本预测为正样本, 并且测试的样本实际也为正样本的样本数; FP 是测试的样本预测为正样本, 并且测试的样本实际为负样本的样本数. 最后将置信度最大的正样本类作为最终的预测类别:PC i 其中, 是第i 类的精确率.2.3 联合CNN-Softmax-SVM 算法联合CNN-Softmax-SVM 算法是同时采用在原CNN 模型的基础上添加了SVM 分类过程和权重判定层, 其算法结构如图6所示.图6 算法模型结构将待识别轨迹特征输入卷积层、降采样层、全连接层得到一维形式的特征向量之后, 分别通过CNN 网络的全连接层的Softmax 层和用SVM 分类函数代替CNN 网络的全连接层的分类函数, 得到两种预分类结果, 将两种预分类结果输入权重判定层, 通过权重判定决定最终输出. 联合CNN-Softmax-SVM 算法的实现步骤如算法1.算法1. 联合CNN-Softmax-SVM 算法输入:¯M1) 待识别手势轨迹图;P CNN −SVM ={p 1(¯x 1),p 1(¯x 2),p 1(¯x 3),···,p 1(¯x n )}¯x i p 1(¯x i )2) 采用CNN-SVM 训练样本后的每种类别识别正确率的平均值集合, 其中是采用C N N -SVM 训练测试集的第i 类, 是采用CNN-SVM 训练样本后的第i 类平均测试正确率;P CNN −Softmax ={p 2(¯y 1),p 2(¯y 2),p 2(¯y 3),···,p 2(¯y n )}¯y i p 2(¯y i )3) 采用CNN-Softmax 训练样本后的每种类别识别正确率的平均值集合, 其中是采用CNN-Softmax 训练测试集的第i 类; 是采用CNN-Softmax 训练样本后的第i 类平均测试正确率;Class _x输出: 识别类别¯M⃗m 1 经过卷积层、降采样层、全连接层得到一维特征向量.⃗m⃗m 2 根据式(6)获取的类别y CNN-Softmax ; 根据式(14)获取的类别x CNN-SVM .¯PCNN −SVM =n ∑i =1p 1(¯x i )n ¯P CNN −Softmax =n ∑i =1p 2(¯y i )n3 获取CNN-SVM 测试样本识别正确率的平均值;获取CNN-Softmax 测试样本识别正确率的平均值y CNN −Softmax x CNN −SVM 4 if = then计算机系统应用2020 年 第 29 卷 第 9 期Class _x y CNN −Softmax 5 =6 else¯PCNN −SVM ⩾¯P CNN −Softmax 7 if then p 1(x CNN −SVM )⩾p 2(y CNN −Softmax )8 if Class _x x CNN −SVM 9 =10 elseClass _x y CNN −Softmax 11 =12 end if 13 elsep 1(x CNN −SVM )⩽p 2(y CNN −Softmax )14 if Class _x =y CNN −Softmax 15 16 elseClass _x =x CNN −SVM 17 18 end if 19 end if 20 end if3 实验分析100≤H ≤12443≤S ≤25546≤V ≤255实验环境的硬件: intel(R) Core(TM) i5-9400 CPU @ 2.90 Hz 2.90 GHz; 内存: 8.00 GB; 系统类型: 64位操作系统, 基于x64的处理器; 便携式手贴纸: 贴纸形状为圆形, 贴纸直径为1 cm, 贴纸颜色采用蓝色(在HSV 颜色空间下贴纸颜色范围: , ,); 摄像头: 罗技(Logitech) C270i IPTV 高清网络摄像头720P.实验从NIST 数据集[15]中整理出0-9数字字符样本图片各2000张用于改进卷积神经网络训练和测试,实验样本数据的训练和预测都是仅在CPU 下运行. 训练相关参数如表1.表1 参数分配表参数CNN-Softmax CNN-SVM 训练批量图大小128×128128×128学习率e −3e −3迭代次数80008000SVM 惩罚系数C—1K f D f CNN 中所有网络各层的参数设置如表2所示, 其中F 、S 、P 分别表示卷积池化窗的大小、窗口滑动的步长、图像的边界填充, 表示在当前网络层中卷积池化窗的个数, 表示当前网络层输出特征的维度.表2 CNN 各层参数设置网络层类型K f F S P D f卷积层1325×510124×124×32下采样层1—2×22062×62×32卷积层2645×51058×58×64下采样层2—2×22029×29×64CNN-Softmax 和CNN-SVM 在训练过程中的误差下降散点图如图7所示, 训练过程的精度折线图如图8所示. 其中CNN-Softmax 的测试平均精确度为98.23%.另一方面, CNN-SVM 的测试平均精确度为98.04%.10002000300040005000600070008000StepCNN-SoftmaxCNN-SVM图7 训练误差下降散点图10002000300040005000600070008000StepCNN-Softmax CNN-SVM图8 训练精度折线图实验邀请8名长期使用台式电脑的志愿者进行简单的互动, 每位志愿者每个数字字符累计实验15次.其中手与摄像头距离阈值设置在0~60 cm 范围内; 手指与摄像头中心之间角度阈值设置在0~18.3°范围内.采用联合CNN-Softmax-SVM 算法对动态手势进行识别, 识别结果如图9手势识别混淆矩阵所示, 识别混淆矩阵中第一横轴为识别出的结果, 第一纵轴为识别的类别.由图9混淆矩阵可知, 本文方法对于动态手势轨迹具有较高的识别准确率. 在识别大部分数字字符上准确率可达到95%以上.由于数字“0”和“6”、“2”和“3”、“4”和“6”、“5”和“6”、“5”和“8”在基于单目视觉下书写轨迹具有一定的相似性, 容易存在一定程度误判的情形. 单独采用CNN-Softmax 算法在书写并识别“5”和“6”时识别率小2020 年 第 29 卷 第 9 期计算机系统应用于93%; 单独采用CNN-SVM算法在识别数字字符轨迹“2”、“3”、“4”时识别率小于93%; 采用本文提出的联合CNN-Softmax-SVM算法对0~9数字字符识别平均准确率能达到95%以上, 提高了相似字符识别的准确率.图9 志愿者字符手势识别混淆矩阵表3给出了本文所提出的联合CNN-Softmax-SVM算法和单独采用CNN-Softmax和CNN-SVM算法的对比结果. 由表3可知, 尽管本文算法的平均识别时间略长, 但还是能满足实时性; 另外, 采用本文方法在识别率上相比其他几种方法有了较明显的提升.表3 不同动态手势识别方法的对比识别方法平均识别率平均识别时间/frame (s)CNN-Softmax0.9440.026CNN-SVM0.9370.017CNN-Softmax-SVM0.9590.0454 结论与展望基于单目视觉的特征轨迹提取方法, 有效地解决了复杂背景的干扰, 增强了手势数据的有效性并且能够适应大部分动态手势变换. 联合CNN-Softmax-SVM 动态手势识别算法能够有效提取动态手势的轨迹特征,在识别率上也有一定的提高. 但由于部分数字字符在基于单目视觉下书写轨迹具有相似性, 容易存在一定程度误判的情形. 在今后研究中应该进一步提高相似字符轨迹的识别率和识别时间. 除此之外, 本文研究是针对数字字符轨迹分类的手势识别, 也可以尝试对字母字符轨迹的手势识别进行实验研究, 以便更好地满足人机交互的需求.参考文献Plouffe G, Cretu AM. Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Transactions on Instrumentation and Measurement, 2016, 65(2): 305–316. [doi: 10.1109/TIM.2015.2498560]1李凯, 王永雄, 孙一品. 一种改进的DTW动态手势识别方法. 小型微型计算机系统, 2016, 37(7): 1600–1603. [doi:10.3969/j.issn.1000-1220.2016.07.045]2陈国良, 葛凯凯, 李聪浩. 基于多特征HMM融合的复杂动态手势识别. 华中科技大学学报(自然科学版), 2018, 46(12): 42–47. [doi: 10.13245/j.hust.181208]3马正华, 李雷, 乔玉涛, 等. 基于多传感器融合的动态手势识别研究分析. 计算机工程与应用, 2017, 53(17): 153–159.[doi: 10.3778/j.issn.1002-8331.1603-0279]4陈甜甜, 姚璜, 左明章, 等. 基于深度信息的动态手势识别综述. 计算机科学, 2018, 45(12): 42–51, 76.5王健, 朱恩成, 黄四牛, 等. 基于深度学习的动态手势识别方法. 计算机仿真, 2018, 35(2): 366–370. [doi: 10.3969/j.issn.1006-9348.2018.02.078]6李玲霞, 王羽, 吴金君, 等. 基于改进型多维卷积神经网络的微动手势识别方法. 计算机工程, 2018, 44(9): 243–249.[doi: 10.19678/j.issn.1000-3428.0048138]7Jian CF, Xiang XY, Zhang MY. Mobile terminal gesture recognition based on improved FAST corner detection. IET Image Processing, 2019, 13(6): 991–997. [doi: 10.1049/iet-ipr.2018.5959]8LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324. [doi: 10.1109/5.726791]9常亮, 邓小明, 周明全, 等. 图像理解中的卷积神经网络. 自动化学报, 2016, 42(9): 1300–1312. [doi: 10.16383/j.aas.2016.c150800]10Astorino A, Fuduli A. The proximal trajectory algorithm in SVM cross validation. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(5): 966–977. [doi:10.1109/TNNLS.2015.2430935]11彭清, 季桂树, 谢林江, 等. 卷积神经网络在车辆识别中的应用. 计算机科学与探索, 2018, 12(2): 282–291. [doi:10.3778/j.issn.1673-9418.1704055]12朱树先, 李芸, 祝勇俊, 等. RBF支持向量机用于多类混叠的人脸识别研究. 控制工程, 2019, 26(4): 773–776. [doi:10.14107/ki.kzgc.C3.0593]13李亦滔. 基于支持向量机的改进分类算法. 计算机系统应用, 2019, 28(10): 145–151. [doi: 10.15888/ki.csa.007080] 14Grother PJ. NIST special database 19. https:/// srd/nist-special-database-19. (2010-08-27)[2019-04-27].15计算机系统应用2020 年 第 29 卷 第 9 期。

基于自然手势交互的工业机器人示教系统设计与实现

基于自然手势交互的工业机器人示教系统设计与实现
随着 人 机交 互技 术 的发 展 ,采用 自然交 互 的方 式进 行机 器人 示 教成 为可 能 , 自然手 势交 互 是一 种快 捷 、方 便 、低认 知 负担 的新 型人 机交 互技术 口。J。采 用 自然 手势 交互 进行 工 业机 器人 示教 具 有学 习成 本低 ,精确 度 高 , 交互 过程 简 单等优 点 。操 作 人员 不需 要对 机 器人 控制 有 深 入 的 了解 ,也 不需 要具 备机 器 人编 程 的专 业知 识 .只 需通 过简 单 的 自然手 势 就能 完成 机器 人控 制 与示 教 。本 文将 自然 手 势交 互 与工业 机 器人 示教 相 结合 ,通 过手 势 姿 态 传感 器 获取 手部 位置 和姿 态 信息 ,并将 其传 送给 工 业 机 器人 实现 机械 臂定 。
(1.华南理工大学 ,广州 510006;2.广东省智能制造研究所 ,广州 510070) 摘 要 :提 出一种基于 自然手势 交互 的工业机 器人 示教方 案 ,用 Leap Motion传感器采集手部数据 ,将
手掌 坐标动态映 射到机器 人手臂坐 标系统中 ,控制机机 器人手臂 跟随手掌位置 实时运动 ,同 时通过聚类算 法识别 出手势类型 ,控制末 端夹具实现 抓取 、放置 、停止等动 作。进一 步以爱 普生 SCARA机器人为基础 搭建 了自然 手势示教实验平 台 ,实验 结果表明示教 效果 良好 ,误差
示 教 系 统 由主 控 前端 数 据 采 集 处 理 模块 和 从控 端 机 器 人模 块 组 成 。主控 部 分 主要 由PC机和 Leap Motion 体 感传 感器 等硬 件 组成 ,主 要完 成手 势 数据 获取 、数据 优 化 处理 、手势 识 别 、数据 存储 等功 能 。从控 部分 主要 由机 器人 控 制 器 、机 械 臂 、末 端 夹 具 ,夹 具控 制器 , Arduino模 块 等硬 件 组 成 ,从 控 部分 受 到 主控 部 分 动作 触 发信 号 产生 各类 任 务动作 。主控 端和 从控 端通 过USB 接 口进 行通 讯 ,数据 处 理流 中手 掌位 置 映射 与姿态 映射 同时进 行 且相 互独 立 ,从而 使得 在 示教 中机械 臂位 置控 制 与 末端 夹具 的姿 态控 制相 互不 受 到影 响 ,在机 器人工 作 区域 内任 意位 置 都 能 实 现 机 器 手 的 抓 取 、 放 置等 操 作 。系统 采 用visual studio2015开发 ,控 制 程 序每0.45s 向被控 端机 器 人发送 位 置与 手势 类 别数据 , 同时将 数据

基于深度学习的手势识别及其在虚拟实验中的应用研究

基于深度学习的手势识别及其在虚拟实验中的应用研究

基于深度学习的手势识别及其在虚拟实验中的应用研究摘要本文旨在探讨基于深度学习的手势识别及其在虚拟实验中的应用研究。

深度学习已经成为计算机视觉领域一种非常强大的工具,尤其在图像识别和视频分析方面被广泛应用。

手势识别作为计算机视觉领域的一个重要研究方向之一,对于人机交互、智能家居、虚拟现实等领域都有着重要的应用价值。

本文首先对手势识别及深度学习的相关技术进行了简要介绍,之后详细阐述了基于深度学习的手势识别算法,包括卷积神经网络、循环神经网络、自编码器等。

我们还介绍了常用的手势数据集和评估指标,以便更好地评估不同算法的性能。

最后,我们将所设计的手势识别系统应用于虚拟实验中,进行了实验验证。

结果表明所设计的基于深度学习的手势识别算法能够很好地实现手势识别,具有高准确度和良好的稳定性。

本研究为虚拟现实技术的发展提供了新思路和方法。

关键词:手势识别;深度学习;卷积神经网络;循环神经网络;自编码器;虚拟实验。

AbstractThis paper aims to explore the research on gesture recognition based on deep learning and its applicationin virtual experiments. Deep learning has become avery powerful tool in the field of computer vision, especially in image recognition and video analysis. Gesture recognition, as an important researchdirection in computer vision, has importantapplication value for human-computer interaction,smart homes, virtual reality and other fields. This paper first briefly introduces the relevant technologies of gesture recognition and deep learning, and then elaborates on the algorithm of gesture recognition based on deep learning, including convolutional neural network, recurrent neural network, autoencoder, etc. We also introduce commonly used gesture datasets and evaluation indicators for better evaluation of the performance of different algorithms. Finally, we applied the designed gesture recognition system to virtual experiments for experimental verification. The results show that the gesture recognition algorithm based on deep learning can achieve gesture recognition well, with high accuracy and good stability. This research provides new ideas and methods for the development of virtual reality technology.Keywords: gesture recognition; deep learning; convolutional neural network; recurrent neural network; autoencoder; virtual experimentGesture recognition is an important task in the field of virtual reality. In recent years, with the development of deep learning technology, gesture recognition based on deep learning has become a research hotspot due to its high accuracy and robustness. In this study, we designed a gesture recognition system based on deep learning using convolutional neural network (CNN), recurrent neural network (RNN), and autoencoder.Firstly, we used CNN to extract features from gesture images, which effectively reduced the complexity and dimensionality of the data. Then, RNN was used to model the temporal information of the gesture sequence. Finally, autoencoder was used to reduce thedimensional space of the feature vectors, thereby improving the recognition accuracy and reducing the computational cost.To verify the effectiveness of the designed gesture recognition system, we applied it to virtual experiments. The results showed that the system achieved high accuracy and good stability in recognizing gestures. Compared with traditional methods, the deep learning-based gesture recognition system showed advantages in accuracy and robustness.In conclusion, this study provides a new approach for the development of gesture recognition technology in virtual reality. The deep learning-based gesture recognition system has great potential in various application scenarios such as human-computer interaction, gaming, and virtual reality educationAdditionally, the use of the deep learning approach also opens up opportunities for the development of more advanced gesture recognition systems. For example, the current study only focused on hand gestures, but further exploration could be done to include body gestures as well. This could lead to more immersive virtual reality experiences where users can interact with the virtual environment using their entire body.Furthermore, deep learning-based gesture recognition systems also have potential applications in healthcare. For instance, individuals with limited mobility due to disabilities or injuries could use gesture recognition technology to interact with their environment and perform everyday tasks without the need for physical assistance. This could greatly improve their qualityof life and independence.However, there are still challenges that need to be addressed in the development of deep learning-basedgesture recognition systems. One major challenge isthe need for large amounts of data for training the model to accurately recognize a wide range of gestures. This could be addressed by developing algorithms that are capable of learning from smaller datasets or by using data augmentation techniques to artificially increase the dataset size.Another challenge is the need for real-time processing of the data, especially in scenarios where immediate feedback is required. This could be addressed by developing efficient algorithms and hardware systems that are capable of processing the data in real-time.In conclusion, the development of deep learning-based gesture recognition systems has the potential to revolutionize the way we interact with technology, especially in virtual reality and healthcare applications. Further research and development in this area could lead to more advanced and sophisticated systems that provide a more immersive and natural human-computer interaction experienceOne potential application for deep learning-based gesture recognition systems is in the field of virtual reality. Currently, users of virtual reality systems must use handheld controllers to interact with thevirtual environment, but with the development of gesture recognition technology, users could usenatural hand movements to interact with the virtual world. This would provide a more immersive andintuitive user experience, as users would be able to interact with virtual objects in a way that more closely resembles real world interactions. Additionally, the use of gesture recognition technology in virtual reality could help to reduce the cost and complexity of virtual reality systems, as it would eliminate the need for expensive handheld controllers.Another potential application for deep learning-based gesture recognition systems is in healthcare. Gesture recognition technology could be used to help patients with disabilities or injuries to communicate with healthcare professionals or to control assistive devices. For example, patients with spinal cord injuries may be unable to use traditional input devices, such as keyboards or mice, but gesture recognition technology could allow them to control their computer or other devices with natural hand movements. Similarly, patients with Parkinson's disease or other conditions that affect their motor control could use gesture recognition technology to control assistive devices, such as wheelchairs,prosthetic limbs, or even robotic exoskeletons.Despite the potential applications of deep learning-based gesture recognition systems, there are still several challenges that must be overcome before these systems become mainstream. One major challenge is the need for a large amount of training data to train the deep learning algorithms. In order to accurately recognize a wide range of hand gestures, the algorithm must be trained on a diverse dataset of images or videos of people making those gestures. Collecting and labeling this data can be time-consuming and expensive, and there is also a risk of bias if the dataset is not diverse enough.Another challenge is the need for efficient algorithms and hardware systems that are capable of processingthe data in real-time. Deep learning algorithms can be computationally intensive, and processing large amounts of data in real-time requires specialized hardware, such as graphics processing units (GPUs) or field-programmable gate arrays (FPGAs). Additionally, there are challenges related to noise and variability in the data, such as changes in lighting or variations in hand shape or position. These challenges must be overcome in order to develop accurate and reliable gesture recognition systems that can be used in a widerange of applications.In conclusion, the development of deep learning-based gesture recognition systems has the potential to revolutionize the way we interact with technology, especially in virtual reality and healthcare applications. Further research and development in this area could lead to more advanced and sophisticated systems that provide a more immersive and natural human-computer interaction experience. However, there are still several challenges that must be overcome before these systems become mainstream, including the need for large amounts of training data, efficient algorithms and hardware systems, and the ability to handle noise and variability in the data. Despite these challenges, the future of gesture recognition technology looks bright, and it is likely that we will see more applications for this technology emerging in the coming yearsIn conclusion, gesture recognition technology has great potential in revolutionizing the way we interact with computers and machines. It allows for more intuitive and natural interfaces that can enhance user experience and improve productivity. Although there are still challenges to be addressed, the future of gesture recognition technology is promising and we canexpect to see more applications and advancements in this field in the near future。

《2024年基于深度学习的静态手势识别算法研究》范文

《2024年基于深度学习的静态手势识别算法研究》范文

《基于深度学习的静态手势识别算法研究》篇一一、引言随着人工智能技术的快速发展,深度学习在计算机视觉领域的应用越来越广泛。

其中,静态手势识别作为人机交互的重要手段,具有广泛的应用前景。

本文旨在研究基于深度学习的静态手势识别算法,提高识别准确率和实时性,为实际应用提供技术支持。

二、相关研究背景及现状静态手势识别是计算机视觉领域的一个研究热点,传统的识别方法主要依赖于手工特征提取和分类器设计。

然而,这些方法往往对光照、背景、手势姿势等因素敏感,导致识别准确率不高。

近年来,随着深度学习技术的发展,卷积神经网络(CNN)在静态手势识别中取得了显著的成果。

通过学习大量数据中的特征,深度学习算法可以自动提取手势特征,提高识别的准确性和鲁棒性。

三、算法原理及实现本文提出的基于深度学习的静态手势识别算法,主要采用卷积神经网络进行特征学习和分类。

具体实现步骤如下:1. 数据集准备:收集包含多种手势的图像数据集,并进行预处理,如归一化、去噪等。

2. 构建卷积神经网络:设计合适的网络结构,包括卷积层、池化层、全连接层等。

通过网络学习大量数据中的特征,自动提取手势特征。

3. 训练网络:使用标注的手势图像数据集训练网络,通过反向传播算法优化网络参数。

4. 测试与评估:使用测试数据集对训练好的网络进行测试,评估识别的准确率、召回率等性能指标。

5. 实时识别:将训练好的网络应用于实际场景中,实现静态手势的实时识别。

四、实验与分析本文在多个公开数据集上进行了实验,包括不同光照、背景、手势姿势等条件下的图像数据。

实验结果表明,基于深度学习的静态手势识别算法具有较高的准确率和鲁棒性。

具体分析如下:1. 准确率:在多个数据集上的实验结果表明,本文提出的算法在静态手势识别任务中取得了较高的准确率,优于传统的手工特征提取方法。

2. 鲁棒性:算法对光照、背景、手势姿势等因素的干扰具有较强的鲁棒性,能够在复杂环境下实现准确的静态手势识别。

3. 实时性:通过优化网络结构和算法流程,算法在实时性方面也表现出较好的性能,能够满足实际应用的需求。

计算机视觉中的人体姿态估计与动作识别

计算机视觉中的人体姿态估计与动作识别

计算机视觉中的人体姿态估计与动作识别计算机视觉技术是指通过计算机对图像和视频进行智能分析和理解的一种技术。

在计算机视觉领域中,人体姿态估计与动作识别是一个重要的研究方向。

它主要目标是通过计算机视觉算法来识别和分析人体的姿态和动作,实现对人体运动的智能分析和理解。

人体姿态估计是指通过计算机视觉技术,从图像或视频中准确地估计出人体的姿态信息,包括人体的关节点位置、姿势角度和关节运动轨迹等。

这项技术在许多应用领域中具有重要价值,例如人机交互、增强现实、虚拟现实、运动分析、医学康复等领域。

人体姿态估计的关键问题是如何准确地检测和定位人体的关节点。

目前,人体姿态估计主要基于深度学习和卷积神经网络的方法。

通过训练大量的标记数据,深度学习模型能够学习到人体关节点的特征表示,并准确地预测出人体的姿态。

动作识别是指通过计算机视觉技术,从图像或视频中识别和分析出人体的动作。

人类的运动行为具有丰富的语义信息,能够传达出人的意图和情感。

因此,动作识别在人机交互、视频监控、运动分析、智能驾驶等领域有着广泛的应用。

动作识别的关键问题是如何从视频序列中提取出有用的动作特征,并通过机器学习算法进行分类和识别。

目前,基于深度学习的方法在动作识别中取得了显著的进展。

通过训练大规模的标记数据,深度学习模型能够学习到动作的高级表示,并实现准确的动作分类和识别。

人体姿态估计和动作识别是相互关联的研究方向。

在很多应用场景中,准确的人体姿态估计是实现动作识别的基础。

通过对人体姿态的估计,可以提取出更具有语义信息的特征,并用于动作的分类和识别。

人体姿态估计和动作识别在图像和视频分析中有广泛的应用,例如人机交互、运动分析、智能驾驶等领域。

它们可以帮助机器了解人的动作和意图,实现更智能化的交互和理解。

同时,人体姿态估计和动作识别也面临一些挑战,例如多人场景下的姿态估计和动作识别、姿态估计和动作识别的实时性要求等。

总结来说,人体姿态估计与动作识别是计算机视觉中的重要研究方向。

《2024年基于深度机器学习的体态与手势感知计算关键技术研究》范文

《2024年基于深度机器学习的体态与手势感知计算关键技术研究》范文

《基于深度机器学习的体态与手势感知计算关键技术研究》篇一一、引言随着人工智能技术的飞速发展,深度机器学习在各个领域的应用越来越广泛。

其中,体态与手势感知计算技术作为人机交互的重要手段,对于提高人机交互的自然性和便捷性具有重要意义。

本文旨在研究基于深度机器学习的体态与手势感知计算关键技术,为相关领域的研究和应用提供理论支持和参考。

二、体态与手势感知计算的重要性体态与手势是人类交流中重要的非语言信息,具有直观、生动的特点。

在人机交互领域,通过捕捉和分析人的体态与手势,可以实现更加自然、便捷的人机交互方式。

基于深度机器学习的体态与手势感知计算技术,可以实现对人体动作的实时捕捉、分析和理解,从而提高人机交互的自然性和智能性。

三、深度机器学习在体态与手势感知计算中的应用深度机器学习在体态与手势感知计算中发挥着重要作用。

通过构建深度神经网络模型,可以实现对人体动作的精准识别和预测。

具体而言,深度机器学习在体态与手势感知计算中的应用包括以下几个方面:1. 数据预处理:通过图像处理和视频分析等技术,对采集到的人体动作数据进行预处理,提取出有用的特征信息。

2. 模型构建:构建深度神经网络模型,通过对大量数据进行训练和学习,使模型能够自动提取和识别人体动作的特征。

3. 动作识别:利用训练好的模型对人体的动作进行识别和分析,实现对人体动作的精准识别和预测。

4. 实时交互:通过将识别结果反馈给计算机或其他设备,实现人机之间的实时交互。

四、关键技术研究在基于深度机器学习的体态与手势感知计算中,关键技术研究主要包括以下几个方面:1. 数据集构建:构建包含丰富多样的人体动作数据集,为模型的训练和测试提供数据支持。

2. 模型优化:通过优化神经网络结构、调整参数等方式,提高模型的识别精度和泛化能力。

3. 实时性处理:通过优化算法和硬件设备等方式,提高系统的实时性处理能力,实现对人体动作的快速识别和响应。

4. 多模态融合:将体态与手势感知计算与其他技术(如语音识别、自然语言处理等)进行融合,实现更加全面、自然的人机交互方式。

基于深度学习的手部动作识别技术研究

基于深度学习的手部动作识别技术研究

基于深度学习的手部动作识别技术研究一、引言随着人工智能技术的不断发展,深度学习已经成为目前最具前景的领域之一。

在计算机视觉、语音识别等领域,深度学习技术已经取得了重大突破。

本文将探讨基于深度学习的手部动作识别技术的研究。

二、手部动作识别1.手部动作的重要性手部是人体重要的器官之一,手部动作也是人类日常生活中频繁使用的能力之一。

因此,手部动作的准确识别对于智能手环、智能手套等产品的研发来说至关重要。

2.手部动作识别的方法手部动作识别的方法包括传统的机器学习方法和基于深度学习的方法。

传统的机器学习方法的缺点是需要人工进行特征工程,手动提取特征,效率较低。

而基于深度学习的方法不需要特征工程,可以自动学习特征,具有较高的准确性和效率。

三、基于深度学习的手部动作识别技术1.深度学习架构基于深度学习的手部动作识别技术通常采用卷积神经网络(CNN)和循环神经网络(RNN)的结合。

CNN是专门用于图像处理的神经网络模型。

CNN主要采用多层卷积核和池化操作获取图像特征,而RNN则是一种主要应用于序列数据的神经网络模型。

RNN的最大特点是具有记忆功能,可以将过去时刻的信息传递到当前时刻的神经元中。

在手部动作识别中,CNN主要负责对图像进行特征提取,而RNN则负责对时序信息进行建模。

2.数据集的构建数据集的质量对于手部动作识别的准确性起着至关重要的作用。

数据集应涵盖不同人的手部动作,不同光线、角度和距离下的手部动作,以及动作的速度、幅度等不同因素。

3.实验结果在实验中,我们使用一个手部动作数据集进行测试,包括9种手部动作:握拳、打开手掌、向左移动、向右移动、向前移动、向后移动、向上移动、向下移动、不动。

我们将数据集分为训练集和测试集,其中训练集包含8000张图像,测试集包含2000张图像。

实验结果表明,我们提出的基于深度学习的手部动作识别技术在该数据集上表现良好。

准确率达到了97.12%。

四、总结本文介绍了手部动作识别技术的重要性,以及传统的机器学习方法与基于深度学习的方法的区别。

人机交互动态手势轮廓提取仿真研究

人机交互动态手势轮廓提取仿真研究

人机交互动态手势轮廓提取仿真研究PANG Lei;CHEN Qi-xiang【摘要】为提高动态手势识别的精确性,在手势特征提取、动态手势识别等方面进行实验研究.采用基于肤色模型的分割方法对手势图像进行处理,分别对手势轮廓特征和手势运动特征进行提取,提出基于HMM-NBC模型的手势识别算法,定义10种手势并建立动态手势样本库,进行手势识别研究,并与支持向量机手势识别算法相比较,研究表明:HMM-NBC模型算法的手势识别速度明显高于支持向量机算法,具有较高的识别率,平均手势识别率为88.8%.【期刊名称】《机械设计与制造》【年(卷),期】2019(000)001【总页数】4页(P253-256)【关键词】轮廓提取;人机交互;动态手势;手型分割【作者】PANG Lei;CHEN Qi-xiang【作者单位】【正文语种】中文【中图分类】TH16;TP3911 引言近年来随着人工智能技术的不断发展,基于视觉手势的人机交互逐渐成为人机交互领域研究的热点[1],手势识别涉及计算机图形学、神经网络、人工智能、图像处理等多门学科,是现代计算机技术发展的新兴产物[2]。

基于视觉系统的手势识别技术容易受到复杂背景和外界光照的影响,使得识别手势的种类较少,导致识别率较低[3],因此对手势的检测和识别研究,提高手势识别率具有重要的现实意义。

文献[4]提出基于像素分类的指尖检测算法,识别率为80.31%;文献[5]利用动态时间规整算法,并对模型识别率和系统性能进行评估,识别率为83.25%;文献[6]提出了HMMFNN模型的复杂动态手势识别,将速率、方位、位置等特征相结合,进行12方向角度量化,得到轨迹方向角特征。

提出了基于HMM-NBC模型的手势识别算法,定义10种手势并建立动态手势样本库,进行手势识别研究,该研究对动态手势识别具有多方面的理论意义和实用价值。

2 动态手势轮廓提取手势特征提取是有效表示原始的数据特征向量的过程,是分类识别的重要环节[7-8],动态手势轮廓提取可以看成手势运动和手型变化的组合,手势特征提取的好坏对分类器的识别率有较大影响,因此,分别对手势轮廓特征和手势运动特征进行提取。

基于卷积神经网络的手势识别算法

基于卷积神经网络的手势识别算法

基于卷积神经网络的手势识别算法朱雯文;叶西宁【期刊名称】《华东理工大学学报(自然科学版)》【年(卷),期】2018(044)002【摘要】Gesture recognition is a hot topic in the fields of human-computer interaction,prosthesis control,rehabilitation,and so on.To satisfy the requirement of real-time and accuracy on gesture recognition,this paper makes the analysis on LeNet-5 and proposes the LeNet-A network suitable for acceleration signal.In the proposed LeNet-A,the low-cost acceleration signal is taken as the input of network,which are six layers,including two convolution layers,two pooling layers,one fully connected layer and one output layer.Especially,by considering the specific complexity of gesture recognition based on acceleration signal,we introduce the Dropout layer and change the size of convolution kernel and the number of convolution kernel.Besides,the activation function is replaced by Relu function and the classifier is replaced by Softmax classifier.This paper chooses Ninapro database as the dataset,which is an available public resource for research on gesture recognition and has up to more than 50 gestures including intact subject and the amputated.Before making classification,these signals in this dataset are dealt with by low-pass filtering,down-sampling,data balance and training/testing set segmentation.The simulation results show that the proposed network hasgreat advantage in the gesture recognition both in the intact subject and the amputated and attain the average accuracy of 90.37% and79.99%,respectively,w hich increase by 12% and 31%,compared with the best result in the exiting literature.The test on one subject with and without rest gesture shows that the accuracy without rest gesture is93.60%,increasing by 3.1%compared with the accuracy with rest gesture.It is observed that the rest gesture has a relatively large influence on the whole gesture classification accuracy.The latter experiment also illustrates that this proposed network not only improves the gesture recognition accuracy distinctly with characteristics of good real-time,but also has quite strong robustness to noise.These further verify that the proposed method in this paper is quite significant for gesture recognition.%手势识别是人机交互、智能假肢、医疗康复等领域的研究热点.为了满足手势识别实时性和准确性的需求,本文以成本较小的加速度信号作为数据,在对LeNet-5卷积神经网络进行分析的基础上,提出了一种适合加速度信号的LeNet-A网络.该网络针对基于加速度的手势分类特有的复杂性,增加Dropout层,改变卷积核大小、卷积核数量、激活函数以及分类器.在Ninapro数据集上的实验结果表明,该网络在正常受试者和截肢者的识别率上均表现出很大的优势,平均精度分别为90.37% 和79.99%,比目前最佳分类器提升了12% 和31% 左右.该网络还具有较好的实时性和抗噪性.【总页数】10页(P260-269)【作者】朱雯文;叶西宁【作者单位】华东理工大学信息科学与工程学院,上海200237;华东理工大学信息科学与工程学院,上海200237【正文语种】中文【中图分类】TP391【相关文献】1.基于卷积神经网络的手势识别算法设计与实现 [J], 张斌;孙旭飞;吴一鹏2.基于组合色彩空间和卷积神经网络的手势识别算法 [J], 纪国华3.基于卷积神经网络与CUDA加速计算的手势识别算法应用研究 [J], 姜洋洋4.基于卷积神经网络的手势识别算法研究 [J], 程冉;史健芳5.基于深度卷积神经网络和支持向量机的手势识别算法 [J], 闫俊伢;吴迪;滕华因版权原因,仅展示原文概要,查看原文内容请购买。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Gesture Recognition Based on ManifoldLearningHeeyoul Choi,Brandon Paulson,and Tracy HammondDept.of Computer Science,Texas A&M University3112TAMU,College Station,TX77843-3112,USA{hchoi,bpaulson,hammond}@Abstract.Current feature-based gesture recognition systems use human-chosen features to perform recognition.Effective features for classificationcan also be automatically learned and chosen by the computer.In otherrecognition domains,such as face recognition,manifold learning methodshave been found to be good nonlinear feature extractors.Few manifoldlearning algorithms,however,have been applied to gesture recognition.Current manifold learning techniques focus only on spatial information,making them undesirable for use in the domain of gesture recognition wherestroke timing data can provide helpful insight into the recognition of hand-drawn symbols.In this paper,we develop a new algorithm for multi-strokegesture recognition,which integrates timing data into a manifold learningalgorithm based on a kernel Isomap.Experimental results show it to per-form better than traditional human-chosen feature-based systems.Keywords:Sketch Recognition,Manifold Learning,Kernel Isomap.1IntroductionSketched gestures are a natural form of input for many domains,such as draw-ing mathematical symbols,graphs,binary trees,finite state machines,electrical circuit diagrams,military course of action diagrams and many more.To allow the computer to understand these diagrams,gesture recognition systems have been built for a large number of domain[1,2,3,4,5,6,7,8,9,10,11].Many techniques for gesture recognition have been developed;however,current gesture recogni-tion systems still struggle with trying to get high recognition accuracy while still providing drawing freedom.As in other recognition domains(such as image and speech),feature selection is crucial for efficient and qualified performance in gesture recognition.Previous research has provided several suggestions for good feature sets[12,13,14,11].Rubine suggested13features based on stylistic drawing features and time[12].Long modified Rubine’s features by adding11 new features and removing2time-based features[13].Also,ink features were proposed in[11].These feature sets were well designed,but they are based on manual entry of methods to extract them,which becomes tiresome.As Mahoney says in[15],designing features manually is a tedious task and may be extended by designer’s intuition,which is subjective.Moreover,feature design N.da Vitora Lobo et al.(Eds.):SSPR&SPR2008,LNCS5342,pp.247–256,2008.c Springer-Verlag Berlin Heidelberg2008248H.Choi,B.Paulson,and T.Hammondis sensitive to problems such as jitters.More interestingly,machine learning research in other domains have shown that computers are able to select their own features with similar results.As stated in[15],some researchers have tried to learn some features automatically as well as manually,which leads to a mixture of hand-chosen and machine-generated features.Human-generated features are not easily extendable,whereas it is trivial to add some dimensions in feature space for a computer.In addition,hand-chosen features are extracted based on human-observable properties,whereas machine-generated features can be optimized for clear advantages in classification.In machine learning,the Hidden Markov Models(HMMs)[16],which use temporal and spatial structure,have been widely used for gesture recognition. Several sketch recognition systems have been built which use HMMs to help predict stroke ordering[17,18,19].Bayesian Networks have also been applied [20]to recognize multi-stroke shapes using LADDER shape descriptions[21]. Some dissimilarity-based approaches have been proposed in[22]with image-based methods and in[23]with graph-based methods.To our knowledge,however,manifold learning has not yet been applied to gesture recognition despite evidence that has shown manifold learning meth-ods to be good nonlinear feature extractors.Manifold learning is an effective method for representation that works by recovering meaningful low dimensional structures hidden in high-dimensional data[24,25,26,27,28].Previous research has attempted to use manifold learning to analyze3D hand-gestures[29],but manifold learning has not yet been applied to gesture(or sketch)recognition.In this work we apply the kernel Isomap manifold learning method to classify sketch data,because it has a projection property,which provides the ability to project(map)new test data into(onto)the same feature space as training data. Kernel Isomap requires a dissimilarity matrix tofind a low-dimensional mapping from which it produces a feature set.We introduce a new algorithm to measure dissimilarity distance between shapes that is based on both spatial and temporal information.We also show how this algorithm can be modified to accommodate for shapes drawn with multiple strokes.For recognizing shapes,we compare our method with the widely used Rubine method[12]and the$1recognizer[30], which is a recently proposed automatic sketch recognizer.2Kernel IsomapManifold learning involves inducing a smooth,nonlinear,low-dimensional man-ifold from a set of data points.Recently,various mapping methods(for example see[24,25,26,27])have been developed in the machine learning community,and their wide applications have started to draw attention in pattern recognition and signal processing.Isomap is one of the representative isometric mapping meth-ods,which extends metric multidimensional scaling(MDS),by using Dijkstra’s geodesic distances(shortest paths)on a weighted graph,instead of Euclidean distances[25].Gesture Recognition Based on Manifold Learning249 The geodesic distance matrix used in Isomap can be interpreted as a kernelmatrix.However,the kernel matrix based on the doubly centeredgeodesic dis-tance matrix is not always positive semi-definite.Kernel Isomap[31]exploits a constant-shifting method such that the geodesic distance-based kernel ma-trix is guaranteed to be positive semi-definite as a Mercer kernel[32]matrix. This kernel Isomap has a generalization property,enabling us to project test data points onto an associated low-dimensional manifold,using a kernel trick as in kernel PCA[33],whereas,in general,most embedding methods(including Isomap,LLE,and Laplacian eigenmap)do not have such a property.See[31]for the details.3Implementation3.1Dissimilarity from Sketch DataTo apply the kernel Isomap to sketch classification,what we need is not the data points,but a dissimilarity matrix,which is enough tofind a low dimensional space.Thefirst step in our algorithm is to scale each character to be the same width and height.Then,we need a consistent(and large)number of points in each sketch.Thus,we interpolate points between any two consecutive points that are adjacent in time,but far away in distance to ensure that each gesture has the same number of points.Dissimilarity.To calculate the dissimilarity matrix,we calculate the sum of squared distance between two points of the same order in each sketch as in Fig.1. This is the squared dissimilarity in our algorithm.The dissimilarity between the i th sketch data and the j th sketch data,D ij,is then given byD ij=Sk=1(d k)2,(1)where d k is the of two sketches,and S is the number of250H.Choi,B.Paulson,and T.HammondIn addition to normalizing each stroke by resampling,we use different weights for different points.When we connect the two parts of‘4’or‘5’,the connection causes confusion in calculating the distance.If we use different weights so that the fake points have less importance than real parts of the stroke,we can eliminate some of the confusion.Finally,the distance between two points is given byD ij=Sk=1(d k)2w ik w jk,(2)where w ik is the normalized weight of the k th point in the i th stroke,which is given byw ik=1Z iif the point is original in the stroke,εZ iotherwise,(3)where Z i is the normalization term andεis a constant between0and1.We call a kernel Isomap equipped with weighted distance weighted kernel Isomap1.The weighted kernel Isomap works well with multi-strokes shapes.For example,some characters such as the digit‘7’might be drawn as one stroke or two strokes even by one user.In this case,weighted kernel Isomap works well since the weights make the boundary between one stroke and multi-strokes smooth so that the distance between two same characters can be closer.There have been proposed some dissimilarities for sketch data[22].However, those dissimilarities are based on images,while our proposed dissimilarity is based on sketch data including time information.3.2ClassificationAfter getting features through kernel Isomap,we use the k nearest neighbor (kNN)method,which is a simple classifier,to check how good the features are2 In kNN,we use Mahalanobis distance instead of Euclidean distance between points,since the variances of Rubine features are too different from each other, thus Euclidean distance does notfind any meaningful structure from Rubine’s features.On the other hand,in the case of kernel Isomap,the variances of the features are similar to each other,so Euclidean distance also shows good results. 4ExperimentsWe carried out numerical experiments with three different data sets:(a)26 small letters in English written by one user,where all characters are drawn by 1When weighted kernel Isomap and kernel Isomap work in the same way,we name (weighted)kernel Isomap2Applying kNN to the projected space does not give any significant benefit in clas-sification compared to applying it directly to the input data space.However,the results from kNN is enough to show how much our methods have the information for classification after the projection to the feature space.Gesture Recognition Based on Manifold Learning251 one stroke and each letter has25data points;a third of the examples of each letter were drawn two weeks later,so there is a reasonable amount of variability in the example data,(b)10digits from0to9,drawn by another user,where each class has25characters and(c)8different mathematical symbols(‘+’,‘-’,‘x’,‘/’,‘=’,‘sin’,‘cos’and‘tan’),drawn by10subjects,where each class of each person has5characters,each of which is by one stroke.In order to show the improved accuracy of our method,we compared our method with Rubine’s algorithm and the$1recognizer.4.1AccuraciesFor more robust results,we executed10-fold cross validation50times.Figs. 2and3show the hit rates of classification in three data sets,respectively,by three approaches:(1)Rubine’s method,(2)$1recognizer(3)kernel Isomap and (4)weighted kernel Isomap.To make it fair,we extracted13features from the (weighted)kernel Isomap as Rubine’s3.(a)(b)Fig.2.For(a)alphabetical characters(b)digits,the classification accuracies of three methods:(Method1)Rubine method(Method2)$1recognizer(Method3)kernel Isomap and(Method4)weighted kernel IsomapFig.2is the boxplots of50time experiments in the cases of the English letters set and digits.In thesefigures,weighted kernel Isomap and kernel Isomap features are better than Rubine features in the classification,but similar to each other. Moreover,we can easily extend the number of features in(weighted)kernel Isomap by taking more eigenvalues and eigenvectors of the kernel matrix,which might produce better performance,whereas the number of Rubine features is13and it is hard to add another feature to them.Note that in(b),some digits such as‘4’and‘5’are composed of2strokes.This verifies that the weights for points in strokes work for the classification especially when characters are composed of multiple strokes. The average hit rates for(a)are77.31%,90.96%,87.19%and87.04%,respectively, and for(b)they are91.80%,97.28%,95.90%and97.80%,respectively.3$1recognizer does not extract any features.Actually it is like using all features.252H.Choi,B.Paulson,and T.HammondFig.3.For mathematical symbols(‘+’,‘-’,‘x’,‘/’,‘=’,‘sin’,‘cos’and‘tan’),the clas-sification accuracies of three methods:(Method1)Rubine method(Method2)$1 recognizer(Method3)kernel Isomap and(Method4)weighted kernel Isomap.The average hit rates are60.46%,82.41%,92.77%and92.41%,respectively.The previous two experiments used one user’s data set.As for the mathe-matical symbols,the data set was made by10different subjects.However,since each character in the data set was drawn by one stroke,there is no big difference between kernel Isomap and weighted kernel Isomap as in thefirst experiment.In Fig.3,the accuracies of our proposed methods are much better than Rubine’s.4.2Feature SpacesFigs.4and5show us how good the features are.We plotted the4best features from the English letters data set.In Fig.5which is from kernel Isomap,we can see that sketches in the same classes share similar feature values for most features,whereas in Fig.4,which uses the Rubine feature set,just a few features are useful for classification.Fig.4.Four features from Rubine’s.Each axis represents one Rubine feature.Note that the features do not effectively separate the different gestures.Gesture Recognition Based on Manifold Learning253Fig.5.Four features corresponding to the 4largest eigenvalues of kernel matrix.Each axis represents an automatically generated feature.Note that the features effectively separate the different gestures.24681012246810122468101224681012Fig.6.Correlation matrices between 13features (Red color represents high correlation and each axis represents 13features):(Left)Rubine Features;(Right)(weighted)kernel Isomap Features.Note that in the figure on the right all of the automatically chosen Isomap features are highly uncorrelated,meaning that they are much more effective to include distinguishable information.We calculated the correlation matrix of each feature set (see Fig.6).(Weighted)kernel Isomap features are uncorrelated,which means they are more efficient than the heavily correlated and redundant Rubine features (seen particularly between features 10and 11).5DiscussionAs shown in the figures above,the features from (weighted)kernel Isomap are more accurate and more efficient than Rubine’s for the classification.But,there are two drawbacks in the (weighted)kernel Isomap approach.To get the fea-tures,a kernel Isomap requires a lot of training data because it is assumed in manifold learning that data points should be dense.In addition,it takes some time to extract those features for training.However,after training,time does not matter when testing new data points.Also,although the Rubine’s approach does not need dense data points for training,it also needs training data for the classification.More importantly,as we gather more data points,the features of a (weighted)kernel Isomap become better,whereas Rubine’s features do not.254H.Choi,B.Paulson,and T.HammondThe weighted kernel Isomap works well with multi-strokes shapes.Sometimes, it is hard to say if the input character is composed of one stroke or multiple strokes.In this case,weighted kernel Isomap works well since the weights make the boundary between single stroke and multi-strokes smooth.There is some room to improve the classification hit rate by providing better weights.The reason why(weighted)kernel Isomap is better than Rubine’s for the classification is that they try tofind the best features to represent the data set in an efficient way,whereas Rubine’s features are already determined according to stylistic drawing properties.Therefore,the feature set of kernel Isomap reflects the domain or context dependent features.With the letter data set which has large amount of orientation variability,the $1recognizer is slightly better than our method because of their rotation invariant distance metric4.However,our method outperforms the$1recognizer with math data set because it contained much less rotational variability.Furthermore,some shapes,such as‘+’and‘x’,contain rotational ambiguity which makes them hard to distinguish with$1.When a domain calls for rotation invariant features,we can improve our technique by simply using the distance measure in the$1recognizer. Another disadvantage of the$1recognizer is that it does not extract any features, so it can not improve anymore(it isfixed in its method of classification),whereas our method can be integrated with other types of classifiers,such as a quadratic classifier or support vector machine,which may improve overall performance.For the purposes of this work,we used a simple kNN classifier as proof that our man-ifold learning technique can be used to extract features from sketched data.6ConclusionGesture recognition is a natural form of human-computer interaction.As in other recognition problems,features are crucial for efficient and qualified performance in gesture recognition.In general pattern recognition problems,machines can learn effective features which may perform better than the hand-chosen ones in classifica-tion.Though,in many research areas,manifold learning methods have been shown to be good nonlinear feature extractors,only a few of them have been applied to gesture recognition.In this paper,we developed a new technique for gesture clas-sification using a manifold learning approach that combines temporal and spatial information,while handling multiple strokes with weighted distances.Experimen-tal results confirmed the performance to be better than the Rubine’s feature set. AcknowledgementsOne of the authors was supported by StarVision Technologies’s student spon-sorship program and this work funded in part by NSF IIS grant0744150:Devel-oping Perception-based Geometric Primitive-shape and Constraint Recognizers to Empower Instructors to Build Sketch Systems in the Classroom.4It is computationally so expensive to calculate rotation-invariant distances.So it takes much time even to classify a new point.Gesture Recognition Based on Manifold Learning255 ReferencesViola,J.,Zeleznik,R.:Mathpad2:A system for the creation and explorationof mathematical sketches.ACM Transactions on Graphics(Proceedings of SIG-GRAPH)23(3)(2004)2.Stahovich,T.,Davis,R.,Shrobe,H.:Qualitative rigid body mechanics.ArtificialIntelligence(2000)nday,J.A.,Myers,B.A.:Sketching interfaces:Toward more human interfacedesign.IEEE Computer34(3),56–64(2001)4.Forbus,K.D.,Usher,J.,Chapman,V.:Sketching for military course of actiondiagrams.In:Proceedings of IUI2003(2003)5.Do,E.Y.L.:VR sketchpad-create instant3D worlds by sketching on a transparentwindow.In:de Vries,B.,van Leeuwen,J.P.,Achten,H.H.(eds.)CAAD Futures 2001,pp.161–172(July2001)6.Forsberg,A.S.,Dieterich,M.K.,Zeleznik,R.C.:The music notepad.In:Proceedingsof UIST1998,ACM SIGGRAPH(1998)7.Igarashi,T.,Matsuoka,S.,Tanaka,H.:Teddy:A sketching interface for3d freeformdesign.In:SIGGRAPH1999,pp.409–416(August1999)8.Mahoney,J.V.,Fromherz,M.P.J.:Interpreting sloppy stickfigures by graph rectifi-cation and constraint-based matching.In:Fourth IAPR Int.Workshop on Graphics Recognition,Kingston,Ontario,Canada(2001)9.Muzumdar,M.:ICEMENDR:Intelligent capture environment for mechanical en-gineering drawing.Master’s thesis,Massachusetts Institute of Technology(1999) 10.Hammond,T.,Davis,R.:Tahuti:A geometrical sketch recognition system for UMLclass diagrams.In:AAAI Spring Symposium on Sketch Understanding,March25-27,pp.59–68(2002)11.Patel,R.,Plimmer,B.,Grundy,J.,Ihaka,R.:Ink features for diagram recognition.In:Sketch Based Interfaces and Modeling IEEE,Eurographics(2007)12.Rubine,D.:Specifying gestures by puter Graphics25(4),329–337(1991)13.Long,A.C.,Landay,J.A.,Rowe,L.A.,Michiels,J.:Visual similarity of pen ges-tures.In:Human Factors in Computing Systems(2000)14.Sezgin,T.M.,Stahovich,T.,Davis,R.:Sketch based interfaces:Early processing forsketch understanding.In:Proceedings of2001Perceptive User Interfaces Workshop (PUI2001)(2001)15.Mahoney,J.V.,Fromherz,M.P.J.:Three main concerns in sketch recognition andan approach to addressing them.In:AAAI Spring Symposium on Sketch Under-standing,Standord,CA,pp.105–112(March2002)16.Rabiner,L.R.,Juang,B.H.:An introduction to hidden Markov models.IEEETrans.Acoustics,Speech,and Signal Processing Magazine3,4–16(1986)17.Sezgin,T.M.:Sketch Interpretation Using Multiscale Stochastic Models of Tempo-ral Patterns.PhD thesis,Massachusetts Institute of Technology(May2006)18.Sun,Z.,Jiang,W.,Sun,J.:Adaptive online multi-stroke sketch recognition basedon hidden markov model.In:Yeung,D.S.,Liu,Z.-Q.,Wang,X.-Z.,Yan,H.(eds.) ICMLC2005.LNCS,vol.3930,pp.948–957.Springer,Heidelberg(2006)19.Muller,S.,Eickeler,S.,Rigoll,G.:Image database retrieval of rotated objects byuser sketch.In:IEEE Workshop on Content-Based Access of Image and Video Libraries,p.40(1998)20.Alvarado,C.,Davis,R.:Sketchread:A multi-domain sketch recognition engine.In:Proceedings of UIST2004,pp.23–32(2004)256H.Choi,B.Paulson,and T.Hammond21.Hammond,T.,Davis,R.:Ladder,a sketching language for user interface develop-ers.Elsevier,Computers and Graphics28,518–532(2005)22.Kara,L.B.,Stahovich,T.F.:An image-based trainable symbol recognizer forsketch-based interfaces.In:Making Pen-Based Interaction Intelligent and Natu-ral,Menlo Park,California,October21-24.AAAI Fall Symposium,pp.99–105 (2004)23.Lee,W.,Kara,L.B.,Stahovich,T.F.:An efficient graph-based recognizer for hand-drawn puters&Graphics31,554–567(2007)24.Seung,H.S.,Lee,D.D.:The manifold ways of perception.Science290,2268–2269(2000)25.Tenenbaum,J.B.,de Silva,V.,Langford,J.C.:A global geometric framework fornonlinear dimensionality reduction.Science290,2319–2323(2000)26.Saul,L.,Roweis,S.T.:Think globally,fit locally:Unsupervised learning of lowdimensional manifolds.Journal of Machine Learning Research4,119–155(2003) 27.Belkin,M.,Niyogi,P.:Laplacian eigenmaps for dimensionality reduction and datarepresentation.Neural Computation15,1373–1396(2003)28.de Silva,V.,Tenenbaum,J.B.:Global versus local methods in nonlinear dimension-ality reduction.In:Advances in Neural Information Processing Systems,vol.15, pp.705–712.MIT Press,Cambridge(2003)29.Jenkins,O.C.,Matari,M.J.:A spatio-temporal extension to isomap nonlinear di-mension reduction.In:Proc.Int’l.Conf.Machine Learning,Banff,Canada(2004) 30.Wobbrock,J.,Wilson,A.,Li,Y.:Gestures without libraries,toolkits,or training:A$1recognizer for user interface prototypes.In:Proc.of the20th Annual ACM Symposium on User Interface Software and Technology,Newport,RI,USA(2007) 31.Choi,H.,Choi,S.:Robust Kernel Isomap.Pattern Recognition40(3),853–862(2007)32.Girolaini,M.:Mercer kernel-based clustering in feature space.IEEE Transactionson Neural Networks13(3),780–784(2002)33.Sch¨o lkopf,B.,Smola,A.J.,M¨u ller,K.R.:Nonlinear component analysis as a kerneleigenvalue problem.Neural Computation10(5),1299–1319(1998)。

相关文档
最新文档