3D MOTION ESTIMATION OF HEAD AND SHOULDERS IN VIDEOPHONE SEQUENCES

合集下载

三自由度车辆动力学模型英文

三自由度车辆动力学模型英文

三自由度车辆动力学模型英文Three-Degree-of-Freedom Vehicle Dynamics Model.Vehicle dynamics is a crucial aspect of automotive engineering, dealing with the motion of vehicles under the influence of various forces and moments. Among various dynamic models, the three-degree-of-freedom (3DOF) vehicle dynamics model stands out as a simplified yet effective representation for analyzing vehicle handling characteristics. This model captures the essential dynamics of a vehicle by considering the motion in the lateral, longitudinal, and yaw directions.Lateral Motion:The lateral motion of a vehicle refers to its movement perpendicular to the direction of travel. This motion is primarily influenced by factors such as tire-road interaction forces, steering inputs, and vehicle sidewinds. In the 3DOF model, the lateral motion is described by alateral displacement variable, which represents the deviation of the vehicle from its straight-ahead path.Longitudinal Motion:The longitudinal motion of a vehicle corresponds to its movement along the direction of travel. This motion is primarily influenced by factors such as engine torque, braking forces, and rolling resistance. In the 3DOF model, the longitudinal motion is described by a longitudinal velocity variable, which represents the speed of the vehicle along its path.Yaw Motion:Yaw motion refers to the rotation of a vehicle around its vertical axis, which passes through the vehicle's center of gravity. This motion is influenced by moments generated by tire forces and steering inputs. In the 3DOF model, yaw motion is described by a yaw rate variable, which represents the rate of rotation of the vehicle around its vertical axis.Model Equations:The 3DOF vehicle dynamics model is described by a set of ordinary differential equations. These equations represent the laws of motion in the lateral, longitudinal, and yaw directions. The equations are typically derived using Newton's laws of motion and principles of moment balance.The lateral motion equation takes into account tire forces, steering inputs, and sidewinds. The longitudinal motion equation considers factors like engine torque, braking forces, and rolling resistance. The yaw motion equation incorporates tire forces and steering moments to describe the vehicle's rotational dynamics.Applications:The 3DOF vehicle dynamics model finds applications in various areas of automotive engineering, including vehicle handling analysis, suspension design, and control systemdevelopment. It can be used to simulate vehicle responses to different driving scenarios, such as cornering, braking, and acceleration.By analyzing the model's responses, engineers can assess vehicle handling characteristics, identify potential issues, and optimize vehicle design. Additionally, the model can be extended to include more complex dynamic effects, such as tire roll dynamics and vehicle rollover stability, to further enhance its predictive capabilities.Conclusion:The three-degree-of-freedom vehicle dynamics model is a valuable tool for analyzing vehicle handlingcharacteristics and understanding the dynamics of a vehicle under various driving conditions. Its simplicity and effectiveness make it a popular choice for automotive engineering applications, ranging from vehicle design and optimization to control system development. By leveraging this model, engineers can gain insights into vehicledynamics, improve vehicle performance, and enhance overall safety.。

三维建模外文资料翻译--人体动画基础

三维建模外文资料翻译--人体动画基础

外文资料翻译—原文部分Fundamentals of Human Animation<From Peter Ratner.3D Human Modeling and Animation[M].America:Wiley,2003:243~249>If you are reading this part, then you have mostlikely finished building your human character,created textures for it, set up its skeleton, mademorph targets for facial expressions, and arrangedlights around the model. You have then arrived at perhapsthe most exciting part of 3-D design, which isanimating a character. Up to now the work has beensomewhat creative, sometimestedious, and often difficult.It is very gratifying when all your previous effortsstart to pay off as you enliven your character. When animating, there is a creative flow that increases graduallyover time. You are now at the phase where you becomeboth the actor and the director of a movie or play.Although animation appears to be a more spontaneousact, it is nevertheless just as challenging, if notmore so, than all the previous steps that led up to it.Your animations will look pitiful if you do not understandsome basic fundamentals and principles. Thefollowing pointers are meant to give you some direction.Feel free to experiment with them. Bend andbreak the rules whenever you think it will improve theanimation.SOME ANIMATION POINTERS1. Try isolating parts. Sometimes this is referredto as animating in stages. Rather than trying tomove every part of a body at the same time, concentrateon specific areas. Only one section ofthe body is moved for the duration of the animation.Then returning to the beginning of the timeline,another section is animated. By successivelyreturning to the beginning and animating a differentpart each time, the entire process is lessconfusing.2. Put in some lag time. Different parts of the bodyshould not start and stop at the same time.Whenan arm swings, the lower arm should follow afew frames after that. The hand swings after thelower arm. It is like a chain reaction that worksits way through the entire length of the limb.3. Nothing ever comes to a total stop. In life, onlymachines appear to come to a dead stop. Muscles,tendons, force, and gravity all affect the movementof a human. You can prove this toyourself.Try punching the air with a full extension. Noticethat your fist has a bounce at the end. If a part comes to a stop such as a motionhold, keyframe it once and then again after threeto eight or more keyframes. Your motion graphwill then have a curve between the two identicalkeyframes. This will make the part appear tobounce rather than come to a dead stop.4. Add facial expressions and finger movements.Your digital human should exhibit signs of lifeby blinking and breathing. A blink will normallyoccur every 60 seconds. A typical blink might beas follows:Frame 60: Both eyes are open.Frame 61: The right eye closes halfway.Frame 62: The right eye closes all the wayand the left eye closes halfway.Frame 63: The right eye opens halfway andthe left eye closes all the way.Frame 64: The right eye opens all the way andleft eye opens halfway.Frame 65: The left eye opens all the way.Closing the eyes at slightly different timesmakes the blink less mechanical.Changing facial expressions could be justusing eye movements to indicate thoughts runningthroughyour model's head. The hands willappear stiff if you do not add finger movements.Too many students are too lazy to take the time toadd facial and hand movements. If you make theextra effort for these details you will find thatyour animations become much more interesting.5. What is not seen by the camera is unimportant.If an arm goes through a leg but is not seenin the camera view, then do not bother to fix it. Ifyou want a hand to appear close to the body andthe camera view makes it seem to be close eventhough it is not, then why move it any closer? This also applies to sets. There is no need to buildan entire house if all the action takes place in theliving room. Consider painting backdrops ratherthan modeling every part of a scene.6. Use a minimum amount of keyframes. Toomany keyframes can make the character appearto move in spastic motions. Sharp, cartoonlikemovements are created with closely spacedkeyframes. Floaty or soft, languid motions arethe result of widely spaced keyframes. Ananimationwill often be a mixture of both. Try tolook for ways that will abbreviate the motions.You can retain the essential elements of an animationwhile reducing the amount of keyframesnecessary to create a gesture.7.Anchor a part of the body. Unless your characteris in the air, it should have some part of itselflocked to the ground. This could be a foot, ahand, or both. Whichever portion is on theground should be held in the same spot for anumber of frames. This prevents unwanted slidingmotions. When the model shifts its weight,the foot that touches down becomes locked inplace. This is especially true with walkingmotions.There are a number of ways to lock parts of amodel to the ground. One method is to useinverse kinematics. The goal object, which couldbe a null, automatically locks a foot or hand tothe bottom surface. Another method is to manuallykeyframe the part that needs to be motionlessin the same spot. The character or its limbs willhave to be moved and rotated, so that foot orhand stays in the same place. If you are using forwardkinematics, then this could mean keyframingpractically every frame until it is time tounlock that foot or hand.8.A character should exhibit weight. One of themost challenging tasks in 3-D animation is tohave a digital actor appear to have weight andmass. You can use several techniques to achievethis. Squash and stretch, or weight and recoil,one of the 12 principles of animation discussedin Chapter 12, is an excellent way to give yourcharacter weight.By adding a little bounce to your human, heor she will appear to respond to the force of gravity.For example, if your character jumps up andlands, lift the body up a little after it makes contact.For a heavy character, you can do this severaltimes and have it decrease over time. Thiswill make it seem as if the force of the contactcauses the body to vibrate a little.Secondary actions, another one of the 12principles of animation discussed in Chapter 12,are an important way to show the effects of gravityand mass. Using the previous example of ajumping character, when he or she lands, thebelly could bounce up and down, the arms couldhave some spring to them, the head could tilt forward,and so on.Moving or vibrating the object that comes incontact with the traveling entity is anothermethod for showing the force of mass and gravity.A floor could vibrate or a chair that a personsits in respond to the weight by the seat goingdown and recovering back up a little. Sometimesan animator will shake the camera to indicate theeffects of a force.It is important to take into consideration thesize and weight of a character. Heavy objectssuch as an elephant will spend more time on theground, while a light character like a rabbit willspend more time in the air. The hopping rabbithardly shows the effects of gravity and mass.9. Take the time to act out the action. So often, itis too easy to just sit at the computer and trytosolve all the problems of animating a human. Putsome life into the performance by getting up andacting out the motions. This will make the character'sactions more unique and also solve manytiming and positioning problems. The best animatorsare also excellent actors. A mirror is anindispensable tool for the animator. Videotapingyourself can also be a great help.10. Decide whether to use IK, FK, or a blend ofboth. Forward kinematics and inversekinematicshave their advantages and disadvantages. FKallows full control over the motions of differentbody parts. A bone can be rotated and moved to theexact degree and location one desires. The disadvantageto using FK is that when your person hasto interact within an environment,simple movementsbecome difficult. Anchoring a foot to theground so it does not move ischallenging becausewhenever you move the body, the feet slide. Ahand resting on a desk has the same problem.IK moves the skeleton with goal objects suchas a null. Using IK, the task of anchoring feet andhands becomes very simple. The disadvantage toIK is that a great amount of control is packedtogether into the goal objects. Certain posesbecome very difficult to achieve.If the upper body does not require any interactionwith its environment, then consider ablend of both IK and FK. IK can be set up for thelower half of the body to anchor the feet to theground, while FK on the upper body allowsgreater freedom and precision of movements.Every situation involves a different e your judgment to decide which setup fits theanimation most reliably.11.Add dialogue. It has been said that more than90% of student animations that are submitted tocompanies lack dialogue. The few that incorporatespeech in their animations make their workhighly noticeable. If the animation and dialogueare well done, then those few have a greateradvantage than their competition. Companiesunderstand that it takes extra effort and skill to create animation with dialogue.When you plan your story, think about creatinginteraction between characters not only on aphysical level but through dialogue as well.There are several techniques, discussed in thischapter, that can be used to make dialogue manageable.12. Use the graph editor to clean up your animations.The graph editor is a useful tool that all3-D animators should become familiar with. It isbasically a representation of all the objects,lights, and cameras in your scene. It keeps trackof all their activities and properties.A good use of the graph editor is to clean upmorph targets after animating facial expressions.If the default incoming curve in your graph editoris set to arcs rather than straight lines, youwill most likely find that sometimes splines inthe graph editor will curve below a value of zero.This can yield some unpredictable results. Thefacial morph targets begin to take on negativevalues that lead to undesirable facial expressions.Whenever you see a curve bend below a value ofzero, select the first keyframe point to the right ofthe arc and set its curve to linear. A more detaileddiscussion of the graph editor will be found in alater part of this chapter.ANIMATING IN STAGESAll the various components that can be moved on ahuman model often become confusing if you try tochange them at the same time. The performancequickly deteriorates into a mechanicalroutine if youtry to alter all these parts at the same keyframes.Remember, you are trying to create human qualities,not robotic ones.Isolating areas to be moved means that you canlook for the parts of the body that have motion overtime and concentrate on just a few of those. For example,the first thing you can move is the body and legs.When you are done moving them around over theentire timeline, then try rotating thespine. You mightdo this by moving individual spine bones or using aninverse kinematics chain. Now that you have the bodymoving around andbending, concentrate on the arms.If you are not using an IK chain to move the arms,hands, andfingers, then rotate the bones for the upperand lower arm. Do not forget the wrist. Fingermovementscan be animated as one of the last parts. Facialexpressions can also be animated last.Example movies showing the same character animatedin stages can be viewed on the CD-ROM asCD11-1 AnimationStagesMovies. Some sample imagesfrom the animations can also be seen in Figure 11-1.The first movie shows movement only in the body andlegs. During the second stage, the spine and headwere animated. The third time, the arms were moved.Finally, in the fourth and final stage, facial expressionsand finger movements were added.Animating in successive passes should simplifythe process. Some final stages would be used tocleanup or edit the animation.Sometimes the animation switches from one partof the bodyleading to another. For example, somewhereduring the middle of an animation the upperbody begins to lead the lower one. In a case like this,you would then switch from animating the lower bodyfirst to moving the upper part before the lower one.The order in which one animates can be a matterof personal choice. Some people may prefer to dofacial animation first or perhaps they like to move thearms before anything else. Following is a summary ofhow someone might animate a human.1. First pass: Move the body and legs.2. Second pass: Move or rotate the spinal bones, neck, and head.3. Third pass: Move or rotate the arms and hands.4. Fourth pass: Animate the fingers.5. Fifth pass: Animate the eyes blinking.6. Sixth pass: Animate eye movements.7. Seventh pass: Animate the mouth, eyebrows,nose, jaw, and cheeks <you can break these upinto separate passes>.Most movement starts at the hips. Athletes oftenbegin with a windup action in the pelvic area thatworks its way outward to the extreme parts of thebody. This whiplike activity can even beobserved injust about any mundane act. It is interesting to notethat people who study martial arts learn that most oftheir power comes from the lower torso.Students are often too lazy to make finger movementsa part of their animation. There are several methodsthat can make the process less time consuming.One way is to create morph targets of the fingerpositions and then use shape shifting to move the variousdigits. Each finger is positioned in an open andfistlike closed posture. For example, the sections ofthe index finger are closed, while the others are left inan open, relaxed position for one morph target. Thenext morph target would have only the ring fingerclosed while keeping theothers open. During the animation,sliders are then used to open and close the fingersand/or thumbs. Another method to create finger movements is toanimate them in both closed and open positions andthen save the motion files for each digit. Anytime youanimate the same character, you can load the motionsinto your new scene file. It then becomes a simpleprocess of selecting either the closed or the open positionfor each finger and thumb and keyframing themwherever you desire. DIALOGUEKnowing how to make your humans talk is a crucialpart of character animation. Once you adddialogue,you should notice a livelier performance and a greaterpersonality in your character. At first, dialogue mayseem too great a challenge to attempt. Actually, if youfollow some simple rules, you will find that addingspeech to your animations is not as daunting a taskas one would think. The following suggestions shouldhelp.DIALOGUE ESSENTIALS1. Look in the mirror. Before animating, use amirror or a reflective surface such as that on a CDto follow lip movements and facial expressions.2. The eyes, mouth, and brows change the most.The parts of the face that contain the greatestamount of muscle groups are the eyes, brows,and mouth. Therefore, these are the areas thatchange the most when creating expressions.3. The head constantly moves during dialogue.Animate random head movements, no matterhow small, during the entire animation. Involuntarymotions of the head make a point withouthaving to state it outright. For example, noddingand shaking the head communicate, respectively,positive and negative responses. Leaning thehead forward can show anger, while a downwardmovement communicates sadness. Move thehead to accentuate and emphasize certain statements.Listen to the words that are stressed andadd extra head movements to them.4. Communicate emotions. There are six recognizableuniversal emotions: sadness, anger, joy,fear, disgust, and surprise. Other, more ambiguousstates are pain, sleepiness, passion, physicalexertion, shyness, embarrassment, worry, disdain,sternness, skepticism, laughter, yelling,vanity, impatience, and awe.5. Use phonemes and visemes. Phonemes are theindividual sounds we hear in speech. Rather thantrying to spell out a word, recreate the word as aphoneme. For example, the word computer isphonetically spelled "cumpewtrr." Visemes arethe mouth shapes and tongue positionsemployedduring speech. It helps tremendously to draw achart that recreates speech as phonemes combinedwith mouth shapes <visemes> above orbelow a timeline with the frames marked and thesound and volume indicated.6. Never animate behind the dialogue. It is betterto make the mouth shapes one or two framesbefore the dialogue.7. Don't overstate. Realistic facial movements arefairly limited. The mouth does not open thatmuch when talking.8. Blinking is always a part of facial animation.Itoccurs about every two seconds. Differentemotional states affect the rate of blinking. Nervousnessincreases the rate of blinking, whileanger decreases it.9. Move the eyes. To make the character appear tobe alive, be sure to add eye motions. About 80%of the time is spent watching the eyes and mouth,while about 20% is focused on the hands andbody.10. Breathing should be a part of facial animation.Opening the mouth and moving the headback slightly will show an intake of air, whileflaring the nostrils and having the head nod forwarda little can show exhalation. Breathingmovements should be very subtle and hardlynoticeable...外文资料翻译—译文部分人体动画基础<引自Peter Ratner.3D Human Modeling and Animation[M].America:Wiley,2003:243~249> 如果你读到了这部分,说明你很可能已构建好了人物角色,为它创建了纹理,建立起了人体骨骼,为面部表情制作了morph修改器并在模型周围安排好了灯光.接下来就是三维设计中最精彩的部分,即制作角色动画.到目前为止有些工作极富创意,有些枯燥乏味,但都困难重重.在经过了前期的努力后,角色已显示出了活力,这是非常令人高兴的.在制作动画时,创意会随着时间的推移不断涌现.现在你既是电影和戏剧的演员又是导演.虽然动作是很自然的表演,但它即使不比之前的准备步骤更复杂,也极具挑战.如果你不懂一些基础知识和基本原理,制作出的动画会很可笑.以下几点为你提供一些指导.尽管拿它们做实验.只要你认为能改进动画,可随意遵守或打破这些规那么.动画指南:1.尝试分离各部分.有时指的是分阶段制作动画.不要试图同时移动身体的每个部位,应集中精力制作具体部位的动画.在动画的持续时间内只移动身体的一部分.然后返回时间轴的起始位置,制作另一部分的动画.通过不断回到起始位置,每次制作一个不同部位的动画,能使整个过程变得清晰明了.2.添加一些延迟.身体的不同部位不应该同时开始和停止动作.当胳膊摆动时,下臂应该在其随后摆动几帧.在下臂停止摆动后手再摆动.整个手臂的活动就像是一边连串的连锁反应.3.任何一个动作都不会戛然而止.生活中,只有机器会突然停止.肌肉,腱,压力和引力都会影响人体的移动.你可以亲自证明这一点.用力向前推拳直到完全舒展开手臂.注意最终你的拳头会回弹一下.如果一个部位要停止,例如要保持动作,首先把它设置为关键帧,然后在3到8个或更多关键帧后再设置一次关键帧.动作图形会在两个相同的关键帧中间产生一条曲线.这会使动作有一个回弹而不是马上停止.4.添加面部表情和手指动作.数字人体应当通过眨眼和呼吸来呈现生命的气息.通常每隔60秒会眨一下眼睛.典型的眨眼应该如下所述:第60帧:两眼都睁开.第61帧:右眼半合.第62帧:右眼紧闭,左眼半合.第63帧:右眼半睁,左眼紧闭.第64帧:右眼完全睁开,左眼半睁.第65帧:左眼完全睁开.在不同时间闭上眼睛会让眨眼显得更为自然.面部表情的改变可通过眼睛的转动来表明模型脑海中的想法.如果你不添加手指动作,手会显得过于僵硬.很多同学懒得花时间去添加面部和手部动作.如果你花额外的努力在这些细节上,你的动画会变得更有趣.5.摄像机没有拍到的内容不用关注.如果胳膊叉到了腿里但摄像机视图中看不到,就不用费心去更正.如果你希望一只手看上去靠近身体并且摄像机视角看上去也是如此,即使实际并不靠近,也没必要再做调整.这也适用于布景.如果所有的表演都发生在起居室,就没必要建造整幢房子.考虑绘制背景而不是做出场景每一部分的模型.6.尽量少使用关键帧.过多的关键帧会让角色动作看上去有痉挛的感觉.剧烈,类似于卡通的动作是使用分布密集的关键帧制作的.飘逸或柔和、没精打采的动作是通过分布稀疏的关键帧制作的.动画中通常结合使用二者.试着寻找可以简化动作的方法.你可以在保留动画基本元素的同时减少构成姿势所需的关键帧数量.7.通过锁定位置锚定身体的某个部位.除非你的角色在空中,否那么它身体的一些部位应该被锁定在地面上.可以是一只脚,一只手或二者.处于地面的部分应该在好几帧上保持在同一位置.这样可阻止不必要的滑动.当模型移动重量时,落下的脚被锁定在适当的位置.对于行走动作这点特别适用.有很多方法将模型的部位锁定在地面上.除了直接把一只脚或一只手锁定在地面外,另一种方法是把需要保持在相同位置的部位手动变成关键帧.角色或其四肢必须移动或旋转,只有这样,脚可手才能保持在相同位置.8.角色应该显示重量.三维动画中最富挑战性的一项任务是让一个数字演员显得拥有重量和质量.可以使用几种方法来实现.第12章中讨论的动画的12个原理之一的挤压与拉伸〔或者重量与反弹〕是为角色提供重量的好方法.通过为人体添加一些反弹动作,可以展示角色受到重力影响的效果.比如,如果角色跳起后落下,脚触地后身体要稍微向上抬一下.对于一个比较重的角色,可以让这个动作重复几次,一次比一次弱.这显示出接触的力量似乎让身体微微有些振动.第12章中讨论的动画的12个原理中的另外一个——辅助动作是显示重量和质量效果的一种重要方法.就用前面跳跃的角色例子,角色着地时,腹部可以上下颤动,胳膊可以微微弹起,头可以向前倾斜等.移动与正在移动的实体接触的物体或让其振动是另一种显示质量和重力的方法.地板可以振动,有人坐进去的椅子通过下陷再稍微弹回也可以显示出对重量的反应.有时动画师可以晃动摄像机来表明力量的效果.考虑角色的大小和重量很重要.较重的物体如大象大部分时间都在地面上,而较轻的角色如兔子大部分时间在空中.忙碌的兔子很难显示出重力和质量的效果.9.花时间表演动作.我们很容易只是坐在电脑前,努力解决人体动画的所有问题.站起来,实际表演一下动作,会给动画注入活力.这会让角色的动作显得更为独特,也可以解决许多时间和位置安排问题.最好的动画师也是最优秀的演员.对于动画师来说,镜子是不可或缺的工具.录制自己的表演也有很大的用处.10.决定是否使用IK,FK,或两者都用.正向运动和逆向运动各有其优缺点.FK能控制不同身体部位的运动.一个骨骼可被旋转移动到想要的精确位置和程度.使用FK 的缺点是当你的角色处在一个互动的环境内,简单的移动也会变得困难.当你把脚固定在地面上让它不动也会有难度因为当你移动身体时,脚就会滑动.放在桌上的手也会有相同问题.IK没有目标的移动骨骼.使用IK,固定脚和手就变得非常简单.其缺点是大部分的控制会被集中到目标位置.某个特定姿势会变得难以实现.如果上身不需要任何与环境的互动,那就考虑IK和FK两者都用.IK可以设置身体的下半部分把脚固定在地上,而上半部分用FK使身体移动的自由度和精确度更好.每种情况都涉及到一种不同的方法.根据自己判断决定哪种设置最可靠地适合动画.11.添加对话.曾经有个说法是学生提交给公司的动画中有90%以上都缺少对话.只有很少一部分学生在动画中添加了对话,从而极大地提高了作品的吸引力.如果动画和对话配合良好,比起他们的竞争对手,这些学生便具有了相当大的优势.公司了解,要制作拥有对话的动画,需要付出加倍的努力,拥有一流的技术.在计划故事时,考虑在角色之间形成交流,这种交流不仅是身体层面的,而且要通过对话来表现.本章讨论了几种让对话更具管理性的技巧.12.使用图形编辑器来清理动画.图形编辑器是所有三维动画师都应该掌握的有用工具.它基本上是场景中所有物体,灯光和摄像机的代表.它了解它们的所有活动和属性.一种使用图形编辑器的好方法是在制作面部动画后清理morph Shape.如果图形编辑器中的默认引入曲线被设置为弧线而不是直线,有时图形编辑器中的曲线会弯到0以下.这会造成一些不可预知的结果.如果面部开始呈现负值,将会导致变形的面部表情.无论何时看到曲线弯到0值以下,先选择弧形右边的第一个关键点,然后把它的曲线设置为直线.本章后面分步讲解的时候将详细讨论图形编辑器.分阶段制作动画如果试图同时改变人体模型上可以移动的各个部件,制作动画的过程经常会变得混乱不堪.如果试图在同一个关键帧上改变这些部件,表演会迅速沦落为机械的程序.记住,您是在试图模仿人类的动作,而不是机器人的动作.隔离要移动的区域意味着您可以分步寻找要移动的身体部位,一段时间只集中精力于一个部位.比如,可以移动的第一个部位是身体和腿.在整个时间轴上完成对它们的移动后,再试着弯曲脊柱和转动髋部.完成转身和弯腰动作后,再集中精力制作臂部动作.不要忘记手腕.最后可以添加手指动作.也可以最后制作面部表情动画.连续地制作各个部位的动画会简化该流程.可以在最后几个阶段清理或编辑动画.有时动画从身体某一部位切换会引出另一部位.比如,有时在动画中间,上身开始引出下肢.在这种情况下,您要从首先制作下肢动画转换到先移动上身,再移动下肢.制作动画的顺序取决于个人喜好.有些人可能更愿意首先制作面部表情动画,也有些人喜欢首先移动胳膊.以下总结了一些制作人体动画的方法.1.第1轮:移动身体和腿部.2.第2轮:移动或旋转脊骨,脖子和头.3.第3轮:移动或旋转胳膊和手.4.第4轮:制作手指动画.5.第5轮:制作眨眼动画.6.第6轮:制作眼睛动作.7.第7轮:制作嘴、眉毛、鼻子、颚和脸颊动画〔可以把这些再细分成几轮〕.大多数移动从臀部开始.运动员总是从撅到极限的骨盆部位开始结束动作.这种像鞭子的行为在现实生活中也可以看到.有趣的是尚武的人可以发现他们的大多数力量来自于下体.对话在人物动画中,了解如何让人开口说话是一个关健部分.加入对话后,人物就会具有更逼真的表现和更鲜明的个性.起初,对话可能是一项极大的挑战,您连尝试的勇气都没有.实际上,如果遵循一些简单的原那么,就会发现给动画添加对话没有想象中的那么困难.下面这些建议可能会对您有所帮助.对话基础1.看镜子.在制作动画前,使用镜子或CD之类的反射面来观察嘴唇动作和面部表情.2.眼睛、嘴和眉毛是变化最大的部分.脸上包含肌肉组最多的部分是眼睛、眉毛和嘴.因此,制作表情时这些是变化最大的区域.3.对话期间头部要不停地摆动.在整个动画中,添加头部随机摆动的动画,幅度多小都无所谓.下意识的头部动作显然含意丰富.例如,点头和摇头分别表示赞成和反对.头向前伸可以表示生气;低头可以表示伤心;猛然抬头可以表示吃惊.移动头部来强调特定的状态.聆听重读的词语,然后对这些词添加头部动作. 4.传达情感.可以识别的情感一般有6种:伤心、生气、开心、恐惧、厌恶和惊讶.其他比较模糊的状态有痛苦、困倦、热情、用力、害羞、尴尬、担心、鄙视、严厉、怀疑、微笑、欢呼、骄傲、不耐烦等.5.使用音素和发音嘴形.音素是我们在对话中听到的单个声音.使用音素组成单词,而不是试图拼出单词.例如:单词computer根据音素拼为"compewtrr".发音嘴形是说话时嘴的形状和舌头的位置.在时间轴上方或下方绘制图表,在图表中使用音素和嘴形组成话语,标出这些话语所在的帧,并说明声音和音量.这样的图表将非常有用.6.动画不要晚于对话.最好将嘴形设置为早于对话一到两帧.7.不要过于夸X.现实中面部表情的变化是非常有限的.说话时嘴巴不会X得很大.8.眨眼始终是面部动画的一部分.一般每两秒钟就要眨一次眼.不同的情绪状态影响眨眼的频率.紧X时眨眼频率会增加,而生气时会减少.9.转动眼睛.要使人物显得生动,一定要添加眼睛动作.人类大约有80%的时间花在注意他人的眼睛和嘴上,而只有20%的时间关注他人的手和身体.10.呼吸应该是面部动画的一部分.X开嘴的同时头稍微向后仰表示吸气,而鼻孔翕动的同时头稍微向前倾可以表示呼气,呼吸动作幅度应该小到几乎注意不到.- 21 - / 11。

深度图像中的3D手势姿态估计方法综述

深度图像中的3D手势姿态估计方法综述

小型微型计算机系统Journal of Chinese C o m p u t e r Systems 2021年6月第6期 V o l.42 N o.6 2021深度图像中的3D手势姿态估计方法综述王丽萍、汪成\邱飞岳u,章国道1U浙江工业大学计算机科学与技术学院,杭州310023)2(浙江工业大学教育科学与技术学院,杭州310023)E-mail :690589058@ qq. c o m摘要:3D手势姿态估计是计算机视觉领域一个重要的研究方向,在虚拟现实、增强现实、人机交互、手语理解等领域中具有 重要的研究意义和广泛的应用前景_深度学习技术已经广泛应用于3D手势姿态估计任务并取得了重要研究成果,其中深度图 像具有的深度信息可以很好地表示手势纹理特征,深度图像已成为手势姿态估计任务重要数据源.本文首先全面阐述了手势姿 态估计发展历程、常用数据集、数据集标记方式和评价指标;接着根据深度图像的不同展现形式,将基于深度图像的数据驱动手 势姿态估计方法分为基于简单2D深度图像、基于3D体素数据和基于3D点云数据,并对每类方法的代表性算法进行了概括与 总结;最后对手势姿态估计未来发展进行了展望.关键词:3D手势姿态估计;深度学习;深度图像;虚拟现实;人机交互中图分类号:T P391 文献标识码:A文章编号:1000-1220(2021)06-1227■(»Survey of 3D Hand Pose Estimation Methods Using Depth MapW A N G Li-ping' ,W A N G C h e n g1 ,Q I U Fei-yue1'2,Z H A N G G u o-d a o11 (College of Computer Science and Technology .Zhejiang University of Technology .Hangzhou 310023 ’China)2(College of Education Science and Technology.Zhejiang University of Technology,Hangzhou 310023,China)Abstract:3D han d pose estimation is an important research direction in the field of computer vision .which has essencial research sig­nificance and wide application prospects in the fields of virtual reality,a u g m ented reality,h u m a n-c o m p u t e r interaction and sign lan­guage understanding. D e e p learning has been widely used in 3D h and pose estimation tasks and has achieved considerable results. A-m o n g t h e m,the depth information contained in the depth image can well represent the texture characteristics of the h and poses,and the depth image has b e c o m e an important data source for han d pose estimation tasks. Firstly,development history,b e n c h m a r k data sets, marking methods and evaluation metrics of hand pose estimation were introduced. After that,according to the different presentation forms of depth maps,the data-driven hand pose estimation methods based on depth images are divided into simple 2D depth m a p based m e t h o d s,3D voxel data based methods and 3D point cloud data based m e t h ods,and w e further analyzed and su m m a r i z e d the represent­ative algorithms of them. A t the en d of this paper,we discussed the development trend of hand pose estimation in the future.K e y w o r d s:3D hand pose estimation;deep learning;depth m a p;virtual reality;human-c o m p u t e r interactioni引言手势姿态估计是指从输人的图像或者视频中精确定位手 部关节点位置,并根据关节点之间的位置关系去推断出相应 的手势姿态.近年来,随着深度学习技术的发展,卷积神经网 络(Convolution Neural N e t w o r k s,C N N)'1-推动了计算机视觉 领域的快速发展,作为计算机视觉领域的一个重要分支,手势 姿态估计技术引起了研究者广泛关注.随着深度学习技术的快速发展和图像采集硬件设备的提 升,基于传统机器学习的手势姿态估计模型逐渐被基于深度 学习的估计模型所取代,国内外众多研究机构相继开展了针 对该领域的学习研究,有效推动了手势姿态估计技术的发展. 手势姿态估计大赛“H a n d s 2017”[2]和“Ha n ds2019”[3]吸引了国内外众多研究者们参与,综合分析该项赛事参与者提出的 解决方案,虽然不同的方法在计算性能和手势姿态估计精度 上各有差异,但所有参赛者都是使用深度学习技术来解决手 势姿态估计问题,基于深度学习的手势姿态估计已经成为该 领域主流发展趋势.除此之外,潜在的市场需求也是促进手势姿态技术快速 发展的原因之一.手势姿态估计可广泛应用于虚拟现实和增 强现实中,手势作为虚拟现实技术中最重要的交互方式之一, 可以为用户带来更好的沉浸式体验;手势姿态估计还可以应 用于手势识别、机器人抓取、智能手机手势交互、智能穿戴等 场景.由此可见,手势姿态估计技术将给人类的生活方式带来 极大的改变,手势姿态估计技术已成为计算机视觉领域中重 点研究课题,对手势姿态估计的进一步研究具有非常重要的收稿日期:2020-丨1-27收修改稿日期:2021~01-14基金项目:浙江省重点研发计划基金项目(2018C01080)资助.作者简介:王丽萍,女,1964年生,博士,教授,博士生导师,C C F会员,研究方向为计算智能、决策优化,计算机视觉等;汪成,男,1996年生,硕士研究生,研究方向为 计算机视觉、人机交互、虚拟现实;邱飞岳,男,1%5年生,博士,教授,博士生导师,C C F会员,研究方向为智能教育、智能计算、虚拟现实;章国道,男.1988年生,博士研究生,C C F会员,研究方向为计算机视觉、人机交互、过程挖掘.1228小型微型计算机系统2021 年意义.手势姿态估计技术发展至今已取得大量研究成果,有关 手势姿态估计的研究文献也相继由国内外研究者提出.Erol 等人[41第一次对手势姿态估计做了详细的综述,对2007年之 前的手势姿态估计方法进行了分析比较,涉及到手势的建模、面临的问题挑战、各方法的优缺点,并且对未来的研究方向进 行了展望,但该文献所比较的33种方法都是使用传统机器学 习方法实现手势姿态估计,其中只有4种方法使用了深度图 像来作为数据源,且没有讲述数据集、评价标准、深度图像、深 度学习等现如今手势姿态估计主流研究话题;S u p a n c i c等 人[5]以相同的评价指标对13种手势姿态估计方法进行了详 细的对比,强调了数据集的重要性并创建了一个新的数据集;E m a d161对2016年前基于深度图像的手势姿态估计方法做了 综述,该文献也指出具有标记的数据集对基于深度学习的手 势姿态估计的重要性;从2016年-2020年,手势姿态估计技术 日新月异,基于深度学习的手势姿态估计方法相继被提出,Li 等人[7]对手势姿态估计图像采集设备、方法模型、数据集的 创建与标记以及评价指标进行综述,重点指出了不同的图像 采集设备之间的差异对手势姿态估计结果的影响.除了以上 4篇文献,文献[8-12]也对手势姿态估计的某一方面进行了 总结概要,如文献[8]重点讲述了手势姿态估计数据集创建 及标记方法,作者提出半自动标记方法,并创建出了新的手势 姿态估计数据集;文献[9]提出了 3项手势姿态估计挑战任 务;文献[10]对2017年之前的数据集进行了评估对比,指出 了以往数据集的不足之处,创建了数据量大、标记精度髙、手 势更为丰富的数据集“Bighand 2. 2M”;文献[11 ]对2017手 势姿态估计大赛排名前11的方法进行的综述比较,指出了 2017年前髙水准的手势姿态估计技术研究现状,并对未来手 势姿态估计的发展做出了展望.以上所提到的文献是迄今为止手势姿态估计领域较为全 面的研究综述,但这些文献存在一些共同的不足:1)没有讲 述手势姿态估计发展历程;2)对手势姿态估计方法分类不详 细;3)对手势姿态估计种类说明不够明确;4)没有涉及最新 提出的新方法,如基于点云数据和体素数据方法.针对以上存 在的问题,本文在查阅了大量手势姿态估计相关文献基础上,对手势姿态估计方法与研究现状进行了分类、梳理和总结后 得出此文,旨在提供一份更为全面、详细的手势姿态估计研究 综述.本文结构如下:本文第2节介绍相关工作,包括手势姿态估计发展历程、手势姿态估计任务、手势建模、手势姿态估计分类和方法类型;第3节介绍手势姿态估计常用数据集、数据集标记方式和 手势姿态估计方法评价指标;第4节对基于深度图像的手势 姿态估计方法进行详细分类与总结;第5节总结本文内容并 展望了手势姿态估计未来的发展趋势.2相关工作2.1手势姿态估计发展历程手势姿态估计技术的发展经历了 3个时期:基于辅助设 备的手势姿态估计、基于传统机器学习的手势姿态估计和基于深度学习的手势姿态估计,如图1所示.图1手势姿态估计发展历程图Fig.1D ev el op m e nt history of hand pose estimation1) 基于辅助设备的手势姿态估计.该阶段也称为非视觉 手势姿态估计时期,利用硬件传感器设备直接获取手部关节点位置信息.其中较为经典解决方案为Dexvaele等人[13i提出的数据手套方法,使用者穿戴上装有传感器设备的数据手套,通过手套中的传感器直接获取手部关节点的坐标位置,然后根据关节点的空间位置,做出相应的手势姿态估计;W a n g等人[M]使用颜色手套来进行手势姿态估计,使用者穿戴上特制颜色手套来捕获手部关节的运动信息,利用最近颜色相邻法找出颜色手套中每种颜色所在的位置,从而定位手部关节肢体坐标位置.基于辅助设备的手势姿态估计具有一定优点,如具有良好的鲁棒性和稳定性,且不会受到光照、背景、遮挡物等环境因素影响,但昂贵的设备价格、繁琐的操作步骤、频繁的维护校准过程、不自然的处理方式导致基于辅助设备的手势姿态估计技术在实际应用中并没有得到很好地发展[15].2) 基于传统机器学习的手势姿态估计该阶段也称为基于计算机视觉的手势姿态估计时期,利用手部图像解决手势姿态估计问题.在深度学习技术出现之前,研究者主要使用传统机器学习进行手势姿态估计相关的工作,在这一阶段传统机器学习主要关注对图像的特征提取,包括颜色、纹理、方向、轮廓等.经典的特征提取算子有主成分分析(PrincipalC o m p o n e n t A n a l y s i s,P C A)、局部二值模式(Local Binary Pat­terns ,L B P)、线性判别分析( Linear Discriminant Analysis ,L D A)、基于尺度不变的特征(Scale Invariant Feature Trans­form, S I FT) 和方向梯度直方图 (Histogram of Oriented Gradi-e n t,H O G)等.获得了稳定的手部特征后,再使用传统的机器学习算法进行分类和回归,常用的方法有决策树、随机森林和支持向量机等.3) 基于深度学习的手势姿态估计.随着深度学习技术的 发展,卷积神经网络大大颠覆了传统的计算机视觉领域,基于深度学习的手势姿态估计方法应运而生.文献[21 ]以深度图像作为输人数据源,通过卷积神经网络预测输出手部关节点的三维坐标;文献[22]利用深度图的二维和三维特性,提出了一种简单有效的3D手势姿态估计,将姿态参数分解为关节点二维热图、三维热图和三维方向矢量场,通过卷积神经网络进行多任务的端到端训练,以像素局部投票机制进行3D图2 21关节点手部模型图F ig . 2 21 joints hand model2.3手势姿态估计分类本小节我们将对目前基于深度学习的手势姿态估计种类 进行说明.从不同的角度以不同的分类策略,可将手势姿态估 计分为以下几种类型:2.3.1 2D /3D 手势姿态估计根据输出关节点所处空间的维度,可将手势姿态估计分 为2D 手势姿态估计和3D 手势姿态估计.2D 手势姿态估计指的是在2D 图像平面上显示关节点 位置,关节点的坐标空间为平面U ,y ),如图3所示;3D 手势 姿态估计指的是在3D 空间里显示关节点位置,关节点的坐 标空间为(x ,y ,z ),如图4所示.图3 2D 手势姿态估计图 图4 3D 手势姿态估计图Fig . 3 2D hand poseF ig . 4 3D hand poseestim ationestim ation在手势姿态估计的领域中,相较于2D 手势姿态估计,针 对3D 手势姿态估计的研究数量更多,造成这一现象的主要手势姿态估计;文献[23]将体素化后的3D 数据作为3D C N N 网络的输人,预测输出生成的体素模型中每个体素网格是关 节点的可能性;文献[24]首次提出使用点云数据来解决手势 姿态估计问题,该方法首先利用深度相机参数将深度图像转 化为点云数据,再将标准化的点云数据输人到点云特征提取 神经网络提取手部点云数据特征,进而回归出手部关节 点位置坐标.将深度学习技术引人到手势姿态估计任务中,无 论是在预测精度上,还是在处理速度上,基于深度学习手势姿 态估计方法都比传统手势姿态估计方法具有明显的优势,基 于深度神经网络的手势姿态估计已然成为了主流研究趋势. 2.2手势建模手势姿态估计的任务是从给定的手部图像中提取出一组 预定义的手部关节点位置,目标关节点的选择一般是通过参 考真实手部关节点而设定的.根据建模方式的不同,关节点的 个数往往也不同,常见的手部模型关节点个数为14、16、21 等.在手势姿态估计领域,手部模型关节点的个数并没有一个 统一的标准,在大多数手势姿态估计相关的论文和手势姿态 估计常用数据集中,往往采用21关节点的手部模型, 如图2所示.原因为2D 手势姿态估计的应用范围小,基于2D 手势姿态估 计的实际应用价值不大[7],而3D 手势姿态估计可以广泛应 用于虚拟现实、增强现实、人机交互、机器人等领域,吸引了众 多大型公司、研究机构和研究人员致力于3D 手势姿态估计 的研究[29%.由此可见,基于深度图像的3D 手势姿态估计已经成为 手势姿态估计领域主流研究趋势,本文也是围绕深度图像、深 度学习、3D 手势姿态估计这3个方面进行总结叙述.2.3.2R G B/Depth /R G B -D根据输入数据类型的不同,可将手势姿态估计分为:基于R GB 图像的手势姿态估计、基于深度图像的手势姿态估计、基于R G B -D (R G B图像+ D e p t h m a p )图像的手势姿态估计;其中,根据深度图像不同展现形式,将基于深度图像的手势姿 态估计进一步划分为:基于简单2D 深度图像、基于3D 体素 数据、基于3D 点云数据,如图5所示.基于不同数据形式 的手势姿 雜计方m m基于Dqptii Map 深®图 像的手势 姿态估计:@iSDq)th Map深度图多视角深度图 Multi View 体素Volume Voxel点云Point Cloud2D Data3DCNNs基于RGB-D r Dqith Map |图像的手势姿态估计RGB 图人手分割图5手势姿态估计方法分类图F ig . 5 Classification o f hand pose estim ation m ethods2.4方法类型文献[4]根据不同的建模途径和策略,将手势姿态估计 方法划分为模型驱动方法(生成式方法)[31~ ,和数据驱动方 法(判别式方法).研究者结合了模型驱动和数据驱动两种方法的特点,提出混合式方法[3541];在本小节我们将对这3种 手势姿态估计方法类型进行简要概述.2.4.1模型驱动模型驱动方法需要大量的手势模型作为手势姿态估计的 基础.该方法实现的过程为:首先,创建大量符合运动学原理 即合理的手势模型,根据输人的深度图像,选择一个最匹配当 前深度图像的手势模型,提出一个度量模板模型与输入模型 的差异的代价函数,通过最小化代价函数,找到最接近的手势 模型.2.4.2数据驱动数据驱动方法需要大量的手势图像数据作为手势姿态估 计的基础.数据驱动方法所使用的图像数据可以是R G B 图像、深度图像或者是R G B -D 图像中的任意一种或者多种类型 图像相结合.以深度图像为例,基于数据驱动的手势姿态估计 方法可以通过投喂特定标记的手势数据来训练,建立从观察 值到有标记手势离散集之间的直接映射.在这个过程中,根据 手势关节点结果值计算方式的不同,可以将基于数据驱动的Hand PointNet SHPR-Net SO-HandNet Cascade PointNet3D Data基于RGB 图像的 手棘 纖十王丽萍等:深度图像中的3D 手势姿态估计方法综述12291230小型微型计算机系统2021 年手势姿态估计方法进一步分为基于检测和基于回归的方法.2.4.3 混合驱动模型驱动和数据驱动各有优势,模型驱动是基于固定手势模型,手势姿态识别率高;数据驱动基于神经网络,不需要固定手势模型,且对不确定手势和遮挡手势的鲁棒性髙.研究者们结合了两种方法的特点,提出混合式方法解决手势姿态估计问题.常见的混合式手势姿态估计方式有两种:1)先使用模型驱动预估一个手势结果,若预估失败或者预估的结果与手势模型相差较大,则使用数据驱动进行手势姿态估计,在这种方法中,数据驱动只是作为一种备选方案当且仅在模型驱动失败的情况下使用;2)先使用数据驱动预测出一个初始的手势姿势结果,再使用模型驱动对预测的初始手势结果进行优化.3数据集和评价指标数据集对有监督深度学习任务十分重要,对手势姿态估计而言,规模大、标记精度髙、适用性强的手势姿态数据集不仅能提供准确的性能测试和方法评估,还能推进手势姿态估计研究领域的发展.目前常见3D手势姿态估计数据集有:B ig Ha nd2. 2M[I0),N Y U[42).Dexter l[43i,M S R A14[441,IC V L[451,M S R A15 w,H a n d N e t[47】,M S R C[48],等,其中 I C V L、N Y U 和M S R A15是使用最为广泛的手势姿态估计数据集,常用手势姿态估计数据集相关信息如表1所示.表1手势姿态估计数据集Table 1H a n d pose estimation datasets数据集发布时间图像数量类别数关节数标记方式视角图像尺寸I A S T A R20138703020自动3320 x240 Dexter 12013213715手动2320 x240M S R A1420142400621手动3320x240I C V L2014176041016半自动3320 x240N Y U201481009236半自动3640 x480M S R A15201576375921半自动3640 x480M S R C2015102000122合成3512 x424 HandNet2015212928106自动3320x240 BigHand2.2M 2017 2.2M1021自动3640 x 480F H A D2018105459621半自动1640 x4803.1数据集标记方法Y u a n等人指出创建大规模精准数据集的关键因素是快速、准确的标记方式.常用手势姿态数据集标记方式有四 种:手动标记、半自动标记、自动标记和合成数据标记.手动标 记方法因其耗时耗力且存在标记错误情况,导致使用人工手 动标记的手势数据集规模小,不适合用于基于大规模数据驱 动的手势姿态估计方法;半自动标记方法有两种形式,一种是 先使用人工手动标记2D关节信息,再使用算法自动推断3D 关节信息;另一种是先使用算法自动推断出3D关节信息,再 使用人工手动对标记的3D关节信息进行修正,与全手动标 记方法相比,半自动标记方法具有高效性,适用于创建数据规 模大的数据集.合成数据标记方法指的是使用图形图像应用程序,先基于先验手势模型生成仿真手势图像数据,同时自动标记3D关节信息;与手动标记和半自动标记方法相比,合成数据标记方法无需手工介人,有效提高了数据标记效率,适合于大规模数据集的创建;但不足的是,合成的仿真数据无法全面有效地反映真实手势姿态,合成手势数据集中存在手势扭曲、反关节、关节丢失等不符合运动学规律的手势情形,导致丢失真实手势特征.自动标记方法指的在采集手部图像时,使用外部传感器设备对手势关节进行标记.文献[49]的A S T A R数据集使用带有传感器数据手套对手部关节进行标记;B i g H a n d2.2M数据集采用具有6D磁传感器的图像采集标记系统进行自动标记.3.2评价指标3D手势姿态估计方法的评价指标主要包括:1) 平均误差:在测试集图像中,所有预测关节点的平均 误差距离;以21个手势关节点模型为例,会生成21个单关节点平均误差评测值,对21个单关节点平均误差求均值,得到整个测试集的平均误差.2)良好帧占比率:在一个测试图像帧中,若最差关节点 的误差值在设定的阈值范围内,则认为该测试帧为良好帧,测试集中所有的良好帧之和占测试集总帧数的比例,称为良好帧占比率.其中,第1个评价指标反映的是单个关节点预测精准度,平均误差越小,则说明关节定位精准度越高;第2个评价指标反映的是整个测试集测试结果的好坏,在一定的阈值范围内,单个关节的错误定位将造成其他关节点定位无效,该评价指标可以更加严格反映手势姿态估计方法的好坏.4基于深度图像手势姿态估计方法深度图像具有良好的空间纹理信息,其深度值仅与手部表面到相机的实际距离相关,对手部阴影、光照、遮挡等影响因素具有较高的鲁棒性.基于深度学习和深度图像的手势姿态估计方法属于数据驱动,通过训练大量的数据来学习一个能表示从输人的深度图像到手部关节点坐标位置的映射关系,并依据映射关系预测出每个关节点的概率热图或者直接回归出手部关节点的二维或者三维坐标.在本节中,将深度图像在不同数据形式下的3D手势姿态估计方法分为:1) 直接将深度图像作为简单2D图像,使用2D C N N s进 行3D手势姿态估计.2)将深度图像转换成3D体素数据,使用3D C N N s进行 3D手势姿态估计.3)将深度图像转换成3D点云数据,使用点云特征提取 网络提取手部点云数据特征,从而实现手部关节点定位.4.1基于简单2D深度图像早期C. X u等人[50]提出使用随机森林传统机器学习方法直接从手部深度图像中回归出手势关节角度,随着深度学习技术的提出,卷积神经网络在计算机视觉任务中取得了巨大成就,与传统机器学习方法相比具有较大的优势.表2详细列举了基于简单2D深度图像手势姿态估计代表性算法相关信息.其中,受文献[51]启发,T o m p s o n%首次6期王丽萍等:深度图像中的3D 手势姿态估计方法综述1231提出将卷积神经网络应用于手势姿态估计任务中,他们使用 卷积神经网络生成能代表深度图像中手部关节二维概率分布 的热图,先从每幅热图中分别定位出每个关节点的2D 平面 位置,再使用基于模型的逆运动学原理从预估的2D 平面关 节和其对应的深度值估计出关节点三维空间位置.由于手势 复杂多样和手指之间具有高相似性,导致了从热图中预估出 的2D 关节点与真实关节点位置之间可能存在偏差,且当手 部存在遮挡时,深度值并不能很好地表示关节点在三维空间 中的深度信息.针对文献[42]中所存在的问题,G e 等人[52]提 出将手部深度图像投影到多个视图上,并从多个视图的热图 中恢复出手部关节点的三维空间位置,他们使用多视图 C N N s 同时为手部深度图像前视图、侧视图和俯视图生成热 图,从而更精准地定位手关节的三维空间位置.表2基于简单2D 深度图手势姿态估计代表性算法对比 Table2 Com parison of representative algorithmsforhandpose estimation based on2D depth m a p分类算法名称提出时间算法特点平均误差(nun)m j I C V L M S R A 15首次应用C N N ,关ConvNet[42]2014节点二维热图,逆^r e n[55]于简 DeepPrior 单2D Multi-深 V i e w -C N N [52] 度 图 像[54]D e n s e R e g 22]P o s e -R E N [56]J G R -P 20[59]运动学模型.区域集成网络,检2017测关节点三维13.39 7.63 •位置.20178.10 9.50网络.关节点二维热图,2018 多视图 C N N 定位 12.50 - 9.70关节点三维位置.逐像素估计,关节2018 点二维、三维热图,10.20 7.30 7.20单位矢量场.謂迭倾测关节点三u 81 6 79 8 65维位置.漏8 讀 755积网络.O b e r w e g e r 等人使用卷积神经网络直接输出手部关节点三维空间位置,他们认为网络结构对3D 手势姿态估结果 很重要,使用了 4种不同C N N 架构同时预测所有的关节点位 置,通过实验对比得出多尺寸方法对手部关节点位置回归效果更好,同时他们在网络中加入3D 手势姿态先验信息预测 手部关节点位置,并使用了基于C N N 架构的关节点优化网络 对每一个预测的关键点进行更加精准的位置输出;除此之外, 为了进一步提升3D 手势姿态估计的准确性,他们在文献 [21]基础上提出使用迭代优化的方法多次修正手部关节点 位置,对DeepPrior[53]进行改进,提出DeepPrior + + [54]方法, 通过平移、旋转、缩放等方法增强手势姿态估计训练集数据, 以获得更多的可利用信息,并在手势特征提取网络中加人了 残差模块以进一步提升了 3D 手势姿态估计精度.G u o等人[55]提出基于区域集成的卷积神经网络架构 R E N .R E N将卷积层的特征图分成多个局部空间块,并在全连接层将局部特征整合在一起,与之前基于2D 热图、逆运动学约束和反馈回路的手势姿态估计方法相比,R E N 基于单一 网络的方法直接检测出手部关节的三维位置,极大提高了手势姿态估计的性能.然而,R E N 使用统一的网格来提取局部 特征区域,对所有特征都进行同等的处理,这并不能充分获得 特征图的空间信息和具有高度代表性的手势特性.针对该问 题,C h e n 等人[56]提出P o s e -R E N 网络进一步提高手势姿态估 计性能,他们基于R E N 网络预测的手势姿态,将预测的初始 手部姿态和卷积神经网络特征图结合,以提取更优、更具代表 性的手部姿态估计特征,然后根据手部关节拓扑结构,利用树 状的全连接对提取的特征区域进行层次集成,P o s e -R E N 网络 直接回归手势姿态的精准估计,并使用迭代级联方法得到最 终的手势姿态.W a n 等人[22]提出一种密集的逐像素估计的方法,该方法 使用了沙漏网络Hourglass Network-571生成关节点2D 热图和3D热图以及三维单位矢量场,并由此推断出三维手部关节的 位置;他们在文献[58]提出自监督方法,从深度图像中估计3D手势姿态,与以往基于数据驱动的手势姿态估计方法不同的是,他们使用41个球体近似表示手部表面,使用自动标记 的合成手势数据训练神经网络模型,用无标记的真实手势数 据对模型进行了微调,并在网络中采用多视图监督方法以减 轻手部自遮挡对手势姿态估计精度的影响.4.2基于3D 体素数据2D C N N提取的深度图像特征由于缺乏3D 空间信息,不适合直接进行3D 手势姿态估计.将深度图像的3D 体素表示作为3D C N N 的输人,从输入的3D 体素数据中提取关节点 特征,可以更好地捕获手的3D 空间结构并准确地回归手部 关节点3D 手势姿态[60].基于3D 体素数据手势姿态估计流 程如图6所示.基于检测图6基于体素数据手势姿态估计流程图 Fig. 6W o r k f l o w ofhandposeestimationbased o nvoxeldata表3详细列举了基于3D 体素数据手势姿态估计代表性 算法相关信息,其中,G e 等人在文献[61 ]中首次提出使用3DC N N s解决3D 手势姿态估计问题,他们先使用D -T S D F [62]将局部手部图像转换成3D 体素数据表现形式,设计了一个具 有3个三维卷积层、3个三维全连接层的3D 卷积神经网络架 构,用于提取手部体素数据三维特征,并基于提取的三维特征 回归出最终手部关节点三维空间位置;在文献[52]基础上,G e等人[63]提出利用完整手部表面作为从深度图像中计算手势姿态的中间监督,进一步提升了 3D 手势姿态估计精度.M o o n等人[23]指出直接使用深度图像作为2D CN N的输入进行3D 手势姿态估计存在两个严重缺点:缺点1是2D 深 度图像存在透视失真的情况,缺点2是深度图和3D 坐标之 间的高度非线性映射,这种高度非线性映射会直接影响到手 部关节点位置的精准回归.为解决这些问题,他们提出将从深 度图像中进行3D 手势姿态估计的问题,转化为体素到体素。

3D head tracking and pose-robust 2D texture map-based face recognition using a simple ellipsoid mode

3D head tracking and pose-robust 2D texture map-based face recognition using a simple ellipsoid mode
uscript received February 21, 2008. K. H. An and M. J. Chung are with the Electrical Engineering and Computer Science Department, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea (e-mail: akh@ cheonji.kaist.ac.kr, mjchung@ee.kaist.ac.kr).
F
first type of these approaches is often called multi-view face recognition [1], [6]. Multi-view face recognition is a simple extension of frontal face recognition. It treats the whole face image under a certain pose as one vector in a high-dimensional vector space. And the training is done using multi-view face images and a test image is assumed to be matched to one of the existing head poses. Generally, multi-view based approaches should have view-specific classifiers. Therefore, the training and recognition processes are even more time consuming. The second type of approaches is face recognition across pose [3], [4]. It uses a canonical frontal view for face recognition. This method needs a face alignment process to generate a novel frontal view image. Therefore, various well-known frontal face recognition methods can be easily applied to this type of approaches. In this paper, we adopt the latter approach. If we can register face images into frontal views, the recognition task would be much easier. To align a face image into a canonical frontal view, we need to know the pose information of a human head. Therefore, in this paper, we propose a novel method for modeling a human head as a simple 3D ellipsoid. Also, we present 3D head tracking and pose estimation methods using the proposed ellipsoid model. After recovering full motion of the head, we can register face images with pose variations into stabilized view images which are suitable for frontal face recognition. In other words, both training and test face images are back projected to the surface of a 3D ellipsoid according to their poses computed by 3D motion estimation and registered into stabilized view images. By doing so, simple and efficient frontal face recognition can be carried out in the stabilized texture map space instead of the original input image space. To evaluate the feasibility of the proposed approach using a simple ellipsoid model, 3D head tracking experiments are carried out on 45 image sequences with ground truth from Boston University, and several face recognition experiments are conducted on our laboratory database and the Yale Face Database B by using subspace-based face recognition methods such as PCA, PCA+LDA, and DCV [2], [5], [7]. The rest of the paper is organized as follows. In Section 2, we first introduce how to model a human head as a simple 3D ellipsoid. We apply this novel model to 3D head tracking and motion estimation in Section 3. Section 4 describes how to generate a stabilized texture map from an input face image by using the proposed 3D head model and estimated pose information. In Section 5, we show various experimental results to verify the feasibility of our proposed approach. Finally, we conclude the paper in

中英文对照阅读--为什么手指永远取代不了鼠标

中英文对照阅读--为什么手指永远取代不了鼠标

中英文对照阅读---为什么手指永远取代不了鼠标为什么手指永远取代不了鼠标体感技术不断发展之后,人们对体感控制器的期望越来越高。

但是,实测表明,这类产品目前还取代不了传统的鼠标,最近被寄予厚望的手势感应设备Leap Motion也不例外。

我中枪了。

我在桌子上朝空气又抓又砸又戳地比划了八秒钟,弗兰克•威尔提终于掏出手枪把我干掉了。

幸好这只是一场游戏,但是我却连一点取胜的机会都没有。

我的枪——在这场游戏中就是我的食指,一整天都没有击中任何东西。

这款射击游戏Fast Iron》只是最近上市的Leap Motion体感控制器的众多游戏应用里的一款。

Leap Motion是一种可以让用户通过手势控制电脑的外围设备.2012年5月,它刚刚发布的时候,的确带给人很多期待。

现在一年多过去了,这款79美元的设备也终于正式投放市场。

试用一周之后,以它目前的情况看,这种体感控制技术倒是说不出有什么问题。

安装Leap Motion体感控制器的过程出乎意料地简单。

我还以为必须得把这个体感控制器放在离屏幕一定距离的地方才行,要么就是必须得把这款设备放在一定的高度上,但事实证明这些都没必要。

而且也不必下载额外的软件,算是一款即插即玩型的设备,我通过一个USB接口就把它连接到了我的Mac上。

安装Leap Motion的软件也没什么难度,而且给人的感觉就是手势控制似乎很好掌握,似乎传感器可以很清楚地捕捉每个手指的运动。

但是软件的完善程度很不够,让我大跌眼镜——在我的Mac上运行的时候,它的全屏功能自行调节了所有其它应用窗口的大小,这一点非常讨厌。

而且图像也不太稳定,感觉就像分辨率很低一样。

想到这里,我心烦地摆了摆手,但是这款设备却没有探测到这个动作。

事后来看,这说明它灵敏度欠缺。

这个缺点可能会影响我对这款体感控制器的体验。

比如说,就在我选择能与这款设备兼容的软件的时候,我发现图像效果不佳反倒成了这款设备最小的问题。

这款设备有一个专门的Airspace Store 网络商城,主要销售各种第三方开发者针对这款设 He shot me. After a good eight seconds of flailing, grabbing, and poking at the air above my desk, Frank Welty finally unholstered his sidearm and put me out of my misery. Alas, it was only a game, but I never really stood a chance. My shooter, which in this case was my pointer finger, hadn't hit a damn thing all day.The game, Fast Iron, is just one of dozens of apps available for the newly launched Leap Motion Controller. A peripheral that lets users control their computer through hand gestures, this device showed plenty of promise when it was announced in May 2012. Now, more than a year later, the $79 product has come to market, and after a week of feeling it out, it's hard to point a finger at what exactly is wrong with gesture-based computing, at least in its current state.Setting up the Leap Motion Controller was unexpectedly easy. I had imagined having to input measurements like the controller's distance to the screen, or heeding requirements like keeping the device at a certain height, but none of that was necessary. Other than downloading a software suite, the peripheral was more-or-less plug-and-play, the unit powered and connected to my Mac via a USB cable.The Leap Motion software involved minimal hand-holding, and gave the impression that gesture-based controls would be easy to master, with the sensor seeming to pick up each finger and hand rotation cleanly. But the program's lack of finish caught my eye -- running on my Mac, the software's full-screen capability resized all my other applications windows (a huge annoyance), and its graphics looked choppy, almost like they were low-resolution. The device couldn't detect my hands shaking with worry over these concerns, but in hindsight, they were clear indicators of a lack of finish that would plague my experience with the controller.For example, as I continued pawing through the device's compatible software, poor graphics soon became the least of Leap Motion's problems. The company's Airspace Store, a proprietary app marketplace that sells third-party created software for the备开发的软件。

头部姿势估计实时随机森林算法(Random Forests for Real Time Head Pose Estimation)_算法理论_科研数据集

头部姿势估计实时随机森林算法(Random Forests for Real Time Head Pose Estimation)_算法理论_科研数据集

头部姿势估计实时随机森林算法(Random Forests for Real Time Head Pose Estimation)数据介绍:Fast and reliable algorithms for estimating the head pose are essential for many applications and higher-level face analysis tasks. We address the problem of head pose estimation from depth data, which can be captured using the ever more affordable 3D sensing technologies available today.关键词:算法,估算,实时,头部姿势,高品质,低质量,algorithms,estimation,real time,head pose,high-quality,low-quality,数据格式:TEXT数据详细介绍:Random Forests for Real Time Head Pose EstimationFast and reliable algorithms for estimating the head pose are essential for many applications and higher-level face analysis tasks. We address the problem of head pose estimation from depth data, which can be captured using the ever more affordable 3D sensing technologies available today.To achieve robustness, we formulate pose estimation as a regression problem. While detecting specific face parts like the nose is sensitive to occlusions, we learn the regression on rather generic face surface patches. We propose to use random regression forests for the task at hand, given their capability to handle large training datasets.In this page, our research work on head pose estimation is presented, source code is made available and an annotated database can be downloaded for evaluating other methods trying to tackle the same problem.Real time head pose estimation from high-quality depth dataIn our CVPR paper Real Time Head Pose Estimation with Random Regression Forests, we trained a random regression forest on a very large, synthetically generated face database. In our experiments, we show that our approach can handle real data presenting large pose changes, partial occlusions, and facial expressions, even though it is trained only on synthetic neutral face data. We have thoroughly evaluated our system on a publicly available database on which we achieve state-of-the-art performance without having to resort to the graphics card. The video shows the algorithm running in real time, on a frame by frame basis (no temporal smoothing), using as input high resolution depth images acquired with the range scanner of Weise et al.CODEThe discriminative random regression forest code used for the DAGM'11 paper is made available for research purposes. Together with the basic head pose estimation code, a demo is provided to run the estimation directly on the stream of depth images coming from a Kinect, using OpenNI. A sample forest is provided which was trained on the Biwi Kinect Head Pose Database.Because the software is an adaptation of the Hough forest code, the same licence applies:By installing, copying, or otherwise using this Software, you agree to be bound by the terms of the Microsoft Research Shared Source License Agreement (non-commercial use only). If you do not agree, do not install copy or use the Software. The Software is protected by copyright and other intellectual property laws and is licensed, not sold.THE SOFTWARE COMES "AS IS", WITH NO WARRANTIES. THIS MEANS NO EXPRESS, IMPLIED OR STATUTORY WARRANTY, INCLUDING WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, ANY WARRANTY AGAINST INTERFERENCE WITH YOUR ENJOYMENT OF THE SOFTWARE OR ANY WARRANTY OF TITLE OR NON-INFRINGEMENT. THERE IS NO WARRANTY THAT THIS SOFTWARE WILL FULFILL ANY OF YOUR PARTICULAR PURPOSES OR NEEDS. ALSO, YOU MUST PASS THIS DISCLAIMER ON WHENEVER YOU DISTRIBUTE THE SOFTWARE OR DERIVATIVE WORKS.NEITHER MICROSOFT NOR ANY CONTRIBUTOR TO THE SOFTWARE WILL BE LIABLE FOR ANY DAMAGES RELATED TO THE SOFTWARE OR THIS MSR-SSLA, INCLUDING DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL OR INCIDENTAL DAMAGES, TO THE MAXIMUM EXTENT THE LAW PERMITS, NO MATTER WHAT LEGAL THEORY IT IS BASED ON. ALSO, YOU MUST PASS THIS LIMITATION OF LIABILITY ONWHENEVER YOU DISTRIBUTE THE SOFTWARE OR DERIVATIVE WORKS.If you do use the code, please acknowledge our papers:Real Time Head Pose Estimation with Random Regression Forests@InProceedings{fanelli_CVPR11,author = {G. Fanelli and J. Gall and L. Van Gool},title = {Real Time Head Pose Estimation with Random Regression Forests}, booktitle = {Computer Vision and Pattern Recognition (CVPR)},year = {2011},month = {June},pages = {617-624}}Real Time Head Pose Estimation from Consumer Depth Cameras@InProceedings{fanelli_DAGM11,author = {G. Fanelli and T. Weise and J. Gall and L. Van Gool},title = {Real Time Head Pose Estimation from Consumer Depth Cameras}, booktitle = {33rd Annual Symposium of the German Association for Pattern Recognition (DAGM'11)},year = {2011},month = {September}}If you have questions concerning the source code, please contact Gabriele Fanelli.Biwi Kinect Head Pose DatabaseThe database was collected as part of our DAGM'11 paper Real Time Head Pose Estimation from Consumer Depth Cameras.Because cheap consumer devices (e.g., Kinect) acquire row-resolution, noisy depth data, we could not train our algorithm on clean, synthetic images as was done in our previous CVPR work. Instead, we recorded several people sitting in front of a Kinect (at about one meter distance). The subjects were asked to freely turn their head around, trying to span all possible yaw/pitch angles they could perform.To be able to evaluate our real-time head pose estimation system, the sequences were annotated using the automatic system of ,i.e., each frame is annotated with the center of the head in 3D and the head rotation angles.The dataset contains over 15K images of 20 people (6 females and 14 males - 4 people were recorded twice). For each frame, a depth image, the corresponding rgb image (both 640x480 pixels), and the annotation is provided. The head pose range covers about +-75 degrees yaw and +-60 degrees pitch. Ground truth is provided in the form of the 3D location of the head and its rotation angles.Even though our algorithms work on depth images alone, we provide the RGB images as well.The database is made available for research purposes only. You are required to cite our work whenever publishing anything directly or indirectly using the data:@InProceedings{fanelli_DAGM11,author = {G. Fanelli and T. Weise and J. Gall and L. Van Gool},title = {Real Time Head Pose Estimation from Consumer Depth Cameras}, booktitle = {33rd Annual Symposium of the German Association for Pattern Recognition (DAGM'11)},year = {2011},month = {September}}Files:Data (5.6 GB, .tgz compressed) Readme fileSample code for reading depth images and ground truthIf you have questions concerning the data, please contact Gabriele Fanelli. 数据预览:点此下载完整数据集。

基于双相机捕获面部表情及人体姿态生成三维虚拟人动画

基于双相机捕获面部表情及人体姿态生成三维虚拟人动画

2021⁃03⁃10计算机应用,Journal of Computer Applications 2021,41(3):839-844ISSN 1001⁃9081CODEN JYIIDU http ://基于双相机捕获面部表情及人体姿态生成三维虚拟人动画刘洁,李毅*,朱江平(四川大学计算机学院,成都610065)(∗通信作者电子邮箱liyi_ws@ )摘要:为了生成表情丰富、动作流畅的三维虚拟人动画,提出了一种基于双相机同步捕获面部表情及人体姿态生成三维虚拟人动画的方法。

首先,采用传输控制协议(TCP )网络时间戳方法实现双相机时间同步,采用张正友标定法实现双相机空间同步。

然后,利用双相机分别采集面部表情和人体姿态。

采集面部表情时,提取图像的2D 特征点,利用这些2D 特征点回归计算得到面部行为编码系统(FACS )面部行为单元,为实现表情动画做准备;以标准头部3D 坐标值为基准,根据相机内参,采用高效n 点投影(EP n P )算法实现头部姿态估计;之后将面部表情信息和头部姿态估计信息进行匹配。

采集人体姿态时,利用遮挡鲁棒姿势图(ORPM )方法计算人体姿态,输出每个骨骼点位置、旋转角度等数据。

最后,在虚幻引擎4(UE4)中使用建立的虚拟人体三维模型来展示数据驱动动画的效果。

实验结果表明,该方法能够同步捕获面部表情及人体姿态,而且在实验测试中的帧率达到20fps ,能实时生成自然真实的三维动画。

关键词:双相机;人体姿态;面部表情;虚拟人动画;同步捕获中图分类号:TP391.4文献标志码:A3D virtual human animation generation based on dual -camera capture of facialexpression and human poseLIU Jie ,LI Yi *,ZHU Jiangping(College of Computer Science ,Sichuan University ,Chengdu Sichuan 610065,China )Abstract:In order to generate a three -dimensional virtual human animation with rich expression and smooth movement ,a method for generating three -dimensional virtual human animation based on synchronous capture of facial expression andhuman pose with two cameras was proposed.Firstly ,the Transmission Control Protocol (TCP )network timestamp method was used to realize the time synchronization of the two cameras ,and the ZHANG Zhengyou ’s calibration method was used to realize the spatial synchronization of the two cameras.Then ,the two cameras were used to collect facial expressions and human poses respectively.When collecting facial expressions ,the 2D feature points of the image were extracted and theregression of these 2D points was used to calculate the Facial Action Coding System (FACS )facial action unit in order toprepare for the realization of expression animation.Based on the standard head 3D coordinate ,according to the camera internal parameters ,the Efficient Perspective -n -Point (EP n P )algorithm was used to realize the head pose estimation.After that ,the facial expression information was matched with the head pose estimation information.When collecting human poses ,the Occlusion -Robust Pose -Map (ORPM )method was used to calculate the human poses and output data such as the position and rotation angle of each bone point.Finally ,the established 3D virtual human model was used to show the effect of data -driven animation in the Unreal Engine 4(UE4).Experimental results show that this method can simultaneously capture facial expressions and human poses and has the frame rate reached 20fps in the experimental test ,so it can generate naturaland realistic three -dimensional animation in real time.Key words:dual -camera;human pose;facial expression;virtual human animation;synchronous capture0引言随着虚拟现实技术走进大众生活,人们对虚拟替身的获取手段及逼真程度都提出较高要求,希望能够通过低成本设备,在日常生活环境下获取替身,并应用于虚拟环境[1]。

Unity3D技术之根运动 (Root Motion) – 工作原理

Unity3D技术之根运动 (Root Motion) – 工作原理

Unity3D技术之根运动 (Root Motion) –工作原理根运动(Root Motion) –工作原理身体变换身体变换(Body Transform) 是角色的质心。

它用于Mecanim 重定位引擎,提供最稳定的位移模型。

身体方向(Body Orientation) 是下半身方向与上半身方向的平均,与Avatar T 字姿势(T-Pose) 相关。

身体变换(Body Transform) 和方向(Orientation) 存储在动画片段(Animation Clip) 中(使用Avatar 中设置的肌肉线条)。

它们是存储在动画片段(Animation Clip) 中的唯一世界空间曲线。

其他一切事项:肌肉线条和IK 目标(四肢(Hands and Feet))相对身体变换存储。

根变换根变换(Root Transform) 是身体变换(Body Transform) 在Y 平面上的投影,在运行时计算。

在每一帧,根变换(Root Transform) 中的变化都会计算出来。

然后变换的变化会被应用到游戏对象(Game Object) 上,使之移动。

角色下面的圆代表根变换动画片段检视器动画片段编辑器(Animation Clip Editor) 设置(根变换旋转(Root Transform Rotation)、根变换位置(Root Transform Position) (Y) 和根变换位置(XZ))让您从身体变换(Body Transform) 控制根变换(Root Transform) 投影。

根据这些设置,身体变换(Body Transform) 的某些部分可能会是转移的根变换(Root Transform)。

例如,您可以决定是否要让运动Y 位置成为根运动(Root Motion)(轨迹)或者姿势(身体转换)的一部分,这就是所谓的合并到姿势(Baked into Pose)。

根变换旋转合并到姿势(Bake into Pose):方向与身体变换(或者姿势(Pose))保持一致。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

3D MOTION ESTIMATION OF HEAD AND SHOULDERSIN VIDEOPHONE SEQUENCESMarkus KampmannInstitut für Theoretische Nachrichtentechnik und InformationsverarbeitungUniversität Hannover, Appelstraße 9A, 30167 Hannover, F.R.Germanyemail: kampmann@tnt.uni–hannover.de,WWW: http://www.tnt.uni–hannover.de/~kampmannABSTRACTIn this paper, an approach for 3D motion estima-tion of head and shoulders of persons in video-phone sequences is presented. Since head and shoulders are linked together by the neck, constraints for the motion of head and shoulders exist which are exploited to improve the motion estimation. In this paper, the human neck joint is modelled by a spherical joint between head and shoulders and its 3D position is calculated. Then, rotation and translation of the shoulders are esti-mated and propagated to the head. Finally, a rota-tion of the head around the neck joint is estimated. The presented approach is applied to the video-phone sequences Akiyo and Claire. Compared with an approach without using a neck joint, ana-tomically correct positions of head and shoulders are estimated.1. INTRODUCTIONFor coding of moving images at low bit rates, an object–based analysis–synthesis coder (OBASC) has been introduced [1]. In an OBASC, each real object is described by a model object. A model object is defined by three sets of parameters de-fining its motion, shape and color. In [2], the shape of a model object is represented by a 3D wire-frame. The motion is defined by 6 parameters describing the translation and rotation of the mo-del object in 3D space. The color parameters de-note luminance and chrominance reflectance on the model object surface. Objects may be articu-lated, i.e. may consist of two or more flexibly connected 3D object components. Each object component has its own set of 3D motion, 3D shape and color parameters. All these parameters have to be estimated automatically by image analysis. In the case of typical videophone sequences, the human body in the sequence can be considered as an articulated object and head and shoulders as object components. All three sets of model pa-rameters have to be estimated for these object components. This contribution deals with esti-mating the 3D motion parameters of head and shoulders.In [5], a hierarchical approach for 3D motion estimation of object components is proposed. No constraints for the spatial location of the object components are considered. However, for articu-lated objects like a human body constraints for the spatial location of the object components exist. These constraints can be exploited to improve the motion estimation. By using joints between object components, constraints for the relative motion between the object components are introduced. In [6][7], 2D joint positions in sequent images are estimated and a 3D joint position is calculated. In [8], the 3D joint position is directly estimated using motion parameters from preceding images. Using these joint positions, motion estimation of articulated objects is carried out [6][7][9]. In these algorithms, no a priori knowledge about the object components head and shoulders and the position of the connecting neck joint on a human body is exploited.In this contribution, an algorithm for 3D motion estimation of head and shoulders in videophone sequences is presented which uses a priori know-ledge about the image content of a videophone sequence. Here, the human neck joint is modelled by a spherical joint between head and shoulders. The 3D position of the neck joint is calculated using an automatically generated 3D wireframe of the person in the sequence [4] and knowledge about the position of the neck joint on a human body. This 3D wireframe of the person is splitted into the object components head and shoulders and the motion of head and shoulders is estimated. First, six motion parameters namely three rotationJPȀȀh+[R h](PȀh*JȀ))JȀ(3.4) with the rotation matrix [R h]. [R h] is calculated from the rotation angles R h+(R h,x,R h,y,R h,z)T. For motion estima-tion, the three rotation parameters R h have to be estimated. Here, the same algorithm [2] as for the estimation of the motion parameters of the shoul-ders is used.4. EXPERIMENTAL RESULTSThe described algorithm has been tested using the videophone sequences Akiyo and Claire (CIF, 10Hz). The face model from Fig. 1 is adapted automatically to the individual face, the 3D wire-frame of the person is generated and splitted into the object components head and shoulder. After-wards, the 3D motion of head and shoulders is estimated between two succeeding images of the videophone sequences. Here, the proposed 3D motion estimation algorithm is compared with the hierarchical 3D motion estimation algorithm in [5] which does not use a neck joint for motion estimation. Fig. 3 shows original images of the sequences Akiyo and Claire. Fig. 4 and Fig. 6 show results of the proposed motion estimation algorithm, Fig. 5 results of the motion estimation algorithm from [5]. Since no constraints for the spatial location of head and shoulders are used in [5], anatomically impossible positions of head and shoulders are estimated with the algorithm in [5] (Fig. 5). Using the proposed motion estimation algorithm, anatomically correct positions of head and shoulders are estimated (Fig. 6).5. CONCLUSIONSIn this paper, a new approach for 3D motion es-timation of head and shoulders in videophone se-quences is presented. First, the 3D neck joint posi-tion of the person in the sequence is calculated using an automatically generated 3D wireframe of the person and knowledge about the position of the neck joint on a human body. Then, translation and rotation of the shoulders are estimated and propagated to the head. Finally, a rotation of the head around the neck joint is estimated. Compared with an algorithm without using a neck joint, ana-tomically correct positions of head and shoulders are estimated.6. REFERENCES[1]H.G. Musmann, M. Hötter, J. Ostermann,”Object–oriented analysis–synthesis cod-ing of moving images”, Signal Proces-sing: Image Communications, V ol. 3, No.2, pp. 117–138, November 1989.[2]J. Ostermann, ”Object–based analysis–synthesis Coding based on the sourcemodel of moving rigid 3D objects”, Sig-nal Processing: Image Communications,V ol. 6, pp. 143–161, May 1994.[3]M. Kampmann, R. Farhoud, ”Precise facemodel adaptation for semantic coding ofvideophone sequences”, Picture CodingSymposium (PCS ’97), Berlin, Germany,pp. 675–680, September 1997.[4]M. Kampmann, ”Automatic Generationof 3D wireframes of human persons forvideo coding”, 7. Dortmunder Fernsehse-minar, Dortmund, Germany, pp.105–108, September 1997.[5]R. Koch, ”Dynamic 3–D scene analysisthrough synthesis feedback control”,IEEE T–P AMI, V ol. 15, No. 6, pp.556–568, June 1993.[6]R. Holt, A. Netravali, T. Huang and R.Qian, ”Determining Articulated Motionfrom Perspective Views: A Decomposi-tion Approach”, IEEE Workshop on Mo-tion of Non–Rigid and Articulated Ob-jects, Austin, Texas, pp. 126–137, Nov.1994.[7]J. Web. and J. Aggarwal, ”Structure frommotion of rigid and jointed objects”, Se-venth International Joint Conference onArtifitial Intelligence, Vancouver, S.686–691, August 1981.[8]G. Martínez, ”Analyse–Synthese–Codie-rung basierend auf dem Modell bewegterdreidimensionaler, gegliederter Objekte”,Dissertation, Universität Hannover,1998.[9]G. Martínez, ”3D Motion Estimation ofArticulated 3D Objects for Object–BasedAnalysis–Synthesis Coding (OBASC)”,International Workshop on Coding Tech-niques for Very Low Bit–rate Video(VLBV’95), Tokyo, Japan, G.1, Novem-ber 1995.Fig. 3:Original images (CIF,10Hz): (a) Akiyo frame 57, (b) Akiyo frame 91, (c) Claire frame 32,(d) Claire frame 48.(b)(a)(c)(d)Fig. 4:3D wireframe over original image for the proposed algorithm: (a) Akiyo frame 57, (b) Akiyo frame 91,(c) Claire frame 32, (d) Claire frame 48.(b)(a)(c)(d)Fig. 5:3D wireframe in a side view for the algorithm in [5]: (a) Akiyo frame 57, (b) Akiyo frame 91,(c) Claire frame 32, (d) Claire frame 48.(b)(a)(c)(d)Fig. 6:3D wireframe in a side view for the proposed algorithm: (a) Akiyo frame 57, (b) Akiyo frame 91,(c) Claire frame 32, (d) Claire frame 48.(b)(a)(c)(d)。

相关文档
最新文档