Learning to track 3D human motion from silhouettes
Multicamera People Tracking with a Probabilistic Occupancy Map
Multicamera People Tracking witha Probabilistic Occupancy MapFranc¸ois Fleuret,Je´roˆme Berclaz,Richard Lengagne,and Pascal Fua,Senior Member,IEEE Abstract—Given two to four synchronized video streams taken at eye level and from different angles,we show that we can effectively combine a generative model with dynamic programming to accurately follow up to six individuals across thousands of frames in spite of significant occlusions and lighting changes.In addition,we also derive metrically accurate trajectories for each of them.Our contribution is twofold.First,we demonstrate that our generative model can effectively handle occlusions in each time frame independently,even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori.Second,we show that multiperson tracking can be reliably achieved by processing individual trajectories separately over long sequences,provided that a reasonable heuristic is used to rank these individuals and that we avoid confusing them with one another.Index Terms—Multipeople tracking,multicamera,visual surveillance,probabilistic occupancy map,dynamic programming,Hidden Markov Model.Ç1I NTRODUCTIONI N this paper,we address the problem of keeping track of people who occlude each other using a small number of synchronized videos such as those depicted in Fig.1,which were taken at head level and from very different angles. This is important because this kind of setup is very common for applications such as video surveillance in public places.To this end,we have developed a mathematical framework that allows us to combine a robust approach to estimating the probabilities of occupancy of the ground plane at individual time steps with dynamic programming to track people over time.This results in a fully automated system that can track up to six people in a room for several minutes by using only four cameras,without producing any false positives or false negatives in spite of severe occlusions and lighting variations. As shown in Fig.2,our system also provides location estimates that are accurate to within a few tens of centimeters, and there is no measurable performance decrease if as many as20percent of the images are lost and only a small one if 30percent are.This involves two algorithmic steps:1.We estimate the probabilities of occupancy of theground plane,given the binary images obtained fromthe input images via background subtraction[7].Atthis stage,the algorithm only takes into accountimages acquired at the same time.Its basic ingredientis a generative model that represents humans assimple rectangles that it uses to create synthetic idealimages that we would observe if people were at givenlocations.Under this model of the images,given thetrue occupancy,we approximate the probabilities ofoccupancy at every location as the marginals of aproduct law minimizing the Kullback-Leibler diver-gence from the“true”conditional posterior distribu-tion.This allows us to evaluate the probabilities ofoccupancy at every location as the fixed point of alarge system of equations.2.We then combine these probabilities with a color and amotion model and use the Viterbi algorithm toaccurately follow individuals across thousands offrames[3].To avoid the combinatorial explosion thatwould result from explicitly dealing with the jointposterior distribution of the locations of individuals ineach frame over a fine discretization,we use a greedyapproach:we process trajectories individually oversequences that are long enough so that using areasonable heuristic to choose the order in which theyare processed is sufficient to avoid confusing peoplewith each other.In contrast to most state-of-the-art algorithms that recursively update estimates from frame to frame and may therefore fail catastrophically if difficult conditions persist over several consecutive frames,our algorithm can handle such situations since it computes the global optima of scores summed over many frames.This is what gives it the robustness that Fig.2demonstrates.In short,we combine a mathematically well-founded generative model that works in each frame individually with a simple approach to global optimization.This yields excellent performance by using basic color and motion models that could be further improved.Our contribution is therefore twofold.First,we demonstrate that a generative model can effectively handle occlusions at each time frame independently,even when the input data is of very poor quality,and is therefore easy to obtain.Second,we show that multiperson tracking can be reliably achieved by processing individual trajectories separately over long sequences.. F.Fleuret,J.Berclaz,and P.Fua are with the Ecole Polytechnique Fe´de´ralede Lausanne,Station14,CH-1015Lausanne,Switzerland.E-mail:{francois.fleuret,jerome.berclaz,pascal.fua}@epfl.ch..R.Lengagne is with GE Security-VisioWave,Route de la Pierre22,1024Ecublens,Switzerland.E-mail:richard.lengagne@.Manuscript received14July2006;revised19Jan.2007;accepted28Mar.2007;published online15May2007.Recommended for acceptance by S.Sclaroff.For information on obtaining reprints of this article,please send e-mail to:tpami@,and reference IEEECS Log Number TPAMI-0521-0706.Digital Object Identifier no.10.1109/TPAMI.2007.1174.0162-8828/08/$25.00ß2008IEEE Published by the IEEE Computer SocietyIn the remainder of the paper,we first briefly review related works.We then formulate our problem as estimat-ing the most probable state of a hidden Markov process and propose a model of the visible signal based on an estimate of an occupancy map in every time frame.Finally,we present our results on several long sequences.2R ELATED W ORKState-of-the-art methods can be divided into monocular and multiview approaches that we briefly review in this section.2.1Monocular ApproachesMonocular approaches rely on the input of a single camera to perform tracking.These methods provide a simple and easy-to-deploy setup but must compensate for the lack of 3D information in a single camera view.2.1.1Blob-Based MethodsMany algorithms rely on binary blobs extracted from single video[10],[5],[11].They combine shape analysis and tracking to locate people and maintain appearance models in order to track them,even in the presence of occlusions.The Bayesian Multiple-BLob tracker(BraMBLe)system[12],for example,is a multiblob tracker that generates a blob-likelihood based on a known background model and appearance models of the tracked people.It then uses a particle filter to implement the tracking for an unknown number of people.Approaches that track in a single view prior to computing correspondences across views extend this approach to multi camera setups.However,we view them as falling into the same category because they do not simultaneously exploit the information from multiple views.In[15],the limits of the field of view of each camera are computed in every other camera from motion information.When a person becomes visible in one camera,the system automatically searches for him in other views where he should be visible.In[4],a background/foreground segmentation is performed on calibrated images,followed by human shape extraction from foreground objects and feature point selection extraction. Feature points are tracked in a single view,and the system switches to another view when the current camera no longer has a good view of the person.2.1.2Color-Based MethodsTracking performance can be significantly increased by taking color into account.As shown in[6],the mean-shift pursuit technique based on a dissimilarity measure of color distributions can accurately track deformable objects in real time and in a monocular context.In[16],the images are segmented pixelwise into different classes,thus modeling people by continuously updated Gaussian mixtures.A standard tracking process is then performed using a Bayesian framework,which helps keep track of people,even when there are occlusions.In such a case,models of persons in front keep being updated, whereas the system stops updating occluded ones,which may cause trouble if their appearances have changed noticeably when they re-emerge.More recently,multiple humans have been simulta-neously detected and tracked in crowded scenes[20]by using Monte-Carlo-based methods to estimate their number and positions.In[23],multiple people are also detected and tracked in front of complex backgrounds by using mixture particle filters guided by people models learned by boosting.In[9],multicue3D object tracking is addressed by combining particle-filter-based Bayesian tracking and detection using learned spatiotemporal shapes.This ap-proach leads to impressive results but requires shape, texture,and image depth information as input.Finally, Smith et al.[25]propose a particle-filtering scheme that relies on Markov chain Monte Carlo(MCMC)optimization to handle entrances and departures.It also introduces a finer modeling of interactions between individuals as a product of pairwise potentials.2.2Multiview ApproachesDespite the effectiveness of such methods,the use of multiple cameras soon becomes necessary when one wishes to accurately detect and track multiple people and compute their precise3D locations in a complex environment. Occlusion handling is facilitated by using two sets of stereo color cameras[14].However,in most approaches that only take a set of2D views as input,occlusion is mainly handled by imposing temporal consistency in terms of a motion model,be it Kalman filtering or more general Markov models.As a result,these approaches may not always be able to recover if the process starts diverging.2.2.1Blob-Based MethodsIn[19],Kalman filtering is applied on3D points obtained by fusing in a least squares sense the image-to-world projections of points belonging to binary blobs.Similarly,in[1],a Kalman filter is used to simultaneously track in2D and3D,and objectFig.1.Images from two indoor and two outdoor multicamera video sequences that we use for our experiments.At each time step,we draw a box around people that we detect and assign to them an ID number that follows them throughout thesequence.Fig.2.Cumulative distributions of the position estimate error on a3,800-frame sequence(see Section6.4.1for details).locations are estimated through trajectory prediction during occlusion.In[8],a best hypothesis and a multiple-hypotheses approaches are compared to find people tracks from 3D locations obtained from foreground binary blobs ex-tracted from multiple calibrated views.In[21],a recursive Bayesian estimation approach is used to deal with occlusions while tracking multiple people in multiview.The algorithm tracks objects located in the intersections of2D visual angles,which are extracted from silhouettes obtained from different fixed views.When occlusion ambiguities occur,multiple occlusion hypotheses are generated,given predicted object states and previous hypotheses,and tested using a branch-and-merge strategy. The proposed framework is implemented using a customized particle filter to represent the distribution of object states.Recently,Morariu and Camps[17]proposed a method based on dimensionality reduction to learn a correspondence between the appearance of pedestrians across several views. This approach is able to cope with the severe occlusion in one view by exploiting the appearance of the same pedestrian on another view and the consistence across views.2.2.2Color-Based MethodsMittal and Davis[18]propose a system that segments,detects, and tracks multiple people in a scene by using a wide-baseline setup of up to16synchronized cameras.Intensity informa-tion is directly used to perform single-view pixel classifica-tion and match similarly labeled regions across views to derive3D people locations.Occlusion analysis is performed in two ways:First,during pixel classification,the computa-tion of prior probabilities takes occlusion into account. Second,evidence is gathered across cameras to compute a presence likelihood map on the ground plane that accounts for the visibility of each ground plane point in each view. Ground plane locations are then tracked over time by using a Kalman filter.In[13],individuals are tracked both in image planes and top view.The2D and3D positions of each individual are computed so as to maximize a joint probability defined as the product of a color-based appearance model and2D and 3D motion models derived from a Kalman filter.2.2.3Occupancy Map MethodsRecent techniques explicitly use a discretized occupancy map into which the objects detected in the camera images are back-projected.In[2],the authors rely on a standard detection of stereo disparities,which increase counters associated to square areas on the ground.A mixture of Gaussians is fitted to the resulting score map to estimate the likely location of individuals.This estimate is combined with a Kallman filter to model the motion.In[26],the occupancy map is computed with a standard visual hull procedure.One originality of the approach is to keep for each resulting connex component an upper and lower bound on the number of objects that it can contain. Based on motion consistency,the bounds on the various components are estimated at a certain time frame based on the bounds of the components at the previous time frame that spatially intersect with it.Although our own method shares many features with these techniques,it differs in two important respects that we will highlight:First,we combine the usual color and motion models with a sophisticated approach based on a generative model to estimating the probabilities of occu-pancy,which explicitly handles complex occlusion interac-tions between detected individuals,as will be discussed in Section5.Second,we rely on dynamic programming to ensure greater stability in challenging situations by simul-taneously handling multiple frames.3P ROBLEM F ORMULATIONOur goal is to track an a priori unknown number of people from a few synchronized video streams taken at head level. In this section,we formulate this problem as one of finding the most probable state of a hidden Markov process,given the set of images acquired at each time step,which we will refer to as a temporal frame.We then briefly outline the computation of the relevant probabilities by using the notations summarized in Tables1and2,which we also use in the following two sections to discuss in more details the actual computation of those probabilities.3.1Computing the Optimal TrajectoriesWe process the video sequences by batches of T¼100frames, each of which includes C images,and we compute the most likely trajectory for each individual.To achieve consistency over successive batches,we only keep the result on the first 10frames and slide our temporal window.This is illustrated in Fig.3.We discretize the visible part of the ground plane into a finite number G of regularly spaced2D locations and we introduce a virtual hidden location H that will be used to model entrances and departures from and into the visible area.For a given batch,let L t¼ðL1t;...;L NÃtÞbe the hidden stochastic processes standing for the locations of individuals, whether visible or not.The number NÃstands for the maximum allowable number of individuals in our world.It is large enough so that conditioning on the number of visible ones does not change the probability of a new individual entering the scene.The L n t variables therefore take values in f1;...;G;Hg.Given I t¼ðI1t;...;I C tÞ,the images acquired at time t for 1t T,our task is to find the values of L1;...;L T that maximizePðL1;...;L T j I1;...;I TÞ:ð1ÞAs will be discussed in Section 4.1,we compute this maximum a posteriori in a greedy way,processing one individual at a time,including the hidden ones who can move into the visible scene or not.For each one,the algorithm performs the computation,under the constraint that no individual can be at a visible location occupied by an individual already processed.In theory,this approach could lead to undesirable local minima,for example,by connecting the trajectories of two separate people.However,this does not happen often because our batches are sufficiently long.To further reduce the chances of this,we process individual trajectories in an order that depends on a reliability score so that the most reliable ones are computed first,thereby reducing the potential for confusion when processing the remaining ones. This order also ensures that if an individual remains in the hidden location,then all the other people present in the hidden location will also stay there and,therefore,do not need to be processed.FLEURET ET AL.:MULTICAMERA PEOPLE TRACKING WITH A PROBABILISTIC OCCUPANCY MAP269Our experimental results show that our method does not suffer from the usual weaknesses of greedy algorithms such as a tendency to get caught in bad local minima.We thereforebelieve that it compares very favorably to stochastic optimization techniques in general and more specifically particle filtering,which usually requires careful tuning of metaparameters.3.2Stochastic ModelingWe will show in Section 4.2that since we process individual trajectories,the whole approach only requires us to define avalid motion model P ðL n t þ1j L nt ¼k Þand a sound appearance model P ðI t j L n t ¼k Þ.The motion model P ðL n t þ1j L nt ¼k Þ,which will be intro-duced in Section 4.3,is a distribution into a disc of limited radiusandcenter k ,whichcorresponds toalooseboundonthe maximum speed of a walking human.Entrance into the scene and departure from it are naturally modeled,thanks to the270IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL.30,NO.2,FEBRUARY 2008TABLE 2Notations (RandomQuantities)Fig.3.Video sequences are processed by batch of 100frames.Only the first 10percent of the optimization result is kept and the rest is discarded.The temporal window is then slid forward and the optimiza-tion is repeated on the new window.TABLE 1Notations (DeterministicQuantities)hiddenlocation H,forwhichweextendthemotionmodel.The probabilities to enter and to leave are similar to the transition probabilities between different ground plane locations.In Section4.4,we will show that the appearance model PðI t j L n t¼kÞcan be decomposed into two terms.The first, described in Section4.5,is a very generic color-histogram-based model for each individual.The second,described in Section5,approximates the marginal conditional probabil-ities of occupancy of the ground plane,given the results of a background subtractionalgorithm,in allviewsacquired atthe same time.This approximation is obtained by minimizing the Kullback-Leibler divergence between a product law and the true posterior.We show that this is equivalent to computing the marginal probabilities of occupancy so that under the product law,the images obtained by putting rectangles of human sizes at occupied locations are likely to be similar to the images actually produced by the background subtraction.This represents a departure from more classical ap-proaches to estimating probabilities of occupancy that rely on computing a visual hull[26].Such approaches tend to be pessimistic and do not exploit trade-offs between the presence of people at different locations.For instance,if due to noise in one camera,a person is not seen in a particular view,then he would be discarded,even if he were seen in all others.By contrast,in our probabilistic framework,sufficient evidence might be present to detect him.Similarly,the presence of someone at a specific location creates an occlusion that hides the presence behind,which is not accounted for by the hull techniques but is by our approach.Since these marginal probabilities are computed indepen-dently at each time step,they say nothing about identity or correspondence with past frames.The appearance similarity is entirely conveyed by the color histograms,which has experimentally proved sufficient for our purposes.4C OMPUTATION OF THE T RAJECTORIESIn Section4.1,we break the global optimization of several people’s trajectories into the estimation of optimal individual trajectories.In Section 4.2,we show how this can be performed using the classical Viterbi’s algorithm based on dynamic programming.This requires a motion model given in Section 4.3and an appearance model described in Section4.4,which combines a color model given in Section4.5 and a sophisticated estimation of the ground plane occu-pancy detailed in Section5.We partition the visible area into a regular grid of G locations,as shown in Figs.5c and6,and from the camera calibration,we define for each camera c a family of rectangular shapes A c1;...;A c G,which correspond to crude human silhouettes of height175cm and width50cm located at every position on the grid.4.1Multiple TrajectoriesRecall that we denote by L n¼ðL n1;...;L n TÞthe trajectory of individual n.Given a batch of T temporal frames I¼ðI1;...;I TÞ,we want to maximize the posterior conditional probability:PðL1¼l1;...;L Nül NÃj IÞ¼PðL1¼l1j IÞY NÃn¼2P L n¼l n j I;L1¼l1;...;L nÀ1¼l nÀ1ÀÁ:ð2ÞSimultaneous optimization of all the L i s would beintractable.Instead,we optimize one trajectory after theother,which amounts to looking for^l1¼arg maxlPðL1¼l j IÞ;ð3Þ^l2¼arg maxlPðL2¼l j I;L1¼^l1Þ;ð4Þ...^l Nüarg maxlPðL Nül j I;L1¼^l1;L2¼^l2;...Þ:ð5ÞNote that under our model,conditioning one trajectory,given other ones,simply means that it will go through noalready occupied location.In other words,PðL n¼l j I;L1¼^l1;...;L nÀ1¼^l nÀ1Þ¼PðL n¼l j I;8k<n;8t;L n t¼^l k tÞ;ð6Þwhich is PðL n¼l j IÞwith a reduced set of the admissiblegrid locations.Such a procedure is recursively correct:If all trajectoriesestimated up to step n are correct,then the conditioning onlyimproves the estimate of the optimal remaining trajectories.This would suffice if the image data were informative enoughso that locations could be unambiguously associated toindividuals.In practice,this is obviously rarely the case.Therefore,this greedy approach to optimization has un-desired side effects.For example,due to partly missinglocalization information for a given trajectory,the algorithmmight mistakenly start following another person’s trajectory.This is especially likely to happen if the tracked individualsare located close to each other.To avoid this kind of failure,we process the images bybatches of T¼100and first extend the trajectories that havebeen found with high confidence,as defined below,in theprevious batches.We then process the lower confidenceones.As a result,a trajectory that was problematic in thepast and is likely to be problematic in the current batch willbe optimized last and,thus,prevented from“stealing”somebody else’s location.Furthermore,this approachincreases the spatial constraints on such a trajectory whenwe finally get around to estimating it.We use as a confidence score the concordance of theestimated trajectories in the previous batches and thelocalization cue provided by the estimation of the probabil-istic occupancy map(POM)described in Section5.Moreprecisely,the score is the number of time frames where theestimated trajectory passes through a local maximum of theestimated probability of occupancy.When the POM does notdetect a person on a few frames,the score will naturallydecrease,indicating a deterioration of the localizationinformation.Since there is a high degree of overlappingbetween successive batches,the challenging segment of atrajectory,which is due to the failure of the backgroundsubtraction or change in illumination,for instance,is met inseveral batches before it actually happens during the10keptframes.Thus,the heuristic would have ranked the corre-sponding individual in the last ones to be processed whensuch problem occurs.FLEURET ET AL.:MULTICAMERA PEOPLE TRACKING WITH A PROBABILISTIC OCCUPANCY MAP2714.2Single TrajectoryLet us now consider only the trajectory L n ¼ðL n 1;...;L nT Þof individual n over T temporal frames.We are looking for thevalues ðl n 1;...;l nT Þin the subset of free locations of f 1;...;G;Hg .The initial location l n 1is either a known visible location if the individual is visible in the first frame of the batch or H if he is not.We therefore seek to maximizeP ðL n 1¼l n 1;...;L n T ¼l nt j I 1;...;I T Þ¼P ðI 1;L n 1¼l n 1;...;I T ;L n T ¼l nT ÞP ðI 1;...;I T Þ:ð7ÞSince the denominator is constant with respect to l n ,we simply maximize the numerator,that is,the probability of both the trajectories and the images.Let us introduce the maximum of the probability of both the observations and the trajectory ending up at location k at time t :Èt ðk Þ¼max l n 1;...;l nt À1P ðI 1;L n 1¼l n 1;...;I t ;L nt ¼k Þ:ð8ÞWe model jointly the processes L n t and I t with a hidden Markov model,that isP ðL n t þ1j L n t ;L n t À1;...Þ¼P ðL n t þ1j L nt Þð9ÞandP ðI t ;I t À1;...j L n t ;L nt À1;...Þ¼YtP ðI t j L n t Þ:ð10ÞUnder such a model,we have the classical recursive expressionÈt ðk Þ¼P ðI t j L n t ¼k Þ|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}Appearance modelmax P ðL n t ¼k j L nt À1¼ Þ|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}Motion modelÈt À1ð Þð11Þto perform a global search with dynamic programming,which yields the classic Viterbi algorithm.This is straight-forward,since the L n t s are in a finite set of cardinality G þ1.4.3Motion ModelWe chose a very simple and unconstrained motion model:P ðL n t ¼k j L nt À1¼ Þ¼1=Z Áe À k k À k if k k À k c 0otherwise ;&ð12Þwhere the constant tunes the average human walkingspeed,and c limits the maximum allowable speed.This probability is isotropic,decreases with the distance from location k ,and is zero for k k À k greater than a constantmaximum distance.We use a very loose maximum distance cof one square of the grid per frame,which corresponds to a speed of almost 12mph.We also define explicitly the probabilities of transitions to the parts of the scene that are connected to the hidden location H .This is a single door in the indoor sequences and all the contours of the visible area in the outdoor sequences in Fig.1.Thus,entrance and departure of individuals are taken care of naturally by the estimation of the maximum a posteriori trajectories.If there are enough evidence from the images that somebody enters or leaves the room,then this procedure will estimate that the optimal trajectory does so,and a person will be added to or removed from the visible area.4.4Appearance ModelFrom the input images I t ,we use background subtraction to produce binary masks B t such as those in Fig.4.We denote as T t the colors of the pixels inside the blobs and treat the rest of the images as background,which is ignored.Let X tk be a Boolean random variable standing for the presence of an individual at location k of the grid at time t .In Appendix B,we show thatP ðI t j L n t ¼k Þzfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflffl{Appearance model/P ðL n t ¼k j X kt ¼1;T t Þ|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}Color modelP ðX kt ¼1j B t Þ|fflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflffl}Ground plane occupancy:ð13ÞThe ground plane occupancy term will be discussed in Section 5,and the color model term is computed as follows.4.5Color ModelWe assume that if someone is present at a certain location k ,then his presence influences the color of the pixels located at the intersection of the moving blobs and the rectangle A c k corresponding to the location k .We model that dependency as if the pixels were independent and identically distributed and followed a density in the red,green,and blue (RGB)space associated to the individual.This is far simpler than the color models used in either [18]or [13],which split the body area in several subparts with dedicated color distributions,but has proved sufficient in practice.If an individual n was present in the frames preceding the current batch,then we have an estimation for any camera c of his color distribution c n ,since we have previously collected the pixels in all frames at the locations272IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL.30,NO.2,FEBRUARY2008Fig.4.The color model relies on a stochastic modeling of the color of the pixels T c t ðk Þsampled in the intersection of the binary image B c t produced bythe background subtraction and the rectangle A ck corresponding to the location k .。
无人驾驶的英语课件PPT
Other potential applications include long haul trucking, public transportation, and even self driving taxis or shared mobility services
3D Reconstruction
The creation of a 3D model of the environment from sensor data to provide more accurate representation of the scene
Path planning technology
Application scenarios for autonomous driving
Autonomous driving has the potential to revolutionize transportation, particularly in urban areas where traffic congestion and pollution are major issues
Techniques used to regulate the vehicle's velocity, acceleration, and steel angle to achieve desired performance and safety standards
Risk Assessment
The evaluation of potential hazards and their associated risks to inform decision making processes
longitudinal 名词
longitudinal 名词longitudinal的英语名词是"longitudinal",意思是指与纵向或经度有关的事物或特征。
1. The study examined the longitudinal effects of smoking on lung health.该研究考察了吸烟对肺健康的纵向影响。
2. Longitudinals are the vertical beams that support the roof of the building.纵梁是支撑建筑物屋顶的垂直梁。
3. The longitudinal lines on a globe represent the meridians of longitude.地球仪上的纵线代表经线。
4. A longitudinal study was conducted to track the development of cognitive abilities in children over a period of 10 years.进行了一项长期研究,追踪儿童10年来的认知能力发展。
5. The researcher used longitudinal data to analyze the trends in employment rates over the past decade.研究人员使用纵向数据分析了过去十年就业率的趋势。
6. The longitudinal study revealed that regular exercise has long-term health benefits.纵向研究揭示了经常锻炼对长期健康的益处。
7. The longitudinal waves in the ocean are responsible for the motion of surfers.海洋中的纵波控制着冲浪者的运动。
小学上册第十二次英语第六单元测验试卷
小学上册英语第六单元测验试卷英语试题一、综合题(本题有100小题,每小题1分,共100分.每小题不选、错误,均不给分)1.I like to watch _____ fly near the flowers.2. A gardener must know how to care for different ______. (园丁必须知道如何照顾不同的植物。
)3.We can see many ________ (野生动物) in the national park.4.My _______ (狗) follows me everywhere.5.My dad drives a _____ (car/bike).6.The Earth's tilt causes the ______.7.I like to go ______ (散步) in the evening.8. A fault is a crack in the Earth’s crust where movement has occurred, often causing ______.9.The chemical symbol for iodine is __________.10.How do you say "bird" in Spanish?A. PájaroB. OiseauC. VogelD. Uccello11.My brother loves to read ____.12.What do we call a group of butterflies?A. FlutterB. SwarmC. KaleidoscopeD. FlightC Kaleidoscope13. A homogeneous mixture has a _____ composition throughout.14.What is the color of a ripe strawberry?A. BlueB. GreenC. RedD. Yellow15.The cake is ________ (松软).16.What do we use to write on paper?A. PaintB. PencilC. BrushD. Marker17.The stars are _____ (twinkling/shining) in the sky.18.I brush my teeth _____ morning. (every)19.The elephant is the largest _______ (大象是最大的_______).20. A ______ (果实) grows from the flower after fertilization.21.My favorite color is ___. (blue, book, tree)22.What is the capital of Kazakhstan?A. AlmatyB. AstanaC. BishkekD. TashkentB23.I have a toy _____ that bounces.24.Which ocean is the largest?A. Atlantic OceanB. Indian OceanC. Arctic OceanD. Pacific OceanD25.She is wearing a ______ dress. (red)26.What do we call a baby dog?A. KittenB. PuppyC. CubD. Calf27. A _______ can help to visualize the relationship between force and motion.28.Which of these is a type of fruit?A. PotatoB. BeanC. PeachD. LettuceC29.What is the primary language spoken in the USA?A. SpanishB. FrenchC. EnglishD. ChineseC30.The freezing point of water is _____ degrees Celsius.31.What is the opposite of 'happy'?A. SadB. JoyfulC. ExcitedD. Angry32.The puppy sleeps in a _______ (小狗在_______里睡觉).33.What is the color of a school bus?A. BlueB. GreenC. YellowD. Red34.They are ___ a game. (playing)35.Plants need ______ (二氧化碳) to perform photosynthesis.36. A __________ is a reaction that produces energy in the form of light.37.What is the main ingredient in French fries?A. PotatoB. CornC. RiceD. WheatA38.desert) is a dry area with little rainfall. The ____39.What do you call the study of the stars and planets?A. BiologyB. ChemistryC. AstronomyD. GeographyC40.The ancient Chinese invented _______ during the Han dynasty. (火药)41.What is the sum of 5 + 6?A. 10B. 11C. 12D. 13B42.The gecko climbs walls with its _________. (脚)43.The ____ has a slender body and is often found in the grass.44.I want to create a ________ in my backyard.45.What do you call a person who takes care of animals?A. VeterinarianB. ZookeeperC. FarmerD. GardenerA46.In a ______ change, the substance's identity does not change.47. A snail leaves a ______ (黏糊糊的) trail behind.48.What is the name of the famous wizard in J.K. Rowling's series?A. Frodo BagginsB. Harry PotterC. GandalfD. Albus DumbledoreB49.What do you call the place where you go to learn?A. StoreB. SchoolC. FactoryD. Office50.Plants can be studied for ______ (科学研究).51.Which animal is known as man's best friend?A. CatB. DogC. HorseD. FishB52.Which season comes after winter?A. FallB. SummerC. SpringD. Rainy53.What is the primary color of grass?A. RedB. YellowC. BlueD. GreenD54.The cake is ___ (frosted).55.The sun is ________ (温暖) in spring.56.The _______ (小蜻蜓) catches insects in the air.57.The Earth's surface is influenced by both human and ______ activities.58. A reaction that occurs in an aqueous solution is called a ______ reaction.59.The room is _______ (明亮的).60.The rabbit loves to hop around the _________. (院子)61.What animal is known for having stripes?A. LionB. TigerC. BearD. ElephantB62.The garden is ________ (生机勃勃).63.My teacher is _______ and makes learning fun.64.The ______ (水果) of a plant is often sweet and nutritious.65._____ (sustainable) practices help the environment.66. A comet has a bright ______ that can be seen from Earth.67. A __________ is a small island. (小岛)68.My favorite season is ______ (秋天).69.The _______ grows best in moist environments.70.My favorite flower is a ______.71.What do we call the imaginary line that divides the Earth into the Eastern and Western hemispheres?A. EquatorB. Prime MeridianC. LatitudeD. LongitudeB72.I like to sing ______ songs with my friends.73.Ancient Chinese invented ________ to keep track of time.74.I love to learn about plants and how they ________ (生长) in different environments.75.Learning about plants can inspire ______ (环保) efforts.76.What is the primary function of roots in plants?A. Absorb sunlightB. Absorb water and nutrientsC. Produce flowersD. Support the plantB77.The sun is shining ________.78.The chemical formula for silver bromide is _____.79. A chemical reaction that occurs when two substances combine is called a ______ reaction.80.What do you call a baby cat?A. PuppyB. KittenC. CubD. FawnB81. A lion's pride consists of related females and a few ________________ (雄性).82.What is the term for the study of insects?A. EntomologyB. OrnithologyC. ZoologyD. BotanyA83.What do bees make?A. MilkB. HoneyC. BreadD. CheeseB84.Objects in motion tend to stay in ______.85.I brush my teeth _____ (every/never) day.86.The ______ (木本植物) includes trees and shrubs.87.Planting in layers can create a ______ (多层次) garden.88.My grandmother loves to knit __________ (围巾).89.What do we call a group of stars?A. GalaxyB. ClusterC. ConstellationD. NebulaC90.The __________ is a region known for its festivals.91.What do we call the study of the Earth?A. BiologyB. GeographyC. ChemistryD. PhysicsB92.I see _____ birds flying. (many)93.The _______ of light can create various effects in photography.94.She likes to _______ (play) the piano.95.Planting trees can help combat _____ (全球变暖).96.I like _____ (sharing) my gardening tips with others.97.My dad loves to watch _____ (比赛) on TV.98.What is the main language spoken in Spain?A. FrenchB. SpanishC. ItalianD. PortugueseB Spanish99.The ______ (海龟) swims gracefully in the ocean.100.I found a ______ in my pocket. (button)。
2021年北京市丰台区高三一模英语试题[附答案]
丰台区2021年高三年级第二学期综合练习(一)英语2021.03本试卷满分共100分考试时间90分钟注意事项:1.答题前,考生务必先将答题卡上的学校、年级、班级、姓名、准考证号用黑色字迹签字笔填写清楚,并认真核对条形码上的准考证号、姓名,在答题卡的“条形码粘贴区”贴好条形码。
2.本次考试所有答题均在答题卡上完成。
选择题必须使用2B铅笔以正确填涂方式将各小题对应选项涂黑,如需改动,用橡皮擦除干净后再选涂其它选项。
非选择题必须使用标准黑色字迹签字笔书写,要求字体工整、字迹清楚3.请严格按照答题卡上题号在相应答题区内作答,超出答题区域书写的答案无效,在试卷、草稿纸上答题无效。
4.请保持答题卡卡面清洁,不要装订、不要折叠、不要破损。
第一部分:知识运用(共两节,30分)第一节完形填空(共10小题;每小题1.5分,共15分)阅读下面短文,掌握其大意,从每题所给的A、B、C、D四个选项中,选出最佳选项,并在答题卡上将该项涂黑。
This was the fifth time I’d been to the National Annual Competition. Reporters had been saying that I looked unbeatable. Everyone expected me to __1__. But I knew something was __2__ because I couldn’t get this one picture out of my head; a picture of me, falling. “Go away,” I’d say, But the image wouldn’t __3__.It was time to skate. The music started, slowly, and I told myself, “Have fun, Michael! It’s just a(n) __4__.”Once the music picked up, I started skating faster, I’d practiced the routine so manytimes, and I didn’t have to think about __5__ came next. But when I came down from thejump, my foot slipped from under me. I put a hand on the ice to __6__ myself, but it didn’tdo any good.Things kept getting __7__. On a triple flip(三周跳) I spun through the air, and justas I landed, my whole body went down again. There I was, flat on the ice, with the whole world __8__.I didn’t think I’d be able to pull myself together. But as I got up, I heard an amazing __9__. People were clapping in time to the music. They were trying to give me courage.I wasn’t surprised by my scores. However, the audience’s clapping woke me up! I was so busy trying not to __10__ that I forgot to feel what was in my heart—the love for skating.1. A. win B. enjoy C. share D. relax2. A. challenging B. missing C. wrong D. dangerous3. A. return B. leave C. appear D. stay4. A. sport B. activity C. picture D. accident5. A. when B. why C. who D. what6. A. prepare B. catch C. comfort D. measure7. A. clearer B. easier C. heavier D. worse8. A. watching B. expecting C. ignoring D. changing9. A. voice B. story C. sound D. idea10. A. collapse B. resist C. fall D. escape第二节语法填空(共10小题;每小题1.5分,共15分)阅读下列短文,根据短文内容填空,在未给提示词的空白处仅填写1个适当的单词,在给出提示词的空白处用括号内所给词的正确形式填空。
Eighth+grade+first+volume+English+full+volume+cour
Teach students how to manage their time effectively and complete tasks within a reasonable time frame
要点三
ห้องสมุดไป่ตู้
Textbooks and workbooks
Provide students with textbooks and workbooks that are well organized, easy to understand, and contain a variety of exercises and activities to help them practice and review
Memory skills
Introduce effective vocabulary memory methods, such as root and affix methods, associative memory methods, etc., to help students quickly memorize words and improve learning efficiency.
To develop students' ability to think critically and analytically about the information they received in English, enabling them to make informed decisions and judgments
小学上册第九次英语第1单元暑期作业
小学上册英语第1单元暑期作业英语试题一、综合题(本题有100小题,每小题1分,共100分.每小题不选、错误,均不给分)1.The _______ can bring colors to your life.2.The _____ (火焰) is warm.3.What do we call a large body of saltwater?A. RiverB. LakeC. OceanD. PondC4.What do we call the process of removing trees from a forest?A. ReforestationB. AfforestationC. DeforestationD. ConservationC Deforestation5.My best friend is very ______.6.What is the name of the famous American author known for his short stories?A. Edgar Allan PoeB. Nathaniel HawthorneC. F. Scott FitzgeraldD. All of the aboveD7.The chemical formula for acetic acid is _______.8. A solution is a homogeneous mixture of two or more _____.9.The sun is _____ in the sky. (high)10.The _______ is a measure of how much solute is in a solution.11. A force can cause an object to ______.12.My ______ is a great storyteller.13.Potential energy is stored ______.14.What do we call a young dolphin?A. CalfB. PupC. KidD. Foal15.What is the main gas in the atmosphere?A. OxygenB. NitrogenC. Carbon DioxideD. HydrogenB16.What is the term for animals that live both on land and in water?A. MammalsB. ReptilesC. AmphibiansD. Fish17.What do we use to write on paper?A. BrushB. PencilC. SpoonD. Scissors18.My sister loves to _____ her dolls. (play with)19.What do we call a young hedgehog?A. HogletB. PupC. KitD. CalfA Hoglet20.The ________ (strategy) guides our efforts.21.What is the largest planet in our solar system?A. EarthB. MarsC. JupiterD. Saturn22.The _____ (rainbow) appears after a storm.23.My favorite fruit is ___ (apple/banana).24. A __________ (植物的繁殖) can be done in various ways.25.What do you call a young eagle?A. ChickB. EyassC. EagletD. CalfC26.I enjoy learning about health and ______ (营养). It’s important to take care of our bodies.27.My favorite subject is _____ (math/science).28.She has a big _____ (狗).29.The __________ (历史的平衡) requires multiple viewpoints.30.What is the name of the famous mountain in the United States?A. Mount EverestB. Mount RushmoreC. Mount KilimanjaroD. Mount Fuji31.What do you call the tool used for cutting paper?A. KnifeB. ScissorsC. StaplerD. Ruler32.What is the name of the fairy tale character who had a red cape?A. CinderellaB. Little Red Riding HoodC. Snow WhiteD. Belle33. A dolphin is a playful _______ that loves to swim and jump through the waves.34. A ______ is a method for analyzing scientific data.35.The girl sings very ________.36.My favorite fruit is __________ because it tastes __________.37.The ______ of a cactus helps it survive in dry conditions. (仙人掌的刺帮助它在干燥的环境中生存。
计算机视觉在人体运动分析中的应用
计算机视觉在人体运动分析中的应用一、介绍计算机视觉是计算机科学中的一个重要领域,它研究如何让计算机“看得”懂图像和视频。
人体运动分析是计算机视觉的一个应用领域,旨在分析一个人的运动方式和姿势。
人体运动分析在医学、体育、娱乐等领域都有广泛的应用。
二、传统方法和新技术传统的人体运动分析方法主要依靠专业的人工观察和判断。
例如,在体育比赛中,裁判员需要根据运动员的姿态和位置判断是否犯规。
但是,这种方法存在一些问题,例如判断结果可能不够准确、耗时长、需要专业人员和设备等等。
随着计算机视觉技术的发展,许多新的方法和技术被应用到人体运动分析中,例如机器学习、深度学习、三维重建等。
这些新技术不仅提高了准确性,而且大大降低了成本和人工工作量。
三、人体姿态估计人体姿态估计是指根据图像或视频中人的姿势来进行姿态分析。
近几年,深度神经网络(DNN)被广泛应用于人体姿态估计中。
DNN可以学习人体姿态和关节点的特征,从而准确地估计人体姿态。
此外,3D骨架提取技术也被应用于人体姿态估计中。
通过分析视频序列中的每帧图像,该技术可以估计人体的三维姿态。
四、运动跟踪人体运动跟踪通常是指在视频序列中跟踪和分析一个或多个运动对象的位置、速度和加速度等。
传统方法使用人工跟踪或者基于运动模型的方法。
许多新技术已经应用于运动跟踪中。
例如,深度学习可以通过学习图像序列中对象的运动特征,跟踪对象的位置和速度。
此外,利用时空立方体来描述动作划分和物体跟踪也被广泛应用于运动跟踪。
五、动作识别和分类动作识别和分类通常用于将视频序列中的行为/动作或运动样式分类。
这个研究领域可以帮助人们理解运动样式、分类体育项目或研究人类行为模式。
深度学习几乎已经成为动作识别和分类的标准技术。
通过学习时空特征和动作之间的关系,深度学习可以在很大程度上提高动作分类的准确性和可靠性。
六、应用案例计算机视觉在人体运动分析中的应用已经有许多成功的案例。
在体育领域,计算机视觉可以帮助评估运动员的运动表现和强化训练。
专业英语人体运动跟踪11308139
Upper body Motion TrackingOverviewIn this project, I created a real time human upper body tracking system based on skin detection using Altera's DE2 board, a VGA monitor and a camera. Video streams were obtained from a camera, filtered, averaged and stored in a down sampled memory block. The down sampled frames were used to compute the location of the head and arms of the user. Movements of the head, chest and arms are displayed on VGA screen as a 3D robot constructed from several 3D boxes. The 3D projections are changed based on the user's view of the camera to create a more realistic 3D robot. The user can change the current VGA view through a set of switches. The resulting system is able to mimic the user's real time body movements概观在这个项目中,基于肤色检测人体上半身跟踪系统,我采用Altera公司的DE2开发板,VGA显示器和摄像头,。
02725_Robot (Science English PowerPoint)
2024/1/24
7
CHAPTER 02
Basic Principles and Key Technologies of Robots
2024/1/24
8
Introduction to the Basic Principles of Robots
Definition and Classification of Robots
2024/1/24
Documentation and Research
Most technical documentation, research papers, and patents related to robotics are written in
English Therefore, profitability in Technical English is crucial for accessing and understanding
9
Sensor and Actor Technology
2024/1/24
01
Types of Sensors
An introduction to the various types of sensors used in robotics,
including proximity sensors, vision sensors, force sensors, and
Control Architectures
A discussion of different control architectures used in robotics, including centralized control, distributed control, and hybrid control
英语关于年龄的知识点总结
英语关于年龄的知识点总结One of the most basic facts about age is that it is measured in years. The time from birth to the present moment can be represented as a numerical value, which provides a convenient way to track the passage of time and compare the experiences of different individuals. In most cultures, age is calculated from the date of birth, and individuals are often grouped into categories based on their age, such as children, adolescents, adults, or seniors.Age is a key factor in our physical development. Throughout our lives, our bodies undergo a series of changes that are closely tied to our age. In childhood, we experience rapid growth and development, with significant changes occurring in our size, strength, and motor skills. As we reach adolescence, our bodies go through puberty, which brings about numerous physical changes, including the development of secondary sexual characteristics. In adulthood, our bodies reach their peak physical condition, but they also begin to show signs of aging, such as a decrease in muscle mass, bone density, and skin elasticity. In old age, our bodies continue to change, as we become more susceptible to chronic diseases, mobility issues, and cognitive decline.Age also plays a crucial role in our psychological development. Our understanding of the world, our relationships with others, and our sense of self are all shaped by our age. In childhood, we are learning to navigate the world and develop our social and emotional skills. During adolescence, we are forming our identities, establishing our values, and seeking independence from our caregivers. In adulthood, we are focused on building our careers, starting families, and making meaningful contributions to society. In old age, we may find ourselves reflecting on our lives, seeking to pass on our wisdom to younger generations, and coming to terms with the challenges of aging.In addition to its role in physical and psychological development, age is also an important factor in our social and cultural lives. Many societies have specific expectations and norms that are tied to certain age groups. For example, children are often expected to attend school, follow their parents' rules, and participate in age-appropriate activities. Adolescents are expected to begin preparing for their future careers, forming peer relationships, and exploring their identities. Adults are often expected to take on responsibilities such as work, marriage, and parenting, and to contribute to their communities in meaningful ways. Seniors may be expected to step into roles as caregivers, mentors, and community leaders. Age also plays a key role in our relationships with others. Our interactions with people of different ages can be influenced by our own age and the age of the people we are interacting with. For example, children may look to older adults for guidance and support, adolescents may form strong bonds with their peers, and adults may seek out mentors and partners who are at a similar stage of life.Age is also a factor in our access to opportunities and resources. In many societies, age is tied to specific rights and responsibilities. For example, the legal age for driving, voting, drinking alcohol, and consenting to sexual activity varies from country to country. In theworkplace, age can be a determining factor in hiring, promotion, and retirement decisions. In healthcare, age is often used to guide decisions about prevention, treatment, and end-of-life care.Understanding the role of age in society can help us address the challenges and opportunities that come with different stages of life. It can also help us appreciate the diversity and richness of human experience, as we recognize the unique contributions and needs of people of different ages. By taking age into account in our personal and professional lives, we can create more inclusive and supportive communities for people of all ages.。
小学上册第16次英语第二单元自测题[含答案]
小学上册英语第二单元自测题[含答案]英语试题一、综合题(本题有100小题,每小题1分,共100分.每小题不选、错误,均不给分)1.My brother is ________ years old.2. A solution that contains more solute than it can hold is called a _____ solution.3.Creating a ______ (家庭花园) can yield fresh produce.4. A ______ is a natural barrier formed by geographical features.5.The rabbit thumps its _______ (后腿) when scared.6.The __________ (历史的启迪之光) illuminates pathways.7.I love to go ______ (滑雪) during winter vacations.8.The capital of Italy is ________ (罗马).9.What is the capital of Australia?A. SydneyB. MelbourneC. CanberraD. Brisbane10.The country known for its traditional dances is ________ (印度).11.What is the name of the largest fish in the ocean?A. Great White SharkB. Whale SharkC. Hammerhead SharkD. Mako Shark答案:B12.Listenand circle.(听录音,圈出正确的图片)13.We have a ______ (丰富的) resource center.14.The outer layer of the Earth is called the ______.15. A dolphin is very ______ and friendly.16. A reaction that absorbs energy is known as an ______ reaction.17.nature preserve) protects wildlife. The ____18.What do you call a young owl?A. OwletB. ChickC. PupD. Calf19.Acids tend to turn blue litmus paper _____.20.The ______ helps make important decisions.21.The __________ (历史的记忆) informs our future.22.How many wheels does a bicycle have?A. OneB. TwoC. ThreeD. Four23. collect ______ (坚果) for winter. Squirrel24.What is the name of the famous scientist known for his laws of motion?A. Isaac NewtonB. Albert EinsteinC. Nikola TeslaD. Stephen Hawking25.What is the term for a story's underlying message?A. PlotB. ThemeC. SettingD. Character答案:B26.The ________ (种植) season is in spring.27.What is the capital of India?A. DelhiB. MumbaiC. KolkataD. Bangalore28.My cousin is very __________ (敏感).29.She is studying for her ___. (test)30. A wave can be described using its frequency and ______.31.What do you call the action of keeping something safe?A. ProtectingB. GuardingC. DefendingD. Shielding答案:A32.How many legs does an octopus have?A. SixB. EightC. TenD. Twelve答案:B33.The ________ (生态恢复技术) promotes balance.34.What do we call the natural process by which rainwater is absorbed into the ground?A. InfiltrationB. EvaporationC. PrecipitationD. Transpiration35.The chemical formula for silver nitrate is _____.36.The ______ of an object depends on its mass.37. A ______ is the result of a chemical reaction.38.Asteroids are smaller than _______ but larger than meteoroids.39.What do we call the person who sells bread?A. BakerB. FarmerC. ChefD. Butcher40. A sound wave can be used for communication and ______.41._____ (热带雨林) are rich in biodiversity.42.ts can ______ (适应变化) in their environment. Some pla43.I like to _______ my toys before bed.44.The capital of Vanuatu is ________ (维拉港).45.The puppy is ______ in the grass. (rolling)46.The emu is a flightless ________________ (鸟).47.What do you call a square that is tilted?A. RectangleB. DiamondC. TriangleD. Parallelogram答案:B48.My mom enjoys going to ____ (yoga) classes.49.My favorite animal is a ______ (giraffe).50.The grey wolf is a ______ predator in its habitat.51.What is the name of the famous bear from the children's book?A. Winnie the PoohB. Paddington BearC. BalooD. Yogi Bear52.ts can ______ (修复) damaged ecosystems. Some pla53.I enjoy learning about different ______ (文化) and traditions from around the world. It broadens my perspective.54.The _______ provides food for many creatures.55.My little brother calls me ______ whenever he needs help. (我弟弟每当需要帮助时就叫我。
小学上册第十五次英语第3单元真题试卷
小学上册英语第3单元真题试卷英语试题一、综合题(本题有100小题,每小题1分,共100分.每小题不选、错误,均不给分)1.The _____ (植物习性) can inform gardening practices.2. A ______ is a type of animal that can hop.3.In spring, the _____ blooms beautifully.4.What do we call the act of telling a story?A. NarrationB. DescriptionC. ExplanationD. Definition5.What do you call a person who sells things?A. BuyerB. SellerC. CustomerD. Merchant6.The ______ (香草) can enhance the flavor in dishes.7.I like to ___ my friends. (visit)8.The chemical symbol for samarium is ____.9.My favorite subject is ______ (science).10.The _____ (小丑鱼) lives among the anemones in the sea. 小丑鱼生活在海葵中。
11. (French) were known for their fashion and art during the Renaissance. The ____12.I read a _______ every night.13.What is the primary ingredient in a burrito?A. TortillaB. RiceC. BeansD. Meat14.It’s time to ___. (sleep)15.The weather is _______ (多云的) today.16.What is the main ingredient in butter?A. MilkB. CreamC. OilD. EggsB17.I like to ride my ______ (horse).18.What is the capital of Switzerland?A. GenevaB. ZurichC. BernD. Lucerne19.The _____ (computer/tablet) is on sale.20.My dad likes to ______ (烹饪) on weekends.21.What do we call the motion of the Earth around the sun?A. RotationB. RevolutionC. Both A and BD. CycleB22.What do we call the seasons that occur after summer?A. WinterB. SpringC. FallD. AutumnC23.How many letters are in the word "elephant"?A. 6B. 7C. 8D. 9答案:B24.What is the opposite of 'happy'?A. SadB. AngryC. ExcitedD. JoyfulA Sad25.Solutes are substances that are ______ in a solvent.26.They are ________ (快乐) today.27.I like to collect ______ (纪念品) from places I’ve visited.28.The _____ (自然景观) inspires many artists.29.What do we call the scientific study of matter and energy?A. PhysicsB. ChemistryC. BiologyD. Earth ScienceA30.My favorite season is ______ (春天) because the flowers ______ (开花) and the weather is ______ (温暖). I like to go outside and ______ (玩耍) with my friends. 31.The unit of measurement for energy is the ______.32. A _______ can be a solid, liquid, or gas.33.The process of converting a solid to a gas is called _______.34.I eat breakfast with my ____.35.What do we call the process of reducing waste materials?A. Waste minimizationB. RecyclingC. CompostingD. ReductionA Waste minimization36. A _______ (鲨鱼) is a powerful predator.37.What is the color of bananas?A. RedB. YellowC. GreenD. BlueB38.The _____ (花) smells sweet.39.What do you call a young pig?A. PigletB. CalfC. FoalD. Kid40.I always try to help others because __________.41.How many legs does a spider have?A. SixB. EightC. TenD. FourB42.The ______ (猴子) can be very playful.43.The cake is ________ with icing.44. A solid has a ______ shape.45.______ is the movement of the Earth's plates.46. A ______ (城市绿化) initiative can transform landscapes.47.What do you call a place where you can watch movies?A. TheaterB. LibraryC. SchoolD. MuseumA48.What do we call the tiny particles that make up all matter?A. AtomsB. CellsC. MoleculesD. Electrons49.What is the term for a person who creates art?A. SculptorB. ArtistC. PainterD. DesignerB50.My _____ (弟弟) loves to play soccer.51.He _____ (is/are) my best friend.52. A wildcat is a ______ (野生的) version of a domestic cat.53.The _______ of light can be affected by the angle of incidence.54.The chemical formula for barium chloride is ______.55.Fossils are found in ______ rock.56.The plant grows tall and _______ (植物长得又高又_______).57.The ________ was a notable treaty that resolved longstanding tensions.58.What is the opposite of hot?A. ColdB. WarmC. DryD. Wet59.The turtle can live for over a ______ (十年).60.I can build a train track with my ________ (玩具名称).61.My sister is _______ (在拍照).62.The process of a solid changing directly to a gas is called _____.63.The chemical formula for neptunium dioxide is _____.64.What is the capital city of Italy?A. RomeB. FlorenceC. VeniceD. MilanA65. A chemical that can accept electrons is called an ______.66.What is the main ingredient in potato chips?A. RiceB. PotatoC. CornD. WheatB67.What is 3 + 5?A. 7B. 8C. 9D. 1068.I see a _____ (工人) fixing the road.69.We eat _____ (breakfast/lunch) at AM.70.What is the name of the famous painting by Salvador Dalí?A. The Persistence of MemoryB. The Last SupperC. GuernicaD. The Scream71.What do you call the process of extracting minerals from the earth?A. MiningB. QuarryingC. DrillingD. Excavation72.What is the name of the famous scientist who developed the laws of motion?A. Albert EinsteinB. Isaac NewtonC. Charles DarwinD. Louis PasteurB73.What do we call the chemical element with the symbol H?A. HydrogenB. HeliumC. LithiumD. OxygenA Hydrogen74.我的朋友喜欢 _______ (活动). 她觉得这很 _______ (形容词)75.My grandma is a wonderful __________ (启发者).76.The ________ (生长环境) impacts species.77.We visit the ______ (天文馆) to learn about space.78.The chemical formula for copper(I) oxide is _____.79.In my family, my grandma likes to be called . (在我的家庭中,我的奶奶喜欢被称为。
6dof average distance of model points
6dof average distance of model pointsThe average distance of model points is an important metric in 6 Degrees of Freedom (6DoF) motion tracking. It helps us understand the accuracy and precision of the trackingsystem in reproducing the motion of a real-world object ina virtual environment. In this context, 6DoF refers to the ability to accurately track and measure six different types of motions: three rotational (pitch, yaw, and roll) andthree translational (surge, sway, and heave). This meansthat for each point on the model being tracked, we are considering its 3D movement in space.模型点的平均距离是6自由度运动跟踪中的一个重要指标。
它帮助我们了解跟踪系统在虚拟环境中还原真实世界物体运动时的准确性和精度。
在这个背景下,6DoF指的是能够准确跟踪和测量六种不同类型的运动:三种旋转(俯仰、偏航和横滚)和三种平移(前后、左右和上下)。
这意味着对于被跟踪模型上的每个点,我们考虑它在空间中的三维运动。
To calculate the average distance of model points in 6DoF tracking, we first need to have an accurate reference model.The reference model represents the ideal or original position and orientation of the object being tracked. The actual tracked data from the motion tracking system is compared with this reference model to determine how closely it matches.为了计算6自由度跟踪中模型点的平均距离,我们首先需要一个准确的参考模型。
大学生对数字学习技术的看法英文范文
大学生对数字学习技术的看法英文范文全文共6篇示例,供读者参考篇1Digital Learning: The College Kid's ViewHi there! My name is Timmy and I'm 8 years old. I love learning about new things, especially when it comes to technology. Lately, I've been really curious about how college students feel about using digital tools for learning. You see, my older cousin Jill is in her second year at University, and she's always talking about the crazy amount of apps, websites, and gadgets she has to use for her classes. It's like a whole new world compared to the way I learn at my elementary school!From what I've gathered by pestering Jill with a million questions, college is serious business. The professors don't just hand out worksheets or give lectures with a chalkboard like Mrs. Walker does. Nope, they're using all sorts of high-tech stuff that sounds super complicated. But Jill seems to think it's pretty cool, so I wanted to find out more.The first thing Jill told me about was this thing called a "learning management system." Sounds like some kind of robotmanager if you ask me! But apparently, it's just a fancy website where her professors post all the course materials, assignments, and grades. Everything is digital, from the syllabus to the textbooks. No more carrying around a bazillion heavy books in her backpack!Jill said that at first, it was a little overwhelming trying to keep track of everything online. But once she got the hang of it, she realized how convenient it was to have all her stuff in one place. She can even submit her essays and projects through the website, which is way easier than printing out a million pages. Plus, she doesn't have to worry about losing any important documents since they're all safely stored in the cloud.Another thing Jill raves about is how many online resources her professors use. Like, they'll post videos, articles, and interactive simulations to help explain complicated topics. It's not just boring old textbook reading anymore. Jill showed me this one website that lets you explore the human body in 3D – so cool! She said it's like having a whole library of multimedia learning materials right at her fingertips.But overall, Jill seems to think the pros outweigh the cons when it comes to digital learning in college. She likes how flexible and engaging it is compared to traditional methods. Forexample, some of her classes have online discussion forums where students can chat about the course material anytime, anywhere. It's like having a virtual study group on-demand!Another cool thing Jill mentioned is something called "adaptive learning." Apparently, there are these special programs that can adjust the difficulty level and pace of the lessons based on how well each individual student is doing. So if you're struggling with a certain topic, it'll give you more practice and explanations. But if you're breezing through, it'll move you along faster. Talk about personalized learning!Personally, I think all this digital stuff sounds awesome! I can't wait until I'm old enough to go to college and use all those fancy schmancy learning tools. Although, part of me will probably miss the simplicity of just using pencils, paper, and my trusty Number 2s. There's something special about theold-school classroom vibe, you know?But the world is changing, and technology is changing the way we learn. From what I've heard from Jill, most college students are embracing digital learning technologies, even if there are some occasional hiccups and growing pains. It's an exciting frontier that's making education more interactive, flexible, and personalized than ever before.Who knows, by the time I get to college, they'll probably have even crazier gadgets and gizmos for learning. Maybe we'll have virtual reality classrooms or computer chips implanted in our brains to download information instantly! A kid can dream, right? For now, I'll stick to reading about the college kid's perspective and counting down the days until I'm one of them.篇2Digital Stuff for Learning is Super Cool!Hi there! My name is Timmy and I'm 8 years old. I really like video games, building things with Legos, and watching cartoons. School is OK I guess, but sometimes it's kind of boring just sitting at my desk all day. That's why I think digital learning technologies are awesome!My big brother Jimmy is in college and he gets to use all sorts of crazy digital stuff for his classes. He showed me some of it and let me play around, and I gotta tell ya - it's way more fun than just reading books and doing worksheets like we do at my school.First of all, Jimmy has these things called "interactive simulations" for his science classes. Basically they let you like, mess around and experiment with stuff on the computer insteadof just reading about it. One simulation let him build virtual circuits and see how the electricity flowed. Another one showed him how different forces affect objects in motion. It's kinda like a video game, but educational! I tried messing with a few of them and it made learning about science seem really cool and not so dry and boring.Then there are these "educational videos" that Jimmy watches sometimes instead of just having a professor lecture at him. The videos have animations, graphics, and people acting stuff out to explain concepts. Way more engaging than just listening to someone drone on and on! Some of the videos even had little quizzes along the way to test if you were paying attention. Jimmy said finding good videos takes some digging, but the right ones can teach way better than a textbook.Oh oh, and get this - Jimmy's statistics class uses this crazy "virtual reality" headset for visualizing data! You can like, walk around inside 3D graphs and scatter plots and stuff. He let me try it once and it was a total trip. Made understanding all those equations and numbers way easier when you can just look at them from the inside out. The future is here, man!But probably the coolest digital learning thing Jimmy gets to use is this "adaptive learning software." Basically it's kinda likethose learning games I play on the iPad, except it tracks how well you're understanding each topic. If you're struggling with something, it automatically adjusts and gives you more practice on that area. If you've got a concept down cold, it stops wasting your time on that and moves you ahead faster. Jimmy said it's kinda like having a super smart private tutor just for you that learns how you learn best.All this digital stuff sounds so awesome compared to the old school way we do things at my elementary school. We just have those clunky computer labs where we sometimes get to play really basic math games and stuff. But from what I've seen of Jimmy's college tech, learning could be way more interactive, visual, and customized if schools took advantage of it.I mean, don't get me wrong - I know digital ain't everything and we'd still need teachers to guide us. But combining all those cool technologies with in-person instruction from professors seems like it could make for a way more engaging and effective learning experience. At least way more fun than just reading flat textbooks and doing bookwork all day!Jimmy said a lot of colleges are still kinda behind when it comes to using digital learning tools regularly. Some professors are totally old school and just want to lecture from PowerPointslike they're stuck in the 20th century or something. But he thinks within a few more years, digital will become just as normal as pencils and books. After all, us kids these days are growing up immersed in computers, smartphones, and tablets from a super young age. We're just naturally going to prefer more digital ways of learning over the traditional stuff.So those are my thoughts on digital learning tech that I've picked up from watching my bro. Like I said, it all seems so rad and way more fun and interactive than old school methods. I really hope by the time I get to college, every class will have us using simulations, videos, VR, and adaptive learning software as much as possible. Fingers crossed the teachers and professors wise up and make learning more digital! An 8-year-old technological whiz kid can dream, right? Okay, I'm gonna go play Roblox now. Later!篇3Digital Learning: The College Kid's PlaygroundHey there! My name is Timmy, and I'm in the 5th grade. Today, I want to tell you all about what those big college kids think of digital learning. You know, with all the computers,tablets, and fancy apps they use in class. It's like a whole new world compared to when our parents went to school!First off, let me explain what digital learning really means. It's basically using technology like computers, smartphones, and online resources to help students learn. Instead of just listening to a teacher talk and write on a chalkboard, college students get to watch videos, use interactive apps, and even attend virtual classes from anywhere in the world!Now, you might be thinking, "Wow, that sounds so cool!" And you're absolutely right – most college students love digital learning! Let me tell you why.One of the biggest reasons is that it makes learning way more fun and engaging. Can you imagine sitting through a three-hour lecture without anything but a professor droning on and on? Yikes! With digital tools, classes become more interactive and hands-on. Students can participate in online discussions, work on group projects using shared documents, and even create their own digital content like videos or podcasts. It's like learning through play!Another major perk of digital learning is that it's super flexible. College kids are always on the go, juggling classes, jobs, and social lives. With online courses and materials, they canstudy whenever and wherever they want. No more being tied to a specific classroom or schedule. They can watch recorded lectures during their lunch break or work on assignments late at night when the dorm is quiet.But that's not all! Digital learning also helps students develop important skills for the future. Things like coding, data analysis, and digital literacy are becoming increasingly important in today's job market. By using technology in their studies, college students are better prepared for careers that rely heavily on tech.Despite these challenges, most college students seem to embrace digital learning wholeheartedly. They appreciate the flexibility, engagement, and practical skills it offers. Plus, let's be honest – they're part of the generation that grew up with smartphones and tablets, so using technology for learning just feels natural to them.As for me, I can't wait to experience digital learning myself when I eventually make it to college (after conquering middle school and high school, of course!). Just imagine attending a virtual reality lecture on dinosaurs or using a 3D modeling app to design my own buildings. The possibilities are endless!So, there you have it – a glimpse into the college kid's digital learning playground. It's a whole new world out there, filled with interactive tools, flexible schedules, and high-tech skills. While it may not be perfect, it's certainly shaping the future of education in exciting ways. Now, if you'll excuse me, I have to go practice my coding skills on my tablet. Maybe I'll create the next big educational app someday!篇4Digital Stuff is Pretty Cool!My big sister Emily is in college now and she has to use all sorts of digital things for her classes. At first I thought it was just her watching videos on her laptop, but she told me it's way more than that. Let me tell you about all the rad digital learning techs she uses!First up, she has these things called "online lectures." Instead of going to a classroom, her professors just record videos of themselves teaching the material. Emily can watch them whenever she wants on her laptop or even on her phone! She says it's awesome because if she misses something important, she can just rewind and re-watch that part. No more missing out if you space out for a few minutes!But get this - it's not just videos. Her classes also have interactive modules built right into the online portals. She can click through slides, take little quizzes, and even join live video chat rooms to ask questions! The modules make sure she's actually paying attention and understanding before moving on to new topics. It's like having a personal digital tutor.Emily also does a ton of her assignments and homework online now using digital tools. For writing papers, she uses apps that check her grammar and give suggestions in real-time as she types. She can insert citations with just a few clicks. For math and science, she has simulation apps where she can visualize concepts in 3D and run experiments virtually. It's like having a whole science lab on her computer!Speaking of math, Emily doesn't even have to buy expensive calculators anymore. She just uses websites and apps that can solve the craziest equations step-by-step. Some of them even let her graph functions just by typing them in. No more struggling to sketch graphs precisely by hand!But here's the really awesome part - digital tech isn't just one-way anymore. Emily's classes have online discussion boards and shared workspaces where all the students can chat, ask questions, share notes, and work together on group projects.Even though they're not physically in the same room, they can still collaborate and help each other out in real-time.Emily says one of her favorite digital tools is for peer review. When she has to write an essay, she can submit a draft online and it automatically gets shared with a few random classmates. They can all leave comments and suggestions directly on her paper. She said getting feedback from multiple people at once is super helpful for revising her work.Another crazy thing is that Emily's textbooks are all digital now. No more carrying around a ton of heavy books! All the textbooks are downloaded on her laptop and tablet. The digital versions have built-in search functions to quickly find keywords and concepts. Some of them even have interactive graphics, videos, and practice questions built right into the pages. Bye bye boring paper textbooks!Emily's professors also use digital tools to track her progress and participation. Her online quiz scores, attendance records, submitted assignments, and more are all logged in a secret digital gradebook. Emily can check her grades for each assignment and weighted categories anytime she wants. No more wondering how she's doing in the class!Finally, the coolest digital tool Emily showed me is for virtual reality learning. She has this headset that she can wear to explore 3D simulated environments based on what she's studying. If she's learning about Ancient Rome, she can just slip on the headset and suddenly be transported to the Roman Colosseum like she's actually there! For anatomy class, she can walk through a 3D model of the human body and look at all the organ systems up close. It's like the real world but not! Emily says virtual reality makes the material way more memorable and fun to learn.I know this was a lot of digital stuff to cover, but I hope I explained everything clearly. College definitely seems a whole lot different from elementary school! From online lectures to virtual reality, digital technology is changing how Emily and her classmates learn in all sorts of high-tech ways. I can't wait until I'm older so I can try out all these awesome digital tools myself. Although to be honest, a part of me will kind of miss the simplicity of reading from a paperback book and writing on actual paper with a pencil. There's something nostalgic about that. But overall, I think digital learning techs sound unbelievably cool and useful. The future of education is going to be a wild digital ride!篇5Title: Big Kids and Their Computer ToysMy big brother goes to a really big school called college. He's 20 years old and he's studying to be an engineer, which means he builds things like robots and bridges. He's always working on his laptop or playing around with some new gadget or app. He says all the "digital learning tech" helps him study better. I篇6Digital Learning Is Cool!Hi there! My name is Emily, and I'm a 4th grader at Sunny Hills Elementary School. Today, I'm going to tell you all about what college students think of digital learning technologies. It's a fascinating topic that my older sister, who's a freshman at State University, has been discussing a lot lately.You see, when my sister started college last fall, she was amazed by all the high-tech gadgets and software that her professors were using in their classes. It was a whole new world compared to the chalkboards and textbooks she was used to in high school!One of the coolest things my sister raves about is this thing called a "smart board." Imagine a gigantic tablet or computer screen at the front of the classroom, but you can write on it with a special pen, and the professor can pull up all sorts of multimedia content like videos, animations, and interactive diagrams! My sister says it makes her lectures way more engaging and easier to understand than just listening to someone drone on and on.Another nifty tool that many of her classes use is online discussion forums. It's like a message board where students can post questions, share ideas, and have conversations about the course material outside of class time. My sister finds it super handy for getting clarification on tricky concepts or collaborating with classmates on group projects.Speaking of group projects, colleges are also using a lot of cloud-based tools that allow multiple people to work on the same document or presentation simultaneously from different locations. How awesome is that? My sister and her teammates don't have to constantly email files back and forth or meet up in person to get stuff done.But perhaps the most mind-blowing digital learning technology my sister has encountered is something called"virtual reality." By wearing a special headset, she can be transported into immersive 3D environments that bring abstract concepts to life in stunning detail. For example, in her biology class, she got to take a virtual tour through the human circulatory system and see how blood cells travel through the body's vessels. It's like going on the most epic field trip ever without leaving the classroom!Overall though, my sister says the majority of her peers seem excited about the potential of digital learning to make their studies more interactive, collaborative, and fun. They feel like these tools are better preparing them for the highly digital and connected workplaces they'll eventually join after graduation.Personally, I can't wait until I get to use all of this cool technology when I'm older! Sure, there's something nostalgic about old-school pencil-and-paper learning. But being able to explore the ancient ruins of Rome in virtual reality or beam a hologram of my science project to kids across the world? Sign me up!Maybe by the time I'm in college, we'll have even crazier advancements like learning implants that can directly download knowledge into our brains. A girl can dream, right? For now though, I'll settle for my class's brand-new Chromebooks andhope I can avoid getting fingers smudges all over the shiny touchscreens.What's been your experience with digital learning tech so far? I'd love to hear about the awesome (or not-so-awesome) tools your teachers have introduced. Learning sure has come a long way from just reading dusty textbooks and watching filmstrips on an old projector!At the end of the day, whether you prefer high-tech orlow-tech, the most important thing is having teachers who make lessons engaging and relevant. All the snazzy gadgets in the world won't make a boring subject fun. But when educators harness the power of technology in creative ways, it can unlock new worlds of possibility for students like us.。
iDS-2CD7A46G0-IZHS(Y)(R) 4 MP IR Varifocal Bullet
iDS-2CD7A46G0-IZHS(Y)(R)4 MP IR Varifocal Bullet Network Camera⏹High quality imaging with 4 MP resolution⏹Excellent low-light performance via DarkFighter technology⏹Clear imaging against strong back light due to 140 dB WDRtechnology⏹Efficient H.265+ compression technology to savebandwidth and storage⏹Advanced streaming technology that enables smooth liveview and data self-correcting in poor network⏹ 5 streams and up to 5 custom streams to meet a widevariety of applications⏹Water and dust resistant (IP67) and vandal proof (IK10)FunctionFace CountingWith the embedded deep learning algorithms, the camera integrates multiple intelligences. It counts persons and samples the face features simultaneously, and compares them with the built-in face library, so to remove the duplicated person. It counts persons and reports a face alarm simultaneously to achieve both the entrance control and people counting.Hard Hat DetectionWith the embedded deep learning algorithms, the camera detects the persons in the specified region. It detects whether the person is wearing a hard hat, and captures the head of the person and reports an alarm if not.Multi-Target-Type DetectionWith the embedded deep learning algorithms, the camera detects and captures the face and human body in the specified region and outputs the features, such as gender, age and top color. It supports a structurized modeling of the face and human body to achieve a structurized data collection.Queue ManagementWith embedded deep learning based algorithms, the camera detects queuing-up people number and waiting time of each person. The body feature detection algorithm helps filter out wrong targets and increase the accuracy of detection. MetadataMetadata uses individual instances of application data or the data content. Metadata can be used for third-party application development.Smooth StreamingSmooth streaming offers solutions to improve the video quality in different network conditions. For example, in poor network conditions, adapting to the detected real-time network condition, streaming bit rate and resolution are automatically adjusted to avoid mosaic and lower latency in live view; in multiplayer network conditions, the camera transmits the redundant data for the self-error correcting in back-end device, so that to solve the mosaic problem because of the packet loss and error rate.SpecificationCameraImage Sensor 1/1.8″ Progressive Scan CMOSMin. Illumination Color: 0.001 Lux @ (F1.2, AGC ON); B/W: 0.0003 Lux @ (F1.2, AGC ON) Shutter Speed 1 s to 1/100,000 sSlow Shutter YesP/N P/NWide Dynamic Range 140 dBDay & Night IR cut filterPower-off Memory YesLensFocus Auto, semi-auto, manualLens Type & FOV 2.8 to 12 mm, horizontal FOV: 114.5° to 41.8°, vertical FOV: 59.3° to 23.6°, diagonal FOV: 141.1° to 48°8 to 32 mm, horizontal FOV: 42.5° to 15.1°, vertical FOV: 23.3° to 8.64°, diagonal FOV: 49.6° to 17.3Aperture 2.8 to 12 mm: F1.2 to 2.5 8 to 32 mm: F1.7 to F1.73Lens Mount IntegratedBlue Glass Module Blue glass module to reduce ghost phenomenon. P-Iris YesIlluminatorIR Range 2.8 to 12 mm: 30 m 8 to 32 mm: 80 mWavelength 850 nm Smart Supplement Light YesVideoMax. Resolution 2680 ×1520Main Stream 50Hz: 25fps (2680 × 1520, 2560 × 1440, 1920 × 1080, 1280 × 720) 60Hz: 30fps (2680 × 1520, 2560 × 1440, 1920 × 1080, 1280 × 720)Sub-Stream 50Hz: 25fps (704 × 576, 640 × 480) 60Hz: 30fps (704 × 480, 640 × 480)Third Stream 50Hz: 25fps (1920 × 1080, 1280 × 720, 704 × 576, 640 × 480) 60Hz: 30fps (1920 × 1080, 1280 × 720, 704 × 480, 640 × 480)Fourth Stream 50Hz: 25fps (1920 × 1080, 1280 × 720, 704 × 576, 640 × 480) 60Hz: 30fps (1920 × 1080, 1280 × 720, 704 × 480, 640 × 480)Fifth Stream 50Hz: 25fps (704 × 576, 640 × 480) 60Hz: 30fps (704 × 480, 640 × 480)Custom Stream 50Hz: 25fps (1920 × 1080, 1280 × 720, 704 × 576, 640 × 480) 60Hz: 30fps (1920 × 1080, 1280 × 720, 704 × 480, 640 × 480)Video Compression Main stream: H.265+/H.265/H.264+/ H.264Sub-stream/Third stream/Fourth stream/Fifth stream/custom stream:H.265/H.264/MJPEGVideo Bit Rate 32 Kbps to 16 MbpsH.264 Type Baseline Profile/Main Profile/High ProfileH.265 Type Main ProfileCBR/VBRMain stream/Sub-stream/third stream/fourth stream/fifth stream/custom streamScalable Video Coding (SVC) H.265 and H.264 supportRegion of Interest (ROI) 4 fixed regions for each streamAudioEnvironment Noise Filtering YesAudio Sampling Rate 8 kHz/16 kHz/32 kHz/44.1 kHz/48 kHz Audio Compression G.711/G.722.1/G.726/MP2L2/PCM/MP3Audio Bit Rate 64Kbps(G.711)/16Kbps(G.722.1)/16Kbps(G.726)/32-192Kbps(MP2L2)/32Kbps(PCM)/8-320Kbps(MP3)Audio Type Mono soundNetworkSimultaneous Live View Up to 20 channelsAPI ONVIF (PROFILE S, PROFILE G, PROFILE T), ISAPI, SDK, ISUPProtocols TCP/IP, ICMP, HTTP, HTTPS, FTP, SFTP, SRTP, DHCP, DNS, DDNS, RTP, RTSP, RTCP, PPPoE, NTP,UPnP, SMTP, SNMP, IGMP, 802.1X, QoS, IPv6, UDP, Bonjour, SSL/TLSSmooth Streaming YesUser/Host Up to 32 users. 3 user levels: administrator, operator and userSecurity Password protection, complicated password, HTTPS encryption,802.1X authentication (EAP-TLS, EAP-LEAP, EAP-MD5), watermark, IP address filter, basic and digest authentication for HTTP/HTTPS, WSSE and digest authentication for ONVIF, RTP/RTSP OVER HTTPS, Control Timeout Settings, Security Audit Log,TLS 1.2Network Storage microSD/SDHC/SDXC card (256 GB) local storage, and NAS (NFS, SMB/CIFS), auto network replenishment (ANR)Together with high-end Hikvision memory card, memory card encryption and health detection are supported.Client iVMS-4200, Hik-Connect, Hik-CentralWeb Browser Plug-in required live view: IE8+Plug-in free live view: Chrome 57.0+, Firefox 52.0+, Safari 11+ Local service: Chrome 41.0+, Firefox 30.0+ImageSmart IR The IR LEDs on camera should support Smart IR function to automatically adjust power to avoid image overexposure.Day/Night Switch Day, Night, Auto, Schedule, Triggered by Alarm InTarget Cropping YesPicture Overlay LOGO picture can be overlaid on video with 128 × 128 24bit bmp format Image Enhancement BLC, HLC, 3D DNR, Defog, EIS, Distortion CorrectionImage Parameters Switch YesImage Settings Rotate mode, saturation, brightness, contrast, sharpness, gain, white balance adjustable by client software or web browserSNR ≥52 dBInterfaceAlarm 2 input, 2 outputs (max. 24 VDC, 1 A)Audio -IZHSY:1 input (line in), 3.5 mm connector, max. input amplitude: 3.3 vpp, input impedance: 4.7 KΩ, interface type: non-equilibrium; 1 output (line out), 3.5 mm connector, max.output amplitude: 3.3 vpp, output impedance: 100 Ω, interface type:non-equilibrium, mono soundRS-485 With -Y: 1 RS-485(half duplex, HIKVISION, Pelco-P, Pelco-D, self-adaptive)1 Vp-p Composite Output(75Ω/CVBS)(For debugging only)On-board Storage Built-in micro SD/SDHC/SDXC slot, up to 256 GBCommunication Interface 1 RJ45 10 M/100 M/1000 M self-adaptive Ethernet portPower Output With -Y: 12 VDC, max. 200 mA (supported by all power supply types) Heater YesSmart Feature-SetBasic Event Motion detection, video tampering alarm, vibration detection, exception (network disconnected, IP address conflict, illegal login, HDD full, HDD error)Smart Event Line crossing detection, up to 4 lines configurableIntrusion detection, up to 4 regions configurableRegion entrance detection, up to 4 regions configurableRegion exiting detection, up to 4 regions configurableUnattended baggage detection, up to 4 regions configurableObject removal detection, up to 4 regions configurableScene change detection, audio exception detection, defocus detectionCounting Yes Intelligent (Deep Learning Algorithm)Face Recognition Detects up to 30 faces simultaneously;Supports swing left and right from -60° to 60°, tilt up and down from -30° to 30°; Up to 3 face libraries with up to 30000 faces each are configurable; Recognizes face identity via face modeling, grading and comparing to those in face library;Supports face library encryption.Multi-target-type Detection Supports simultaneous detection of human body and face Gets 8 face features and 13 human body featuresFace Counting Supports entering/exiting counting (track 30 targets simultaneously);Supports facial recognition;Supports 24-hour dynamic deduplication and face library deduplication (up to 3 face libraries with up to 30000 faces);Supports reporting alarms when detecting the face is the same as that in the face library.Hard Hat Detection Detects up to 30 human targets simultaneously Supports up to 4 shield regionsQueue Management Detects queuing-up people number, and waiting time of each personGenerates reports to compare the efficiency of different queuing-ups and display the changing status of one queueSupports raw data export for further analysisPremier Protection Line crossing, intrusion, region entrance, region exitingSupport alarm triggering by specified target types (human and vehicle)Filtering out mistaken alarm caused by target types such as leaf, light, animal, and flag, etc.GeneralLinkage Method Upload to FTP/NAS/memory card, notify surveillance center, send email, trigger alarm output, trigger recording, trigger captureTrigger recording: memory card, network storage, pre-record and post-record Trigger captured pictures uploading: FTP, SFTP, HTTP, NAS, EmailTrigger notification: HTTP, ISAPI, alarm output, EmailOnline Upgrade Yes Dual Backup YesWeb Client Language 33 languagesEnglish, Russian, Estonian, Bulgarian, Hungarian, Greek, German, Italian, Czech, Slovak, French, Polish, Dutch, Portuguese, Spanish, Romanian, Danish, Swedish, Norwegian, Finnish, Croatian, Slovenian, Serbian, Turkish, Korean, Traditional Chinese, Thai, Vietnamese, Japanese, Latvian, Lithuanian, Portuguese (Brazil),UkrainianGeneral Function Anti- flicker, 5 streams and up to 5 custom streams, EPTZ, heartbeat, mirror, privacy masks, flash log, password reset via e-mail, pixel counterSoftware Reset YesStorage Conditions -40 °C to 60 °C (-40 °F to 140 °F). Humidity 95% or less (non-condensing) Startup and OperatingConditions-40 °C to 60 °C (-40 °F to 140 °F). Humidity 95% or less (non-condensing)Power Supply 12 VDC ± 20%, three-core terminal block, reverse polarity protection; PoE:802.3at, Type 2 Class 4Power Consumption and Current 12 VDC, 1.33 A, max. 16.0 WPoE: (802.3at, 42.5V-57V), 0.43 A to 0.31 A, max. 18.0 WPower Interface Three-core terminal block Camera Material Aluminum alloy body Screw Material Mild steelCamera Dimension Without -Y: Ø144 × 347 mm (Ø5.7″ × 13.7″) With -Y: Ø140 × 351 mm (Ø5.5″ × 13.8″)Package Dimension 405 × 190 × 180 mm (15.9″ × 7.5″ × 7.1″) Camera Weight 1920 g (4.2 lb.)With Package Weight 3060 g (6.7 lb.)MetadataMetadata Metadata of intrusion detection, line crossing detection, region entrance detection, region exiting detection, unattended baggage detection, object removal, face capture and queue management are supported.ApprovalClass Class BEMC FCC (47 CFR Part 15, Subpart B); CE-EMC (EN 55032: 2015, EN 61000-3-2: 2014, EN 61000-3-3: 2013, EN 50130-4: 2011 +A1: 2014); RCM (AS/NZS CISPR 32: 2015); IC (ICES-003: Issue 6, 2016); KC (KN 32: 2015, KN 35: 2015)Safety UL (UL 60950-1); CB (IEC 60950-1: 2005 + Am 1:2009 + Am 2: 2013); CE-LVD (EN 60950-1: 2005 + Am 1: 2009 + Am 2:2013); BIS (IS 13252(Part 1): 2010+A1: 2013+A2: 2015); LOA (IEC/EN 60950-1)Environment CE-RoHS (2011/65/EU); WEEE (2012/19/EU); Reach (Regulation (EC) No 1907/2006) Protection Ingress protection: IK10 (IEC 62262:2002), IP67 (IEC 60529-2013)Automotive and Railway With -R: EN50121-4Anti-Corrosion Protection With -Y: NEMA 4X(NEMA 250-2014). Anti-corrosion coating, not recommended in high salinity/humidity/acidity environmentOther With -Y: PVC FREEAvailable ModeliDS-2CD7A46G0-IZHS (2.8 to 12 mm), iDS-2CD7A46G0-IZHS (8 to 32 mm), iDS-2CD7A46G0-IZHSY (2.8 to 12 mm), iDS-2CD7A46G0-IZHSY (8 to 32 mm), iDS-2CD7A46G0-IZHSYR (2.8 to 12 mm) iDS-2CD7A46G0-IZHSYR (8 to 32 mm)DimensionAccessory –Y model:DS-1476ZJ-Y Corner MountDS-1475ZJ-Y Vertical Pole MountWithout –Y model:DS-1475ZJ-SUS DS-1476ZJ-SUS。
关于奥运会的英语阅读理解
关于奥运会的英语阅读理解Olympic Games Reading Comprehension: A Fun GuideWhen ites to Olympic Games readingprehension, it can be like opening a treasure chest full of amazing stories. You know, the Olympics is not just a bunch of sports events. It's a global celebration of human spirit and athleticism.Let's start with the types of questions you might encounter in an Olympic - themed readingprehension. There could be questions about the history of the Olympics. Oh, it's like going on a time - traveling adventure. You'll learn how the ancient Greeks started this wonderful tradition way back when. They were all about glorifying the human body and the spirit ofpetition. And now, we've got this huge, modern Olympics that has spread across the world. It's not just for the Greeks anymore. It's for everyone from every corner of the earth.Then there are questions about the different sports in the Olympics. Think of it as a big buffet of sports. There's track and field, which is like the foundation of the sports banquet. Athletes running like the wind, jumping as if they have springs in their legs. Gymnastics? Well, that's like watching human art in motion. Those gymnasts twist and turn in the air, defying gravity almost. Swimming is another big part. It's like the aquatic ballet. Swimmers glide through the water like sleek fish.The reading passages might also talk about the athletes themselves. These athletes are like superheroes without capes. They've trained for years, sacrificed so much. Take Usain Bolt for example. He was like a lightning bolt on the track. His speed was out of this world. And Simone Biles in gymnastics? She's a wonder. Her skills are soplex and precise, it's as if she has a magic wand that makes her perform those amazing routines.In terms of understanding the passages, you've got to look for the details. It's like looking for hidden gems in a big pile of rocks. Sometimes the details are about the scores in a game, or the records that an athlete has broken. Other times, it's about the emotions of the athletes. You can feel their joy when they win, their disappointment when they lose. It's all there in the reading.You also need to understand the context. If a passage is talking about the Olympics being held in a particular city, it's not just about the location. It's about the culture of that city, how it influenced the Games. For instance, when the Olympics were held in Beijing, China showed the world its unique charm. The amazing opening ceremony was like a grand feast for the eyes, with all those traditional Chinese elements. It was a blend of ancient and modern China, and it added a whole new dimension to the Olympic experience.Now, what if youe across words you don't understand? Well, don't panic. It's like finding a little bump on a smooth road. You can usually figure out the meaning from the context. If a word is related to a sport, like "scull" in rowing, you can kind of guess what it might be based on the other words around it. If it's a more general word, like "endorsement" in a passage about Olympic sponsors, you can think about how it might fit in the overall story.Another aspect is theparison between different Olympics. One Olympics might be remembered for a particular event or a new technology used. It's likeparing different chapters in a big book. Each Olympics has its own story to tell, its own set of heroes and memorable moments.When doing Olympic - themed readingprehension, you're not just reading words. You're experiencing the magic of the Olympics. You're sharing in the hopes and dreams of the athletes. It's like being a part of the globalmunity thates together every few years to celebrate the best of human achievement.In conclusion, Olympic Games readingprehension is a wonderful way to explore the world of sports, culture, and human spirit. It's a journey full of excitement, discovery, and inspiration.。
3D Trajectory Recovery for Tracking Multiple Objects and Trajectory Guided Recognition of A
BU CS TR98-019rev.2.To appear in Proc.IEEE Conf.on Computer Vision and Pattern Recognition,June19993D Trajectory Recovery for Tracking Multiple Objectsand Trajectory Guided Recognition of ActionsR´o mer Rosales and Stan SclaroffComputer Science Department,Boston UniversityAbstractA mechanism is proposed that integrates low-level(im-age processing),mid-level(recursive3D trajectory estima-tion),and high-level(action recognition)processes.It isassumed that the system observes multiple moving objectsvia a single,uncalibrated video camera.A novel extendedKalmanfilter formulation is used in estimating the rela-tive3D motion trajectories up to a scale factor.The re-cursive estimation process provides a prediction and errormeasure that is exploited in higher-level stages of actionrecognition.Conversely,higher-level mechanisms providefeedback that allows the system to reliably segment andmaintain the tracking of moving objects before,during,andafter occlusion.The3D trajectory,occlusion,and seg-mentation information are utlized in extracting stabilizedviews of the moving object.Trajectory-guided recognition(TGR)is proposed as a new and efficient method for adap-tive classification of action.The TGR approach is demon-strated using“motion history images”that are then recog-nized via a mixture of Gaussian classifier.The system wastested in recognizing various dynamic human outdoor ac-tivities;e.g.,running,walking,roller blading,and cycling.Experiments with synthetic data sets are used to evaluatestability of the trajectory estimator with respect to noise.1IntroductionTracking non-rigid objects and classifying their motion is achallenging problem.The importance of tracking and mo-tion recognition problems is evidenced by the increasingattention they have received in recent years[26].Effectivesolutions to these problems would lead to breakthroughs inareas such as video surveillance,motion analysis,virtualreality interfaces,robot navigation and recognition.Low-level image processing methods have been shownto work surprisingly well in restricted domains despite thelack of high-level models[9,8,30].Unfortunately,mostof these techniques assume a simplified version of the gen-eral problem;e.g.,there is only one moving object,ob-jects do not occlude each other,or objects appear at a lim-ited range of scales and orientations.While higher-level,model-based techniques can address some of these prob-lems[6,12,15,16,18,21,23],such methods typicallyrequire careful placement of the initial model.These limitations arise because object tracking,3D tra-jectory estimation,and action recognition are treated asseparable problems.In fact,these problems are inexorablyintertwined.For instance,an object needs to be tracked ifits3D trajectory is to be recovered;while at the same time,ing tracking to a plane,if the projective effect is avoidedthrough the use of a top,distant view[14].It is also pos-sible to use some heuristics about body part relations and motion on image plane like[13].In our work,we do not as-sume planar motion or detailed knowledge about the objectand our formulation can handle some changes in shape.More complicated representations like[16,12,18,23,15,21],use model-based techniques,generally articulatedmodels comprised of2D or3D solid primitive,sometimes accounting for self-occlusion by having an explicit object model.Shape models were used by[2]for human tracking.The most relevant motion recognition approaches re-lated to our work are those that employ view-based mod-els[3,9,19].In particular,[9]uses motion history im-ages(MHI)and motion energy images(MEI),temporal templates that are matched using a nearest neighbor ap-proach against examples of given motions already learned. The main problem of this method is the requirement of having stationary objects,and the insufficiency of the rep-resentation to discriminate among similar motions.Mo-tion analysis techniques have had the problem that registra-tion of useful,filtered information is a hard labor by itself [9,4,19].Our system provides a functional front-end that supports such tasks.3Basic ApproachA diagram of our approach is illustrated in Fig.1.Thefirst stage of the algorithm is based on the background subtrac-tion methods of[30].The system is initialized by acquiring statistical measurements of the empty scene over a number of video frames.The statistics acquired are the mean and covariance of image pixels in3D color space.The idea is to have a confidence map to guide segmentation of the foreground from the background.Once initialized,moving objects in the scene are seg-mented using a log-likelihood measure between the incom-ing frames and the current background model.The input video stream is low-passfiltered to reduce noise effects.A connected components analysis is then applied to the resulting image.Initial segmentation is usually noisy,so morphological operations and sizefilters are applied.If there are occlusions,or strong similarities between the empty scene model and the objects,then it is possible that regions belonging to the same object may be classi-fied as part of different objects or vice versa.To solve this problem,two sources of information are used:temporal analysis and trajectory prediction.In temporal analysis,a map of the previous segmented and processed frame is kept.This map is used as a possible approximation of the current connected elements.Connec-tivity in the current frame is compared with it,and regions are merged if they were part of the same object in previous frames.This can account for some shadow effects,local background foreground similarities and brief occlusions.In trajectory prediction,an Extended Kalman Filter (EKF)provides an estimate of each object's image bound-ing box position and velocity.The input to the EKF is a2D bounding box that encloses the moving object in the image. The extended Kalmanfilter then estimates the relative3D motion trajectories for each object,based on a3D linear trajectory model.In contrast to trajectory prediction basedLow−LevelReasoningMotionRecognitionFigure1:System Diagram.on a2D model,the3D approach is explicitly designed to handle nonlinear effects of perspective projection.Occlusion prediction is performed based on the current EKF estimates.Given that we know object position and the occupancy map,we can detect occlusions or collisions in the image plane.Our EKF formulation estimates the tra-jectory parameters for these objects assuming locally lin-ear3D motion,also the bounding box parameters are esti-mated.During an occlusion,the EKF can be used to give the maximum likelihood estimate of the current region cov-ered by the object,along with its velocity and position.For each frame,the estimated bounding box is used to resize and resample the moving blob into a canonical view that can be used as input to motion recognition modules. This yields a stabilized of the moving object throughout the tracking sequence,despite changes in scale and position.The resulting translation/scale stabilized images of the object are then fed to an action recognition module.Ac-tions are represented in terms of motion energy images (MEI's)and motion history images(MHI's)[4,9].An MEI is a cumulative motion image,and an MHI is a function of the recency of the motion at every pixel.By using stabi-lized input sequences,it is possible to make the MEI/MHI approach invariant to unrestricted3D translational motion. The stabilized representation is then fed to a moment-based action classifier.The action recognition module employs a mixture of Gaussian classifier,which is learned via the Ex-pectation Maximization(EM).In theory it is necessary to learn representations of every action for all possible trajectory directions.However,the complexity of such an exhaustive approach would be im-practical.We therefore propose a formulation that avoids this complexity without decreasing recognition accuracy. The problem is made tractable via trajectory-guided recog-nition(TGR),and is a direct consequence of our tracking and3D trajectory estimation mechanisms.In TGR,we partition the hemisphere of possible trajectory directions based on the trajectories estimated in the training data. Each partition corresponds to a group of similar trajectory directions.During training and classification,trajectory di-rection information obtained via the EKF is used to deter-mine the direction-partitioned feature space.This allowsautomatic learning and adaptation of the direction space tothose directions that are commonly observed.43D Trajectory from2D Image MotionOur method requires moving blob segmentation and con-nected components analysis as input to the tracking mod-ule.Due to space limitations,readers are to[24]for details of these modules.To reduce the complexity of the trackingproblem,two feature points are selected:two opposite cor-ners of the blob's bounding ing a blob's bounding box alleviates need to searching for corresponding point features in consecutive frames.In general we think that a detailed tracking of features is neither necessary nor easily tenable for non-rigid motion tracking at low resolution.It is assumed that although the object to be trackedis highly non-rigid,the3D size of the object's boundingbox will remain approximately the same,or at least vary smoothly.This assumption might be too strong in some cases;e.g.,if the internal motion of the object's parts can-not be roughly self contained in a bounding box.However, when analyzing basic human locomotion,we believe that these assumptions are a fair approximation.For our representation a3D central projection modelsimilar to[28,1]is used:is the inverse focal length.The origin of the coordinate system isfixed at the image plane. The model is numerically well defined even in the case of orthographic projection.Our state models a3D planar rectangular bounding box moving along a linear trajectory at constant velocity.Be-cause we are considering the objects as being planar,the depth at both feature points should be the same.The re-duction in the number of degrees of freedom improves the speed of convergence of the EKF and the robustness of the estimates.Our state vector then becomes:(2) where,are the corners of the 3D planar bounding box.The vector represents a corner's3D velocity relative to the camera.The sensitivity in and is directly dependent on theobject depth as objects that are farther away from the cam-era tend to project to fewer image pixels.The sensitivity of is an inverse function of camera focal length,becoming zero in the orthographic case.The3D trajectory and velocity are recovered up to ascale factor.However,the family of allowable solutions allproject to a unique solution on the image plane.We can therefore estimate objects'future positions on the image plane given their motion in space.The use of this3D trajectory model offers significantly improved ro-bustness over methods that employ a2D image trajectory model.Due to perspective foreshortening effects,trajecto-ries in the image plane are nonlinear,and2D models are therefore inaccurate.4.1Extended Kalman Filter Formulation Trajectory estimates are obtained via an extended Kalman Filter(EKF)formulation.Our state is guided by the fol-lowing linear equation:(3) where is our state at time,is the process noise and,the system evolution matrix,is based onfirst or-der Newtonian dynamics in3D space and assumed time invariant.If additional prior information on dynamics is available,then can be changed to better de-scribe the system evolution[22].Our measurement vector is, where are the image plane coordinates for the ob-served feature at time.The measurement vector is related to the state vector via the measurement equation:.Note that is non-linear.The EKF time update equation becomes:(4)(5) where is the process noise covariance.The measurement relationship to the process is nonlin-ear.At each step,the EKF linearizes around our current es-timate using the measurement and state partial derivatives. The measurement update equations become:(6)(7)(8)where is the Jacobian of with respect to:(9)where.Finally,the matrix is the Jacobian of with respect to,and is the measurement noise covariance at time.The general assumptions are:and are Gaussian random vectors with, and.For more detail,see[27, 29].Obviously,as more measurements are collected,the er-ror covariance of our estimates tends to decrease.Ex-perimentally40frames were needed for convergence with real data.As will be seen in our experiments,motions that are not linear in3D can also be tracked,but the estimate at the locations of sudden change in velocity or direction is more prone to instantaneous error.The speed of conver-gence when a change in trajectory occurs depends on the filter's expected noise.A reseting mechanism is used to detect when the EKF does not represent the true observa-tions.This is done by comparing the current projection of the estimate with the observation.5Motion RecognitionOur tracking approach allows the construction of an object centered representation.The resulting translation/scale sta-bilized images of the object are then fed to an action recog-nition module.Actions are represented in terms of motion energy images(MEI's)and motion history images(MHI's) [4,9].An MEI is a cumulative motion image,and an MHI is a function of the recency of the motion at every pixel. By using stabilized input sequences,it is possible to make the MEI/MHI approach invariant to unrestricted3D trans-lational motion.The seven Hu moment invariants are then computed for both the MHI and MEI[4,9].The resulting features are combined in a14-dimensional vector.The dimension of this vector is reduced via principal components analysis (PCA)[11].In our experiments,the PCA allowed a dimen-sionality reduction of64%(dim=5),while capturing90% of the variance of our training data.The reduced feature vector is then fed into a maximum likelihood classifier that is based on a mixture of Gaussians model.In the mixture model,each action class is represented by a set of mixture model parameters,.The model pa-rameters and prior distribution,for each action class are estimated during a training phase using the expec-tation maximization(EM)algorithm[10].Given, we calculate using Bayes rule.The motivation for using a mixture model is that there may be a number of variations(or body configurations)for motions within the same action class.The model should adequately span the space of standard configurations for a given action.However,we are only interested infinding a representative set of these modes.We therefore use an information theoretic technique to select the best number of parameters to be used via MDL principle:Figure2:Tracking example:2bodies walking in different trajectories,occluding eachother.Figure3:Normalized views of the2bodies in the sequence,one row per body.6views with their respective regions of support.Figure4:Recovered(top view)motion.Note that motion of body1is from right to left.Body2goes in the opposite direction. Estimates at the beginning of the sequence are not very accurate, because error covariance in the EKF is higher.Uncertainty is reduced as enough samples are given.the second body moves in the opposite direction.While the early position estimates are noisy,after20-40framestheEKFconvergesto a stable estimate of the trajectories.6.1Learning and Recognition of ActionsIn order to test our the full action recognition system,wecollected sequences(3hours total)of random people per-forming different actions on a pedestrian road in a outdoorenvironment(Charles River sequences).We trained thesystem to recognize four different actions(walking,run-ning,roller blading,biking)gathered from two different camera viewpoints.The camera was located and angle with respect to the road.Video sequences showing56examples of each actionwere extracted from this data set.The duration of eachexample ranged from one tofive seconds(30-150frames).The recognition experiment was conducted as follows.For each trial,a subset of40training examples per action was select at random from the full example set.The re-maining examples per action(excluded from the test set) were then classified.Example frames from the data set are shown in Fig.5.As before,the estimated bounding boxes for each movingobject are shown overlaid on the input video image.Our approach indicated that only two trajectories where mainlyFigure5:Example frames taken from the river sequences.Actions Running Biking-0.120.35 Running-0.010.11-0.23 Biking0.02-Totals0.350.03Table1:Confusion matrix for classifying four action classes: walking,running,roller blading,and biking.In the experimental trials,the total probability of an error in classification (chance=0.75).observed:either direction along the foot path.There-fore,the system learned just two sets of PDF's ().Results for either view were almost the same;average rates are therefore presented.Classification performance is summarized in the con-fusion matrix shown in Tab.1.Each confusion matrix cell represents the probability of error in classification.The entry at is the probability of action being misclassi-fied as action.The last column is the total probability of a given action being misclassified as another action.The last row represents the probability of misclassification to action class.In the experimental trials,the total probability of an error in classification(chance=0.75). 6.2Sensitivity ExperimentsIn order to provide a more comprehensive analysis of the 3D trajectory estimation technique,we tested its sensitiv-ity with synthesized data sets.We conducted Monte Carlo experiments to evaluate the stability of trajectory estima-tion and prediction.In our experiments,two types of noise effects were tested:noise in the2D image measurements, and noise in the3D motion model.Test sequences were generated using a synthetic planar bounding box moving along varying directions in3D from a given starting position.The3D box was then projectedonto the image plane using our camera model(Eq.:1)with unit focal length.Each synthetic image sequence was100 frames long.The set of directions was sampled by the az-imuth and the rotation around the vector correspond-ing to.For sampling we use(different directions).For each experiment, each of the576possible trajectory directions was tested using15sequences with randomly perturbed inputs.All synthetic video sequences were sampled at a pixel resolution of.This was mapped to a physical viewport size of world units.Therefore one pixel width is equivalent to in world units.The depths of the object from the image plane ranged in a scale from to.This resulted in a projected bounding box that occupied approximately3%of the image on average.For all of our experiments we define the error in our es-timate at a given time to be measured in the image plane. The mean squared error(MSE)in estimating the bounding box corner positions was computed over the100frames within each test sequence.To better understand the effect of error due to differences in the projected size of the ob-ject from frame to frame,second error measure,normal-ized MSE was also computed.In this measure,the error at each frame was normalized by length of the projected bounding box diagonal.Figure6:Example of performance with significant measure-ment noise.The plots show error as a function trajectory direction().Normalized mean-square error(upper surface)and non-normalized mean-square error error surfaces(lower surface)are shown.Note that mean-square error varies almost exclusively with(azimuth).To test the sensitivity of the system to noise in the mea-surements,time varying white noise with variance was added to the measured bounding box corner positions at each frame.This was meant to simulate sensor noise and variations in bounding box due to non-rigid deformations of the object.To test the sensitivity of the model formula-tion to perturbations in the actual3D object trajectory,time varying white noise with variance was added to perturb the3D position of the object at each frame.The resulting trajectories were therefore not purely linear.Our experiments have consistently shown that the mean-square error depends exclusively on the azimuth an-gle of the trajectory,.An illustrative example,Fig.6 shows error surfaces for and.The plot shows the mean-square error and normalized mean-square error,over all possible directions.As can be seen,Figure7:Graphs showing the sensitivity with respect to varying levels of measurement noise.Thefirst graph shows the normal-ized mean-square error in the state estimate over various trajec-tory directions.The second graph shows the normalized mean-square error in the state predicted for the future frame. Figure8:Sensitivity with respect to varying levels of noise in the3D motion trajectory.Thefirst graph shows the normalized mean-square error in the state estimate over various trajectory di-rections.The second graph shows the normalized mean-square error in the state predicted.mean-square error is relatively invariant to.Due to this result,our graphs drop by averaging over it,so that the complexity of visualization is reduced.Fig.7shows results of experiments designed to test the effect of increasing the measurement noise.We set relatively low with respect to the real3D dimen-sions,,,and varied.As expected,the error is lower at lower noise levels.Thefirst graph shows the normalized mean-square error in the state estimate over various trajectory directions.The second graph shows the normalized mean-square error in the state predicted for the future frame(ten frames ahead).This error is due to the linearization step in the EKF.This error effects the accuracy of“look ahead”needed to foresee an occlusion, and to predict the state during occlusions.The graph in Fig.8shows the results of experiments designed to test the effect of increasing noise in the3D motion trajectory.This corresponds to noise added to the model.Here,is varied as shown,, ,and is kept relatively small. Note that is set to very high values with respect to the expected model noise.The normalized mean-square error in the position estimates is relatively constant over for each different level of noise and is also higher than the pre-diction error.The main reason for this is that the expected low measurement error tends to pull the estimates towards highly noisy measurements.The prediction error in gen-eral increases with and,showing the higher error in linearizing among the current state space point.The errorM e a n −s q u a r e d e r r o rFigure 9:Sensitivity with number of frames ahead used to calcu-late the prediction.Normalized mean-square error overdecreases after a given value for due to the normaliza-tion effect and the smaller effect that close to has with respect to changes on the image plane.In a final set of experiments,we tested the accuracythe trajectory prediction,at varyingframes into the fu-ture.Results are shown in Fig.9.Notice that as expected,the 3D trajectory prediction is more accurate in short time windows.The uncertainty increases with the number of frames in the future we want to make our prediction.This is mainly due to the fact that the EKF computes an approx-imation linearizing around the current point in state space,and as we pointed out the underlying process is non-linear.The prediction error in general increases with ,showing the higher error as consequence of linearization.7ConclusionWe have shown how to integrate low-level and mid-level mechanisms to achieve more robust and general tracking.3D trajectories can be successfully estimated,given some constraints on non-rigid objects.Prediction and estimation based on a 3D model gives improved performance over 2D image plane based prediction.This trajectory estimation can be used to predict and correct for occlusion.We utilized EKF 3D trajectory estimates in a new framework:Trajectory-Guided Recognition (TGR).This general method significantly reduces the complexity of ac-tion classification,and could be used with other techniques (e.g.,,[3,19]).Our tracking approach allows the construc-tion of an object centered representation.The resulting translation/scale stabilized images of the object are then fed to the TGR action recognition module that selects the appropriate classifier based on trajectory direction.The system was tested in classifying four basic actions in a large number of video sequences collected in uncon-strained,outdoor scenes.The noise stability properties of the trajectory estimation subsystem were also tested us-ing synthetic data sequences.The results of the exper-iments are encouraging.Classification performance was quite good,considering the complexity of the task.References[1]A.Azarbayejani and A.Pentland.Recursive estimation of motion,structure,and focal lenght.PAMI ,17(6),1995.[2]A.Baumberg and D.Hogg.Learning flexible models from image sequences.ECCV ,1994.[3]M.Black and Y Yacoob.Tracking and recognizing rigid and non-rigid facial motion using local parametric models of im-age motion.ICCV ,1995.[4]A.Bobick and J Davis.An appearance-based representation of action.ICPR ,1996.[5]K.Bradshaw,I.Reid,and D.Murray.The active recovery of 3d motion trajectories and their use in prediction.PAMI ,19(3),1997.[6]C.Bregler.Tracking people with twists and exponential maps.CVPR98,1998.[7]R.Chellappa T.Broida.Estimating the kinematics and struc-ture of a rigid object from a sequence of monocular images.PAMI ,13(6):497-513,1991.[8]T.Darrell and A.Pentland.Classifying hand gestures with a view-based distributed representation.NIPS ,1994.[9]J.Davis and A.F.Bobick.The representation and recognition of human movement using temporal templates.CVPR ,1997.[10]A.Dempster,ird,and D.Rubin.Maximum like-lihood estimation from incomplete data.J.of the Royal Stat.Soc.(B),39(1),1977.[11]K.Fukunaga.Introduction to Statistical Pattern Recognition .Academic Press,1972.[12]D.Gavrila and L.Davis.Tracking of humans in action:a 3-dmodel-based approac.Proc.ARPA IUE Workshop ,1996.[13]L.Davis I.Haritaouglu,D.Harwood.W4s:A realtime sys-tem for detecting and tracking people in 2.5d.ECCV ,1998.[14]S.Intille and A.F.Bobick.Real time close world tracking.CVPR ,1997.[15]S.Ju,M.Black,and Y .Yacoob.Cardboard people:A param-eterized model of articulated image motion.Proc.Gesture Recognition ,1996.[16]I.Kakadiaris, D.Metaxas,and R.Bajcsy.Active part-decomposition,shape and motion estimation of articulated objects:A physics-based approach.CVPR ,1994.[17]T.Kohonen.Self-organized formation of topologically cor-rect feature maps.Bio.Cybernetics ,43:59-69,1982.[18]A.Pentland and B.Horowitz.Recovery of non-rigid motionand structure.PAMI ,13(7):730–742,1991.[19]R.Polana and R.Nelson.Low level recognition of humanmotion.Proc.IEEE Workshop on Nonrigid and Articulate Motion ,1994.[20]B.Rao,H.Durrant-Whyte,,and J.Sheen.A fully decentral-ized multi-sensor system for tracking and surveillance.IJRR ,12(1),1993.[21]J.M.Regh and T.Kanade.Model-based tracking of self-occluding articulated objects.ICCV ,1995.[22]D.Reynard, A.Wildenberg, A.Blake,and J.Marchant.Learning dynamics of complex motions from image se-quences.ECCV ,1996.[23]K.Rohr.Towards model-based recognition of human move-ments in image sequences.CVGIP:IU ,59(1):94-115,1994.[24]R.Rosales and S.Sclaroff.Improved tracking of multiplehumans with trajectory prediction and occlusion modeling.IEEE CVPR Workshop on the Interp.of Visual Motion ,1998.[25]R.Rosales and S.Sclaroff.Trajectory guided tracking andrecognition actions.TR BU-CS-99-002,Boston U.,1999.[26]M.Shah and R.Jain.Motion-Based Recognition .KluwerAcademic,1997.[27]H.W.Sorenson.Least-squares estimation:From gauss tokalman.IEEE Spectrum ,V ol.7,pp.63-68,1970.[28]R.Szeliski and S.Kang.Recovering 3d shape and motionfrom image streams using non-linear least squares.CVPR ,1993.[29]G.Welch and G.Bishop.An introduction to the kalman fil-ter,.TR 95-041,Computer Science,UNC Chapel Hill,1995.[30]C.Wren, A.Azarbayejani,T.Darrell,and A.Pentland.Pfinder:Real time tracking of the human body.TR 353,MIT Media Lab,1996.。
CVPR08 Learning realistic human actions from movies
Learning realistic human actions from moviesIvan Laptev Marcin Marszałek Cordelia SchmidBenjamin Rozenfeld INRIA Rennes,IRISAINRIA Grenoble,LEAR -LJKBar-Ilan Universityptev@inria.frmarcin.marszalek@inria.frcordelia.schmid@inria.frgrurgrur@AbstractThe aim of this paper is to address recognition of natural human actions in diverse and realistic video settings.This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets.Our first contri-bution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human ac-tions in videos.We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based clas-sifiing the retrieved action samples for visual learn-ing,we next turn to the problem of action classification in video.We present a new method for video classification that builds upon and extends several recent ideas including local space-time features,space-time pyramids and multi-channel non-linear SVMs.The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8%accuracy.Given the inherent problem of noisy labels in automatic annotation,we particularly in-vestigate and show high tolerance of our method to annota-tion errors in the training set.We finally apply the method to learning and classifying challenging action classes in movies and show promising results.1.IntroductionIn the last decade the field of visual recognition had an outstanding evolution from classifying instances of toy ob-jects towards recognizing the classes of objects and scenes in natural images.Much of this progress has been sparked by the creation of realistic image datasets as well as by the new,robust methods for image description and classifica-tion.We take inspiration from this progress and aim to transfer previous experience to the domain of video recog-nition and the recognition of human actions in particular.Existing datasets for human action recognition (e.g.[15],see figure 8)provide samples for only a few action classes recorded in controlled and simplified settings.This stands in sharp contrast with the demands of real applications fo-cused on natural video with human actions subjected to in-Figure 1.Realistic samples for three classes of human actions:kissing;answering a phone;getting out of a car.All samples have been automatically retrieved from script-aligned movies.dividual variations of people in expression,posture,motion and clothing;perspective effects and camera motions;illu-mination variations;occlusions and variation in scene sur-roundings.In this paper we address limitations of current datasets and collect realistic video samples with human ac-tions as illustrated in figure 1.In particular,we consider the difficulty of manual video annotation and present a method for automatic annotation of human actions in movies based on script alignment and text classification (see section 2).Action recognition from video shares common problems with object recognition in static images.Both tasks have to deal with significant intra-class variations,background clutter and occlusions.In the context of object recogni-tion in static images,these problems are surprisingly well handled by a bag-of-features representation [17]combined with state-of-the-art machine learning techniques like Sup-port Vector Machines.It remains,however,an open ques-tion whether and how these results generalize to the recog-nition of realistic human actions,e.g.,in feature films or personal videos.978-1-4244-2243-2/08/$25.00 ©2008 IEEE[1:13:41-1:13:45]A black car pulls up.Twoarmy officers get out. action annotation.Left:Preci-visual ground truth.Right:“get out of a car”.this issue,we manually actions in12movie scripts ground truth.From147ac-=1)only70%did matcheither were misalignedFigure5.Space-time interest points detected for two video frames with human actions hand shake(left)and get out car(right). complexity,the independence from scale selection artifacts and the recent evidence of good recognition performance using dense scale sampling.We also eliminate detectionsFigure8.Sample frames from the KTH actions sequences.All six classes(columns)and scenarios(rows)are presented.tains a total of2391sequences.We follow the experimentalAnswerPhone GetOutCar HandShake HugPerson Kiss SitDown SitUp StandUpT PT N F P F NFigure 10.Example results for action classification trained on the automatically annotated data.We show the key frames for test movies with the highest confidence values for true/false positives/negatives.achieves 60%precision and scales easily to a large num-ber of action classes.It also provides a convenient semi-automatic tool for generating action samples with manual annotation.Our method for action classification extends recent successful image recognition methods to the spatio-temporal domain and achieves best up to date recognition performance on a standard benchmark [15].Furthermore,it demonstrates high tolerance to noisy labels in the training set and,therefore,is appropriate for action learning in au-tomatic settings.We demonstrate promising recognition re-sults for eight difficult and realistic action classes in movies.Future work includes improving the script-to-video alignment and extending the video collection to a much larger dataset.We also plan to improve the robustness of our classifier to noisy training labels based on an iterative learn-ing approach.Furthermore,we plan to experiment with a larger variety of space-time low-level features.In the long term we plan to move away from bag-of-features based rep-resentations by introducing detector style action classifiers.Acknowledgments.M.Marszałek is supported by the European Community under the Marie-Curie project V IS -ITOR .This work was supported by the European research project C LASS .We would like to thank J.Ponce and A.Zis-serman for discussions.References[1]T.Berg, A.Berg,J.Edwards,M.Maire,R.White,Y .Teh,E.Learned Miller,and s and faces in the news.In CVPR ,2004.[2] A.Bosch,A.Zisserman,and X.Munoz.Representing shape with aspatial pyramid kernel.In CIVR ,2007.[3]P.Doll´a r,V .Rabaud,G.Cottrell,and S.Belongie.Behavior recog-nition via sparse spatio-temporal features.In VS-PETS ,2005.[4]M.Everingham,J.Sivic,and A.Zisserman.Hello!my name is...Buffy –automatic naming of characters in TV video.In BMVC ,2006.[5]M.Everingham,L.van Gool,C.Williams,J.Winn,and A.Zisser-man.Overview and results of classification challenge,2007.The PASCAL VOC’07Challenge Workshop,in conj.with ICCV .[6]H.Jhuang,T.Serre,L.Wolf,and T.Poggio.A biologically inspiredsystem for action recognition.In ICCV ,2007.[7]ptev.On space-time interest points.IJCV ,64(2/3):107–123,2005.[8]ptev and P.P´e rez.Retrieving actions in movies.In ICCV ,2007.[9]zebnik,C.Schmid,and J.Ponce.Beyond bags of features:spatial pyramid matching for recognizing natural scene categories.In CVPR ,2006.[10]L.Li,G.Wang,and L.Fei Fei.Optimol:automatic online picturecollection via incremental model learning.In CVPR ,2007.[11]R.Lienhart.Reliable transition detection in videos:A survey andpractitioner’s guide.IJIG ,1(3):469–486,2001.[12]M.Marszałek,C.Schmid,H.Harzallah,and J.van de Weijer.Learn-ing object representations for visual object class recognition,2007.The PASCAL VOC’07Challenge Workshop,in conj.with ICCV .[13]J.C.Niebles,H.Wang,and L.Fei-Fei.Unsupervised learning ofhuman action categories using spatial-temporal words.In BMVC ,2006.[14] F.Schroff, A.Criminisi,and A.Zisserman.Harvesting imagedatabases from the web.In ICCV ,2007.[15] C.Schuldt,ptev,and B.Caputo.Recognizing human actions:Alocal SVM approach.In ICPR ,2004.[16] F.Sebastiani.Machine learning in automated text categorization.ACM Computing Surveys ,34(1):1–47,2002.[17]J.Willamowski,D.Arregui,G.Csurka,C.R.Dance,and L.Fan.Categorizing nine visual classes using local appearance descriptors.In IWLAVS ,2004.[18]S.-F.Wong and R.Cipolla.Extracting spatiotemporal interest pointsusing global information.In ICCV ,2007.[19]S.-F.Wong,T.K.Kim,and R.Cipolla.Learning motion categoriesusing both semantic and structural information.In CVPR ,2007.[20]J.Zhang,M.Marszalek,zebnik,and C.Schmid.Local fea-tures and kernels for classification of texture and object categories:A comprehensive study.IJCV ,73(2):213–238,2007.[21]rge margin winnow methods for text categorization.InKDD-2000Workshop on Text Mining ,2000.。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Learning to Track3D Human Motion from SilhouettesAnkur Agarwal ANKUR.AGARWAL@INRIALPES.FR Bill Triggs BILL.TRIGGS@INRIALPES.FR GRA VIR-INRIA-CNRS,655Avenue de l’Europe,Montbonnot38330,FranceAbstractWe describe a sparse Bayesian regression methodfor recovering3D human body motion directlyfrom silhouettes extracted from monocular videosequences.No detailed body shape model isneeded,and realism is ensured by training on realhuman motion capture data.The tracker esti-mates3D body pose by using Relevance VectorMachine regression to combine a learned autore-gressive dynamical model with robust shape de-scriptors extracted automatically from image sil-houettes.We studied several different combina-tion methods,the most effective being to learna nonlinear observation-update correction basedon joint regression with respect to the predictedstate and the observations.We demonstrate themethod on a54-parameter full body pose model,both quantitatively using motion capture basedtest sequences,and qualitatively on a test videosequence.1.IntroductionWe consider the problem of estimating and tracking the3D configurations of complex articulated objects from monoc-ular images,e.g.for applications requiring3D human body pose or hand gesture analysis.There are two main schools of thought on this.Model-based approaches presuppose an explicitly known parametric body model,and estimate the pose by either:(i)directly inverting the kinemat-ics,which requires known image positions for each body part(Taylor,2000);or(ii)numerically optimizing some form of model-image correspondence metric over the pose variables,using a forward rendering model to predict the images,which is expensive and requires a good initial-ization,and the problem always has many local minima (Sminchisescu&Triggs,2003).An important sub-case is model-based tracking,which focuses on tracking the pose Appearing in Proceedings of the21st International Conference on Machine Learning,Banff,Canada,2004.Copyright2004by the authors.estimate from one time step to the next starting from a known initialization,based on an approximate dynamical model(Bregler&Malik,1998,Sidenbladh et al.,2002).In contrast,learning based approaches try to avoid the need for accurate3D modelling and rendering,and to capitalize on the fact that the set of typical human poses is far smaller than the set of kinematically possible ones,by estimating (learning)a model that directly recovers pose estimates from observable image quantities(Grauman et al.,2003). In particular,example based methods explicitly store a set of training examples whose3D poses are known, and estimate pose by searching for training image(s)sim-ilar to the given input image,and interpolating from their poses(Athitsos&Sclaroff,2003,Stenger et al.,2003, Mori&Malik,2002,Shakhnarovich et al.,2003).In this paper we take a learning based approach,but in-stead of explicitly storing and searching for similar training examples,we use sparse Bayesian nonlinear regression to distill a large training database into a single compact model that generalizes well to unseen examples.We regress the current pose(body joint angles)against both image descrip-tors(silhouette shape)and a pose estimate computed from previous poses using a learned dynamical model.High di-mensionality and the intrinsic ambiguity in recovering pose from monocular observations makes the regression nontriv-ial.Our algorithm can be related to probabilistic tracking, but we eliminate the need for:(i)an exact body model that must be projected to predict an image;and(ii)a pre-defined error model to evaluate the likelihood of the ob-served image signal given this projection.Instead,pose is estimated directly,by regressing it against a dynamics-based prediction and an observed shape descriptor vector. Regressing on shape descriptors allows appearance varia-tions to be learned automatically,enabling us to work with a simple generic articular skeleton model;while including an estimate of the pose in the regression allows the method to overcome the inherent many-to-one projection ambigui-ties present in monocular image observations.Our strategy makes good use of the sparsity and generaliza-tion properties of our nonlinear regressor,which is a variant of the Relevance Vector Machine(RVM)(Tipping,2000).RVM’s have been used,e.g.,to build kernel regressors for 2D displacement updates in correlation-based patch track-ing (Williams et al.,2003).Human pose recovery is sig-nificantly harder —more ill-conditioned and nonlinear,and much higher dimensional —but by selecting a suffi-ciently rich set of image descriptors,it turns out that we can still obtain enough information for successful regres-sion (Agarwal &Triggs,2004a).Our motion capture based training data models each joint as a spherical one,so formally,we represent 3D body pose by 55-D vectors x including 3joint angles for each of the 18major body joints.The input images are reduced to 100-D observation vectors z that robustly encode the shape of a human image silhouette.Given a temporal sequence of observations z t ,the goal is to estimate the correspond-ing sequence of pose vectors x t .We work as follows:At each time step,we obtain an approximate preliminarypose estimate ˇxt from the previous two pose vectors,us-ing a dynamical model learned by linear least squares re-gression.We then update this to take account of the ob-servations z t using a joint RVM regression over ˇxt and z t —x =r (ˇx ,z )—learned from a set of labelled training examples {(z i ,x i )|i =1...n }.The regressor is a lin-ear combination r (x ,z )≡ k a k φk (x ,z )of prespecified scalar basis functions {φk (x ,z )|k =1...p }(here,instan-tiated Gaussian kernels).The learned regressor is regular in the sense that the weight vectors a k are well-damped to control over-fitting,and sparse in the sense that many of them are zero.Sparsity occurs because the RVM actively selects only the ‘most relevant’basis functions —the ones that really need to have nonzero coefficients to complete the regression successfully.Previous work:There is a good deal of prior work on hu-man pose analysis,but relatively little on directly learning 3D pose from image measurements.(Brand,1999)models a dynamical manifold of human body configurations with a Hidden Markov Model and learns using entropy minimiza-tion.(Athitsos &Sclaroff,2000)learn a perceptron map-ping between the appearance and parameter spaces.Human pose is hard to ground truth,so most papers in this area use only heuristic visual inspection to judge their results.How-ever,the interpolated-k -nearest-neighbor learning method of (Shakhnarovich et al.,2003)used a human model ren-dering package (P OSER from Curious Labs)to synthesize ground-truthed training and test images of 13degree of freedom upper body poses with a limited (±40◦)set of ran-dom torso movements and view points,obtaining RMS es-timation errors of about 20◦per d.o.f.In comparison,our regression algorithm estimates full 54d.o.f.body pose and orientation —a problem whose high dimensionality would really stretch the capacity of an example based method such as (Shakhnarovich et al.,2003)—with mean errors of only about 4◦.We also used P OSER to synthesize a largesetFigure 1.Different 3D poses can have very similar image obser-vations,causing the regression from image silhouettes to 3D pose to be inherently multi-valued.of training and test images from different viewpoints,but rather than using random synthetic poses,we used poses taken from real human motion capture sequences.Our re-sults thus relate to real poses and we also capture the dy-namics of typical human motions for temporal consistency.The motion capture data was taken from the public website /graphics/animWeb/humanoid .(Howe et al.,1999)developed a Bayesian learning frame-work to recover 3D pose from known image locations of body joint centres,based on a training set of pose-centre pairs obtained from resynthesized motion capture data.(Mori &Malik,2002)estimate the centres using shape con-text image matching against a set of training images with pre-labelled centres,then reconstruct 3D pose using the al-gorithm of (Taylor,2000).Rather than working indirectly via joint centres,we chose to estimate pose directly from the underlying image descriptors,as we feel that this is likely to prove both more accurate and more robust,pro-viding a generic framework for estimating and tracking any prespecified set of parameters from image observations.(Pavlovic et al.,2000,Ormoneit et al.,2000)learn dynami-cal models for specific human motions.Particle filters and MCMC methods have widely been used in probabilistic tracking frameworks e.g .(Sidenbladh et al.,2002).Most of the previous learning based methods for human track-ing take a generative,model based approach,whereas our approach is essentially discriminative.2.Observations as Shape DescriptorsTo improve resistance to segmentation errors and occlu-sions,we use a robust representation for our image ob-servations.Of the many different image descriptors that could be used for human pose estimation,and in line with (Brand,1999,Athitsos &Sclaroff,2000),we have chosen to base our system on image silhouettes.There are twomain problems with silhouettes:(i)Artifacts such as shadow attachment and poor background segmentation tend to distort their local form.This often causes problems when global descriptors such as shape moments are used,as in (Brand,1999,Athitsos&Sclaroff,2000),because each lo-cal error pollutes every component of the descriptor.To be robust,shape descriptors must have good spatial local-ity.(ii)Silhouettes make several discrete and continuous degrees of freedom invisible or poorly visible.It is diffi-cult to tell frontal views from back ones,whether a person seen from the side is stepping with the left leg or the right one,and what are the exact poses of arms or hands that fall within(are‘occluded’by)the torso’s silhouette(see fig.1).These factors limit the performance attainable from silhouette-based methods.Histograms of edge information are a good way to encode local shape robustly(Lowe,1999).Here,we use shape con-texts(histograms of local edge pixels into log-polar bins) (Belongie et al.,2002)to encode silhouette shape quasi-locally over a range of scales,making use of their locality properties and capability to encode approximate spatial po-sition on the silhouette—see(Agarwal&Triggs,2004a). Unlike Belognie et al,we use quite small image regions (roughly the size of a limb)to compute our shape contexts, and for increased locality,we normalize each shape con-text histogram only by the number of points in its region. This is essential for robustness against occlusions,shad-ows,etc.The shape context distributions of all edge points on a silhouette are reduced to100-D histograms by vec-tor quantizing the60-D shape context space using Gaussian weights to vote softly into the few histogram centres nearest to the contexts.This softening allows us to compare his-tograms using simple Euclidean distance rather than,say, Earth Movers Distance(Rubner et al.,1998).Each image observation(silhouette)is thusfinally reduced to a100-D quantized-distribution-of-shape-context vector,giving rea-sonably good robustness to occlusions and to local silhou-ette segmentation failures.3.Tracking and RegressionThe3D pose can only be observed indirectly via ambiguous and noisy image measurements,so it is appropriate to start by considering the Bayesian tracking framework in which our knowledge about the state(pose)x t given the observa-tions up to time t is represented by a probability distribu-tion,the posterior state density p(x t|z t,z t−1,...,z0). Given an image observation z t and a prior p(x t)on the corresponding pose x t,the posterior likelihood for x t is usually evaluated using Bayes’rule,p(x t|z t)∝p(z t|x t)p(x t),where p(z t|x t)is a precise‘generative’ob-servation model that predicts z t and its uncertainty given x t.Unfortunately,when tracking objects as complicated as the human body,the observations depend on a great many factors that are difficult to control,ranging from lighting and background to body shape and clothing style and tex-ture,so any hand-built observation model is necessarily a gross oversimplification.One way around this would be to learn the generative model p(z|x)from examples,then to work backwards via its Ja-cobian to get a linearized state update,as in the extended Kalmanfilter.However,this approach is somewhat indirect, and it may waste a considerable amount of effort modelling appearance details that are irrelevant for predicting pose. Instead,we prefer to learn a‘discriminative’(diagnostic or anti-causal)model p(x|z)for the pose x given the obser-vations z—c.f.the difference between generative and dis-criminative classification,and the regression based trackers of(Jurie&Dhome,2002,Williams et al.,2003).Similarly, in the context of maximum likelihood pose estimation,we would prefer to learn a‘diagnostic’regressor x=x(z), i.e.a point estimator for the most likely state x given the observations z,not a generative predictor z=z(x). Unfortunately,this brings up a second problem.In monocu-lar human pose reconstruction,image projection suppresses most of the depth(camera-object distance)information,so the state-to-observation mapping is always many-to-one.In fact,even when the labelled image positions of the pro-jected joint centers are known exactly,there may still be some hundreds or thousands of kinematically possible3D poses,linked by‘kinematicflipping’ambiguities(c.f.e.g. (Sminchisescu&Triggs,2003)).Using silhouettes as im-age observations allows relatively robust feature extraction, but induces further ambiguities owing to the lack of limb labelling:it can be hard to tell back views from front ones, and which leg or arm is which in side views.These ambi-guities make learning to regress x from z difficult because the true mapping is actually multi-valued.A single-valued least squares regressor will tend to either zig-zag erratically between different training poses,or(if highly damped)to reproduce their arithmetic mean(Bishop,1995),neither of which is desirable.Introducing a robustified cost func-tion might help the regressor to focus on just one branch of the solution space so that different regressors could be learned for different branches,but applying this in a heav-ily branched54-D target space is not likely to be straight-forward.To reduce the ambiguity,we can take advantage of the fact that we are tracking and work incrementally from the pre-vious state x t−11(e.g.(D’Souza et al.,2001)).The basic assumption of discriminative tracking is that state informa-tion from the current observation is independent of state in-1As an alternative we tried regressing the pose x t against a sequence of the last few silhouettes(z t,z t−1,...),but the ambi-guities are found to persist for several frames.formation from previous states(dynamics):p(x t|z t,x t−1,...)∝p(x t|z t)p(x t|x t−1,...)(1) The pose reconstruction ambiguity is reflected in the fact that the likelihood p(x t|z t)is typically multimodal(e.g. it is obtained by using Bayes’rule to invert the many-to-one generative model p(z|x)).Probabilistically this isfine,but to handle it in the context of point estima-tion/maximum likelihood tracking,we would in princi-ple need to learn a multi-valued regressor for x t(z t)and then fuse each of the resulting pose estimates with the esti-mate from the dynamics-based regressor x t(x t−1,...).In-stead,we adopt the working hypothesis that given the dy-namics based estimate—or any other rough initial esti-mateˇx t for x t—it will usually be the case that only one of the observation-based estimates is at all likely a poste-riori.Thus,we can use theˇx t value to“select the correct solution”for the observation-based reconstruction x t(z t).Formally this gives a regressor x t=x t(z t,ˇx t),whereˇx t serves mainly as a key to select which branch of the pose-from-observation space to use,not as a useful prediction of x t in its own right.(To work like this,this regressor must be nonlinear and well-localized inˇx t).Taking this one step further,ifˇx t is actually a useful estimate of x t(e.g.from a dynamical model),we can use a single regressor of the same form,x t=x t(z t,ˇx t),but now with a stronger de-pendence onˇx t,to capture the net effect of implicitly recon-structing an observation-estimate x t(z t)and then fusing it withˇx t to get a better estimate of x t.4.Learning the Regression ModelsIn this section we detail the regression methods that we use for recovering3D human body pose.Poses are represented as real vectors x∈R m.For a full body model,these are 55-dimensional,including3joint angles for each of the18 major body joints2.This is not a minimal representation of the true human pose degrees of freedom,but it corresponds to our motion capture based training data,and our regres-sion methods handle such redundant output representations without problems.4.1.Dynamical(Prediction)ModelHuman body dynamics can be modelled fairly accurately with a second order linear autoregressive process,x t=ˇx t+ ,whereˇx t≡˜A x t−1+˜B x t−2is the second or-der dynamical estimate of x t and is a residual error vector (c.f.e.g.(Agarwal&Triggs,2004b)).To ensure dynamical2The subject’s overall azimuth(compass heading angle)θcan wrap around through360◦.We maintain continuity by regressing (a,b)=(cosθ,sinθ)rather thanθ,using atan2(b,a)to recover θfrom the not-necessarily-normalized vector returned by regres-sion.We thus have3×18+1=55parameters to estimate.05010015020025030020406080100Figure2.An example of mistracking caused by an over-narrow pose kernel K x.The kernel width is set to1/10of the optimal value,causing the tracker to lose track from about t=120,after which the state estimate drifts away from the training region and all kernels stopfiring by about t=200.Top:the variation of one parameter(left hip angle)for a test sequence of a person walk-ing in a spiral.Bottom:The temporal activity of the120kernels (training examples)during this track.The banded pattern occurs because the kernels are samples taken from along a similar2.5cy-cle spiral walking sequence,each circuit involving about8steps. The similarity between adjacent steps and between different cir-cuits is clearly visible,showing that the regressor can locally still generalize well.stability and avoid over-fitting,we actually learn the autore-gression forˇx t in the following form:ˇx t≡(I+A)(2x t−1−x t−2)+B x t−1(2) where I is the m×m identity matrix.We estimate A and B by regularized least squares regression against x t,mini-mizing 22+λ( A 2Frob+ B 2Frob)over the training set, with the regularization parameterλset by cross-validation to give a well-damped solution with good generalization.4.2.Likelihood(Correction)ModelNow consider the observation model.As discussed above, the underlying density p(x t|z t)is highly multimodal ow-ing to the pervasive ambiguities in reconstructing3D pose from monocular images,so no single-valued regression function x t=x t(z t)can give acceptable point estimates for x t.This is confirmed in practice:although we have managed to learn moderately successful pose regressors x=x(z),they tend to systematically underestimate pose angles(owing to effective averaging over several possibleFigure3.The variation of the RMS test-set tracking error with damping factor s.See the text for discussion.solutions)and to be subject to occasional glitches where the wrong solution is selected(Agarwal&Triggs,2004a). Although such regressors can be combined with dynamics-based predictors,this only smooths the results:it cannot remove the underlying underestimation and‘glitchiness’. In default of a reliable method for multi-valued regression, we include a non-linear dependence onˇx t with z t in the observation-based regressor.Our full regression model also includes an explicitˇx t term to represent the direct con-tribution of the dynamics to the overall state estimate,so thefinal model becomes x t≡ˆx t+ where is a residual error to be minimized,and:ˆx t≡Cˇx t+pk=1d kφk(ˇx t,z t)=C Dˇx tf(ˇx t,z t)(3)Here,{φk(x,z)|k=1...p}is a set of scalar-valued basis functions for the regression,and d k are the corre-sponding R m-valued weight vectors.For compactness,we gather these into an R p-valued feature vector f(x,z)≡(φ1(x,z),...,φp(x,z)) and an m×p weight matrix D≡(d1,...,d p).In the experiments reported here,we used instantiated-kernel bases of the form:φk(x,z)=K x(x,x k)·K z(z,z k)(4) where(x k,z k)is a training example and K x,K z are(here, independent Gaussian)kernels on x-space and z-space, K x(x,x k)=e−βx x−x k 2and K z(z,z k)=e−βz z−z k 2. Building the basis from Gaussians based at training exam-ples in joint(x,z)space forces examples to become rel-evant only if they have similar estimated poses and simi-lar image silhouettes.It is essential to choose the relative widths of the kernels appropriately.In particular,if the x-kernel is chosen too wide,the method tends to average over(or zig-zag between)several alternative pose-from-observation solutions,which defeats the purpose of includ-ingˇx in the observation regression.On the other hand,by locality,the observation-based state corrections are effec-tively‘switched off’whenever the state happens to wander too far from the observed training examples x k.So if the x-kernel is set too narrow,observation information is only incorporated sporadically and mistracking can easily occur.RVM Training Algorithm0.Initialize A with ridge regression.Initialize the run-ning scale estimates a scale= a for the components or vectors a.1.Approximate theνlog a penalty terms with “quadratic bridges”ν(a/a scale)2+const(the gradients match at a scale);2.Solve the resulting linear least squares problem in A;3.Remove any components a that have become zero,up-date the scale estimates a scale= a ,and continue from 1until convergence.Figure4.Our RVM training algorithm.Fig.2illustrates this effect,for an x-kernel a factor of10 narrower than the optimum.The method initially seemed to be sensitive to the kernel width parameters,but after select-ing optimal parameters by cross-validation on an indepen-dent motion sequence we observed accurate performance over a sufficiently wide range of both the kernel widths:a tolerance factor of∼2onβx and∼4onβz.The coefficient matrix C in(3)plays an interesting role. Setting C≡I forces the correction model to act as a differ-ential update onˇx t.On the other extreme,C≡0gives largely observation-based state estimates with only a la-tent dependence on the dynamics.An intermediate setting, however,turns out to give best overall results.Damping the dynamics slightly ensures stability and controls drift—in particular,preventing the observations from disastrously ‘switching off’because the state has drifted too far from the training examples—while still allowing a reasonable amount of dynamical ually we estimate the full(regularized)matrix C from the training data,but to get an idea of the trade-offs involved,we also studied the effect of explicitly setting C=s I for s∈[0,1].Wefind that a small amount of damping,s opt≈.98gives the best results overall,maintaining a good lock on the observations with-out losing too much dynamical smoothing(seefig.3.)This simple heuristic setting gives very similar results to the full model obtained by learning an unconstrained C.4.3.Relevance Vector RegressionThe regressor is learned using a Relevance Vector Machine (Tipping,2001).This sparse Bayesian approach gives sim-ilar results to methods such as damped least squares/ridge regression,but selects a much more economical set of ac-tive training examples for the kernel basis.We have also tested a number of other training methods(including ridge regression)and bases(including the linear basis).These are not reported here,but the results turn out to be relatively in-sensitive to the training method used,with the kernel bases having a slight edge.Figure5.Tracking results on a spiral walking test sequence.(a)Variation of a joint-angle parameter,as predicted by a pure dynamical model initialized at t={0,1},(b)Estimated values of this angle from regression on observations alone(i.e.no initialization or temporal information),(c)Results from our novel joint regressor,obtained by combining dynamical and state+observation based regression models.(d,e,f)Similar plots for the overall body rotation angle.Note that this angle wraps around360◦,i.e.θ∼=θ±360◦.When regressing y on x(using generic notation),we useEuclidean norm to measure y-space prediction errors,sothe estimation problem takes the form:A:=arg minA ni=1A f(x i)−y i 2+R(A)(5)where R(−)is a regularizer on A.RVM’s take either in-dividual parameters or groups of parameters a(in our case, columns of A),and imposeνlog a regularizers or priors on each group.Rather than using the(Tipping,2000)al-gorithm for training,we use a continuation method based on successively approximating theνlog a regularizers with quadratic“bridges”ν( a /a scale)2chosen to match the prior gradient at a scale,a running scale estimate for a. The bridging functions allow parameters to pass through zero if they need to,without too much risk of premature trapping at zero.The algorithm is sketched infig.4.Regu-larizing over whole columns(rather than individual compo-nents)of A ensures a sparse expansion,as it swaps entire basis functions in or out.5.Experimental Results&AnalysisWe conducted experiments using a database of motion cap-ture data for an m=54d.o.f.body model(3angles for each of18joints,including body orientation w.r.t.the camera). We report mean(over all angles)RMS(over time)absolute difference errors between the true and estimated joint angle vectors,in degrees:D(x,x )=1mmi=1|(x i−x i)mod±180◦|(6)The training silhouettes were created by using Curious Labs’P OSER to re-render poses obtained from real human motion capture data,and reduced to100-D shape descrip-tor vectors as in§2.We used8different sequences totalling ∼2000instantaneous poses for training,and another two sequences of∼400points each as validation and test sets. The dynamical model is learned from the training data ex-actly as described in§4.1,but when training the obser-vation model,wefind that its coverage and capture ra-dius can be increased by including a wider selection ofˇx t values than those produced by the dynamical predictions. Hence,we train the model x=x t(ˇx,z)using a combina-tion of‘observed’samples(ˇx t,z t)(withˇx t computed from (2))and artificial samples generated by Gaussian sampling N(x t,Σ)around the training state x t.The observation z t corresponding to x t is still used,forcing the observation based part of the regressor to rely mainly on the observa-tions,i.e.on recovering x t(or at least an update toˇx t)from z t,usingˇx t mainly as a hint about the inverse solution to choose.The covariance matrixΣis chosen to reflect the local scatter of the training examples,with a larger variance along the tangent to the trajectory at each point to ensure that phase lag between the state estimate and the true state is reliably detected and corrected.Fig.5illustrates the relative contributions of the different terms in our model by plotting tracking results for a mo-tion capture test sequence in which the subject walks in adecreasing spiral.(This sequence was not included in the training set,although similar ones were).The purely dy-namical model(2)provides good estimates for a few time steps,but gradually damps and drifts out of phase.(Such damped oscillations are characteristic of second order linear autoregressive dynamics,trained with enough regulariza-tion to ensure model stability).At the other extreme,using observations alone without any temporal information(i.e. C=0and K x=1)provides noisy reconstructions with occasional‘glitches’due to incorrect reconstructions.Pan-els(c),(f)show that joint regression on both dynamics and observations gives smoother and stabler tracking.There is still some residual misestimation of the hip angle in(c)at around t=140and t=380.Here,the subject is walking di-rectly towards the camera(heading angleθ∼0◦),so the only cue for hip angle is the position of the corresponding foot, which is sometimes occluded by the opposite leg.Even humans have difficulty estimating this angle from the sil-houette at these points.Fig.6shows some silhouettes and corresponding maximum likelihood pose reconstructions,for the same test sequence. The3D poses for thefirst two time steps were set by hand to initialize the dynamical predictions.The average RMS esti-mation error over all joints using the RVM regressor in this test is4.1◦.Well-regularized least squares regression over the same basis gives similar errors,but has much higher storage requirements.The Gaussian RVM gives a sparse regressor for(3)involving only348of the1927training ex-amples,thus allowing a significant reduction in the amount of training data that needs to be stored.Reconstruction re-sults on a test video sequence are shown infig.7.The re-construction quality demonstrates the generalized dynami-cal behavior captured by the model as well as the method’s robustness to imperfect visual features,as a naive back-ground subtraction method was used to extract somewhat imperfect silhouettes from the images.In terms of computational time,thefinal RVM regressor al-ready runs in real time in Matlab.Silhouette extraction and shape-context descriptor computations are currently done offline,but would be doable online in real time.The(of-fline)learning process takes about26min for the RVM with ∼2000data points,and about the same again for(Matlab) Shape Context extraction and clustering.The method is reasonably robust to initialization errors.The results shown infigs.5and6were obtained by initializing from ground truth,but we also tested the effects of auto-matic(and hence potentially incorrect)initialization.In an experiment in which the tracker was automatically initial-ized at each time step in turn using the pure observation model,then tracked forwards and backwards using the dy-namical tracker,the initialization lead to successful track-ing in84%of the cases.The failures occur at the‘glitches’,t=001t=060t=120t=180t=240t=300 Figure6.Some sample pose reconstructions for a spiral walking sequence not included in the training data,corresponding tofig-ures5(c)&(f).The reconstructions were computed with a Gaus-sian kernel RVM,using only348of the1927training examples. The average RMS estimation error per d.o.f.over the whole se-quence is4.1◦.where the observation model gave completely incorrect ini-tializations.6.Discussion&ConclusionsWe have presented a method that recovers3D human body pose from sequences of monocular silhouettes by direct nonlinear regression of joint-angles against histogram-of-shape-context silhouette shape descriptors and dynamics based pose estimates.No3D body model or labelling of image positions of body parts is required.Regressing the pose jointly on image observations and previous poses al-lows the intrinsic ambiguity of the pose-from-monocular-observations problem to be overcome,thus producing sta-ble,temporally consistent tracking.We use a kernel-based Relevance Vector Machine for the regression,thus selecting a sparse set of relevant training examples as exemplars.The method shows promising results on tracking unseen video sequences,giving an average RMS error of4.1◦per body-joint-angle on real motion capture data.Future work:We plan to investigate the extension of our regression based system to a complete discriminative Bayesian tracking framework,including multiple hypothe-ses and robust error models.We would also like to include richer features,such as internal edges in addition to silhou-ette boundaries to reduce susceptibility to poor image seg-mentation.AcknowledgmentsThis work was supported by the European Union projects VIBES and LA V A,and the research network PASCAL.。