16-多媒体分析中的信息融合
合集下载
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
12
Tradeoff-(2)
• The modalities may be correlated or independent. The correlation can be perceived at different levels, such as the correlation among low-level features that are extracted from different media streams and the correlation among semantic-level decisions that are obtained based on different streams. On the other hand, the independence among the modalities is also important as it may provide additional cues in obtaining a decision. When fusing multiple modalities, this correlation and independence may equally provide valuable insight based on a particular scenario or context. • The different modalities usually have varying confidence levels in accomplishing different tasks. For example, for detecting the event of a human crying, we may have higher confidence in an audio modality than a video modality.
“McGurk现象”指出,音频信号适合于表述描述性语义, 视频适合于表述指令性语义,视频和音频融合才能表达 一个完整丰富的语义信息,对二者的割裂将使完整语义 信息丢失。“McGurk现象”为进行多媒体音频和视频融 合检索研究提供了理论基础。
5
6
人们各种感官之间的交互多模式作用得以最有力的支持。 对以上现象的早期解释是人们的各种感官区先各自获取信 息,然后经过大脑神经皮质综合处理后形成了新的理解。 不过近年来新的工具发展也帮助丰富了这个解释,最近的 一些研究更表明其实感知和理解过程是并发的交错作用, 而非分别处理后再加以综合,这是一种计算模型层面的变 化(从串行到并行),也就是说人脑中可能有更多的多感 官神经元,充当多种感官之间的连接点而让各种感官直接 相互作用。 更有趣的是它所透露出来的大脑机制,想不到大脑竟是个 乡愿:在遇到不一致或相互抵触的讯息时,竟然是采用妥 协政策,各打五十大板,谁也不得罪,真是滑头得很!如 果不是实验证据,大概万万不会想到大脑会这样。
How Information Fusion Works?
Multiple types of data carrying various types of information (redundant and complementary) Multiple types of data Related to things of interest To improve estimates about those things “Associated” or “Correlated” to the same object or event or behavior So that estimation algorithms (mathematical techniques)—or— automated reasoning methods (artificial intelligence techniques)can produce better estimates (than based on any single type of data)
(2)The other approach is decision level fusion or late fusion which fuses multiple modalities in the semantic space.
(3)A combination of these approaches is also practiced as the hybrid fusion approach .
多媒体检索中的信息融合
1
多媒体融合分析
音频
3D音频合成 音乐分析
语音识别
多媒体融合 索引
唇读同步人脸动画
唇读 自然语言处理
图像/ 视频
压缩/检索
2
信息融合的定义
3
Motivation
4
人的多模态融合
人与人交流 时,对信息 的理解内在 是视听双态 的,这被称 为 “ McGurk 现象”。
心理学上有个很简单的实验叫麦格克效应 (McGurk Effect):我们如果看到萤幕上出现的 嘴型是ba、ba、ba,但是耳朵听到的是ga、ga、 ga时,我们的大脑会把这两种互相冲突的讯息中 和起来,找出一个合理的解释,因此,我们就听 到da、da、da了。这个效应强烈到我们可以在课 堂中没有良好仪器设备或任何控制情况下得出: 只要两个学生,一高一矮,高的站在前,矮的隐 身在他背后,老师一声令下,高的学生做出ba、 ba、ba的嘴型但不出声,背后的同学大声发出ga、 ga、ga的声音,这时全班同学都听到da、da、 da的声音,而且屡试不爽,非常有趣。
14
Challenge(1)- Levels of fusion
One of the earliest considerations is to decide what strategy to follow when fusing multiple modalities.
(1)The most widely used strategy is to fuse the information at the feature level, which is also known as early fusion.
Tradeoff-(1)
The benefit of multimodal fusion comes with a certain cost and complexity in the analysis process. This is due to the different characteristics of the involved modalities, which are briefly stated in the following: • Different media are usually captured in different formats and at different rates. For example, a video may be captured at a frame rate that could be different from the rate at which audio samples are obtained, or even two video sources could have different frame rates. Therefore, the fusion process needs to address this asynchrony to better accomplish a task. • The processing time of different types of media streams are dissimilar, which influences the fusion strategy that needs to be adopted.
音频索引
图像特 征
视频检索
融合分析
。。。
10
多媒体融合检索技术的研究方向
(一)
把音频和视频特征按照一定的时序关系融合到 一个检索框架中;
(二)
用视频(或音频)实现对音频(或视频)相互 索引;
(三)
三:用音频和视频分别得到多媒体场景判断结 果,然后把视频音频的结果结合起来考虑,得 到最后的结果;
11
15
Challenge(2)-How to fuse?
There are several methods that are used in fusing different modalities. These methods are particularly suitable under different settings. The challenge also includes how the fusion process utilizes the feature and decision level correlation among the modalities, and, how the contextual and the confidence information influences the overall fusion process.
These Basic Ideas are Transferable to Many Types of Problems
7
Fusion System Applications
8
Example: Fusion-Based Automatic Object Recognition
9
Fusion System Applications
Indutrial and commercial ap源自文库lications——robotics, machine intelligence, remote sensing, Image processing, medical systems.
文本
文本索 引
音频
Video OCR 索引
运动 信息
图像
运动物体索 引
16
Challenge(3)-When to fuse?
The time when the fusion should take place is an important consideration in the multimodal fusion process. Certain characteristics of media, such as varying data capture rates and processing time of the media, poses challenges on how to synchronize the overall process of fusion. Often this has been Addressed by performing the multimedia analysis tasks (such as event detection) over a timeline. A timeline refers to a measurable span of time with information denoted at designated points. The timeline-based accomplishment of a task requires identification of designated points at which fusion of data or information should take place. Due to the asynchrony and diversity among streams and due to the fact that different analysis tasks are performed at different granularity levels in time, the identification of these designated points, i.e. when the fusion should take place, is a challenging issue .
13
Tradeoff-(3)
• The capturing and processing of media streams may involve certain costs, which may influence the fusion process. The cost may be incurred in units of time, money or other units of measure. For instance, the task of object localization could be accomplished cheaply by using a RFID(无线射频识别)sensor compared to using a video camera.