Safir A spatial audio framework for instrumented rooms

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

SAFIR-Spatial Audio Framework for
Instrumented Rooms
Michael Schmitz1
Saarland University,Saarbr¨u cken,Germany
schmitz@cs.uni-sb.de
Abstract.This paper describes a framework that provides a high-level
programming interface in Java to interactively spatialize sound sources
in three dimensions.The system adapts to almost arbitrary loudspeaker
conﬁgurations,which aﬀords a highﬂexibility for the deployment of the
audio system regarding amount and placement of speakers.3D sounds
can be created and easily spatialized in real-time by providing the sound
(e.g.as aﬁle)and its virtual position in space.
1Introduction
Development of tools for user interface design traditionally focused on visual components,various products for creating widget based interfaces are avail-able for almost every major programming language.During the past decade an increased interest in integrating sound into visualization systems and user interfaces has recently lead to a variety of applications and tools in theﬁeld of auditory displays,which can particularly be observed in the conferences of the ICAD1community.A few toolkits for data soniﬁcation or acoustic enhance-ments for widgets emerged(for example[7]),as well as new technologies for spatial sound reproduction and some highﬁdelity implementations for dedi-cated virtual reality(VR)environments.On the consumer market the need for spatial audio could as well be observed in the growing surround sound support by current soundcards.
Spatial sound as an information display in intelligent environments supports human-computer interactions in several aspects:it allows attracting and direct-ing the user’s attention,ambient presentation of information or acoustic feed-back.The spatial attributes of the acoustic output could also be used as an additional dimension to map information that is not inherently directional as for instance done in[2],where an acoustically enhanced progress bar allowed to monitor the state of the progress without having it in the visual focus.
This work describes the SAFIR2system,a high-level Java API3for multi-channel spatial sound synthesis.It furthermore permits to freely setup an ar-bitrary number of loudspeakers in the listening area,whereby the local spatial resolution increases with the speaker density,as described later on.
1International Community for Auditory Display
2Spatial Audio Framework for Instrumented Rooms
3Application Programming Interface
2
2Related Work
Several VR labs have developed their own highﬁdelity,multichannel3D audio systems.Examples for such systems are DIVA[4]and the IEM Cube[3].A dis-advantage of such installations is the need for expensive and speciﬁc equipment, for which the application is written for.Current home consumer soundcards sup-port three dimensional sound as well,accessible through vendor speciﬁc API’s. Microsoft attempted to standardize the interface for sound programming with DirectSound,but still all these APIs run on Windows machines only.Moreover such systems aim on particular speaker setups,typically a quad(4speakers) or Dolby5.1conﬁguration.The spatialization technique on such sound cards is normally HRTF4based(Sensaura,Aureal A3D),where the correct position and orientation of the user(towards the screen)is crucial.HRTF based spatialization tries to reproduce the signals that arrive at each earchannel of one listener in a natural listening situation.
Still a moreﬂexible3D audio system,supporting low cost multichannel equip-ment and compatible to common platforms,is not available.The SAFIR system is aﬁrst step to close this gap.
3The SAFIR system
One of our major requirements for the system design was that the user should be able to move within the instrumented environment without using headphones to keep technology away from the user as much as possible.It should also be possible to setup the system in diﬀerent locations if necessary,in a way such the speakers can beﬁt into the new listening area without rearranging the environ-ment.We also decided to use Java in order to integrate it easily into the existing architecture and to have the possibility to install and run it on diﬀerent plat-forms without tuning the code.In addition to that we wanted to achieve good results without purchasing specialized hardware.Our solution SAFIR enables Java programmers to control multichannel spatial audio easily in real-time with low cost hardware.The playback conﬁguration and speaker setup is veryﬂexible and can be chosen according to the applications needs and budget.
As mentioned earlier,SAFIR is developed in Java and based on JSyn libraries for sound synthesis.JSyn is a Java API for sound synthesis and uses routines written in C to be able to perform audio processing eﬃciently in real-time.
A free license is currently available for Windows,Linux and MacOS systems. VBAP(Vector Based Amplitude Panning)[5]is used for spatialization provid-ing computationally cheap functions for positioning of virtual sound sources. Since VBAP does not create distance cues5,SAFIR controls gains and echoes to express virtual distances.Doppler eﬀects are implemented as well in order to amplify the impression of moving sound objects.
4Head Related Transfer Function
5Cues in our context are acoustic attributes that indicate characteristics of the sound source
3
Fig.1.Structure of SAFIR
Since JSyn classes are based on the traditional unit generator model,we also structured SAFIR in a similar manner,which can be seen inﬁgure1.
It describes the signalﬂow from the sound source signal on the left to theﬁnal speaker signals on the right,with each square in between representing one step of the signal processing chain.Other input parameters relevant for the processing steps are shown in the box on top,namely the positions of loudspeakers,user and the virtual sound source.SAFIRﬁrst generates distance,then directional cues andﬁnally adjusts the speaker channels.The following sections describe these processing steps in order of the signalﬂow.
3.1Distance
Human perception of distance does not work as accurate as of direction of sound sources.The main distance cue is the decreasing intensity of a sound source
over growing distance due to air absorption.The1
r2law,also used by SAFIR,
expresses this behavior suﬃciently.This means for example that the intensity
will be multiplied by1
22=1
4
if the sound source moves twice as far away.Another
(indoor)distance cue is the time and energy relation between direct and reﬂected signal.Sound signals do not only travel directly from a source to a receiver but also reﬂect from walls and indirectly reach the listener marginally later and with less energy due to air absorption.This will not be perceived as a separate sound event but interpreted by the brain to estimate the distance of the source.To keep the computational costs low SAFIR only considers theﬁrst reﬂection and
4
assumes that it only occurs from the same direction as the original source,which would be true in a spherical ing such a sphere with a given radius as a simpliﬁed room model,it is possible to compute the additional travelling distance of the reﬂection for all positions.SAFIR now sets the radius to the distance of the furthest loudspeaker(which will probably be in a corner of the room)and mixes a delayed and attenuated copy of the source signal to itself.
The doppler eﬀect is created by another delay that is controlled by a linear function of distance to the virtual sound and the listener and updated whenever the sound source position changes:This will create a frequency shift,which will temporarily increase the pitch of the sound source when it moves towards the user and decrease it when it moves away.
3.2Direction
SAFIR uses the VBAP algorithm to simulate the directions of a virtual sound source.It is a multichannel spatialization technique that allows positioning sound sources on a surface that is spanned by a set of loudspeakers.VBAP chooses speakers that will emit the signal and computes the intensity of these speakers (local intensity panning).In case that all loudspeakers lie in the same horizontal plane,e.g.on a ring around the listener,the algorithm runs in a simpliﬁed2D mode.The number of used loudspeakers and their positions is generally arbitrary, but a higher density of speakers will improve the accuracy of the spatialization.
First step of the VBAP algorithm is to determine the active loudspeakers. Depending on the relative position of the virtual sound source to the listener, up to three speakers with the shortest angular distance to the sound source become active while all others stay mute.In2D mode,only up to two speakers become active and elevation will not be considered.In special cases when e.g. the position of the virtual source matches with a speaker,only that speaker will be the active one.Now the relation of the gain factors of the speakers will be determined on vector basis as shown in theﬁgure2:
The unit vectors l1and l2point towards the active speakers.Unit vector p, pointing towards the sound source,can be formulated as linear combination of speaker vectors l1and l2:
p=g1l1+g2l2(1) g1and g2are the resulting intensity factors for the active speakers and can be computed by simple equations.This algorithm works accordingly for3D mode.
Output of SAFIR’s direction module are between one and three copies of the signal with certain intensities,each of which will be routed to one speaker.
3.3Speaker Channel Adjustments
After the sound sources have passed the distance module,they were routed by the direction block to their active speaker channels as described above.Now after mixing all sounds that are playing in parallel,SAFIR adjusts the speaker
5
Fig.2.VBAP formulation in2D mode
channels incorporating the diﬀerent distances of the speakers to the user.Same principles as in3.1are applied to attenuate and delay speakers that are closer than others.It is especially important,since VBAP assumes that distances of all speakers to the listener are equal.This gives us moreﬂexibility in positioning the speakers in the room and creates a larger’sweet spot’6.
3.4System Usage
The system requires the coordinates of the loudspeakers as input parameters–this could be supplied by a VRML model for instance or a plain textﬁle.Basically two Java classes are needed to manage the audio playback:The AudioContext provides methods to specify the user’s position at runtime and other parame-ters that eﬀect the overall audio reproduction e.g.the general volume.Instan-tiating this class will start the audio server and allow generating instances of AudioObject s.Each AudioObject represents a virtual sound sources and has (among others)methods to interactively position them in space.This separa-tion between context and objects easily allows to supply several rooms with one server if suﬃcient soundcards are available.Sound sources can either beﬁles,IP streams or live recordings from microphone/line-in channels.
3.5Future Work
SAFIR was developed for the Saarland University Pervasive Instrumented En-vironment(SUPIE-as described in[1])and will be used to support interactions in diﬀerent scenarios:Spatial sound will e.g.be attached to the projection of an Everywhere Display and used to guide the user’s focus of attention.Acoustic feedback will be convenient for gesture interactions that-for example-move virtual objects in the environment to reﬂect the changes of the system status. 6Listening area where the spatial impression of audio play back is convincing
6
Beyond that SAFIR will also be used to support new anthropomorphous in-terfaces like a virtual inhabitant of the environment that can migrate between diﬀerent displays and devices.Another idea that will be investigated are Talking Objects.This concept allows users to directly communicate with smart objects, which refers to results of Reeves and Naas[6]who suggest that users often tend to treat objects similar to humans.Such approaches imply that users will be in-volved in dialogues with physical objects that in most cases cannot deliver audio on their own.SAFIR will support this metaphor by providing audio channels that can be spatially related to such objects.The SAFIR library will be available on our web site to allow other groups that are interested to download and use it for their projects.
It is planned to integrate a steerable AudioBeam,a directional loudspeaker that can send a very narrow sound beam only hearable by selected individuals, such that the environment could create private audio channels for users in the same room.Control of this device will then be incorporated into SAFIR to provide extensions for multiple users and privacy settings.
References
1. A.Butz and A.Kr¨u ger.A generalized peephole metaphor for augmented reality
and instrumented environments.In Proceedings of The InternationalWorkshop on Software Technology for Augmented Reality Systems(STARS),2003.
2.M.Crease and S.Brewster.Making progress with sounds-the design and evaluation
of an audio progress bar.In Proceedings of the Fifth International Conference on Auditory Display,1998.
3.M.Fellner and R.H¨o ldrich.Physiologische und psychoakustische grundlagen des
r¨a umlichen h¨o rens,1998.
4.J.Huopaniemi,L.Savioja,and T.Takala.Diva virtual audio reality system.In
Proceedings of the Third International Conference on Auditory Display,pp.111-116, 1996.
5.V.Pulkki.Spatial sound generation and perception by amplitude panning tech-
niques,2001.
6. B.Reeves and C.Nass.The media equation:How people treat computers,televi-
sion,and new media like real people and places.CSLI Publications and Cambridge university press,1996.
7. C.M.Wilson and S.K.Lodha.Listen:a data soniﬁcation toolkit.In Proceedings
of the Third International Conference on Auditory Display,1996.。