音乐就是一系列的音符,这些音符在不同的时间用不同的幅度被播放或者停止。有非常多的指令被用来播放音乐,但是这些指令的操作基本相同,都在使用各种各样不同的音符。在计算机上进行作曲,实际上是存储了很多组音乐,回放时由音频硬件将这些音符播放出来。
Midi格式(文件扩展名是.MID)是存储数字音乐的标准格式。
DirectMusic 音乐片段(music segments)使用.SGT文件扩展名,其他的相关文件包括乐队文件(band file .BND),这种文件里面包含乐器信息;弦映射表文件(chordmaps
file .CDM)包含在回放时修改音乐的和弦指令;样式文件(styles file .STY)包含回放样式信息;模板文件(templates file .TPL)包含创造音乐片段的模板。
Midi是一种非常强大的音乐格式,惟一的不利因素是音乐品质依赖于音乐合成器的性能,因为Midi
仅仅记录了音符,其播放的品质由播放音乐的软硬件决定。MP3文件(文件后缀为.MP3)是一种类似于波表文件的文件格式,但是MP3文件和WAV文件最大的区别在于MP3文件将声音压缩到了最小的程度,但是音质却基本不变。可以用DirectShow组件播放MP3文件,DirectShow组件是一个非常强大的多媒体组件,用DirectShow几乎可以播放任何媒体文件,包括声音和音频文件,部分声音文件我们只能用DirectShow播放。
Direct Audio是一个复合组件,它由DirectSound和DirectMusic两个组件组成,如下图所示:
DirectMusic在DirectX8中得到了巨大的增强,但是DirectSound基本保持原有的状态。DirectSound是主要的数字声音回放组件。DirectMusic处理所有的乐曲格式,包括MIDI、DirectMusic本地格式文件和波表文件。DirectMusic处理完之后将它们送入DirectSound中做其他处理,这意味着回放MIDI的时候可以使用数字化的乐器。
使用DirectSound
使用时需要创建一个和声卡通讯的COM对象,用这个COM对象再创造一些独立的声音数据缓冲区(被称之为辅助音频缓冲区 secondary sound
buffers)来存储音频数据。缓冲区中的这些数据在主混音缓存(称之为主音频缓存 primary sound
buffer)中被混合,然后可以用指定的任何格式播放出来。回放格式通过采样频率、声道数、采样精度排列,可能的采样频率有8000HZ,
11025HZ,22050HZ和44100HZ(CD音质)。
对于声道数可以有两个选择:单通道的单声道声音和双通道的立体声声音。采样精度被限制在两种选择上:8位的低质量声音和16位的高保真声音。在没有修改的情况下,DirectSound主缓冲区的默认设置是22025HZ采样率、8位精度、立体声。在DirectSound中可以调整声音的播放速度(这同样会改变声音的音调),调整音量
、循环播放等。甚至还可以在一个虚拟的
3D环境中播放,以模拟一个实际环绕在周围的声音。
需要做的是将声音数据充满缓冲区,如果声音数据太大的话,必须创建流播放方法,加载声音数据中的一小块,当这一小块播放完毕以后,再加载另外的小块数据进缓冲区,一直持续这个过程,直到声音被处理完毕。在缓冲区中调整播放位置可以实现流式音频,当播放完成通知应用程序更新音频数据。这个通知更新的过程我们称之为“通告”。在同一时间被播放的缓存数目虽然没有限制,但是仍然需要保证缓冲区数目不要太多,因为每增加一个缓冲区,就要消耗很多内存和CPU资源。
在项目中使用DirectSound和DirectMusic,需要添加头文件dsound.h和dmsuic.h,并且需要链接DSound.lib到包含库中,添加DXGuid.lib库可以让DirectSound更容易使用。
以下是DirectSound COM接口:
IDirectSound8:DirectSound接口。
IDirectSoundBuffer8:主缓冲区和辅助缓冲区接口,保存数据并控制回放。
IDirectSoundNotify8:通知对象,通知应用程序指定播放位置已经达到。
各个对象间的关系如下图所示:
IDirectSound8是主接口,用它来创建缓冲区(IDirectSoundBuffer8),然后用缓冲区接口创建通告接口(IDirectSoundNotify8),通告接口告诉应用程序指定的位置已经到达,通告接口在流化音频文件时非常有用。
初始化DirectSound
使用 DirectSound的第一步是创建IDirectSound8对象,IDirectSound8起到控制音频硬件设备的作用,可以通过
DirectSoundCreate8函数来创建。
The
DirectSoundCreate8 function creates and initializes an object that supports the
IDirectSound8 interface.
HRESULT DirectSoundCreate8(
LPCGUID lpcGuidDevice,
LPDIRECTSOUND8 * ppDS8,
LPUNKNOWN pUnkOuter
);
Parameters
- lpcGuidDevice
- Address of the GUID that identifies the
sound device. The value of this parameter must be one of the GUIDs returned
by DirectSoundEnumerate, or NULL for the default device, or one of the
following values.
Value |
Description |
DSDEVID_DefaultPlayback |
System-wide default audio playback
device. Equivalent to NULL. |
DSDEVID_DefaultVoicePlayback |
Default voice playback device. |
- ppDS8
- Address of a variable to receive an
IDirectSound8 interface pointer.
- pUnkOuter
- Address of the controlling object's
IUnknown interface for COM aggregation. Must be NULL, because aggregation is
not supported.
Return Values
If the function
succeeds, it returns DS_OK. If it fails, the return value may be one of the
following.
Return Code |
DSERR_ALLOCATED |
DSERR_INVALIDPARAM |
DSERR_NOAGGREGATION |
DSERR_NODRIVER |
DSERR_OUTOFMEMORY |
Remarks
The application
must call the IDirectSound8::SetCooperativeLevel method immediately after
creating a device object.
创建主音频缓冲区
用 IDirectSoundBuffer对象控制主音频缓冲区,创建主缓冲区不需要DirectX8的接口,因为这个接口从来没有改变。用来创建音频缓冲区的函数是IDirectSound8::CreateSoundBuffer。
The CreateSoundBuffer method
creates a sound buffer object to manage audio samples.
HRESULT CreateSoundBuffer(
LPCDSBUFFERDESC pcDSBufferDesc,
LPDIRECTSOUNDBUFFER * ppDSBuffer,
LPUNKNOWN pUnkOuter
);
Parameters
- pcDSBufferDesc
- Address of a DSBUFFERDESC
structure that describes the sound buffer to create.
- ppDSBuffer
- Address of a variable that
receives the IDirectSoundBuffer interface of the new buffer object. Use
QueryInterface to obtain IDirectSoundBuffer8. IDirectSoundBuffer8 is not
available for the primary buffer.
- pUnkOuter
- Address of the controlling
object's IUnknown interface for COM aggregation. Must be NULL.
Return Values
If the method succeeds, the
return value is DS_OK, or DS_NO_VIRTUALIZATION if a requested 3D algorithm was
not available and stereo panning was substituted. See the description of the
guid3DAlgorithm member of DSBUFFERDESC. If the method fails,
the return value may be one of the error values shown in the following table.
Return code |
DSERR_ALLOCATED |
DSERR_BADFORMAT
|
DSERR_BUFFERTOOSMALL |
DSERR_CONTROLUNAVAIL |
DSERR_DS8_REQUIRED |
DSERR_INVALIDCALL |
DSERR_INVALIDPARAM |
DSERR_NOAGGREGATION |
DSERR_OUTOFMEMORY |
DSERR_UNINITIALIZED |
DSERR_UNSUPPORTED |
Remarks
DirectSound does not initialize
the contents of the buffer, and the application cannot assume that it contains
silence.
If an attempt is made to create a
buffer with the DSBCAPS_LOCHARDWARE flag on a system where hardware acceleration
is not available, the method fails with either DSERR_CONTROLUNAVAIL or
DSERR_INVALIDCALL, depending on the operating system.
pcDSBufferDesc是一个指向DSBUFFERDESC结构的指针,保存所创建的缓冲区的信息。
The DSBUFFERDESC structure
describes the characteristics of a new buffer object. It is used by the
IDirectSound8::CreateSoundBuffer method and by the DirectSoundFullDuplexCreate8
function.
An earlier version of this
structure, DSBUFFERDESC1, is maintained in Dsound.h for compatibility with
DirectX 7 and earlier.
typedef struct DSBUFFERDESC {
DWORD dwSize;
DWORD dwFlags;
DWORD dwBufferBytes;
DWORD dwReserved;
LPWAVEFORMATEX lpwfxFormat;
GUID guid3DAlgorithm;
} DSBUFFERDESC;
Members
- dwSize
- Size of the structure, in
bytes. This member must be initialized before the structure is used.
- dwFlags
- Flags specifying the
capabilities of the buffer. See the dwFlags member of the
DSBCAPS structure for a detailed listing of valid flags.
- dwBufferBytes
- Size of the new buffer, in
bytes. This value must be 0 when creating a buffer with the
DSBCAPS_PRIMARYBUFFER flag. For secondary buffers, the minimum and maximum
sizes allowed are specified by DSBSIZE_MIN and DSBSIZE_MAX, defined in
Dsound.h.
- dwReserved
- Reserved. Must be 0.
- lpwfxFormat
- Address of a WAVEFORMATEX or
WAVEFORMATEXTENSIBLE structure specifying the waveform format for the
buffer. This value must be NULL for primary buffers.
- guid3DAlgorithm
- Unique identifier of the
two-speaker virtualization algorithm to be used by DirectSound3D hardware
emulation. If DSBCAPS_CTRL3D is not set in dwFlags, this member must be
GUID_NULL (DS3DALG_DEFAULT). The following algorithm identifiers are
defined.
Value |
Description |
Availability |
DS3DALG_DEFAULT |
DirectSound uses the
default algorithm. In most cases this is DS3DALG_NO_VIRTUALIZATION.
On WDM drivers, if the user has selected a surround sound speaker
configuration in Control Panel, the sound is panned among the
available directional speakers. |
Applies to software
mixing only. Available on WDM or Vxd Drivers. |
DS3DALG_NO_VIRTUALIZATION |
3D output is mapped onto
normal left and right stereo panning. At 90 degrees to the left, the
sound is coming out of only the left speaker; at 90 degrees to the
right, sound is coming out of only the right speaker. The vertical
axis is ignored except for scaling of volume due to distance.
Doppler shift and volume scaling are still applied, but the 3D
filtering is not performed on this buffer. This is the most
efficient software implementation, but provides no virtual 3D audio
effect. When the DS3DALG_NO_VIRTUALIZATION algorithm is specified,
HRTF processing will not be done. Because DS3DALG_NO_VIRTUALIZATION
uses only normal stereo panning, a buffer created with this
algorithm may be accelerated by a 2D hardware voice if no free 3D
hardware voices are available. |
Applies to software
mixing only. Available on WDM or Vxd Drivers. |
DS3DALG_HRTF_FULL |
The 3D API is processed
with the high quality 3D audio algorithm. This algorithm gives the
highest quality 3D audio effect, but uses more CPU cycles. See
Remarks. |
Applies to software
mixing only. Available on Microsoft Windows 98 Second Edition and
later operating systems when using WDM drivers. |
DS3DALG_HRTF_LIGHT |
The 3D API is processed
with the efficient 3D audio algorithm. This algorithm gives a good
3D audio effect, but uses fewer CPU cycles than DS3DALG_HRTF_FULL. |
Applies to software
mixing only. Available on Windows 98 Second Edition and later
operating systems when using WDM drivers. |
需要设置的惟一一个值是dwFlags,这是一系列标志,用于决定缓冲区性能。
- dwFlags
- Flags that specify
buffer-object capabilities. Use one or more of the values shown in the
following table.
Value |
Description |
DSBCAPS_CTRL3D |
The buffer has 3D
control capability. |
DSBCAPS_CTRLFREQUENCY |
The buffer has frequency
control capability. |
DSBCAPS_CTRLFX |
The buffer supports
effects processing. |
DSBCAPS_CTRLPAN |
The buffer has pan
control capability. |
DSBCAPS_CTRLVOLUME |
The buffer has volume
control capability. |
DSBCAPS_CTRLPOSITIONNOTIFY |
The buffer has position
notification capability. See the Remarks for DSCBUFFERDESC. |
DSBCAPS_GETCURRENTPOSITION2 |
The buffer uses the new
behavior of the play cursor when
IDirectSoundBuffer8::GetCurrentPosition is called. In the first
version of DirectSound, the play cursor was significantly ahead of
the actual playing sound on emulated sound cards; it was directly
behind the write cursor. Now, if the DSBCAPS_GETCURRENTPOSITION2
flag is specified, the application can get a more accurate play
cursor. If this flag is not specified, the old behavior is preserved
for compatibility. This flag affects only emulated devices; if a
DirectSound driver is present, the play cursor is accurate for
DirectSound in all versions of DirectX. |
DSBCAPS_GLOBALFOCUS |
The buffer is a global
sound buffer. With this flag set, an application using DirectSound
can continue to play its buffers if the user switches focus to
another application, even if the new application uses DirectSound.
The one exception is if you switch focus to a DirectSound
application that uses the DSSCL_WRITEPRIMARY flag for its
cooperative level. In this case, the global sounds from other
applications will not be audible. |
DSBCAPS_LOCDEFER |
The buffer can be
assigned to a hardware or software resource at play time, or when
IDirectSoundBuffer8::AcquireResources is called. |
DSBCAPS_LOCHARDWARE |
The buffer uses hardware
mixing. |
DSBCAPS_LOCSOFTWARE |
The buffer is in
software memory and uses software mixing. |
DSBCAPS_MUTE3DATMAXDISTANCE |
The sound is reduced to
silence at the maximum distance. The buffer will stop playing when
the maximum distance is exceeded, so that processor time is not
wasted. Applies only to software buffers. |
DSBCAPS_PRIMARYBUFFER |
The buffer is a primary
buffer. |
DSBCAPS_STATIC |
The buffer is in
on-board hardware memory. |
DSBCAPS_STICKYFOCUS |
The buffer has sticky
focus. If the user switches to another application not using
DirectSound, the buffer is still audible. However, if the user
switches to another DirectSound application, the buffer is muted. |
DSBCAPS_TRUEPLAYPOSITION |
Force
IDirectSoundBuffer8::GetCurrentPosition to return the buffer's true
play position. This flag is only valid in Windows Vista. |
以下是创建声音缓冲区的代码:
// setup the DSBUFFERDESC structure
DSBUFFERDESC ds_buffer_desc;
// zero out strcutre
ZeroMemory(&ds_buffer_desc, sizeof(DSBUFFERDESC));
ds_buffer_desc.dwSize = sizeof(DSBUFFERDESC);
ds_buffer_desc.dwFlags = DSBCAPS_CTRLVOLUME;
ds_buffer_desc.dwBufferBytes = wave_format.nAvgBytesPerSec * 2; // 2 seconds
ds_buffer_desc.lpwfxFormat = &wave_format;
// create the fist version object
if(FAILED(g_ds->CreateSoundBuffer(&ds_buffer_desc, &ds, NULL)))
{
// error ocuurred
MessageBox(NULL, "Unable to create sound buffer", "Error", MB_OK);
}
设置格式
对于格式,有一系列的选择,但是建议在11025HZ、16位、单通道;22050HZ、16位、单通道中选择。选择格式的时候,不要尝试使用立体声,立体声浪费处理时间,而且效果很难评估。同样也不要使用16位以外的采样精度,因为这会导致音质的大幅下降。对于采样频率来说,越高越好,但是也不要设置超过
22050HZ,在这个采样频率下,也能表现出CD音质的水准而没有太多的损失。
设置回放格式需要通过调用 IDirectSoundBuffer::SetFormat。
The SetFormat method sets the
format of the primary buffer. Whenever this application has the input focus,
DirectSound will set the primary buffer to the specified format.
HRESULT SetFormat(
LPCWAVEFORMATEX pcfxFormat
);
Parameters
- pcfxFormat
- Address of a WAVEFORMATEX
structure that describes the new format for the primary sound buffer.
Return Values
If the method succeeds, the return
value is DS_OK. If the method fails, the return value may be one of the
following error values:
Return code |
DSERR_BADFORMAT |
DSERR_INVALIDCALL |
DSERR_INVALIDPARAM |
DSERR_OUTOFMEMORY |
DSERR_PRIOLEVELNEEDED |
DSERR_UNSUPPORTED |
Remarks
The format of the primary buffer
should be set before secondary buffers are created.
The method fails if the
application has the DSSCL_NORMAL cooperative level.
If the application is using
DirectSound at the DSSCL_WRITEPRIMARY cooperative level, and the format is not
supported, the method fails.
If the cooperative level is
DSSCL_PRIORITY, DirectSound stops the primary buffer, changes the format, and
restarts the buffer. The method succeeds even if the hardware does not support
the requested format; DirectSound sets the buffer to the closest supported
format. To determine whether this has happened, an application can call the
GetFormat method for the primary buffer and compare the result with the format
that was requested with the SetFormat method.
This method is not available for
secondary sound buffers. If a new format is required, the application must
create a new DirectSoundBuffer object.
这个函数惟一的参数是指向WAVEFORMATEX结构的指针,该结构保存已设置的格式信息。
The WAVEFORMATEX structure
defines the format of waveform-audio data. Only format information common to all
waveform-audio data formats is included in this structure. For formats that
require additional information, this structure is included as the first member
in another structure, along with the additional information.
This structure is part of the
Platform SDK and is not declared in Dsound.h. It is documented here for
convenience.
typedef struct WAVEFORMATEX {
WORD wFormatTag;
WORD nChannels;
DWORD nSamplesPerSec;
DWORD nAvgBytesPerSec;
WORD nBlockAlign;
WORD wBitsPerSample;
WORD cbSize;
} WAVEFORMATEX;
Members
- wFormatTag
- Waveform-audio format type.
Format tags are registered with Microsoft Corporation for many compression
algorithms. A complete list of format tags can be found in the Mmreg.h
header file. For one- or two-channel PCM data, this value should be
WAVE_FORMAT_PCM.
- nChannels
- Number of channels in the
waveform-audio data. Monaural data uses one channel and stereo data uses two
channels.
- nSamplesPerSec
- Sample rate, in samples per
second (hertz). If wFormatTag is WAVE_FORMAT_PCM, then common values for
nSamplesPerSec are 8.0 kHz, 11.025 kHz, 22.05 kHz, and 44.1 kHz. For non-PCM
formats, this member must be computed according to the manufacturer's
specification of the format tag.
- nAvgBytesPerSec
- Required average
data-transfer rate, in bytes per second, for the format tag. If wFormatTag
is WAVE_FORMAT_PCM, nAvgBytesPerSec should be equal to the product of
nSamplesPerSec and nBlockAlign. For non-PCM formats, this member must be
computed according to the manufacturer's specification of the format tag.
- nBlockAlign
- Block alignment, in bytes.
The block alignment is the minimum atomic unit of data for the wFormatTag
format type. If wFormatTag is WAVE_FORMAT_PCM or WAVE_FORMAT_EXTENSIBLE, nBlockAlign must be equal to the product of nChannels and wBitsPerSample
divided by 8 (bits per byte). For non-PCM formats, this member must be
computed according to the manufacturer's specification of the format tag.
Software must process a
multiple of nBlockAlign bytes of data at a time. Data written to and read
from a device must always start at the beginning of a block. For example, it
is illegal to start playback of PCM data in the middle of a sample (that is,
on a non-block-aligned boundary).
- wBitsPerSample
- Bits per sample for the
wFormatTag format type. If wFormatTag is WAVE_FORMAT_PCM, then
wBitsPerSample should be equal to 8 or 16. For non-PCM formats, this member
must be set according to the manufacturer's specification of the format tag.
If wFormatTag is WAVE_FORMAT_EXTENSIBLE, this value can be any integer
multiple of 8. Some compression schemes cannot define a value for
wBitsPerSample, so this member can be zero.
- cbSize
- Size, in bytes, of extra
format information appended to the end of the WAVEFORMATEX structure. This
information can be used by non-PCM formats to store extra attributes for the
wFormatTag. If no extra information is required by the wFormatTag, this
member must be set to zero. For WAVE_FORMAT_PCM formats (and only
WAVE_FORMAT_PCM formats), this member is ignored.
以下设置音频格式为:11025HZ、单通道、16位。
// setup the WAVEFORMATEX structure
WAVEFORMATEX wave_format;
ZeroMemory(&wave_format, sizeof(WAVEFORMATEX));
wave_format.wFormatTag = WAVE_FORMAT_PCM;
wave_format.nChannels = 1; // mono
wave_format.nSamplesPerSec = 11025;
wave_format.wBitsPerSample = 16;
wave_format.nBlockAlign = (wave_format.wBitsPerSample / 8) * wave_format.nChannels;
wave_format.nAvgBytesPerSec = wave_format.nSamplesPerSec * wave_format.nBlockAlign;
阅读下篇:
用DirectX
Audio和DirectShow播放声音和音乐(2)