用DirectX Audio和DirectShow播放声音和音乐（1）

音乐就是一系列的音符，这些音符在不同的时间用不同的幅度被播放或者停止。有非常多的指令被用来播放音乐，但是这些指令的操作基本相同，都在使用各种各样不同的音符。在计算机上进行作曲，实际上是存储了很多组音乐，回放时由音频硬件将这些音符播放出来。

Midi格式（文件扩展名是.MID）是存储数字音乐的标准格式。

DirectMusic 音乐片段（music segments）使用.SGT文件扩展名，其他的相关文件包括乐队文件（band file .BND），这种文件里面包含乐器信息；弦映射表文件（chordmaps file .CDM）包含在回放时修改音乐的和弦指令；样式文件（styles file .STY）包含回放样式信息；模板文件（templates file .TPL）包含创造音乐片段的模板。

Midi是一种非常强大的音乐格式，惟一的不利因素是音乐品质依赖于音乐合成器的性能，因为Midi 仅仅记录了音符，其播放的品质由播放音乐的软硬件决定。MP3文件（文件后缀为.MP3）是一种类似于波表文件的文件格式，但是MP3文件和WAV文件最大的区别在于MP3文件将声音压缩到了最小的程度，但是音质却基本不变。可以用DirectShow组件播放MP3文件，DirectShow组件是一个非常强大的多媒体组件，用DirectShow几乎可以播放任何媒体文件，包括声音和音频文件，部分声音文件我们只能用DirectShow播放。

Direct Audio是一个复合组件，它由DirectSound和DirectMusic两个组件组成，如下图所示：

DirectMusic在DirectX8中得到了巨大的增强，但是DirectSound基本保持原有的状态。DirectSound是主要的数字声音回放组件。DirectMusic处理所有的乐曲格式，包括MIDI、DirectMusic本地格式文件和波表文件。DirectMusic处理完之后将它们送入DirectSound中做其他处理，这意味着回放MIDI的时候可以使用数字化的乐器。

使用DirectSound

使用时需要创建一个和声卡通讯的COM对象，用这个COM对象再创造一些独立的声音数据缓冲区（被称之为辅助音频缓冲区 secondary sound buffers）来存储音频数据。缓冲区中的这些数据在主混音缓存（称之为主音频缓存 primary sound buffer）中被混合，然后可以用指定的任何格式播放出来。回放格式通过采样频率、声道数、采样精度排列，可能的采样频率有8000HZ， 11025HZ，22050HZ和44100HZ（CD音质）。

对于声道数可以有两个选择：单通道的单声道声音和双通道的立体声声音。采样精度被限制在两种选择上：8位的低质量声音和16位的高保真声音。在没有修改的情况下，DirectSound主缓冲区的默认设置是22025HZ采样率、8位精度、立体声。在DirectSound中可以调整声音的播放速度（这同样会改变声音的音调），调整音量、循环播放等。甚至还可以在一个虚拟的 3D环境中播放，以模拟一个实际环绕在周围的声音。

需要做的是将声音数据充满缓冲区，如果声音数据太大的话，必须创建流播放方法，加载声音数据中的一小块，当这一小块播放完毕以后，再加载另外的小块数据进缓冲区，一直持续这个过程，直到声音被处理完毕。在缓冲区中调整播放位置可以实现流式音频，当播放完成通知应用程序更新音频数据。这个通知更新的过程我们称之为“通告”。在同一时间被播放的缓存数目虽然没有限制，但是仍然需要保证缓冲区数目不要太多，因为每增加一个缓冲区，就要消耗很多内存和CPU资源。

在项目中使用DirectSound和DirectMusic，需要添加头文件dsound.h和dmsuic.h，并且需要链接DSound.lib到包含库中，添加DXGuid.lib库可以让DirectSound更容易使用。

以下是DirectSound COM接口：

IDirectSound8：DirectSound接口。
IDirectSoundBuffer8：主缓冲区和辅助缓冲区接口，保存数据并控制回放。
IDirectSoundNotify8：通知对象，通知应用程序指定播放位置已经达到。

各个对象间的关系如下图所示：

IDirectSound8是主接口，用它来创建缓冲区（IDirectSoundBuffer8），然后用缓冲区接口创建通告接口（IDirectSoundNotify8），通告接口告诉应用程序指定的位置已经到达，通告接口在流化音频文件时非常有用。

初始化DirectSound

使用 DirectSound的第一步是创建IDirectSound8对象，IDirectSound8起到控制音频硬件设备的作用，可以通过 DirectSoundCreate8函数来创建。

The DirectSoundCreate8 function creates and initializes an object that supports the IDirectSound8 interface.

HRESULT DirectSoundCreate8(
 LPCGUID lpcGuidDevice,
 LPDIRECTSOUND8 * ppDS8,
 LPUNKNOWN pUnkOuter
);

Parameters

lpcGuidDevice

Address of the GUID that identifies the sound device. The value of this parameter must be one of the GUIDs returned by DirectSoundEnumerate, or NULL for the default device, or one of the following values.

Value	Description
DSDEVID_DefaultPlayback	System-wide default audio playback device. Equivalent to NULL.
DSDEVID_DefaultVoicePlayback	Default voice playback device.

ppDS8

Address of a variable to receive an IDirectSound8 interface pointer.

pUnkOuter

Address of the controlling object's IUnknown interface for COM aggregation. Must be NULL, because aggregation is not supported.

Return Values

If the function succeeds, it returns DS_OK. If it fails, the return value may be one of the following.

Return Code
DSERR_ALLOCATED
DSERR_INVALIDPARAM
DSERR_NOAGGREGATION
DSERR_NODRIVER
DSERR_OUTOFMEMORY

Remarks

The application must call the IDirectSound8::SetCooperativeLevel method immediately after creating a device object.

创建主音频缓冲区

用 IDirectSoundBuffer对象控制主音频缓冲区，创建主缓冲区不需要DirectX8的接口，因为这个接口从来没有改变。用来创建音频缓冲区的函数是IDirectSound8::CreateSoundBuffer。

The CreateSoundBuffer method creates a sound buffer object to manage audio samples.

HRESULT CreateSoundBuffer(
 LPCDSBUFFERDESC pcDSBufferDesc,
 LPDIRECTSOUNDBUFFER * ppDSBuffer,
 LPUNKNOWN pUnkOuter
);

Parameters

pcDSBufferDesc: Address of a DSBUFFERDESC structure that describes the sound buffer to create.
ppDSBuffer: Address of a variable that receives the IDirectSoundBuffer interface of the new buffer object. Use QueryInterface to obtain IDirectSoundBuffer8. IDirectSoundBuffer8 is not available for the primary buffer.
pUnkOuter: Address of the controlling object's IUnknown interface for COM aggregation. Must be NULL.

Return Values

If the method succeeds, the return value is DS_OK, or DS_NO_VIRTUALIZATION if a requested 3D algorithm was not available and stereo panning was substituted. See the description of the guid3DAlgorithm member of DSBUFFERDESC. If the method fails, the return value may be one of the error values shown in the following table.

Return code

DSERR_ALLOCATED

DSERR_BADFORMAT

DSERR_BUFFERTOOSMALL

DSERR_CONTROLUNAVAIL

DSERR_DS8_REQUIRED

DSERR_INVALIDCALL

DSERR_INVALIDPARAM

DSERR_NOAGGREGATION

DSERR_OUTOFMEMORY

DSERR_UNINITIALIZED

DSERR_UNSUPPORTED

Remarks

DirectSound does not initialize the contents of the buffer, and the application cannot assume that it contains silence.

If an attempt is made to create a buffer with the DSBCAPS_LOCHARDWARE flag on a system where hardware acceleration is not available, the method fails with either DSERR_CONTROLUNAVAIL or DSERR_INVALIDCALL, depending on the operating system.

pcDSBufferDesc是一个指向DSBUFFERDESC结构的指针，保存所创建的缓冲区的信息。

The DSBUFFERDESC structure describes the characteristics of a new buffer object. It is used by the IDirectSound8::CreateSoundBuffer method and by the DirectSoundFullDuplexCreate8 function.

An earlier version of this structure, DSBUFFERDESC1, is maintained in Dsound.h for compatibility with DirectX 7 and earlier.

typedef struct DSBUFFERDESC {
 DWORD dwSize;
 DWORD dwFlags;
 DWORD dwBufferBytes;
 DWORD dwReserved;
 LPWAVEFORMATEX lpwfxFormat;
 GUID guid3DAlgorithm;
} DSBUFFERDESC;

Members

dwSize

Size of the structure, in bytes. This member must be initialized before the structure is used.

dwFlags

Flags specifying the capabilities of the buffer. See the dwFlags member of the DSBCAPS structure for a detailed listing of valid flags.

dwBufferBytes

Size of the new buffer, in bytes. This value must be 0 when creating a buffer with the DSBCAPS_PRIMARYBUFFER flag. For secondary buffers, the minimum and maximum sizes allowed are specified by DSBSIZE_MIN and DSBSIZE_MAX, defined in Dsound.h.

dwReserved

Reserved. Must be 0.

lpwfxFormat

Address of a WAVEFORMATEX or WAVEFORMATEXTENSIBLE structure specifying the waveform format for the buffer. This value must be NULL for primary buffers.

guid3DAlgorithm

Unique identifier of the two-speaker virtualization algorithm to be used by DirectSound3D hardware emulation. If DSBCAPS_CTRL3D is not set in dwFlags, this member must be GUID_NULL (DS3DALG_DEFAULT). The following algorithm identifiers are defined.

Value	Description	Availability
DS3DALG_DEFAULT	DirectSound uses the default algorithm. In most cases this is DS3DALG_NO_VIRTUALIZATION. On WDM drivers, if the user has selected a surround sound speaker configuration in Control Panel, the sound is panned among the available directional speakers.	Applies to software mixing only. Available on WDM or Vxd Drivers.
DS3DALG_NO_VIRTUALIZATION	3D output is mapped onto normal left and right stereo panning. At 90 degrees to the left, the sound is coming out of only the left speaker; at 90 degrees to the right, sound is coming out of only the right speaker. The vertical axis is ignored except for scaling of volume due to distance. Doppler shift and volume scaling are still applied, but the 3D filtering is not performed on this buffer. This is the most efficient software implementation, but provides no virtual 3D audio effect. When the DS3DALG_NO_VIRTUALIZATION algorithm is specified, HRTF processing will not be done. Because DS3DALG_NO_VIRTUALIZATION uses only normal stereo panning, a buffer created with this algorithm may be accelerated by a 2D hardware voice if no free 3D hardware voices are available.	Applies to software mixing only. Available on WDM or Vxd Drivers.
DS3DALG_HRTF_FULL	The 3D API is processed with the high quality 3D audio algorithm. This algorithm gives the highest quality 3D audio effect, but uses more CPU cycles. See Remarks.	Applies to software mixing only. Available on Microsoft Windows 98 Second Edition and later operating systems when using WDM drivers.
DS3DALG_HRTF_LIGHT	The 3D API is processed with the efficient 3D audio algorithm. This algorithm gives a good 3D audio effect, but uses fewer CPU cycles than DS3DALG_HRTF_FULL.	Applies to software mixing only. Available on Windows 98 Second Edition and later operating systems when using WDM drivers.

需要设置的惟一一个值是dwFlags，这是一系列标志，用于决定缓冲区性能。

dwFlags

Flags that specify buffer-object capabilities. Use one or more of the values shown in the following table.

Value	Description
DSBCAPS_CTRL3D	The buffer has 3D control capability.
DSBCAPS_CTRLFREQUENCY	The buffer has frequency control capability.
DSBCAPS_CTRLFX	The buffer supports effects processing.
DSBCAPS_CTRLPAN	The buffer has pan control capability.
DSBCAPS_CTRLVOLUME	The buffer has volume control capability.
DSBCAPS_CTRLPOSITIONNOTIFY	The buffer has position notification capability. See the Remarks for DSCBUFFERDESC.
DSBCAPS_GETCURRENTPOSITION2	The buffer uses the new behavior of the play cursor when IDirectSoundBuffer8::GetCurrentPosition is called. In the first version of DirectSound, the play cursor was significantly ahead of the actual playing sound on emulated sound cards; it was directly behind the write cursor. Now, if the DSBCAPS_GETCURRENTPOSITION2 flag is specified, the application can get a more accurate play cursor. If this flag is not specified, the old behavior is preserved for compatibility. This flag affects only emulated devices; if a DirectSound driver is present, the play cursor is accurate for DirectSound in all versions of DirectX.
DSBCAPS_GLOBALFOCUS	The buffer is a global sound buffer. With this flag set, an application using DirectSound can continue to play its buffers if the user switches focus to another application, even if the new application uses DirectSound. The one exception is if you switch focus to a DirectSound application that uses the DSSCL_WRITEPRIMARY flag for its cooperative level. In this case, the global sounds from other applications will not be audible.
DSBCAPS_LOCDEFER	The buffer can be assigned to a hardware or software resource at play time, or when IDirectSoundBuffer8::AcquireResources is called.
DSBCAPS_LOCHARDWARE	The buffer uses hardware mixing.
DSBCAPS_LOCSOFTWARE	The buffer is in software memory and uses software mixing.
DSBCAPS_MUTE3DATMAXDISTANCE	The sound is reduced to silence at the maximum distance. The buffer will stop playing when the maximum distance is exceeded, so that processor time is not wasted. Applies only to software buffers.
DSBCAPS_PRIMARYBUFFER	The buffer is a primary buffer.
DSBCAPS_STATIC	The buffer is in on-board hardware memory.
DSBCAPS_STICKYFOCUS	The buffer has sticky focus. If the user switches to another application not using DirectSound, the buffer is still audible. However, if the user switches to another DirectSound application, the buffer is muted.
DSBCAPS_TRUEPLAYPOSITION	Force IDirectSoundBuffer8::GetCurrentPosition to return the buffer's true play position. This flag is only valid in Windows Vista.

以下是创建声音缓冲区的代码:

    // setup the DSBUFFERDESC structure
    DSBUFFERDESC ds_buffer_desc;

    // zero out strcutre
    ZeroMemory(&ds_buffer_desc, sizeof(DSBUFFERDESC));

    ds_buffer_desc.dwSize        = sizeof(DSBUFFERDESC);
    ds_buffer_desc.dwFlags       = DSBCAPS_CTRLVOLUME;
    ds_buffer_desc.dwBufferBytes = wave_format.nAvgBytesPerSec * 2;  // 2 seconds
    ds_buffer_desc.lpwfxFormat   = &wave_format;

    // create the fist version object
    if(FAILED(g_ds->CreateSoundBuffer(&ds_buffer_desc, &ds, NULL)))
    {
        // error ocuurred
        MessageBox(NULL, "Unable to create sound buffer", "Error", MB_OK);
    }

设置格式

对于格式，有一系列的选择，但是建议在11025HZ、16位、单通道；22050HZ、16位、单通道中选择。选择格式的时候，不要尝试使用立体声，立体声浪费处理时间，而且效果很难评估。同样也不要使用16位以外的采样精度，因为这会导致音质的大幅下降。对于采样频率来说，越高越好，但是也不要设置超过 22050HZ，在这个采样频率下，也能表现出CD音质的水准而没有太多的损失。

设置回放格式需要通过调用 IDirectSoundBuffer::SetFormat。

The SetFormat method sets the format of the primary buffer. Whenever this application has the input focus, DirectSound will set the primary buffer to the specified format.

HRESULT SetFormat(
 LPCWAVEFORMATEX pcfxFormat
);

Parameters

pcfxFormat: Address of a WAVEFORMATEX structure that describes the new format for the primary sound buffer.

Return Values

If the method succeeds, the return value is DS_OK. If the method fails, the return value may be one of the following error values:

Return code

DSERR_BADFORMAT

DSERR_INVALIDCALL

DSERR_INVALIDPARAM

DSERR_OUTOFMEMORY

DSERR_PRIOLEVELNEEDED

DSERR_UNSUPPORTED

Remarks

The format of the primary buffer should be set before secondary buffers are created.

The method fails if the application has the DSSCL_NORMAL cooperative level.

If the application is using DirectSound at the DSSCL_WRITEPRIMARY cooperative level, and the format is not supported, the method fails.

If the cooperative level is DSSCL_PRIORITY, DirectSound stops the primary buffer, changes the format, and restarts the buffer. The method succeeds even if the hardware does not support the requested format; DirectSound sets the buffer to the closest supported format. To determine whether this has happened, an application can call the GetFormat method for the primary buffer and compare the result with the format that was requested with the SetFormat method.

This method is not available for secondary sound buffers. If a new format is required, the application must create a new DirectSoundBuffer object.

这个函数惟一的参数是指向WAVEFORMATEX结构的指针，该结构保存已设置的格式信息。

The WAVEFORMATEX structure defines the format of waveform-audio data. Only format information common to all waveform-audio data formats is included in this structure. For formats that require additional information, this structure is included as the first member in another structure, along with the additional information.

This structure is part of the Platform SDK and is not declared in Dsound.h. It is documented here for convenience.

typedef struct WAVEFORMATEX {
 WORD wFormatTag;
 WORD nChannels;
 DWORD nSamplesPerSec;
 DWORD nAvgBytesPerSec;
 WORD nBlockAlign;
 WORD wBitsPerSample;
 WORD cbSize;
} WAVEFORMATEX;

Members

wFormatTag: Waveform-audio format type. Format tags are registered with Microsoft Corporation for many compression algorithms. A complete list of format tags can be found in the Mmreg.h header file. For one- or two-channel PCM data, this value should be WAVE_FORMAT_PCM.
nChannels: Number of channels in the waveform-audio data. Monaural data uses one channel and stereo data uses two channels.
nSamplesPerSec: Sample rate, in samples per second (hertz). If wFormatTag is WAVE_FORMAT_PCM, then common values for nSamplesPerSec are 8.0 kHz, 11.025 kHz, 22.05 kHz, and 44.1 kHz. For non-PCM formats, this member must be computed according to the manufacturer's specification of the format tag.
nAvgBytesPerSec: Required average data-transfer rate, in bytes per second, for the format tag. If wFormatTag is WAVE_FORMAT_PCM, nAvgBytesPerSec should be equal to the product of nSamplesPerSec and nBlockAlign. For non-PCM formats, this member must be computed according to the manufacturer's specification of the format tag.
nBlockAlign: Block alignment, in bytes. The block alignment is the minimum atomic unit of data for the wFormatTag format type. If wFormatTag is WAVE_FORMAT_PCM or WAVE_FORMAT_EXTENSIBLE, nBlockAlign must be equal to the product of nChannels and wBitsPerSample divided by 8 (bits per byte). For non-PCM formats, this member must be computed according to the manufacturer's specification of the format tag.
Software must process a multiple of nBlockAlign bytes of data at a time. Data written to and read from a device must always start at the beginning of a block. For example, it is illegal to start playback of PCM data in the middle of a sample (that is, on a non-block-aligned boundary).
wBitsPerSample: Bits per sample for the wFormatTag format type. If wFormatTag is WAVE_FORMAT_PCM, then wBitsPerSample should be equal to 8 or 16. For non-PCM formats, this member must be set according to the manufacturer's specification of the format tag. If wFormatTag is WAVE_FORMAT_EXTENSIBLE, this value can be any integer multiple of 8. Some compression schemes cannot define a value for wBitsPerSample, so this member can be zero.
cbSize: Size, in bytes, of extra format information appended to the end of the WAVEFORMATEX structure. This information can be used by non-PCM formats to store extra attributes for the wFormatTag. If no extra information is required by the wFormatTag, this member must be set to zero. For WAVE_FORMAT_PCM formats (and only WAVE_FORMAT_PCM formats), this member is ignored.

以下设置音频格式为：11025HZ、单通道、16位。

    // setup the WAVEFORMATEX structure
    WAVEFORMATEX wave_format;

    ZeroMemory(&wave_format, sizeof(WAVEFORMATEX));

    wave_format.wFormatTag      = WAVE_FORMAT_PCM;
    wave_format.nChannels       = 1;        // mono
    wave_format.nSamplesPerSec  = 11025;
    wave_format.wBitsPerSample  = 16;
    wave_format.nBlockAlign     = (wave_format.wBitsPerSample / 8) * wave_format.nChannels;
    wave_format.nAvgBytesPerSec = wave_format.nSamplesPerSec * wave_format.nBlockAlign;

阅读下篇：用DirectX Audio和DirectShow播放声音和音乐（2）

posted on 2007-07-26 17:51 lovedday 阅读(6444) 评论(2) 编辑收藏引用

# re: 用DirectX Audio和DirectShow播放声音和音乐（1）[未登录] 2009-09-08 20:04 ln

博主的文章很好，看了很受启发，最近我要做声音方面的东西，但是在网上找不大到相关的资料，楼主能推荐个网址之类的吗，还有这方面的书我也搜了半天没搜到，麻烦楼主给推荐下吧。谢谢了。回复更多评论

# re: 用DirectX Audio和DirectShow播放声音和音乐（1） 2015-03-24 17:24 王小亮

很瘦启发。。学习了。回复更多评论

刷新评论列表

只有注册用户登录后才能发表评论。
【推荐】100%开源！大型工业跨平台软件C++源码提供，建模，组态！



网站导航: 博客园 IT新闻 BlogJava 博问 Chat2DB 管理

# re: 用DirectX Audio和DirectShow播放声音和音乐（1）[未登录] 2009-09-08 20:04 ln

# re: 用DirectX Audio和DirectShow播放声音和音乐（1） 2015-03-24 17:24 王小亮

天行健君子当自强而不息