concentrate on c/c++ related technology

plan,refactor,daily-build, self-discipline,

C++博客 :: 首页 :: 联系 :: 聚合

:: 管理

37 Posts :: 1 Stories :: 12 Comments :: 0 Trackbacks

常用链接

留言簿(9)

我参与的团队

随笔分类

随笔档案

文章档案

2010年5月 (1)

搜索

阅读排行榜

评论排行榜

shader programming笔记

samplers: a windows into video memory with associated state defining things like filtering, and texture coordination addressing mode.

in DXSDK version 8.0 or earlier, the application can pass human-reable assembly code to D3D library via D3DXAssembleShader and get back a binary representation of the shader which can be passed to Direct3D via CreateVertexShader/CreatePixelShader.

in DXSDK version 9 now, the application can pass HLSL shader to to D3DX via D3DXAssembleShader API and gets back a binary representation of the compiled shader,which in turn is passed to Direct3D via CreateVertexShader/CreatePixelShader.

the binary asm code generated is a function only of the compiler target chosen,not the specific graphics device in user's or developer's system.

and it will independent on platforms mostly.

Direct3d runtime itself doesn;t know anything about HLSL., only the binary assemble shader models.HLSL compiler can be updated independent of Direct3D runtime.

an application calls D3DX to compile a HLSL shader to binary asm code via D3DXAssembleShader .

one parameter in D3DXAssembleShader defines the specific compile target(assembly language model), which the HLSL compiler use.

if an application is doing an HLSL shader compilation at runtime(not offline), the application will examine the capabilities of Direct3D device and select the compiler target to match, if it the algorithm in the HLSL shader is too complex to execute in the selected compiler target, then the compilation will fail.

failure of a given HLSL shader to compile for a particular compile target is an indication that the shader is too complex for the compile target.

another common source of compilation failure is exceeding the maximum instruction count of the chosen compile target.

an algorithm expressed in HLSL may require too many instructions to be executed by the compile target.

a convenient utility which allows the developers to compile shaders offline is the fxc command line compiler which is provided in the DX SDK 9.0.

this utility has a number of convenient options that u can use to not only compile your shaders on the commandline but also generate disassembled code for the specific compile target.

scale type

half 16-bit code.

integer operations that go outside the range of integers that can be expressed as floats on these platforms are not guaranteed to function as expected.

vector type

declare vector type in this way:

vector. the default dimension is 4.

vector<type, size> .

once defined such a vector, u may access the its individual components by using the array access syntax or using swizzle.

in the swizzle case, the components must come from either {x,y,z,w},or {r,g,b,a},or name space.

ps_2_0 and lower pixel shader donot have native support for arbitrary swizzle.

concise high level code which uses swizzles can result in fairly nasty binary asm when compiling to target.

matrix type

matrix also can be accessed by array notation.

type modifiers

row_major and col_major type modifiers can be used to specify the expected layout of a matrix within the hardware constant store.

row_major indicates that each row of the matrix will be stored in a single constant register.

col_major indicates that each col of the matrix will be stored in a single constant register.

storage class modifiers

at global scope, the static storage class modifiers indicate that the variable is only to be accessed by the shader not by the application via API. static storage class in the local scope, means that it only persists between the invocations of declaring the function.

extern modifiers can be used in a global variable to indicate that it can be modified outside the shader via the API.

shared modifier is used to specify that a given global variable is to be shared between effects.

uniform modifier is to have been set externally to the HLSL shader.

external float translucency;

const float gloss_bais

static float gloss_scale

float diffuse

diffuse/translucency: can be settable by Set*ShaderConstant, and can be modified by the shader itself.

gloss_bais: can be settable by Set*ShaderConstant, and can not be modified by the shader code.

gloss_scale: can not be settable by Set*ShaderConstant api, and can be modified within the shader code only.

constructor

constructors are commonly used when a shader writer wants to temporarially define a quantity with literal values.

or when a shader writer wants to explicitly pack smaller datatypes together.

type casting

type casting often happens in order to promote or demote a given variable to match a variable to which it is assigned.

type casting casting behavior

scalar-to-scalar always valid.

scalar-to-vector always valid, replicate the scalar to fill the vector.

scalar-to-matrix always valid, replicate the scalar to fill the matrix.

scalar-to-structure always valid, replicate the scalar to fill the structure.

vector-to-scalar always valid, it will select the first component of the vector.

vector-to-vecor always valid, the destination vector must not larger then the source vector.

vector-to-matrix size of the vector must equal to the size of the matrix.

vector-to-structure valid if the structure is not larger than the vector,and all components of the structure is numeric.

matrix-to-scalar always valid. it will select the upper-left component of the matrix.

matrix-to-vector the size of the matrix must equal the size of the vector.

matrix-to-matrix the destination matrix must not larger than the source vector.

matrix-to-structure valid if the structure is not larger than the vector, and all components of the structure is numeric.

structure-to-scalar the structure should at least contain one element.

structure-to-vector the structure should at least the size of the vector, the first components must be numeric, up to the size.

structure-to-matrix the structure should be at least the size of the matrix, the first components must be numeric, up to the size.

structure-to-object the structure should be contain at least one member,the type of this member must be identical to the type of the object

structure-to-structure the destination must not larger than the source structure.

samples: for each different texture map that u plan to sample in a pixel shader, you must declare a sampler.

a HLSL sampler has a very direct mapping to the API concept of a sampler and in turn, to the actual silicon in the graphic processor which is responsible for addressing and filtering textures. a sampler must be defined for every texture map that u plan to access in a given shader, but u may use a given sampler multiple times in a shader.

Texture sampling intrinsics

there are four types of textures(1D,2D,3D,and cube map),and four types of load(regular, with derivatives, projective, and biased) with each for 16 combination.

tex1D, tex2D and tex3D and texCube

the texture loading intrinsics which take ddx and ddy parameters compute texture LOD using these explicit derivative, which would typically have been preciously calculated with the ddx and ddy intrinsics.

tex*proj intrinsics are used to do projective texture reads, where the texture coordinates used to sample the texture are divided by the last component prior to accessing the texture.

tex*bias intrinsics are used to perform biased texture sampling, where the bias can be computed per pixel.

shader input

vertex and pixel shaders have two types of input data , varying and uniform,

the varying input is the data that unique to each execution of the shader. for a vertex shader, the varying data comes from the vertex streams.

the uniform data is constant for multiple executions of a shader.

uniform input

uniform data can be specified by two methods in HLSL.

first : declare global variables and use them within vertex shaders and pixel shaders.

any use of global variable within a shader will result in the addition of the variable to a list of uniform variables required by the shader

second: mark an input parameter of the top-level shader function as uniform.

this marking specifes that the given variable should be added to the list of uniform variables used by the shader.

uniform variables used by the shaders are communicated back to the application via constant table.

the constant table a symbol table which defines how the uniform variables used by the shader must be loaded into the constant register prior to the shader execution.

the constant table contains the constant register location of all uniform variables used by the shaders.

the table also includes the type information and the default value.

the constant table generated by the compiler is stored in a compact binary form.

varying input

varying data is specified by marking the input parameters of the top-level shader function with an input semantics

all top-level shader inputs must either be marked as varying by using semantics or marked with the keyword "uniform" indicating that

the value is constant for the execution of the shader.

if a top-level shader input is not marked with a semantic or uniform keyword, then the shader will fail to compile

the input semantic is a name used to link the given shader to an output of the previous stage of the graphics pipeline.

pixel and vertex shaders have different sets of input semantics due to different parts of the graphics pipeline that feed each shader unit.

vertex shader input semantics describe the per vertex information to be loaded from a vertex buffer into a form that can be consumed by the vertex shader.

these input semantics directly maps to the combination of the D3DDECLUSAGE enum and UsageIndex that is used to describe vertex data elements in a vertex buffer.

pixel shader input semantics describe the information that is provided per pixel by the rasterization unit. the data is interpolating between the outputs of the vertex shader for each vertex of the primitive.

the basic pixel shader input semantics link the color and texture coordinate information to input paramters.

input semantics can be assigned to shader input by two methods. first, append a colon ":", and the input semantics name to the input parameter declaration. second, define a input structure with input semantics assigned to each element of the structure.

shader output

vertex and pixel shaders provide output data to the subsequent graphics pipeline stage.

output semantics are to specify how data generated by the shader should be linked to the inputs of the stage.

pixel shader outputs are the value provided to alpha blending unit each of the render target or the depth value to be written to the buffer.

vertex shader output semantics are used to link the shader both to the pixel shader and the rasterization stage.

POSITION is the output that is the required output from each vertex shader that is consumed by the rasterizer and not exposed to the pixel shader.

TEXCOORDn and COLORn denotes outputs that are made available to the pixel shader post interpolation.

pixel shader output semantics bind the output colors of a pixel shader with the correct render target.

the colors output from the pixel shader are linked to the alpha blend stage, which determines how the destination render target will be modified.

dcl_position: position, dcl_texture: texture.dcl_normal: normal

def instruction is a free instruction which appears in the actual assembly instruction stream that defines constants that will be used by the ALU operation.

tex1D: a way for the HLSL shader writer to indicate to the compiler that only one component of the texture coordinate needs to be populated, shaving off an assembly instruction in some cases.

the compiler is smart enough about removing dead code, but it can not know about values that do not ultimately matter due to circumstance of a given shader.

1 try to use vector and matrics, and also use ints instead of floats.

newer hardware supports static branching and prediction instructions, static looping,dynamic branchings, and dynamic loopings.

and loops are a form of flow control that occurs often in the shader, some hardware allows either static branching or dynamic branchings,

2 static branching is a capability that in a shader that allows for blocks of code to be switched on or off based on the boolean constant.

most of the hardware supports only the static branching.

3 input parameters.

the method in which data is loaded into registers either from a vertex buffer into a vertex shader or from the vertex shader output to the pixel input registers is well-defined in Direct3D-spec.pixel shader input values are always expanded into a vector of four floats.

that means that datatype declaration is more of a hint thatn a specification of how the data is loaded into the shader.

a common optimization used by shader assembly writers is to take advantage of the way in which data is expanded when loaded into registers.

it's very common to need the w component to be set 1.0 when multiplying by the world matrix,but the vertex buffer typically contain only x, y and z components.if the position input parameter is declared float3,then an extra copy instruction to a 1.0 into the w component will be required.

another optimization is to make sure and declare all input parameter with the appropriate type for their usgae in the shader.

it's important to declare the parameter as an ints, to avoid truncation.

precision issue(logp,expp,lit)

logp, expp, and lit can be used as low-precision version of log, exp, and pow

since log, and exp, count 10 instruction slots each and logp and expp only count as one instruction each. it's possible to balloon the size of the generated asm code and potentially run out of instructions,particularly on the vs_1_1 compile target.

accessing these low-precision instruction is accomplished by declaring the output to either cast to or stored into the low precision data type called "half".

a low precision output from an operation informs the compiler that the operation should be performed with low precision possible.

using ps_1_x compile target.

posted on 2008-11-27 19:53 jolley 阅读(1037) 评论(0) 编辑收藏引用

只有注册用户登录后才能发表评论。




网站导航: 博客园博客园最新博文博问管理