SIMD技术引导和SIMD开发指导

SIMD简介

Single Instruction Multiple Data，单指令多数据流，能够复制多个操作数，并把它们打包在大型寄存器的一组指令集。以同步方式，在同一时间内执行同一条指令。
上面是对本质的描述，初学者可能对这个本质不好理解，其实对计算机数据基本形式有深入掌握的朋友，是容易理解SIMD的。计算机数据是电磁位Bit为基础的，数的计算很少只处理一个Bit，一次处理一般就要处理多个Bit，把多个Bit扩展为多个操作数，就容易理解SIMD的本质了。
SIMD的使用难点在准备多个操作数和使用特殊的指令函数，这样做就好像习惯了C语言和C++的程序员要用汇编写代码哪样，风格突然一变，会不适应。其实现在对SIMD有很多面向对象的封装，特别是www.codeproject.com上的文章，比如：
Basics of Single Instruction Multiple Data (SIMD) – CodeProject
Easy SIMD through Wrappers – CodeProject
Fast SIMD Prototyping – CodeProject
不想用底层的方式来编写，就把本质理解清楚，然后用别人封装好的类来写SIMD，不过这当然会影响效率，只能用在某些应用层面的开发。在某些驱动和算法层面用面向对象是不可能用面向对象，所以在理解了别人的封装代码情况下，还是要深入掌握SIMD的操作形式。有很多人称呼SIMD是脏活，这是有道理的，因为它的接口特殊，不太容易适应。
锐英源提供专业的SIMD技术指导，上面说的codeproject.com上的英文文章，锐英源能深入理解内容，并能掌握作者的思想和代码细节；本文后续的英文翻译是国外学习的留学生的SIMD作业翻译，国外教材和教学的严谨是出名的，锐英源出色地完成了此任务。锐英源的SIMD技术值得信赖，欢迎合作。

The Assignment Task任务

You are to modify the (square-based) multithreaded implementation of a "simple" raytracer from the first assignment to take advantage of SIMD instructions. As time is tight for this assignment you only need to SIMD-ify the sphere/ray intersection code, but this requires a fairly sizeable amount of changes across multiple files.修改方阵多线程raytracer应用SIMD指令。只需要SIMD-if状态的球/光线交叉点代码，几个文件要合理修改。
From the provided (square-based) multithreaded raytracer implementation, you will create multiple
subsequent versions that modify the sphere implementation as follows:任务点:
1. Replaced array-of-structures (AoS) with structures-of-arrays (SoA) for the sphere container.用数组结构体替换结构体数组，此用于球容器
2. Rewrote the sphere/ray intersection tests in a where there is one there is many approach.重写球光线交叉测试位置，这些位置有多个实现方法。
3. Optimised the sphere/ray intersection test to take advantage of SIMD.用SIMD优化交叉测试

Implementation

1. Structures of Arrays

This stage involves removing the Sphere struct (in SceneObjects.h) and modifying the Scene struct(Scene.h) to store sphere data via structure of arrays rather than the currently existing array of
structures (currently in the sphereContainer variable).从SceneObjects.h中删除球结构体且修改Scene.h 里的Scene结构体来存储球数据，用数组结构体来代替现有的结构体数组（此数组在现有代码的sphereContainer变量里）
In order to complete this step you will need to:

Remove the Sphere struct (doing this immediately causes a large number of errors, so youmight like to do this only after making other changes).删除球结构体，直接删除会有大量错误，提示后面改过后才做此
Replace the definition of sphereContainer (in Scene struct) with an equivalent structure ofarrays.把Scene 结构体里的sphereContainer用等价的数组结构体代替
Rewrite the sphere data loading into the Scene struct (dynamically allocated memory for the AoS and the GetSphere function).重写球数据加载（动态分配内存AoS和GetSphere函数）
Update the Intersection struct as it can no longer use a pointer to a Sphere struct, it must relyon an index into the SoA container. This requires updates to a number of functions in Intersection.h.不用指针，用下标
Rewrite the isSphereIntersected function to make use of the SoA.
At times this code update may require conversions to and/or from Vector structs to the equivalently stored data in the SoA.现支持数据向量结构转换。

At the end of this stage the program should still produce the same results as the base code. Bethorough in your testing (i.e. test all the scenes) to ensure that everything works correctly before progressing.

2. Where there is One there is Many

This stage involves replacing the isSphereIntersected function with two new versions that use the where there is one there is many paradigm.一成多扩展处理isSphereIntersected
In order to complete this step you will need to:

Write two functions that take the entire spheres container (or at least a pointer/reference) as an argument and will perform whatever action took place at the calling site. There are two versions as the isInShadow function (in Lighting.cpp) uses a short-circuited approach, exiting as soon as any object collision is discovered, and the objectIntersection function (in Intersection.cpp) finds the closest sphere intersected with (which means it must examine them all).写2个函数用上整个球容器（至少指针或引用）做为参数，且实现调用者想要的任何功能。isInShadow形式版本使用短路方法，只要碰撞产生就退出，objectIntersection形式版本找到最接近球（意思是要必须检查所有），参考上述2个版本完成处理isSphereIntersected

A version of isSphereIntersected that returns a boolean depending on whether or not a sphere intersects with the ray (i.e. stopping as soon as one is found).版本1返回布尔，依赖于球是否和光线交叉（即，发现一个就返回）
A version of isSphereIntersected that returns void and updates its t parameter to be the closest sphere that intersects with the ray.另外一个返回空，更新t参数保存和光线最接近的球。

3. SIMD Conversion of isSphereIntersected function

This stage involves converting the latter of the two new version of the isSphereIntersected function from Stage 2 (the void returning one) into a SIMD implementation.2个新函数应用SIMD。
Test In order to complete this step you will need to:

Convert many of the input parameters (or parts of them) into SIMD-ready values (e.g. the ray starting location and direction need to be converted into a SIMD format).输入参数转换（或部分）到SIMD-就绪值（即，光线开始位置和方向需要转换为SIMD格式）。
Step through the SoA in chunks proportional to the number of value stored in each SIMD variable (i.e. 4 or 8).结构体数组细致批量转换到每个SIMD变量里。
Convert the calculation to use SIMD. Care must be taken to correctly deal with the various conditional expressions in the loop. There are three if-statements that must be converted to SIMD code via the use of appropriate masking statements.计算过程转换为SIMD模式。重视循环里条件语句。有3个if语句必须转换到SIMD模式，通过使用合适的修饰语句实现。
Consolidate the values calculated using SIMD into a single scalar return value. This should be done after the loop finishes.联合SIMD结算的各个结果到一个标量返回值上。这要在循环结束时完成。
Carefully deal with the situation where the sphere count isn't equally divisible by the number in each SIMD calculation (e.g. 9 spheres with 4 in each SIMD value).对于球个数不能平均分布到多个SIMD计算上要小心。（即9个球要处理，而每个SIMD值只能处理4个）

NOTE: this is easily the hardest stage of the assignment.

Hints / Tips

The techniques required to complete each stage rely heavily on work done in tutorials 4 and 5
— refer to them often.

When implementing the SIMD stage, it's best to SIMD-ify small portions of code at a time (e.g.even a single line is often a lot) and then perform the rest of the calculation through the existing scalar code, or even compare the result of the SIMD version to the existing scalar code.每次只修改少量代码来应用SIMD状态(即一行代码的应用也要写很多)，接着用现有代码来实现其余的计算，或用现有存量代码的结果比较SIMD版本的结果

Again for SIMD, it's helpful to render scenes at a greatly reduced size for testing, with an equally small block size, one sample per pixel, and with one thread (e.g. try -size 4 4 -blockSize 4 -samples 1 -threads 1). It may even be helpful to fix the number of rays cast(MAX_RAYS_CAST) to be 1, so that reflection and refraction don't occur. This doesn't produce a very nice image, but images can be easily compared visually for problems (the example is only 16 pixels) and printf statements can be used to verify the correctness of calculations without outputting a huge amount of data (the use of debuggers is somewhat perilous in a multithreaded environment, but that shouldn't be an issue here).
When converting conditional statements to masks it can be helpful to do this with scalars first (although C/C++ implementations often use 1 rather than 0xFFFFFFFF for true, so the calculations may be different).
Write functions to output all the elements in a SIMD value for easier printf debugging (I am not a fan of how these types are shown in the debugger).

代码引导

__m128 vec;
vec= _mm_set_ps(0,1,2,3);//4 float in vec,values arrange is 3,2,1,0 in address offset按地址从低到高保存4个float到vec，顺序是3,2,1,0
__m128 vecHypSq = _mm_mul_ps(vec, vec);//mul two,set value from return乘，返回值赋值
float *fp = (float*)&vecHypSq;//force shift,use pointer to access value强制类型转换，用指针来访问
float fget=fp[0] + fp[1] + fp[2];//fp[0] vi 3fp[0]对应值3，
__m128 mask= _mm_cmplt_ps(vec,_mm_set_ps1(1));//compare below小于比较
float fget1=mask.m128_f32[0];//mask have four flag mask变量里有4个标志，每个标志用整数成员访问
vec.m128_f32[2]=8;//use sub field to access value，用子成员形式来访问

友情链接

汕头招聘网 | 山东招聘网 | 郑州教育培训 | 软件下载