精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
Single Instruction Multiple Data,单指令多数据流,能够复制多个操作数,并把它们打包在大型寄存器的一组指令集。以同步方式,在同一时间内执行同一条指令。
上面是对本质的描述,初学者可能对这个本质不好理解,其实对计算机数据基本形式有深入掌握的朋友,是容易理解SIMD的。计算机数据是电磁位Bit为基础的,数的计算很少只处理一个Bit,一次处理一般就要处理多个Bit,把多个Bit扩展为多个操作数,就容易理解SIMD的本质了。
SIMD的使用难点在准备多个操作数和使用特殊的指令函数,这样做就好像习惯了C语言和C++的程序员要用汇编写代码哪样,风格突然一变,会不适应。其实现在对SIMD有很多面向对象的封装,特别是www.codeproject.com上的文章,比如:
Basics of Single Instruction Multiple Data (SIMD) – CodeProject
Easy SIMD through Wrappers – CodeProject
Fast SIMD Prototyping – CodeProject
不想用底层的方式来编写,就把本质理解清楚,然后用别人封装好的类来写SIMD,不过这当然会影响效率,只能用在某些应用层面的开发。在某些驱动和算法层面用面向对象是不可能用面向对象,所以在理解了别人的封装代码情况下,还是要深入掌握SIMD的操作形式。有很多人称呼SIMD是脏活,这是有道理的,因为它的接口特殊,不太容易适应。
锐英源提供专业的SIMD技术指导,上面说的codeproject.com上的英文文章,锐英源能深入理解内容,并能掌握作者的思想和代码细节;本文后续的英文翻译是国外学习的留学生的SIMD作业翻译,国外教材和教学的严谨是出名的,锐英源出色地完成了此任务。锐英源的SIMD技术值得信赖,欢迎合作。
You are to modify the (square-based) multithreaded implementation of a "simple" raytracer from the first assignment to take advantage of SIMD instructions. As time is tight for this assignment you only need to SIMD-ify the sphere/ray intersection code, but this requires a fairly sizeable amount of changes across multiple files.修改方阵多线程raytracer应用SIMD指令。只需要SIMD-if状态的球/光线交叉点代码,几个文件要合理修改。
From the provided (square-based) multithreaded raytracer implementation, you will create multiple
subsequent versions that modify the sphere implementation as follows:任务点:
1. Replaced array-of-structures (AoS) with structures-of-arrays (SoA) for the sphere container.用数组结构体替换结构体数组,此用于球容器
2. Rewrote the sphere/ray intersection tests in a where there is one there is many approach.重写球光线交叉测试位置,这些位置有多个实现方法。
3. Optimised the sphere/ray intersection test to take advantage of SIMD.用SIMD优化交叉测试
This stage involves removing the Sphere struct (in SceneObjects.h) and modifying the Scene struct(Scene.h) to store sphere data via structure of arrays rather than the currently existing array of
structures (currently in the sphereContainer variable).从SceneObjects.h中删除球结构体且修改Scene.h 里的Scene结构体来存储球数据,用数组结构体来代替现有的结构体数组(此数组在现有代码的sphereContainer变量里)
In order to complete this step you will need to:
At the end of this stage the program should still produce the same results as the base code. Bethorough in your testing (i.e. test all the scenes) to ensure that everything works correctly before progressing.
This stage involves replacing the isSphereIntersected function with two new versions that use the where there is one there is many paradigm.一成多扩展处理isSphereIntersected
In order to complete this step you will need to:
This stage involves converting the latter of the two new version of the isSphereIntersected function from Stage 2 (the void returning one) into a SIMD implementation.2个新函数应用SIMD。
Test In order to complete this step you will need to:
NOTE: this is easily the hardest stage of the assignment.
The techniques required to complete each stage rely heavily on work done in tutorials 4 and 5
— refer to them often.
Again for SIMD, it's helpful to render scenes at a greatly reduced size for testing, with an equally small block size, one sample per pixel, and with one thread (e.g. try -size 4 4 -blockSize 4 -samples 1 -threads 1). It may even be helpful to fix the number of rays cast(MAX_RAYS_CAST) to be 1, so that reflection and refraction don't occur. This doesn't produce a very nice image, but images can be easily compared visually for problems (the example is only 16 pixels) and printf statements can be used to verify the correctness of calculations without outputting a huge amount of data (the use of debuggers is somewhat perilous in a multithreaded environment, but that shouldn't be an issue here).
When converting conditional statements to masks it can be helpful to do this with scalars first (although C/C++ implementations often use 1 rather than 0xFFFFFFFF for true, so the calculations may be different).
Write functions to output all the elements in a SIMD value for easier printf debugging (I am not a fan of how these types are shown in the debugger).
__m128 vec;
vec= _mm_set_ps(0,1,2,3);//4 float in vec,values arrange is 3,2,1,0 in address offset按地址从低到高保存4个float到vec,顺序是3,2,1,0
__m128 vecHypSq = _mm_mul_ps(vec, vec);//mul two,set value from return乘,返回值赋值
float *fp = (float*)&vecHypSq;//force shift,use pointer to access value强制类型转换,用指针来访问
float fget=fp[0] + fp[1] + fp[2];//fp[0] vi 3fp[0]对应值3,
__m128 mask= _mm_cmplt_ps(vec,_mm_set_ps1(1));//compare below小于比较
float fget1=mask.m128_f32[0];//mask have four flag mask变量里有4个标志,每个标志用整数成员访问
vec.m128_f32[2]=8;//use sub field to access value,用子成员形式来访问