C++17并行算法

C++17并行算法能够提升软件性能，增强了thread类，增加了一些内核相关的函数，值得学习和使用。

C++17 标准库中添加了对大多数通用算法的并行版本的支持，以帮助程序利用并行执行来提高性能。今天几乎每台计算机都有多个处理器内核，但是，默认情况下，在大多数情况下，在调用任何标准算法时，只使用其中一个内核，其他内核不参与标准算法的运行。C++17 解决了这种情况，在处理大型数组或数据容器时，算法可以更快地工作，在所有可用内核之间分配工作。

因此，<algorithm > 中与容器一起使用的函数具有并行版本。它们都收到了额外的重载，该重载采用执行策略的第一个参数，用于确定算法的执行方式。

在 C++17 中，执行策略的第一个参数可以采用以下 3 个值之一：

std::execution::seq 用于正常顺序执行
std::execution::par 进行正常的并行执行，在这种模式下，程序员在访问数据时必须注意避免 race 状态，但可以使用内存分配、互斥锁等
std::execution::par_unseq 进行无序并行执行，在此模式下，程序员传递的函子不应分配内存、块互斥锁或其他资源
因此，例如，要获得std::sort算法的并行版本，我们所要做的就是告诉算法使用所谓的并行执行策略，并且只使用以下最适合特定情况的选项之一：

std::sort(std::execution::par,
begin (name_of_container)), end (name_of_container)); //same thing as the
//version without an execution policy
std::sort(std::execution::seq,
begin (name_of_container)), end (name_of_container));
std::sort(std::execution::par_unseq,
begin (name_of_container)), end (name_of_container));
让我们看一下以下示例，其中 10,000 个整数按有并行化和无并行化排序：std::vectorstd::sort

#include <iostream>
#include <chrono>
#include <vector>
#include <algorithm>
#include <execution>

using namespace std;
using std::chrono::duration;
using std::chrono::duration_cast;
using std::chrono::high_resolution_clock;

void printVector(const char * pStatus, std::vector<int> &vect)
{
std::cout << "The vector with " << vect.size() << " elements "
<< pStatus << " sorting : \n";
for (int val : vect) {
std::cout << val << " ";
}
std::cout << "\n\n";
}

int main() {
const int numberOfElements = 10000;
const int numOfIterationCount = 5;

std::cout << "The number of concurrent threads supported is "
<< std::thread::hardware_concurrency() << "\n\n";//这里有thread增强

std::vector<int> vect(numberOfElements);
std::generate(vect.begin(), vect.end(), std::rand);

//printVector("before (original vector)", vect);

std::cout << "Let's sort the vector using sort() function WITHOUT PARALLELIZATION : \n";
for (int i = 0; i < numOfIterationCount; ++i) {
std::vector<int> vec_to_sort(vect);
//printVector("before", vec_to_sort);
const auto t1 = high_resolution_clock::now();
std::sort(vec_to_sort.begin(), vec_to_sort.end());
const auto t2 = high_resolution_clock::now();
std::cout << "The time taken to sot vector of integers is : "
<< duration_cast<duration<double, milli>>(t2 - t1).count() << "\n";
//printVector("after", vec_to_sort);
}

std::cout << "\n\n";
std::cout << "Let's sort the vector using sort() function
and a PARALLEL unsequenced policy (std::execution::par_unseq) : \n";
for (int i = 0; i < numOfIterationCount; ++i) {
std::vector<int> vec_to_sort(vect);
// printVector("before", vec_to_sort);
const auto t1 = high_resolution_clock::now();
std::sort(std::execution::par_unseq, vec_to_sort.begin(), vec_to_sort.end());
const auto t2 = high_resolution_clock::now();
std::cout << "The time taken to sot vector of integers is : "
<< duration_cast<duration<double, milli>>(t2 - t1).count() << "\n";
// printVector("after", vec_to_sort);
}

std::cout << "\n\n";

return 0;
}

在上述实现中，仅当范围（numberOfElements）的大小超过某个阈值时，该算法的并行版本才会提供与串行版本相比的性能提升，该阈值可能因编译、平台或设备的标志而异。我们的实现具有 10,000 个元素的人为阈值。

我们可以尝试不同的阈值和范围大小，看看这如何影响执行时间。自然地，只有十个元素，我们不太可能注意到任何差异。

但是，在对大型数据集进行排序时，并行执行更有意义，并且好处可能非常显著。

algorithms 库还定义了 for_each（）算法，我们现在可以使用它来并行化许多基于范围的常规for循环。但是，我们需要考虑到循环的每次迭代都可以独立于另一次迭代执行，否则可能会遇到数据竞争。

并发模型细节还有很多，锐英源软件网站也会不断更新，服务大家。

友情链接

CAD培训 / ERP管理系统