使用NumPy、Numba和Python异步编程的高性能大数据分(3)

汇总统计asyncio主程序如列表4所示。可以看到main() 方法是唯一被调用的方法。

import time import numpy as np from class_summary_statistics_asyncio import SummaryStatisticsAsyncio def main(one_dimensional_array): # 新建汇总统计asyncio类对象 summary_statistics_asyncio = SummaryStatisticsAsyncio() # 调用main方法 summary_statistics_asyncio.main(one_dimensional_array) if __name__ == '__main__': start_time = time.clock() one_dimensional_array = np.arange(1000000000, dtype=np.float64) main(one_dimensional_array) end_time = time.clock() print("Program Runtime: {} seconds".format(round(end_time - start_time, 1)))

列表4.  结合Python异步库的汇总统计asyncio主程序代码

当行数达到1百万时,结合asyncio库的汇总统计主程序将得到以下结果。我加入了 开始/结束过程的打印,以展示异步过程在这种特殊情况下的工作原理。

start calculate_number_observation() procedure 观测数: 1000000000 start calcuate_arithmetic_mean() procedure starting calculate_median() 中值: 499999.5 finished calculate_median() procedure 算术平均值: 499999.5 start calculate_sample_standard_deviation() procedure 样本标准差: 288675.27893349814 finished calculate_sample_standard_deviation() procedure finished calcuate_arithmetic_mean() procedure finished calculate_number_observation() procedure 程序运行时间: 1504.4秒

结合Numba库的NumPy数组

结合Numba库的汇总统计类对象代码如列表5所示。你可以访问Numba在GitHub上的目录,以了解更多关于这个Python的开源NumPy感知优化编译器的信息。值得一提的是Numba支持CUDA GPU编程。下面的代码中,调试代码已被删除,以便在编译模式下运行该程序。

import time from numba import jit import numpy as np from math import sqrt class SummaryStatisticsNumba(object): """ 结合numba库计算观测数、算术平均值、中值和样本标准差 """ def __init__(self): pass @jit def calculate_number_observation(self, one_dimensional_array): """ 计算观测数 :参数 one_dimensional_array: numpy一维数组 :返回值 观测数 """ number_observation = one_dimensional_array.size return number_observation @jit def calcuate_arithmetic_mean(self, one_dimensional_array, number_observation): """ 计算算术平均值 :参数 one_dimensional_array: numpy一维数组 :参数 number_observation: 观测数 :返回值 算术平均值 """ sum_result = 0.0 for i in range(number_observation): sum_result += one_dimensional_array[i] arithmetic_mean = sum_result / number_observation return arithmetic_mean @jit def calculate_median(self, one_dimensional_array, number_observation): """ 计算中值 :参数 one_dimensional_array: 指numpy一维数组 :参数 number_observation: 观测数 :返回值 中值 """ one_dimensional_array.sort() half_position = number_observation // 2 if not number_observation % 2: median = (one_dimensional_array[half_position - 1] + one_dimensional_array[half_position]) / 2.0 else: median = one_dimensional_array[half_position] return median @jit def calculate_sample_standard_deviation(self, one_dimensional_array, number_observation, arithmetic_mean): """ 计算样本标准差 :参数 one_dimensional_array: numpy一维数组 :参数 number_observation: 观测数 :参数 arithmetic_mean: 算术平均值 :返回值 样本标准差值 """ sum_result = 0.0 for i in range(number_observation): sum_result += pow((one_dimensional_array[i] - arithmetic_mean), 2) sample_variance = sum_result / (number_observation - 1) sample_standard_deviation = sqrt(sample_variance) return sample_standard_deviation

列表5  结合Numba库的汇总统计类对象代码

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/60e1d2d1a28483f2f3300925b08e2807.html