Computer Integrated Manufacturing System ›› 2024, Vol. 30 ›› Issue (7): 2444-2452.DOI: 10.13196/j.cims.2021.0913

Previous Articles     Next Articles

High-efficiency grouping realigning algorithm based on vector shuffling and DMA

LI Huixiang1,2,ZHANG Huifu1,2,HU Yonghua1,2+,ZHANG Xin1,2,WANG Shuying1,2   

  1. 1.School of Computer Science and Engineering,Hunan University of Science and Technology
    2.Hunan Provincial Key Laboratory for Service Computing and Novel Software Technology,Hunan University of Science and Technology
  • Online:2024-07-31 Published:2024-08-07
  • Supported by:
    Project supported by the Hunan Provincial Department of Education,China(No.20B242,19A169),and the Hunan Provincial Natural Science Foundation,China(No.2023JJ50019).

基于向量混洗和DMA传输的高效分组整序算法研究

李慧祥1,2,张会福1,2,胡勇华1,2+,张鑫1,2,王书盈1,2   

  1. 1.湖南科技大学计算机科学与工程学院
    2.湖南科技大学服务计算与软件服务新技术湖南省重点实验室
  • 作者简介:
    李慧祥(1998-),男,湖南郴州人,硕士研究生,研究方向:算法并行优化,E-mail:1449488105@qq.com;

    张会福(1970-),男,福建松溪人,教授,博士,研究方向:计算机应用技术,E-mail:hfzhang@hnust.edu.cn;

    +胡勇华(1981-),男,湖南湘潭人,教授,博士,研究方向:编译技术、并行计算,通讯作者,E-mail:huyh@hnust.cn;

    张鑫(1998-),女,山西晋中人,硕士研究生,研究方向:算法并行优化;E-mail:2423746399@qq.com;

    王书盈(1997-),女,湖南岳阳人,硕士研究生,研究方向:系统软件、算法并行优化;E-mail:wsy0329@qq.com。
  • 基金资助:
    湖南省教育厅科研资助项目(20B242,19A169);湖南省自然科学基金资助项目(2023JJ50019)。

Abstract: Realignment is a key bottleneck of the performance of fast Fourier transform operation process.For the vector Very Long Instruction Word(VLIW)architecture processor,to accelerate the processing of fast Fourier transform and improve the performance of the application systems in related fields,a group realigning method suitable for arbitrary integer power bases of 2 was proposed.This method included three aspects:dividing the input data into several groups of certain scale,shuffling and realigning within each group,and transferring the data in each group to the destination array in turn through Direct Memory Access(DMA).The method had the advantage of effectively reducing the calculation requirement of inverse sequence numbers and avoiding the requirement of finding index for each single data.Moreover,for architectures with “ping-pong” storage mechanism,a parallel method of shuffling and DMA was proposed based on this method.The specific algorithm implementation and experimental verification were carried out on the FT-M7002 processor.Experimental results showed that the method was not only correct,but also could effectively improve the execution speed of realigning in vector processors,which was very suitable for vector VLIW processors.

Key words: fast fourier transform, bit reverse, realignment, shuffle

摘要: 为提高快速傅里叶变换的处理速度,提升相关应用领域系统的性能,针对向量超长指令字(VLIW)架构处理器,提出一种适用于数据涉及的采样点数量为2的整数次幂的分组整序方法。该方法对输入数据按照一定规模分成若干个组,在每个分组内部进行混洗整序,以及通过直接存储器访问(DMA)传输将每组中的数据依次传输到结果数组,有效减少逆序数的计算需求和消除了单个数据寻址的要求。此外,针对硬件的“乒乓”存储功能提出了数据的向量混洗和DMA传输的并行处理方法,进一步提升分组整序算法的执行效率。通过在FT-M7002处理器上的具体算法实现和实验结果表明,该方法适用于向量VLIW架构处理器,运算结果正确,且有效地提高了整序的执行速度。

关键词: 快速傅里叶变换, 位逆序, 整序, 混洗

CLC Number: