Welcome to the GPU-FFT-Optimization repository! We present cutting-edge algorithms and implementations for optimizing the Fast Fourier Transform (FFT) on Graphics Processing Units (GPUs).
The associated research paper: https://eprint.iacr.org/2023/1410
NTT variant of GPU-FFT is available: https://github.com/Alisah-Ozcan/GPU-NTT
- CMake >=3.26
- GCC
- CUDA Toolkit
Configure + build:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_ARCHITECTURES=86
cmake --build build --parallelInstall:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_ARCHITECTURES=86
cmake --build build --parallel
cmake --install buildNotes:
- If you install to a system location (default:
/usr/local), you may needsudoor set-DCMAKE_INSTALL_PREFIX=/your/prefix. - If you omit
-DCMAKE_CUDA_ARCHITECTURES=..., GPU-FFT defaults to80;86;89;90. - If CMake cannot find
nvcc, setCUDACXX=/path/to/nvccor pass-DCMAKE_CUDA_COMPILER=/path/to/nvcc. - If you change compilers/toolchains, prefer a clean configure:
cmake --fresh -S . -B build.
GPU-FFT uses C++17/CUDA17 and applies per-configuration compile flags.
- Build type (single-config generators like Makefiles/Ninja):
-DCMAKE_BUILD_TYPE=Release|Debug|RelWithDebInfo|MinSizeRel - Optimization/debug defaults:
Release:-O3 -DNDEBUGRelWithDebInfo:-O3 -g -DNDEBUGDebug:-g(and CUDA adds line info)
- Extra warnings:
-DGPUFFT_ENABLE_WARNINGS=ON
Example:
cmake -S . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo -DGPUFFT_ENABLE_WARNINGS=ON
cmake --build build --parallelChoose one of data type which is upper line of the benchmark files:
- typedef Float32 BenchmarkDataType;
- typedef Float64 BenchmarkDataType;
To run examples:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_ARCHITECTURES=86 -DGPUFFT_BUILD_EXAMPLES=ON
cmake --build build --parallel
./build/bin/example/cpu_fft_example <RING_SIZE_IN_LOG2> <BATCH_SIZE>
./build/bin/example/gpu_fft_C_C_example <RING_SIZE_IN_LOG2> <BATCH_SIZE>
./build/bin/example/gpu_fft_R_R_example <RING_SIZE_IN_LOG2> <BATCH_SIZE>
./build/bin/example/cpu_ffnt_example <RING_SIZE_IN_LOG2> <BATCH_SIZE>
./build/bin/example/gpu_ffnt_R_R_example <RING_SIZE_IN_LOG2> <BATCH_SIZE>
Example: ./build/bin/example/gpu_fft_R_R_example 12 1To run benchmarks:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_ARCHITECTURES=86 -DGPUFFT_BUILD_BENCHMARKS=ON
cmake --build build --parallel
./build/bin/benchmark/gpu_fft_C_C_mult_benchmark --disable-blocking-kernel
./build/bin/benchmark/gpu_fft_R_R_mult_benchmark --disable-blocking-kernel
./build/bin/benchmark/gpu_fft_benchmark --disable-blocking-kernel
./build/bin/benchmark/gpu_ffnt_R_R_mult_benchmark --disable-blocking-kernelMake sure GPU-FFT is installed before integrating it into your project. The installed GPU-FFT library provides a set of config files that make it easy to integrate GPU-FFT into your own CMake project. In your CMakeLists.txt, simply add:
project(<your-project> LANGUAGES CXX CUDA)
find_package(CUDAToolkit REQUIRED)
# ...
find_package(GPUFFT CONFIG REQUIRED)
# ...
target_link_libraries(<your-target> (PRIVATE|PUBLIC|INTERFACE) GPUFFT::fft CUDA::cudart)
# ...
set_target_properties(<your-target> PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
# ...Please use the below BibTeX, to cite GPU-FFT in academic papers.
@misc{cryptoeprint:2023/1410,
author = {Ali Şah Özcan and Erkay Savaş},
title = {Two Algorithms for Fast GPU Implementation of NTT},
howpublished = {Cryptology ePrint Archive, Paper 2023/1410},
year = {2023},
note = {\url{https://eprint.iacr.org/2023/1410}},
url = {https://eprint.iacr.org/2023/1410}
}
This project is licensed under the Apache License. For more details, please refer to the License file.
If you have any questions or feedback, feel free to contact me:
- Email: alisah@sabanciuniv.edu
- LinkedIn: Profile